警惕ContentProvider的风险
背景
当下的Android App形态里,很大一部分App是多进程的。 假设某音乐类App,其有主进程(Main Process)和播放进程(Play Process)之分,二者之间通过ContentProvider
进行大数据量访问。 前者负责提供ContentProvider
,后者则访问该ContentProvider
。
通常,运行在操作系统之上的各个进程,其地址空间是独立的,运行起来互不干扰。 其中一个进程退出或崩溃,并不会影响其他进程,这种隔离是由操作系统保障的。 但是,在Android上似乎并不是这么符合直观。
假设当前因为Low Memory Killer、或者用户行为等因素,导致Main Process被杀或Crash,而此时Play Process正在访问ContentProvider
,则其会怎么样呢? 本文将就这个问题,由一条日志线索展开,一起揭开ContentProvider
与进程被杀的真相。
一宗播放停止问题
一直以来,该App存在播放暂停或停止的反馈,正好,不久前,在跟踪某一款手机上播放停止的问题时,意外地目睹到了一起“凶案现场”:
06-22 16:03:04.148 998 2071 I ActivityManager: Killing 12398:net.poemcode.music:playservice/u0a59 (adj 200): depends on provider net.poemcode.music/.sharedfileaccessor.ContentProviderImpl in dying proc net.poemcode.music (adj 200)
案情很简单,就是本来好好地听歌,突然就不播了。现场留下的这条日志,就是我们破解谜题的线索。 这条线索虽然言简意赅,但是似乎蕴含了不小的信息量,不妨望文生义下,大致的意思是说:
- 系统将要杀死进程net.poemcode.music:playservice
- net.poemcode.music:playservice依赖于Provider:net.poemcode.music/.sharedfileaccessor.ContentProviderImpl
- net.poemcode.music/.sharedfileaccessor.ContentProviderImpl“寄生”在进程net.poemcode.music里
- 进程net.poemcode.music即将挂掉
日志解读就这么多了,综合起来看有三点:
- A进程要挂了
- B进程正在使用A进程里的
ContentProvider
- B进程也要被杀
日志逆向分析
Android提供了ContentProvider
,这是绝大部分App都在采用的数据访问机制,而多进程如前文所讲,也是常见的方式。 如果两个进程因为ContentProvider
关联在一起,其中一个终止,另外一个也会遭殃,看起来不那么合理。 现在就从代码着手,一起来看看原委。
经过搜索AOSP的代码,确认上述日志与ActivityManagerService
1有紧密联系:
capp.kill("depends on provider " + cpr.name.flattenToShortString() + " in dying proc " + (proc != null ? proc.processName : "??") + " (adj " + (proc != null ? proc.setAdj : "??") + ")", true); |
上述字符串在上述日志里,就是这一部分:
depends on provider net.poemcode.music/.sharedfileaccessor.ContentProviderImpl in dying proc net.poemcode.music (adj 0)
如果上述代码还不足以让人信服,那么再结合下面ProcessRecord
2的代码,
void kill(String reason, boolean noisy) { if (!killedByAm) { Trace.traceBegin(Trace.TRACE_TAG_ACTIVITY_MANAGER, "kill"); if (noisy) { Slog.i(TAG, "Killing " + toShortString() + " (adj " + setAdj + "): " + reason); } EventLog.writeEvent(EventLogTags.AM_KILL, userId, pid, processName, setAdj, reason); Process.killProcessQuiet(pid); ActivityManagerService.killProcessGroup(uid, pid); if (!persistent) { killed = true; killedByAm = true; } Trace.traceEnd(Trace.TRACE_TAG_ACTIVITY_MANAGER); } } |
注意Slong.i
,就正好吻合上述日志的前半部分了:
Killing 16141:net.poemcode.music:playservice/u0a103 (adj 200):
现在找到了日志出处,那么来看回ActivityManagerService
,较为完整的代码如下:
private final boolean removeDyingProviderLocked(ProcessRecord proc, ContentProviderRecord cpr, boolean always) { // ...... for (int i = cpr.connections.size() - 1; i >= 0; i--) { ContentProviderConnection conn = cpr.connections.get(i); // ...... ProcessRecord capp = conn.client; conn.dead = true; if (conn.stableCount > 0) { if (!capp.persistent && capp.thread != null && capp.pid != 0 && capp.pid != MY_PID) { capp.kill("depends on provider " + cpr.name.flattenToShortString() + " in dying proc " + (proc != null ? proc.processName : "??") + " (adj " + (proc != null ? proc.setAdj : "??") + ")", true); } } // ...... } // ...... } |
函数removeDyingProviderLocked
一共在四个地方被调用,在上述问题场景下,到底是哪一个地方调用了这个方法呢? 先来看第一处,代码如下:
final boolean forceStopPackageLocked(String packageName, int appId, boolean callerWillRestart, boolean purgeCache, boolean doit, boolean evenPersistent, boolean uninstalling, int userId, String reason) { // ... ... ArrayList>ContentProviderRecord> providers = new ArrayList>>(); if (mProviderMap.collectPackageProvidersLocked(packageName, null, doit, evenPersistent, userId, providers)) { if (!doit) { return true; } didSomething = true; } for (i = providers.size() - 1; i >= 0; i--) { removeDyingProviderLocked(null, providers.get(i), true); } // ... ... } |
这是第二处:
private void cleanupDisabledPackageComponentsLocked( String packageName, int userId, boolean killProcess, String[] changedClasses) { // ...... // Clean-up disabled providers. ArrayList>ContentProviderRecord> providers = new ArrayList>>(); mProviderMap.collectPackageProvidersLocked( packageName, disabledClasses, true, false, userId, providers); for (int i = providers.size() - 1; i >= 0; i--) { removeDyingProviderLocked(null, providers.get(i), true); } // ...... } |
那么到底是不是上面这两处呢?其实并不是的。不妨函数removeDyingProviderLocked
中Slog
的部分:
" (adj " + (proc != null ? proc.setAdj : "??")
如果removeDyingProviderLocked
的参数proc
为null
,则其日志输出里,应该类似(adj ??)
。 因为现实日志里输出的是数字,而上述两个地方都将参数proc
赋值为null
,所以也就排除了这两个地方。
一共四处,那么接下来看第三处:
boolean cleanupAppInLaunchingProvidersLocked(ProcessRecord app, boolean alwaysBad) { // Look through the content providers we are waiting to have launched, // and if any run in this process then either schedule a restart of // the process or kill the client waiting for it if this process has // gone bad. boolean restart = false; for (int i = mLaunchingProviders.size() - 1; i >= 0; i--) { ContentProviderRecord cpr = mLaunchingProviders.get(i); if (cpr.launchingApp == app) { if (!alwaysBad && !app.bad && cpr.hasConnectionOrHandle()) { restart = true; } else { removeDyingProviderLocked(app, cpr, true); } } } return restart; } |
第四处:
private final boolean cleanUpApplicationRecordLocked(ProcessRecord app, boolean restarting, boolean allowRestart, int index, boolean replacingPid) { Slog.d(TAG, "cleanUpApplicationRecord -- " + app.pid); // ...... // Remove published content providers. for (int i = app.pubProviders.size() - 1; i >= 0; i--) { ContentProviderRecord cpr = app.pubProviders.valueAt(i); final boolean always = app.bad || !allowRestart; boolean inLaunching = removeDyingProviderLocked(app, cpr, always); if ((inLaunching || always) && cpr.hasConnectionOrHandle()) { // We left the provider in the launching list, need to // restart it. restart = true; } cpr.provider = null; cpr.proc = null; } app.pubProviders.clear(); // ...... } |
对比第三处和第四处,已经无法使用上面的方法进行对比。 这个时候不妨检查cleanUpApplicationRecordLocked
中的第一行代码,
Slog.d(TAG, "cleanUpApplicationRecord -- " + app.pid); |
如果日志中也出现了这行信息,那么就可以区分出处了。
06-22 16:03:04.146 998 2071 D ActivityManager: cleanUpApplicationRecord -- 19826 06-22 16:03:04.147 998 2071 W ActivityManager: Scheduling restart of crashed service net.poemcode.music/.service.MainService in 1000ms 06-22 16:03:04.147 998 2071 W ActivityManager: Scheduling restart of crashed service net.poemcode.music/.business.lockscreen.LockScreenService in 11000ms 06-22 16:03:04.148 998 2071 I ActivityManager: Killing 12398:net.poemcode.music:playservice/u0a59 (adj 200): depends on provider net.poemcode.music/.sharedfileaccessor.ContentProviderImpl in dying proc net.poemcode.music (adj 200)
由此,可以通过其可以判断调用位置应该是在第四处。当线索调查到这里,再往下继续追溯就变得非常困难了。
通过cleanUpApplicationRecordLocked
的注释可以了解到,该方法不仅会在进程将死的时机调用,也会在直接停止进程时使用。
/**
* Main code for cleaning up a process when it has gone away. This is
* called both as a result of the process dying, or directly when stopping
* a process when running in single process mode.
*
* @return Returns true if the given process has been restarted, so the
* app that was passed in must remain on the process lists.
*/ |
这个时候继续调查日志:
06-22 16:03:04.145 998 2071 I ActivityManager: Process net.poemcode.music (pid 19826) has died 06-22 16:03:04.146 998 2071 D ActivityManager: cleanUpApplicationRecord -- 19826 06-22 16:03:04.147 998 2071 W ActivityManager: Scheduling restart of crashed service net.poemcode.music/.service.MainService in 1000ms 06-22 16:03:04.147 998 2071 W ActivityManager: Scheduling restart of crashed service net.poemcode.music/.business.lockscreen.LockScreenService in 11000ms 06-22 16:03:04.148 998 2071 I ActivityManager: Killing 12398:net.poemcode.music:playservice/u0a59 (adj 200): depends on provider net.poemcode.music/.sharedfileaccessor.ContentProviderImpl in dying proc net.poemcode.music (adj 200)
从中可以看出Main Process已经已死,从而可以判断出cleanUpApplicationRecordLocked
被调用在handleAppDiedLocked
:
/** * Main function for removing an existing process from the activity manager * as a result of that process going away. Clears out all connections * to the process. */ private final void handleAppDiedLocked(ProcessRecord app, boolean restarting, boolean allowRestart) { int pid = app.pid; boolean kept = cleanUpApplicationRecordLocked(app, restarting, allowRestart, -1, false /*replacingPid*/); // ... ... } |
顺水推舟,继而确定appDiedLocked
调用了上述方法。
final void appDiedLocked(ProcessRecord app, int pid, IApplicationThread thread, boolean fromBinderDied) { // ... ... if (!app.killedByAm) { Slog.i(TAG, "Process " + app.processName + " (pid " + pid + ") has died"); mAllowLowerMemLevel = true; } else { // Note that we always want to do oom adj to update our state with the // new number of procs. mAllowLowerMemLevel = false; doLowMem = false; } EventLog.writeEvent(EventLogTags.AM_PROC_DIED, app.userId, app.pid, app.processName); if (DEBUG_CLEANUP) Slog.v(TAG_CLEANUP, "Dying app: " + app + ", pid: " + pid + ", thread: " + thread.asBinder()); handleAppDiedLocked(app, false, true); // ...... } |
最后追根溯源,找到这里:
private final class AppDeathRecipient implements IBinder.DeathRecipient { // ...... @Override public void binderDied() { if (DEBUG_ALL) Slog.v( TAG, "Death received in " + this + " for thread " + mAppThread.asBinder()); synchronized(ActivityManagerService.this) { appDiedLocked(mApp, mPid, mAppThread, true); } } // ...... } |