Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent Android Cached App Freezer from freezing the replay server #3504

Closed
wants to merge 1 commit into from

Conversation

thisisjimmyfb
Copy link
Contributor

@thisisjimmyfb thisisjimmyfb commented Dec 14, 2024

Since Android assigns a low oom_score_adj to foreground services, which prevent cached app freezer from freezing the replay server. Therefore adding a dummy service to RenderDoc will prevent cached app freezer from freezing the replay server.

Cached app freezer stops execution for cached processes and reduces resource usage by misbehaving apps that might attempt to operate while cached. This can happen for the RenderDoc replay server when user simply stopped using RenderDoc for a short period of time because the RenderDoc replay server only produces user perceptible changes when actions are taken on the RenderDoc client.

@baldurk
Copy link
Owner

baldurk commented Dec 16, 2024

I'm not quite sure of the purpose of this PR. You've explained the implementation detail but not why this is necessary or what is causing the disconnections. Android has developer features to prevent the screen from going off or from sleeping while applications are running, so this hasn't been a widespread problem. Can you explain exactly what the problem is that you're solving here beyond just 'prevents disconnections'?

Android is an extremely unstable and unreliable platform where almost nothing works as expected - so any Android-specific changes must be required or well justified to outweigh any risk of breaking or causing unexpected crashes/problems if they use features which are unavailable, changed, or broken on some devices.

@thisisjimmyfb
Copy link
Contributor Author

Cached app freezer stops execution for cached processes and reduces resource usage by misbehaving apps that might attempt to operate while cached. This can happen for the RenderDoc replay server when user simply stopped using RenderDoc for a short period of time because the RenderDoc replay server only produces user perceptible changes when actions are taken on the RenderDoc client.

The WakeLock is a more speculative fix for an issue where the adb command process hangs, which I presume is due to cpu suspend (and the issue has stopped happening ever since the wake lock is introduced in our fork). I can leave that out if you are not comfortable with speculative fixes.

@cmannett85-arm
Copy link
Collaborator

The freezer is disabled when the flag use_freezer is set to false or the developer option is disabled.

You can just turn it off in the developer options.

The WakeLock is a more speculative fix for an issue where the adb command process hangs, which I presume is due to cpu suspend

When does this occur? Once the server is running the UI will ping it repeatedly which should keep it alive.

@thisisjimmyfb
Copy link
Contributor Author

thisisjimmyfb commented Dec 18, 2024

Each Android implementation can choose to expose or not expose certain set of developer options so users might not always have the option to disable it. In the case of Cached App Freezer, it can really help free up system resources so disabling it can actually impede RenderDoc's execution.

The adb command process hang happened while the RenderDoc server was running and connected to the client running on my PC. If network activity was supposed to prevents cpu suspend I will remove the Wake Lock from this PR.

@thisisjimmyfb thisisjimmyfb force-pushed the v1.x branch 2 times, most recently from ae89ba1 to d83699d Compare December 18, 2024 20:07
@thisisjimmyfb thisisjimmyfb changed the title Prevent Android cpu suspend and cached app freezer Prevent Android Cached App Freezer from freezing the replay server Dec 18, 2024
@cmannett85-arm
Copy link
Collaborator

Hi @thisisjimmyfb, what devices and Android versions have you tested this on?

@thisisjimmyfb
Copy link
Contributor Author

thisisjimmyfb commented Jan 6, 2025

Hi @thisisjimmyfb, what devices and Android versions have you tested this on?

This was tested on a Meta Quest 3 running Android12.

@baldurk
Copy link
Owner

baldurk commented Jan 8, 2025

Does this happen on any other devices? right now this seems like much too risky of a change to make for only one device where the issue seems to be device-specific and created by an optional feature that device has enabled (and removed a way to disable).

@thisisjimmyfb
Copy link
Contributor Author

Does this happen on any other devices? right now this seems like much too risky of a change to make for only one device where the issue seems to be device-specific and created by an optional feature that device has enabled (and removed a way to disable).

This shouldn't be a Meta Quest specific issue. Cached App Freezer is a feature that existed since android11. I spoke to our AOSP team and they believe this is a general AOSP problem. Is it possible that people are indeed experiencing the disconnection but simply reconnects and not reporting it?

Besides solving the disconnection problem, adding the dummy service also has the side effect of lowing the oom_score_adj value which can help keep the RenderDoc replay server survive low memory situations.

@tnovak
Copy link

tnovak commented Jan 9, 2025

@baldurk My understanding is that this is fundamentally about RenderDoc being at odds with the Android app lifecycle.

RenderDoc uses an Activity (Loader.java) to bootstrap the remote server, which is implemented entirely in native code. However, when the user navigates away from the loader activity (e.g. by launching another app) and the activity transitions into the stopped state, Android expects it to no longer do any work, and can enforce this by suspending the process. This, of course, breaks native threads that may still attempt to communicate with the host.

A common solution is to use a foreground service, signaling to the OS that the process is actively doing work. This affects its priority wrt lowmemorykiller, cached apps freezer, etc.

This reproduces the issue for me on a regular Android device/emulator, provided that the freezer is enabled (adb shell dumpsys activity |grep use_freezer returns true), which I believe is the case by default as of Android 12 (commit).

  1. Launch the remote server: adb shell am start -n org.renderdoc.renderdoccmd.arm64/.Loader -e renderdoccmd remoteserver
  2. Now start a few other activities (e.g. via am start, or by pressing the home button and then launching another app)
  3. Check RenderDoc's oom_score_adj.

This is what it looks like on my device:

$ adb shell dumpsys activity oom |grep -A 1 org.renderdoc.renderdoccmd
    Proc # 3: cch    b/ /CAC  ------  t: 0 8367:org.renderdoc.renderdoccmd.arm64/u0a191 (cch-act)
        oom: max=1001 curRaw=900 setRaw=900 cur=900 set=900

This indicates that the renderdoccmd process is cached (oom_score_adj >= 900), and can therefore be frozen.

$ adb logcat -s ActivityManager
[...]
01-09 15:43:11.383   592   690 D ActivityManager: freezing 8367 org.renderdoc.renderdoccmd.arm64

@@ -246,6 +246,7 @@ ReplayProxy::ReplayProxy(ReadSerialiser &reader, WriteSerialiser &writer, IRepla
m_Proxy(proxy),
m_Remote(NULL),
m_Replay(NULL),
m_PreviewWindow(NULL),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an unrelated fix, consider moving to a separate PR.

import android.os.Message;
import android.os.Process;

public class DummyService extends android.app.Service
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps something like KeepaliveService might be more descriptive?

.setPriority(Notification.PRIORITY_LOW)
.setSmallIcon(R.drawable.icon)
.setContentTitle("RenderDoc Dummy Foreground Service")
.setContentText("RenderDoc needs this dummy foreground service to prevent cached app freezer from interrupting the network connection")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like "RenderDoc server running" as the text, with "RenderDoc" as the title? Cached app freezer is an implementation detail that's unimportant to the user.

@@ -17,6 +18,11 @@ protected void onCreate(android.os.Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
getWindow().addFlags(WindowManager.LayoutParams.FLAG_KEEP_SCREEN_ON);

Context context = getApplicationContext();
if(context != null) {
context.startService(new Intent(this, DummyService.class));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also stop the foreground service when the activity is destroyed, or if the remote server exits for whatever reason?

@tnovak
Copy link

tnovak commented Jan 10, 2025

2. Now start a few other activities

I should clarify that cached apps freezer shouldn't cause problems in what's presumably a common case where one first starts org.renderdoc.renderdoccmd.arm64/.Loader, and then the app being debugged. In that case, renderdoccmd is treated as the previous app and isn't susceptible to being frozen.

It's when an additional activity is involved (e.g. user navigates back to the home app) that renderdoccmd is demoted to the cached state.

In any case, this patch should make the server more robust on Android by being explicit about still having work to do, even when the Loader activity is no longer visible to the user.

@baldurk
Copy link
Owner

baldurk commented Jan 10, 2025

Please do not create reviews on pull requests on this repository, as you are not a code owner.

My ultimate concern is that this is a risky change with a significant dependency on entirely new Android APIs that RenderDoc has never used before. There is no way to know what proportion of Android devices support this service API, or for those that do whether they implement it fully and correctly and don't cause different problems due to using it. The tradeoff is working around this issue on devices where we know it happens (so far only one confirmed) vs the risk that other devices with no problems currently break (on Android this is always a significant risk). That means the cost/benefit ratio is not enough to justify this change in my view.

Is it possible that people are indeed experiencing the disconnection but simply reconnects and not reporting it?

This is certainly possible but it's not reasonable to act on speculated problems that have not been reported. We should certainly monitor this to see if it is reported on more devices, but right now we only know for sure that it happens on the Quest 3. Cam tests on a variety of devices and I have a handful and have not hit this issue so that implies it is not universal.

@thisisjimmyfb since Oculus has its own fork of RenderDoc that quest developers will generally be using my recommendation for now is that you apply this change there locally to resolve the issue for your devices. This code changes extremely rarely so there should be minimal issues with having it as a local patch.

If this becomes a more widespread and serious issue my first step would be to recommend that users turn off this freezer cache via a developer option as Cam mentioned above (#3504 (comment)). We should only look into making risky changes if it's also widespread that that option is unavailable, and ideally find a way to scope it so that it only runs on the affected devices to limit the potential problems. Since this is Android I don't have any trust that this change is safe to make generally and while that doesn't rule out fixes, it does mean that it requires more justification for me to be comfortable merging risky changes.

@baldurk baldurk closed this Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants