Skip to content

Evolution of the Android backend #2293

@rib

Description

@rib

Hi,

Sorry in advance this is a fairly long read :) This this aims to open a discussion as much as it does try to highlight some practical issues I found recently while working with the Android backend. The winit Discussions don't seem super active and since I also had some practical issues to bring up I figured I'd post here, but happy to try and split this up if it makes sense...

I've recently been experimenting with an alternative "glue" layer for building native Rust applications on Android that's based on the GameActivity class (in turn based on the widely used AppCompatActivity class) that's part of Google's Android Game Development Kit:

Ref: https://developer.android.com/games/agdk
and https://developer.android.com/games/agdk/integrate-game-activity

Here's the glue layer I've implemented so far: https://github.com/rib/agdk-rust/tree/main/game-activity

The README there also has a bit more context, including a bit about my motivation for looking at this but essentially NativeActivity isn't practically usable in many circumstances (at least not without subclassing in Java to augment with additional functionality, but even then it's impossible to use AppCompatActivity which has become a very standard foundational class for Android development that provides numerous back-ported compatibility APIs on Android).

I originally aimed to base the game-activity glue API on ndk-glue but as I made progress I found I needed to diverge in a number of ways from the ndk-glue API, such as for robustly handling synchronization between the java main thread and native thread, and I also chose to define a standard extern "C" android_main() entry point ABI for applications that doesn't require any special macros.

So although I originally imagined I might be able to enable winit support simply by being a drop in alternative to ndk-glue I ended up working on more substantial winit backend changes to be able to build on this work further.

Just for reference at this point I've published these three minimal Android app examples based on this work, to test creating a winit + wgpu app and a winit-based egui app. I have an experimental Bevy app I've been poking at locally but for now Bevy doesn't really work on Android due to it's lack of handling of lifecycle events.

Please find my example test apps here:
https://github.com/rib/agdk-rust/tree/main/examples

Please find my initial Android backend for Winit here: https://github.com/rib/winit/tree/agdk-game-activity

Now it's a bit tricky to determine the next steps for this work, considering that ideally I'd like to be able to enable this functionality in winit upstream, not just a private branch.

For starters I figured it would be good to share the current status of this work, and then I was thinking I'd list some of the most actionable topics that have come to light in the process of working on this that might help determine some next steps...

Technical backend issues that were noticed:

  • The existing Android backend in winit could be made more consistent with e.g. the X11 backend considering being similarly based on polling file descriptors.

Something I found while working on the Android backend was that it was initially unclear (at least to me) if there were potentially multiple points where the loop could block on IO while it didn't follow (what I would consider) a more traditional polling model where there would be one clear point of polling (via a Looper which is Android's wrapper over epoll). For some time I was convinced the existing backend was going to block on its main event reads, in addition to doing a looper poll, but I think to some extent it's just the layout that I found a little misleading to follow. In comparison I found the organisation of the X11 backend quite clear, with a separate function for running a single iteration of the loop and a single point where the loop would block to poll for file descriptor events. When I enabled my own glue layer I ended up following the same structure as the X11 backend.

  • It's subjective but the Android event loop feels like it's kind of upside down - where it dispatches events and then polls in different ways according to the control_flow. Again I found the X11 backend structure clearer in this regard - it would poll for events and then run an iteration of event dispatching (which was encapsulated in a function where it was clear to see the logical order of dispatching matched the documented order that's expected).

  • A small detail, but the Android backend uses a call_event_handler macro which doesn't need to be a macro. I ended up swapping in a sticky_exit_callback function that was consistent with the X11/Wayland backends.

  • Redraw requests are converted into an event in such a way that if they don't get handled by triggering a redraw then the request is lost. I think redraw requests are conceptually expected to be persistent requests that should only be cleared when they are fulfilled. E.g. if a redraw is requested while the loop isn't 'running'  (i.e. suspended) then (currently) the internal event is take()en after waking from the poll but redraws aren't emitted while an app is suspended. I'd intuitively expect that the request should be queued until the app resumes.

  • Queuing redraw requests as events introduces some unnecessary latency, and potentially multiple iterations of the event loop before the request might be honoured. At the point of queuing the redraw request there might already be other events pending that will wake up the looper, such as input events and Android's Looper implementation doesn't prioritize delivering a Poll::Wake over any other file descriptor events. Since the redraw request will only be acknowledged once the looper specifically wakes with a Poll::Wake it's possible to handle multiple other pending events before catching up with the redraw request.

Something I did a little differently here, which could potentially be re-used in other backends too was to create a RedrawRequester that encapsulated a shared atomic boolean flag and a 'waker' (i.e. looper.wake()). This gives more ready access to the flag for any subsequent iteration of the event loop (woken for any reason) and doesn't lead to any buffering of data (except for potentially redundant wakes which Android already tries to expunge automatically). The lack of buffering is notable compared to e.g. the X11 backend that uses an mpsc channel for buffering redraw requests which also doesn't seem like an ideal fit for the problem.

Java <-> Native Synchronization in ndk-glue

This is a key area where I was very uncertain about the current ndk-glue design and how robustly is handles synchronization between the java main thread and the rust native thread for operations such as destroying the applications native window and for saving state.

As far as I've seen, ndk-glue employs a purely cooperative synchronization scheme that documents what downstream users of the API must do to ensure synchronization.

There is this comment for the WindowDestroyed event enum:

/// If the window is in use by ie. a graphics API, make sure the lock from
/// [`native_window()`] is held on to until after freeing those resources.
///
/// After receiving this [`Event`] `ndk_glue` will block until that read-lock
/// is released before returning to Android and allowing it to free up the window.
WindowDestroyed,

which essentially documents an implementation detail that says: if you keep the the read lock guard after querying the current native_window() you can effectively block ndk-glue from being able to clear the native window, which will ultimately force synchronization with the Java thread in case it gets notified of a window termination (because it won't be able to write the change).

In practice though winit's Android backend doesn't appear to take a read lock during particularly critical event callbacks, such as for Redraw events so winit doesn't seem to honor the documented synchronization scheme to help block the native window from being torn out in the middle of rendering.

What's also notably different to the original android_native_app_glue provided for use with NativeActivity by Google is that there's no guarantee that the native window remains accessible at least until the WindowDestroyed event has been received and the application has had an opportunity to react. By the time the application sees a WindowDestroyed event the native window could already be long gone.

Something I did differently in the game-activity glue implementation was provide a poll_events() API that takes an FnMut closure that is in some ways comparable to how winit takes a closure that's called for each event. The important thing this enables is that the implementation can place arbitrary pre- and -post logic around the handling of any Android event, which provides a robust place to handle any necessary synchronization with the Java main thread as required for different events, including for window termination and state saving. Since this design also fully encapsulates synchronization, there's no need to downstream users to handle anything and synchronization details can also be changed without affecting applications.

This is something that would also be good to discuss as an ndk-glue issue to see if it makes sense to change its current design but I figure it also makes sense to highlight here too.

High-level Android portability questions

In the process of testing the winit backend and e.g. looking to get egui and Bevy running I realized that if you follow most existing examples for how to use Winit and look at existing integrations you don't tend to end up with an application that is portable to Android (ignoring things like main() function quirks.)

In particular Android is currently unique in requiring applications to be aware that the .native_window() for a winit window will be invalidated between Suspended and Resumed event pairs, and also requiring applications to recreate any render surfaces each time their application is Resumed.

Existing winit integrations tend to assume they can create a window and initialize all graphics state + rendering surfaces up front when they create their event loop, which will ultimately just lead to a panic on Android once the app tries to access a native_window before it has been Resumed.

Here's an example of an upstream PR for egui that attempts to update their winit + wgpu integration abstractions so that it can support Android: emilk/egui#1634

I think their may be some opportunities within Winit to help steer downstream users into building portable integrations and applications. E.g. one idea I've wondered about is whether we could make all platforms consistently deliver a Resumed event (even desktop window systems) and then encourage (by updating examples) that this should be the standard place for all applications to lazily initialize all their graphics state and create their rendering surfaces. There would still be the separate requirement on Android to have to re-create new surfaces for each future Resumed event but it would already be quite a big improvement in terms of consistent application structure that could help encourage portability by default.

Next steps

... okey, I guess I'll stop here for now, since this is already a pretty big dump of information. :)

I'd be interested to gauge interest in any of this, and would be happy to split out separate issues for some of the things mentioned above.

One big question that could be interesting to discuss is whether there might be an interest in updating Winit's Android backend to work with this game-activity glue layer, or something similar considering:

  • Some of the concerns around synchronization that the current ndk-glue design appears to have?
  • The ability to support AppCompatActivity based Android applications which includes back-ported Android APIs that make it much more practical to develop Android applications that are compatible with a wider range of Android versions.
  • Being in a better position to leverage more of the AGDK native libraries, e.g. for improved ime text input handling, game controller support, "swappy" synchronization for rendering etc.

Alternatively maybe it'd be possible to define a standard "glue" API, and maybe have something similar to ndk-context for the glue layer that would make it possible for applications to choose their glue (though I suspect that adding additional abstractions here might also just impede improvements to Android support more than help). I think initially the main challenge with this direction would be with defining a standard input API considering that game-activity is not based on AInputQueue which NativeActivity apps tend to use.

Thanks for your time if you made it this far! :-D

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions