|
| 1 | +# Replace logging and error reporting infrastructure with tracing. |
| 2 | + |
| 3 | +* Status: proposed |
| 4 | +* Deciders: |
| 5 | + * AppServices team: ? |
| 6 | + * Firefox Android: ? |
| 7 | + * Firefox iOS: ? |
| 8 | + * Firefox Desktop: ? |
| 9 | + |
| 10 | +* Date: Mar 06, 2024 |
| 11 | +* Feedback deadline: Mar 20, 2024 |
| 12 | + |
| 13 | +## Context and Problem Statement |
| 14 | + |
| 15 | +Currently all Rust code in [app-services](https://github.com/mozilla/application-services/) |
| 16 | +uses the `log` crate for debugging and diagnostics. |
| 17 | +Log output is often captured during tests where it's just printed to stdout/stderr if the test |
| 18 | +fails. Further, both our mobile apps and Firefox Desktop have techniques for having this log |
| 19 | +output be sent across the FFI so it can be captured in whatever "native" logging system is |
| 20 | +in place (eg, logcat on Android, custom file logging on iOS, Log.sys.mjs/Console.sys.mjs logging on Desktop) |
| 21 | + |
| 22 | +application-services also has an [error support crate](https://github.com/mozilla/application-services/tree/main/components/support/error), |
| 23 | +designed explicitly for error reporting for the applications. |
| 24 | +Android and iOS both hook into this to report errors to Sentry, whereas Desktop has not yet implemented this functionality. |
| 25 | +While the error support crate has capabilities beyond just moving the error across the FFI, this capability is the focus of this document, |
| 26 | +and any changes proposed to this crate by this document are limited to just this. |
| 27 | + |
| 28 | +### Problems with the current approaches. |
| 29 | + |
| 30 | +#### Problems using `log` |
| 31 | + |
| 32 | +The main problem with the `log` module is a design choice made by that crate, |
| 33 | +which is the concept of a [global "max level"](https://docs.rs/cli-log/latest/cli_log/fn.set_max_level.html) |
| 34 | + |
| 35 | +In Gecko, simply setting this max level to, say, `Trace` can cause performance regressions in other |
| 36 | +crates - eg, [see this desktop bug](https://bugzilla.mozilla.org/show_bug.cgi?id=1874215) for one example. |
| 37 | + |
| 38 | +In practice, this means that we are unable to selectively get `Debug` or `Trace` logs for individual components, |
| 39 | +because the max level for the logger is, effectively, fixed at `Info`. |
| 40 | + |
| 41 | +### Problems with the error reporter |
| 42 | + |
| 43 | +The error reporter actually works OK for mobile, but is not implemented at all for Desktop. |
| 44 | +However, the existing implementation is quite ad-hoc and not very "rich" in terms of data which |
| 45 | +can be sent to the error reporter. |
| 46 | + |
| 47 | +So while there's actually no real problem here, there is an opportunity to better align the |
| 48 | +logging and error-reporting requirements into a single facility before introducing this capability |
| 49 | +to desktop. |
| 50 | + |
| 51 | +## The Rust tracing crate. |
| 52 | + |
| 53 | +An alternative to the `log` crate is the [`tracing`](https://docs.rs/tracing/latest/tracing/) crate, |
| 54 | +which comes from the tokio project. |
| 55 | +This document is not going to rehash the excellent existing documentation, but some key features |
| 56 | +include: |
| 57 | + |
| 58 | +* For simple logging it has a very similar API to the `log` crate. Almost all existing use of `log` in non |
| 59 | + test code can be upgrade to tracing simply by replacing `log::trace!` with `tracing::trace!` etc. |
| 60 | +* Richer logging semantics mean that it's quite easy and efficient to attach additional data to each log entry. |
| 61 | + ie, more than simple strings can be logged. |
| 62 | +* A [`tracing-subscriber`](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/) mechanism that allows |
| 63 | + efficient filtering etc of log entries without enforcing a global `max_level`. |
| 64 | + Each subscriber is able to efficiently filter log events without impacting other crates. |
| 65 | + |
| 66 | +This means that it should be viable to have some crates capture `Trace` output without impacting any other crates |
| 67 | +or the overall performance of the application.. |
| 68 | + |
| 69 | +## Proposal: Move to tracing for all app-services crates. |
| 70 | + |
| 71 | +This document proposes that: |
| 72 | + |
| 73 | +* All app-services crates move to using `tracing` instead of `log`. |
| 74 | + There is a `tracing_log` crate which should mean we don't need to move all crates, but using this |
| 75 | + crate does have some limitations, and given the move to `tracing` is largely mechanical we should probably |
| 76 | + avoid it if we can - but it's a reasonable fallback if we need log output from crates we don't control. |
| 77 | + |
| 78 | +* All techniques we have for reporting log events and error reports to the application should be replaced with |
| 79 | + a new mechanism based on [`tracing-subscriber`](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/) |
| 80 | + to move logs and error-reports across the FFI. |
| 81 | + |
| 82 | +* This same mechanism, and the same raw data, should be used on all platforms (ie, Android, iOS and Desktop). |
| 83 | + |
| 84 | +## Implementation: Move to tracing for all app-services crates. |
| 85 | + |
| 86 | +This section describes the changes necessary to implement the above. |
| 87 | + |
| 88 | +### Move all our crates to `tracing` |
| 89 | + |
| 90 | +Most crates tend to use `log::debug!()` (or `trace!`, `error!` etc). We'd just change the crate to depend on `tracing` instead of `log`, |
| 91 | +and probably change each module to have as a prelude `use tracing::{trace, debug, info, error};` and change each `log::error!` to `error!`. |
| 92 | +Or maybe we just spell it out entirely as `tracing::log!(...)` - this is a choice for individual crate maintainers. |
| 93 | + |
| 94 | +Many tests start with `env_logger::try_init().unwrap()` or similar - these would all be replaced with the more-verbose but equivalent |
| 95 | + |
| 96 | +``` |
| 97 | + tracing_subscriber::registry() |
| 98 | + .with(fmt::layer()) |
| 99 | + .with(EnvFilter::from_default_env()) |
| 100 | + .init(); |
| 101 | +``` |
| 102 | + |
| 103 | +(probably via a helper of some sort) |
| 104 | + |
| 105 | +### Implement a `tracing_subscriber::Layer`-based "subscriber" mechanism. |
| 106 | + |
| 107 | +This subscriber mechanism would require the application to know all tracing `targets` it cares about - and by default, |
| 108 | +each crate is its own target. The app must explicitly "subscribe" to all such targets individually. |
| 109 | +It will *not* be possible to subscribe to all events for performance reasons, and it's unlikely we'll allow |
| 110 | +any "matching" capabilities (eg, regular expressions or similar) - each target will use an exact string match. |
| 111 | + |
| 112 | +When subscribing to a target, the app will also need to specify the max-level for that target. In other words, a |
| 113 | +subscription can be considered a tuple of `(target, level)`. |
| 114 | + |
| 115 | +This implies that checking whether an individual tracing event has a subscriber will be a `HashMap` lookup for the target name, |
| 116 | +then a numeric comparison for the level. Each target will allow only one subscriber, identified by the `target`. |
| 117 | +This means that changing the level for an existing subscriber will involve replacing the subscription with a new one which holds the new level. |
| 118 | +It is expected that the implementation will simply be "set a new subscription", which will replace the old subscription. |
| 119 | + |
| 120 | +This assumes that the application will subscribe with the actual level needed by that subscriber - ie, that apps will not |
| 121 | +simply subscribe to the highest log level and perform additional filtering after the log event has been dispatched. |
| 122 | +This means that the "performance critical path" for this process is determining if we have a subscriber; |
| 123 | +once we have determined a subscriber matches an event, we can perform relatively expensive operations |
| 124 | +on the event because we assume action will be taken on the event. |
| 125 | + |
| 126 | +An example of these "relatively expensive" operations is fetching event "fields", such as the message or other meta-data. |
| 127 | + |
| 128 | +You will note that this is generic enough to handle traditional "log" messages, our error-reporting logs, and more structured logging. |
| 129 | +The shape of the structs etc (eg, try and use one struct for all use-cases, one struct per use-case, etc) is intentionally omitted to avoid bike-shedding on those details in the first instance. |
| 130 | + |
| 131 | +### Replace all existing "subscribers" |
| 132 | + |
| 133 | +There are 3 main places which would change in the first instance, broken down by platform |
| 134 | + |
| 135 | +#### Mobile |
| 136 | + |
| 137 | +The existing [rust_log_forwarder](https://github.com/mozilla/application-services/tree/main/components/support/rust-log-forwarder) and [error-support](https://github.com/mozilla/application-services/tree/main/components/support/error) crates would change how they collect "events" to using `tracing-subscriber`. Initially we'd not change the API exposed to the mobile apps, but very soon after declaring success on Desktop we'd look to consolidate the APIs as seen by the applications. |
| 138 | + |
| 139 | +#### Desktop |
| 140 | + |
| 141 | +Desktop has a [hand-written xpcom-based log adaptor](https://searchfox.org/mozilla-central/source/services/sync/golden_gate/src/log.rs#119-120). This would be removed entirely and a uniffi-based callback mechanism is used. Rust code calling back into Javascript has the same semantics as `golden_gate` - the log calls are "fire and forget", ending up in the main thread automatically. |
| 142 | + |
| 143 | +The [`gecko-logger`](https://searchfox.org/mozilla-central/source/xpcom/rust/gecko_logger/src/lib.rs) crate would change: |
| 144 | +* All application-services log-related code would be removed entirely (eg, [here](https://searchfox.org/mozilla-central/source/services/interfaces/mozIAppServicesLogger.idl) and [here](https://searchfox.org/mozilla-central/source/services/common/app_services_logger)) - |
| 145 | +app-services would not rely on `log` at all in this world. |
| 146 | +* `gecko-logger` (or a similar crate next to it) would grow support for owning the single tracing-subscriber. It would be responsible for adding a single app-services owned `tracing_subscriber::Layer` instance to the single subscriber. |
| 147 | + |
| 148 | +The [app-services-logger](https://searchfox.org/mozilla-central/source/services/common/app_services_logger/src/lib.rs) would lose all xpcom-related code and instead lean on uniffi and tracing-subscriber. |
| 149 | + |
| 150 | +#### Application Services |
| 151 | + |
| 152 | +* All crates move to `tracing` instead of `logging` |
| 153 | + |
| 154 | +* A new crate would be added which defines the application callback interfaces (via UniFFI) and |
| 155 | + also the new tracing-subscriber implementation. |
| 156 | + |
| 157 | +* The crates `rust-log-forwarder` and `error-reporter` crates would keep their external interface |
| 158 | + but would have their internal implementation replaced with the subscriber. This is for backwards |
| 159 | + compatibility with mobile - eventually we'd expose the new callback interfaces to mobile and delete |
| 160 | + these crates entirely. |
| 161 | + |
| 162 | +#### Mobile |
| 163 | + |
| 164 | +Mobile would not require changes in the first version, although we'd move to the new mechanism within |
| 165 | +a month or 2, at which time we can kill the 2 "legacy" app-services crates. |
0 commit comments