Skip to content

Commit f796798

Browse files
committed
Add adr-8: Replace logging and error reporting infrastructure with tracing.
1 parent 47ef7c8 commit f796798

File tree

2 files changed

+167
-0
lines changed

2 files changed

+167
-0
lines changed

docs/adr/0008-use-tracing.md

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
# Replace logging and error reporting infrastructure with tracing.
2+
3+
* Status: proposed
4+
* Deciders:
5+
* AppServices team: ?
6+
* Firefox Android: ?
7+
* Firefox iOS: ?
8+
* Firefox Desktop: ?
9+
10+
* Date: Mar 06, 2024
11+
* Feedback deadline: Mar 20, 2024
12+
13+
## Context and Problem Statement
14+
15+
Currently all Rust code in [app-services](https://github.com/mozilla/application-services/)
16+
uses the `log` crate for debugging and diagnostics.
17+
Log output is often captured during tests where it's just printed to stdout/stderr if the test
18+
fails. Further, both our mobile apps and Firefox Desktop have techniques for having this log
19+
output be sent across the FFI so it can be captured in whatever "native" logging system is
20+
in place (eg, logcat on Android, custom file logging on iOS, Log.sys.mjs/Console.sys.mjs logging on Desktop)
21+
22+
application-services also has an [error support crate](https://github.com/mozilla/application-services/tree/main/components/support/error),
23+
designed explicitly for error reporting for the applications.
24+
Android and iOS both hook into this to report errors to Sentry, whereas Desktop has not yet implemented this functionality.
25+
While the error support crate has capabilities beyond just moving the error across the FFI, this capability is the focus of this document,
26+
and any changes proposed to this crate by this document are limited to just this.
27+
28+
### Problems with the current approaches.
29+
30+
#### Problems using `log`
31+
32+
The main problem with the `log` module is a design choice made by that crate,
33+
which is the concept of a [global "max level"](https://docs.rs/cli-log/latest/cli_log/fn.set_max_level.html)
34+
35+
In Gecko, simply setting this max level to, say, `Trace` can cause performance regressions in other
36+
crates - eg, [see this desktop bug](https://bugzilla.mozilla.org/show_bug.cgi?id=1874215) for one example.
37+
38+
In practice, this means that we are unable to selectively get `Debug` or `Trace` logs for individual components,
39+
because the max level for the logger is, effectively, fixed at `Info`.
40+
41+
### Problems with the error reporter
42+
43+
The error reporter actually works OK for mobile, but is not implemented at all for Desktop.
44+
However, the existing implementation is quite ad-hoc and not very "rich" in terms of data which
45+
can be sent to the error reporter.
46+
47+
So while there's actually no real problem here, there is an opportunity to better align the
48+
logging and error-reporting requirements into a single facility before introducing this capability
49+
to desktop.
50+
51+
## The Rust tracing crate.
52+
53+
An alternative to the `log` crate is the [`tracing`](https://docs.rs/tracing/latest/tracing/) crate,
54+
which comes from the tokio project.
55+
This document is not going to rehash the excellent existing documentation, but some key features
56+
include:
57+
58+
* For simple logging it has a very similar API to the `log` crate. Almost all existing use of `log` in non
59+
test code can be upgrade to tracing simply by replacing `log::trace!` with `tracing::trace!` etc.
60+
* Richer logging semantics mean that it's quite easy and efficient to attach additional data to each log entry.
61+
ie, more than simple strings can be logged.
62+
* A [`tracing-subscriber`](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/) mechanism that allows
63+
efficient filtering etc of log entries without enforcing a global `max_level`.
64+
Each subscriber is able to efficiently filter log events without impacting other crates.
65+
66+
This means that it should be viable to have some crates capture `Trace` output without impacting any other crates
67+
or the overall performance of the application..
68+
69+
## Proposal: Move to tracing for all app-services crates.
70+
71+
This document proposes that:
72+
73+
* All app-services crates move to using `tracing` instead of `log`.
74+
There is a `tracing_log` crate which should mean we don't need to move all crates, but using this
75+
crate does have some limitations, and given the move to `tracing` is largely mechanical we should probably
76+
avoid it if we can - but it's a reasonable fallback if we need log output from crates we don't control.
77+
78+
* All techniques we have for reporting log events and error reports to the application should be replaced with
79+
a new mechanism based on [`tracing-subscriber`](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/)
80+
to move logs and error-reports across the FFI.
81+
82+
* This same mechanism, and the same raw data, should be used on all platforms (ie, Android, iOS and Desktop).
83+
84+
## Implementation: Move to tracing for all app-services crates.
85+
86+
This section describes the changes necessary to implement the above.
87+
88+
### Move all our crates to `tracing`
89+
90+
Most crates tend to use `log::debug!()` (or `trace!`, `error!` etc). We'd just change the crate to depend on `tracing` instead of `log`,
91+
and probably change each module to have as a prelude `use tracing::{trace, debug, info, error};` and change each `log::error!` to `error!`.
92+
Or maybe we just spell it out entirely as `tracing::log!(...)` - this is a choice for individual crate maintainers.
93+
94+
Many tests start with `env_logger::try_init().unwrap()` or similar - these would all be replaced with the more-verbose but equivalent
95+
96+
```
97+
tracing_subscriber::registry()
98+
.with(fmt::layer())
99+
.with(EnvFilter::from_default_env())
100+
.init();
101+
```
102+
103+
(probably via a helper of some sort)
104+
105+
### Implement a `tracing_subscriber::Layer`-based "subscriber" mechanism.
106+
107+
This subscriber mechanism would require the application to know all tracing `targets` it cares about - and by default,
108+
each crate is its own target. The app must explicitly "subscribe" to all such targets individually.
109+
It will *not* be possible to subscribe to all events for performance reasons, and it's unlikely we'll allow
110+
any "matching" capabilities (eg, regular expressions or similar) - each target will use an exact string match.
111+
112+
When subscribing to a target, the app will also need to specify the max-level for that target. In other words, a
113+
subscription can be considered a tuple of `(target, level)`.
114+
115+
This implies that checking whether an individual tracing event has a subscriber will be a `HashMap` lookup for the target name,
116+
then a numeric comparison for the level. Each target will allow only one subscriber, identified by the `target`.
117+
This means that changing the level for an existing subscriber will involve replacing the subscription with a new one which holds the new level.
118+
It is expected that the implementation will simply be "set a new subscription", which will replace the old subscription.
119+
120+
This assumes that the application will subscribe with the actual level needed by that subscriber - ie, that apps will not
121+
simply subscribe to the highest log level and perform additional filtering after the log event has been dispatched.
122+
This means that the "performance critical path" for this process is determining if we have a subscriber;
123+
once we have determined a subscriber matches an event, we can perform relatively expensive operations
124+
on the event because we assume action will be taken on the event.
125+
126+
An example of these "relatively expensive" operations is fetching event "fields", such as the message or other meta-data.
127+
128+
You will note that this is generic enough to handle traditional "log" messages, our error-reporting logs, and more structured logging.
129+
The shape of the structs etc (eg, try and use one struct for all use-cases, one struct per use-case, etc) is intentionally omitted to avoid bike-shedding on those details in the first instance.
130+
131+
### Replace all existing "subscribers"
132+
133+
There are 3 main places which would change in the first instance, broken down by platform
134+
135+
#### Mobile
136+
137+
The existing [rust_log_forwarder](https://github.com/mozilla/application-services/tree/main/components/support/rust-log-forwarder) and [error-support](https://github.com/mozilla/application-services/tree/main/components/support/error) crates would change how they collect "events" to using `tracing-subscriber`. Initially we'd not change the API exposed to the mobile apps, but very soon after declaring success on Desktop we'd look to consolidate the APIs as seen by the applications.
138+
139+
#### Desktop
140+
141+
Desktop has a [hand-written xpcom-based log adaptor](https://searchfox.org/mozilla-central/source/services/sync/golden_gate/src/log.rs#119-120). This would be removed entirely and a uniffi-based callback mechanism is used. Rust code calling back into Javascript has the same semantics as `golden_gate` - the log calls are "fire and forget", ending up in the main thread automatically.
142+
143+
The [`gecko-logger`](https://searchfox.org/mozilla-central/source/xpcom/rust/gecko_logger/src/lib.rs) crate would change:
144+
* All application-services log-related code would be removed entirely (eg, [here](https://searchfox.org/mozilla-central/source/services/interfaces/mozIAppServicesLogger.idl) and [here](https://searchfox.org/mozilla-central/source/services/common/app_services_logger)) -
145+
app-services would not rely on `log` at all in this world.
146+
* `gecko-logger` (or a similar crate next to it) would grow support for owning the single tracing-subscriber. It would be responsible for adding a single app-services owned `tracing_subscriber::Layer` instance to the single subscriber.
147+
148+
The [app-services-logger](https://searchfox.org/mozilla-central/source/services/common/app_services_logger/src/lib.rs) would lose all xpcom-related code and instead lean on uniffi and tracing-subscriber.
149+
150+
#### Application Services
151+
152+
* All crates move to `tracing` instead of `logging`
153+
154+
* A new crate would be added which defines the application callback interfaces (via UniFFI) and
155+
also the new tracing-subscriber implementation.
156+
157+
* The crates `rust-log-forwarder` and `error-reporter` crates would keep their external interface
158+
but would have their internal implementation replaced with the subscriber. This is for backwards
159+
compatibility with mobile - eventually we'd expose the new callback interfaces to mobile and delete
160+
these crates entirely.
161+
162+
#### Mobile
163+
164+
Mobile would not require changes in the first version, although we'd move to the new mechanism within
165+
a month or 2, at which time we can kill the 2 "legacy" app-services crates.

docs/adr/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ This log lists the architectural decisions for MADR.
1111
* [ADR-0004](0004-early-startup-experiments.md) - Running experiments on first run early startup
1212
* [ADR-0005](0005-remote-settings-client.md) - A remote-settings client for our mobile browsers.
1313
* [ADR-0007](0007-limit-visits-migration-to-10000.md) - Limit Visits Migrated to Places History in Firefox iOS
14+
* [ADR-0008](0008-use-tracing.md) - Replace logging and error reporting infrastructure with tracing.
15+
1416

1517
<!-- adrlogstop -->
1618

0 commit comments

Comments
 (0)