Skip to content

Commit 1f7c192

Browse files
committed
Add adr-8: Replace logging and error reporting infrastructure with tracing.
1 parent 815d73c commit 1f7c192

File tree

2 files changed

+139
-0
lines changed

2 files changed

+139
-0
lines changed

docs/adr/0008-use-tracing.md

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
# Replace logging and error reporting infrastructure with tracing.
2+
3+
* Status: proposed
4+
* Deciders:
5+
* AppServices team: ..
6+
* moz-central releng/build teams: ?
7+
* Date: Mar 2025
8+
* Feedback deadline: ?
9+
10+
## Context and Problem Statement
11+
12+
Rust code in [application-services](https://github.com/mozilla/application-services/) needs support for diagnostics.
13+
Specifically, we need support for logging and for error reporting.
14+
15+
### Logging
16+
17+
Our components must be able to send logging to the browser - all platforms capture some logging.
18+
19+
We currently use the `log` crate and wire this up best we can.
20+
21+
### Error Reporting
22+
There's an [error support crate](https://github.com/mozilla/application-services/tree/main/components/support/error),
23+
designed explicitly for error reporting for the applications.
24+
Android and iOS both hook into this to report errors to Sentry, whereas Desktop has not yet implemented this functionality, but could probably benefit if it did.
25+
26+
### Problems with the current approaches.
27+
28+
#### Problems using `log`
29+
30+
The main problem with the `log` module is the concept of a [global "max level"](https://docs.rs/cli-log/latest/cli_log/fn.set_max_level.html)
31+
that's set for all crates.
32+
Gecko sets the global max level to `Info` - any more verbose causes [performance regressions in other crates](https://bugzilla.mozilla.org/show_bug.cgi?id=1874215).
33+
34+
In practice, this means that we are unable to get debug logs for individual components.
35+
36+
#### Problems with the error reporter
37+
38+
None for mobile - but is not implemented at all for Desktop.
39+
40+
So while there's actually no real problem here, there is an opportunity to better align the
41+
logging and error-reporting requirements into a single facility while introducing this capability
42+
to desktop.
43+
44+
## The Rust tracing crate.
45+
46+
An alternative to the `log` crate is the [`tracing`](https://docs.rs/tracing/latest/tracing/) crate,
47+
which comes from the tokio project.
48+
49+
`tracing` has a very similar API to the `log` crate - `log::trace!` becomes `tracing::trace!` etc.
50+
It has richer semantics than `log` (eg, async support) and largely acts a replacement -
51+
it supports the `RUST_LOG` environment variable and writes to `stdout`,
52+
so developers who are running tests and our CI etc should notice no difference.
53+
54+
Importantly, it has a [`tracing-subscriber`](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/)
55+
mechanism that allows efficient, more targettted filtering instead of a global `max_level`.
56+
Each subscriber is able to filter log events without impacting crates they aren't subscribed to.
57+
58+
This means that it should be viable to have some crates capture `trace!()` output without impacting any other crates
59+
or the overall performance of the application.
60+
61+
## Proposal: Move to tracing for all app-services crates.
62+
63+
This document proposes that:
64+
65+
* All app-services crates move to using `tracing` instead of `log`.
66+
67+
* All exiting handling of log events be replaced with a new mechanism based on
68+
[`tracing-subscriber`](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/)
69+
to move logs and error-reports across the FFI.
70+
71+
* We use the same mechanism for error reporting, leveraging the richer metadata offered by tracing.
72+
73+
There are ways to make both `log` and `tracing` work, but we should avoid that if we can, just lean into tracing.
74+
75+
## Implementation: Move to tracing for all app-services crates.
76+
77+
This section describes the changes necessary to implement the above.
78+
79+
### Move all our crates to `tracing`
80+
81+
Most crates use `log::debug!()`/`log::error!()`, which changes to `tracing::...` (do we change them to just `debug!()`?)
82+
83+
Many tests start with `env_logger::try_init().unwrap()` or similar - we'll have a test helper.
84+
85+
### Implement a `tracing_subscriber::Layer`-based "subscriber" mechanism.
86+
87+
This subscriber mechanism requires the application to know all tracing `targets` it cares about.
88+
Each crate is its own target and the app must explicitly "subscribe" to all targets individually.
89+
It will *not* be possible to subscribe to all targets and it's unlikely we'll allow
90+
any "matching" capabilities (eg, regular expressions or similar) - each target will use an exact string match.
91+
92+
This requires our applications to configure their own subscriptions to each `target` with the level for that target,
93+
making it possible to avoid a single, global max-level.
94+
95+
We'll implement this subscriber with a simple `HashMap` mapping the target name to a level.
96+
Once we have determined a subscriber matches an event, we can perform relatively expensive operations
97+
on the event because we assume action will be taken on the event.
98+
This assumption seems fine because we own all the subscribers.
99+
100+
An example of these "relatively expensive" operations is fetching event "fields", such as the message or other meta-data,
101+
and using them to format a string, and dispatching the end result to the underlying logging
102+
system.
103+
104+
Note that this is generic enough to handle traditional "log" messages and our error reporting requirements. It's a general event reporting system.
105+
106+
[There's a WIP for all the above here](https://github.com/mozilla/application-services/compare/main...mhammond:application-services:log-to-tracing)
107+
108+
### Replace all existing "subscribers"
109+
110+
There are 3 main places which would change in the first instance, broken down by platform
111+
112+
#### Mobile
113+
114+
[A WIP for this is also included here](https://github.com/mozilla/application-services/compare/main...mhammond:application-services:log-to-tracing)
115+
116+
#### Desktop
117+
118+
Desktop has a [hand-written xpcom-based log adaptor](https://searchfox.org/mozilla-central/source/services/sync/golden_gate/src/log.rs#119-120). This would be removed entirely and a uniffi-based callback mechanism is used. Rust code calling back into Javascript has the same semantics as `golden_gate` - the log calls are "fire and forget", ending up in the main thread automatically.
119+
120+
The [`gecko-logger`](https://searchfox.org/mozilla-central/source/xpcom/rust/gecko_logger/src/lib.rs) crate would change:
121+
* All application-services log-related code would be removed entirely (eg, [here](https://searchfox.org/mozilla-central/source/services/interfaces/mozIAppServicesLogger.idl) and [here](https://searchfox.org/mozilla-central/source/services/common/app_services_logger)) -
122+
app-services would not rely on `log` at all in this world.
123+
* `gecko-logger` (or a similar crate next to it) would grow support for owning the single tracing-subscriber. It would be responsible for adding a single app-services owned `tracing_subscriber::Layer` instance to the single subscriber.
124+
125+
The [app-services-logger](https://searchfox.org/mozilla-central/source/services/common/app_services_logger/src/lib.rs) would lose all xpcom-related code and instead lean on uniffi and tracing-subscriber.
126+
127+
#### Application Services
128+
129+
* All crates move to `tracing` instead of `log`
130+
131+
* A new crate would be added which defines the application callback interfaces (via UniFFI) and
132+
also the new tracing-subscriber implementation.
133+
134+
* The crates `rust-log-forwarder` and `error-reporter` crates would keep their external interface
135+
but would have their internal implementation replaced with the subscriber. This is for backwards
136+
compatibility with mobile - eventually we'd expose the new callback interfaces to mobile and delete
137+
these crates entirely.

docs/adr/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ This log lists the architectural decisions for MADR.
1111
* [ADR-0004](0004-early-startup-experiments.md) - Running experiments on first run early startup
1212
* [ADR-0005](0005-remote-settings-client.md) - A remote-settings client for our mobile browsers.
1313
* [ADR-0007](0007-limit-visits-migration-to-10000.md) - Limit Visits Migrated to Places History in Firefox iOS
14+
* [ADR-0008](0008-use-tracing.md) - Replace logging and error reporting infrastructure with tracing.
15+
1416

1517
<!-- adrlogstop -->
1618

0 commit comments

Comments
 (0)