Start reporting a few process metrics, like CPU and memory usage for backend platforms #49178
Replies: 3 comments
-
The Profiling team started collecting CPU and memory metrics on Android and iOS as part of the profile payload. The schema for "measurements" (time series metrics that are collected over the timeframe of a profile) is defined here: https://github.com/getsentry/relay/blob/fcbd996ace227a9fe9c69b736db91482e5e178d5/relay-profiling/src/measurements.rs#L6. The MDX team has also started working on a proof-of-concept to display the CPU & memory metrics for Android in the transaction details view: #46532 We are interested in an effort to expand support for this across other platforms. |
Beta Was this translation helpful? Give feedback.
-
I'd love to see some minimum memory profiling (minimal, in that it probably wouldn't be much more than simple quantities in order to not have a performance impact) and maybe some thresholds in order to trigger warnings. |
Beta Was this translation helpful? Give feedback.
-
Taking one SDK as an example, Laravel, there are, for example, quite a few metrics, Laravel developers are able to collect via, let's say, Pulse:
|
Beta Was this translation helpful? Give feedback.
-
As we look into extending the capabilities of our SDKs, I would like to collect some feedback on the topic of process metrics.
Thesis
Metrics like memory-, or CPU usage can provide insights into the overall health of an application.
How could we send such metrics to Sentry?
Each backend platform provides some API to collect such metrics from the userland.
These metrics don't directly relate to traces or transactions, and it would also provide little value to try to track them in the context of a trace.
To avoid creating a dedicated metrics ingest, we could use transactions to report them by adding them to spans. The data volume and performance impact would be neglectable. Especially if we choose to apply some frequency at which the collection happens.
What would we do with this data?
Relay would need to extract these metrics to make them available. It would need more ideation to see how we could embed them into our performance product and detect anomalies automatically.
Caveats
If we choose the span transport, metrics can only be reported when spans are created / the application is under load.
This means that for apps with meager traffic, creating a continuous time series would be impossible.
Request for feedback
As I said, we are currently discussing this idea and would love your feedback.
Beta Was this translation helpful? Give feedback.
All reactions