Sentry Developer Metrics - What should that look like? 💡 #58584
Replies: 9 comments 19 replies
-
Lines of code, thanks. |
Beta Was this translation helpful? Give feedback.
-
There are a few global features I tend to want in metrics, for both front- and back-end:
Drilling down, I think there are two major contexts: (1) something should not be changing, most importantly when it is changing anyway; and (2) something should be changing and I'm observing overall impacts. (1) is the context of operations and incident response. Things that matter are the ability to monitor and alert to surface anomalies, the ability to tie context like metrics/spans/traces/logs/source code/releases/feature flag state together, the ability to identify patterns in errors (one of my favorite of Sentry's capabilities!), and the ability to exclude extraneous information. If something like overall response time—which I would prefer to measure both from the client and the server, but often have to settle for server—is going up, my first priority is determining what's common about the contributing measurements, to assess impact/severity, and then to identify areas to focus for remediation. (2) is the context of deployment and release, especially with CI/CD and feature flags. When I deploy or adjust a dial to control a release, I want to be able to see the impacts of that. What percentage of traffic is going down various branches? How is latency responding for each branch? What error/log patterns are new? |
Beta Was this translation helpful? Give feedback.
-
I work with a few relatively lightweight monoliths (backend-heavy, just dumps HTML or JSON, there are no heavyweight clients or distributed parts in motion) that usually run within 10s of RPS and very much starting my adventure with metrics on any meaningful scale, so most of this probably isn't useful. Buuut, if you wanna read this...When I have metrics, it's usually Grafana + Prometheus aimed at node_exporter for the VMs (CPU/RAM/IO/disk space/etc), some generic framework-specific plugins (request/application server metrics) and one more custom endpoint that gets polled frequently and runs some critical "business" metrics (this most often dumps object counts across different dimensions). I don't think Sentry should do all that, unless you basically reinvent the two solutions above, and frankly there doesn't seem to be a lot of value in that. When I'm writing a new endpoint or refactoring something existing, sometimes I look at a query or an API client call and I'm like "hmm, this looks sus, how long does that take anyway" - Sentry has some integrated spans and what not, but it'd be great if I could just point at a line of code (ideally, directly in the editor) and ask what metrics can you tell me about that. More often than not there's a bunch of operations I wish I had added spans around to measure, but noticed only after they became issues - I can't go back in time to figure that out, but Sentry has sampled profiles, so I could ask "hey, how long does it take on average for clicky this function to run under what conditions from the past profiles" -> let's say p95 is 0.65ms when it's called from ViewUser (okay), 0.14ms from ViewGroup (that's fine), 792ms from ViewDocuments (oh dear). I can do this today, but only if I instrument interesting things ahead of time, or wade through sampled profiles manually. The views mentioned above are pretty obvious, but less obvious would be "extreme users" - when your average user has around 100 documents but one power user comes around with hundreds of thousands and your massive tzar bomba query with 10 JOINs inside brings the poor database to its knees. It'd be cool if something could tell me there's something going on in here, but suggest adding some sort of a wrapper that could collect more specific context (sampled arguments? sizes of input/output collections?) - when a crash occurs, a lot of this sort of stuff gets collected, but not when things just get unusually slow. Also, integrating metrics into issues. If an issue occurs under a specific view, can you point me at recent profiles and collected metrics to see how it behaves for successful/failed cases? In breadcrumbs, how long do queries take, and how much data did they return? Can I go from breadcrumb metrics into a graph of whatever-you-can-show-me? Can I have Sentry itself bump the sampling rate on a given code path for more detailed data collection if a completely new issue starts rarely occurring, without my intervention? I usually don't know what I'm looking for, just that there are hints in whatever data is collected around here. A lot of this isn't very organized and probably doesn't belong to a "metrics" solution, but that's what I'm thinking about here. |
Beta Was this translation helpful? Give feedback.
-
I would like to track the usage of external integrations like CloudConvert etc. I use Laravel for all my applications so being able to up a number in Sentry would be really cool. |
Beta Was this translation helpful? Give feedback.
-
One thing to keep in mind is: Which metrics do we need and how to use them properly and usefully.
That brings us to some criteria:
But there are risks:
So, what metrics would be useful?
|
Beta Was this translation helpful? Give feedback.
-
Having watched the preview of this feature in the launch week, I think it would be very useful to allow users to define alerts/notifications for when something happens to a metric. For example, if I have a numeric metric that I would not want to go under certain value, but it does, I would like Sentry to notify me as it does with exceptions. It would be even better if Sentry could detect changes in metrics' usual patterns and alert you about it, like in AWS CloudWatch. |
Beta Was this translation helpful? Give feedback.
-
In addition to what you've described here (metrics closely correlated w/ other other event details to be able to holistically diagnose what's happening), something I'd be very interested in is just simple metrics that we can send from ~100% of requests. For example, it would be very nice to get an accurate idea of throughput and transaction durations and I'd love to send that data from 100% of requests, while still sampling at just a couple percent for full transactions or profile data. It's not possible to currently do that in any form in the product, is it? |
Beta Was this translation helpful? Give feedback.
-
I'd also like to be able to fully work with my own metrics that I can conveniently send to Sentry and work with them as Sentry metrics. Something like how it works with Sentry.sendMetrics({
customMetric1: {
value: 1,
util: "second"
},
...more
}); |
Beta Was this translation helpful? Give feedback.
-
I've been looking at implementing DDM as an experimental feature in the .NET SDK. At first glance, this looks really similar to OpenTelemetry Metrics. Has any evaluation already been done on whether it's better to build this out as a proprietary Sentry thing or to build on top of OpenTelemetry Metrics? I did some (very preliminary) head scratching about how OpenTelemetry might be used to implement this in the .NET SDK (see the description on getsentry/sentry-dotnet#2880) but didn't want to burn too many calories on this if this was already evaluated as part of the pilot implementation for Python or the PHP/Javascript work that's already underway. |
Beta Was this translation helpful? Give feedback.
-
Hi, everyone!
I’m Alexandra from the product team here at Sentry, and I’m very excited to say that we are building an application metrics solution for developers. We believe many metrics tools are out there, but they are often not created for developers or with their specific needs in mind, and we have some cool ideas to build something better.
Here’s the core premise: generic metrics are not ideal for developers, and context/correlation is key.
Every day, we look at metrics at Sentry for our use, but sometimes, they leave us hanging. We need context and connected signals, like traces, spans, or releases, to find the root cause and solve a problem. We also found that it is needed throughout the stack, from the backend to the frontend to a mobile app. Today, we use many different tools and jump between many of them to find the issue.
We believe this to be an issue you, our users, are facing as well.
This is how we imagine it to work:
It could look like this completely made-up mockup:
Are we on the right track?
As we explore this new venture, your input is invaluable. We would love to understand better what you’re currently using, what challenges you face, and how Sentry could help you solve them.
Here are a few questions to get us started:
Thanks for taking the time to contribute. We’re very excited to hear your ideas!
P.S. We’re aiming to start an early access phase soon, so stay tuned for more news to come 🚀
This is also a living document. Subscribe for updates as we continue our work.
Beta Was this translation helpful? Give feedback.
All reactions