Skip to content

Commit c971199

Browse files
andrewlockbouwkast
andauthored
Create System.Diagnostics.Metrics-based runtime metrics listener (#8027)
## Summary of changes Creates a .NET 6+ only implementation of `IRuntimeMetricsListener` that uses the `System.Diagnostics.Metrics` (and other) APIs ## Reason for change .NET Core (probably all versions, but at least .NET 6+) has a memory leak with the event pipes, which means if we enable runtime metrics, we likely have a slow memory leak 😬 [This was raised ~1 year ago with .NET team](dotnet/runtime#111368), specifically citing dd-trace-dotnet. but doesn't have a fix yet. Also a PR has been open on the .NET repo with a tentative fix for ~2 months, so _at best_ this _might_ be fixed in .NET 11. Separately, the `System.Diagnostics.Metrics` APIs were introduced in .NET 6, with support for aspnetcore-based metrics added in .NET 8, and support for "runtime" metrics in .NET 9. This PR introduces a new (experimental for now) `IRuntimeMetricsListener` implementation that doesn't use `EventListener`, and instead uses the `System.Diagnostics.Metrics` APIs, aiming to provide essentially the same runtime metrics we currently do, just using a different source. ## Implementation details - Created a new `IRuntimeMetricsListener` implementation, `DiagnosticsMetricsRuntimeMetricsListener` - Added a config to enable it in .NET 6+ only, `DD_RUNTIME_METRICS_DIAGNOSTICS_METRICS_API_ENABLED` - Open to suggestions here. Other options include having an "enum" type for listener instead of just this one. That's harder to consume for customers, but more extensible theoretically. - Added tests To give as wide compatibility as possible, and to avoid any additional overhead, whenever the built-in runtime metrics use existing APIs (e.g. via `GC` calls), we use those instead of the metrics. In summary: Thread metrics: - `runtime.dotnet.threads.workers_count`: via `ThreadPool.ThreadCount` (same as `RuntimeEventListener`) - `runtime.dotnet.threads.contention_count: via `Monitor.LockContentionCount` GC metrics: - `runtime.dotnet.gc.size.gen#` from info in `GC.GetGCMemoryInfo()`, which mirrors [the built-in approach](https://github.com/dotnet/runtime/blob/v10.0.1/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/RuntimeMetrics.cs#L185). - `runtime.dotnet.gc.memory_load` was a tricky one as the built-in uses a new API, but I think the info we get in `GC.GetGCMemoryInfo()` is broadly good enough - `runtime.dotnet.gc.count.gen#` uses `GC.CollectionCount()`, same as [built-in approach](https://github.com/dotnet/runtime/blob/v10.0.1/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/RuntimeMetrics.cs#L159) - `runtime.dotnet.gc.pause_time` this was also a tricky one, more on it below... `runtime.dotnet.gc.pause_time` is a runtime metric that's available in .NET 9, so when we're running in .NET 9 we just use that value. There's actually also a public API introduce in .NET 6, `GetTotalPauseDuration()`, but [it's _only_ available from 6.0.21](dotnet/runtime#87143), so we can't directly reference it. Resorted to using a simple `CreateDelegate` call to invoke it in these cases. We could use duck typing, but didn't seem worth it. If we're running < 6.0.21, there's no feasible way to get the value, so we just don't emit it. ASP.NET Core metrics: - `runtime.dotnet.aspnetcore.requests.current` - `runtime.dotnet.aspnetcore.requests.failed` - `runtime.dotnet.aspnetcore.requests.total` - `runtime.dotnet.aspnetcore.requests.queue_length` - `runtime.dotnet.aspnetcore.connections.current` - `runtime.dotnet.aspnetcore.connections.queue_length` - `runtime.dotnet.aspnetcore.connections.total` Note that the `.total` and `.failed` requests are recorded as _gauges_ (which monotonically increase), which doesn't feel right to me (they should be counters, surely), but that's what `RuntimeEventListener` is using, so we have to stick to the same thing (metric types are global by metric, so we can't change it). It means there's a risk of overflow there, but that's already the case for `RuntimeEventListener` so I guess we just ignore it 🤷‍♂️ I couldn't find a way to get the following metrics at all without using `EventListener`: - `runtime.dotnet.threads.contention_time` ## Test coverage Added unit and integration tests for the listener behavior. I also manually ran an aspnetcore app in a loop with both the `RuntimeEventListener` and the new listener producing metrics (hacked in, we wont ever do this in "normal" execution), and did a manual comparison of the metrics. Overall, the values were in broad agreement (slightly off, due to skew in sampling time) and helped identify some cases where I'd made incorrect assumptions (e.g. aspnetcore `.total` metrics are never "reset" to 0. ## Other details Relates to: - #5862 (comment) - dotnet/runtime#111368 - dotnet/runtime#118415 - https://datadoghq.atlassian.net/browse/LANGPLAT-916 --------- Co-authored-by: Steven Bouwkamp <steven.bouwkamp@datadoghq.com>
1 parent 243f2c8 commit c971199

File tree

17 files changed

+673
-15
lines changed

17 files changed

+673
-15
lines changed

tracer/src/Datadog.Trace.Trimming/build/Datadog.Trace.Trimming.xml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -614,6 +614,7 @@
614614
<type fullname="System.Func`5" />
615615
<type fullname="System.Func`6" />
616616
<type fullname="System.GC" />
617+
<type fullname="System.GCGenerationInfo" />
617618
<type fullname="System.GCMemoryInfo" />
618619
<type fullname="System.Globalization.CultureInfo" />
619620
<type fullname="System.Globalization.DateTimeStyles" />

tracer/src/Datadog.Trace/Configuration/TracerSettings.cs

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,16 @@ not null when string.Equals(value, "otlp", StringComparison.OrdinalIgnoreCase) =
179179

180180
RuntimeMetricsEnabled = runtimeMetricsEnabledResult.WithDefault(false);
181181

182+
RuntimeMetricsDiagnosticsMetricsApiEnabled = config.WithKeys(ConfigurationKeys.RuntimeMetricsDiagnosticsMetricsApiEnabled).AsBool(false);
183+
184+
#if !NET6_0_OR_GREATER
185+
if (RuntimeMetricsEnabled && RuntimeMetricsDiagnosticsMetricsApiEnabled)
186+
{
187+
Log.Warning(
188+
$"{ConfigurationKeys.RuntimeMetricsDiagnosticsMetricsApiEnabled} was enabled, but System.Diagnostics.Metrics is only available on .NET 6+. Using standard runtime metrics collector.");
189+
telemetry.Record(ConfigurationKeys.RuntimeMetricsDiagnosticsMetricsApiEnabled, false, ConfigurationOrigins.Calculated);
190+
}
191+
#endif
182192
OtelMetricExportIntervalMs = config
183193
.WithKeys(ConfigurationKeys.OpenTelemetry.MetricExportIntervalMs)
184194
.AsInt32(defaultValue: 10_000);
@@ -1053,6 +1063,15 @@ not null when string.Equals(value, "otlp", StringComparison.OrdinalIgnoreCase) =
10531063
/// </summary>
10541064
internal bool RuntimeMetricsEnabled { get; }
10551065

1066+
/// <summary>
1067+
/// Gets a value indicating whether the experimental runtime metrics collector which uses the
1068+
/// <a href="https://learn.microsoft.com/en-us/dotnet/core/diagnostics/metrics">System.Diagnostics.Metrics</a> API.
1069+
/// This collector can only be enabled when using .NET 6+, and will only include ASP.NET Core metrics
1070+
/// when using .NET 8+.
1071+
/// </summary>
1072+
/// <seealso cref="ConfigurationKeys.RuntimeMetricsDiagnosticsMetricsApiEnabled"/>
1073+
internal bool RuntimeMetricsDiagnosticsMetricsApiEnabled { get; }
1074+
10561075
/// <summary>
10571076
/// Gets a value indicating whether libdatadog data pipeline
10581077
/// is enabled.

tracer/src/Datadog.Trace/Configuration/supported-configurations-docs.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -555,6 +555,13 @@ DD_RUNTIME_METRICS_ENABLED: |
555555
Configuration key for enabling or disabling runtime metrics sent to DogStatsD.
556556
Default value is <c>false</c> (disabled).
557557
558+
DD_RUNTIME_METRICS_DIAGNOSTICS_METRICS_API_ENABLED: |
559+
Enables an experimental runtime metrics collector which uses the
560+
<a href="https://learn.microsoft.com/en-us/dotnet/core/diagnostics/metrics">System.Diagnostics.Metrics</a> API.
561+
This collector can only be enabled when using .NET 6+, and will only include ASP.NET Core metrics
562+
when using .NET 8+.
563+
Default value is <c>false</c> (disabled).
564+
558565
DD_SERVICE: |
559566
Configuration key for the application's default service name.
560567
Used as the service name for top-level spans,

tracer/src/Datadog.Trace/Configuration/supported-configurations.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -868,6 +868,11 @@
868868
"A"
869869
]
870870
},
871+
"DD_RUNTIME_METRICS_DIAGNOSTICS_METRICS_API_ENABLED": {
872+
"version": [
873+
"A"
874+
]
875+
},
871876
"DD_SERVICE": {
872877
"version": [
873878
"A"

tracer/src/Datadog.Trace/Generated/net461/Datadog.Trace.SourceGenerators/ConfigurationKeysGenerator/ConfigurationKeys.g.cs

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,15 @@ internal static partial class ConfigurationKeys
215215
[System.Obsolete("This parameter is obsolete and should be replaced by `DD_TRACE_RATE_LIMIT`")]
216216
public const string MaxTracesSubmittedPerSecond = "DD_MAX_TRACES_PER_SECOND";
217217

218+
/// <summary>
219+
/// Enables an experimental runtime metrics collector which uses the
220+
/// <a href="https://learn.microsoft.com/en-us/dotnet/core/diagnostics/metrics">System.Diagnostics.Metrics</a> API.
221+
/// This collector can only be enabled when using .NET 6+, and will only include ASP.NET Core metrics
222+
/// when using .NET 8+.
223+
/// Default value is <c>false</c> (disabled).
224+
/// </summary>
225+
public const string RuntimeMetricsDiagnosticsMetricsApiEnabled = "DD_RUNTIME_METRICS_DIAGNOSTICS_METRICS_API_ENABLED";
226+
218227
/// <summary>
219228
/// Configuration key for enabling or disabling runtime metrics sent to DogStatsD.
220229
/// Default value is <c>false</c> (disabled).

tracer/src/Datadog.Trace/Generated/net6.0/Datadog.Trace.SourceGenerators/ConfigurationKeysGenerator/ConfigurationKeys.g.cs

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,15 @@ internal static partial class ConfigurationKeys
215215
[System.Obsolete("This parameter is obsolete and should be replaced by `DD_TRACE_RATE_LIMIT`")]
216216
public const string MaxTracesSubmittedPerSecond = "DD_MAX_TRACES_PER_SECOND";
217217

218+
/// <summary>
219+
/// Enables an experimental runtime metrics collector which uses the
220+
/// <a href="https://learn.microsoft.com/en-us/dotnet/core/diagnostics/metrics">System.Diagnostics.Metrics</a> API.
221+
/// This collector can only be enabled when using .NET 6+, and will only include ASP.NET Core metrics
222+
/// when using .NET 8+.
223+
/// Default value is <c>false</c> (disabled).
224+
/// </summary>
225+
public const string RuntimeMetricsDiagnosticsMetricsApiEnabled = "DD_RUNTIME_METRICS_DIAGNOSTICS_METRICS_API_ENABLED";
226+
218227
/// <summary>
219228
/// Configuration key for enabling or disabling runtime metrics sent to DogStatsD.
220229
/// Default value is <c>false</c> (disabled).

tracer/src/Datadog.Trace/Generated/netcoreapp3.1/Datadog.Trace.SourceGenerators/ConfigurationKeysGenerator/ConfigurationKeys.g.cs

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,15 @@ internal static partial class ConfigurationKeys
215215
[System.Obsolete("This parameter is obsolete and should be replaced by `DD_TRACE_RATE_LIMIT`")]
216216
public const string MaxTracesSubmittedPerSecond = "DD_MAX_TRACES_PER_SECOND";
217217

218+
/// <summary>
219+
/// Enables an experimental runtime metrics collector which uses the
220+
/// <a href="https://learn.microsoft.com/en-us/dotnet/core/diagnostics/metrics">System.Diagnostics.Metrics</a> API.
221+
/// This collector can only be enabled when using .NET 6+, and will only include ASP.NET Core metrics
222+
/// when using .NET 8+.
223+
/// Default value is <c>false</c> (disabled).
224+
/// </summary>
225+
public const string RuntimeMetricsDiagnosticsMetricsApiEnabled = "DD_RUNTIME_METRICS_DIAGNOSTICS_METRICS_API_ENABLED";
226+
218227
/// <summary>
219228
/// Configuration key for enabling or disabling runtime metrics sent to DogStatsD.
220229
/// Default value is <c>false</c> (disabled).

tracer/src/Datadog.Trace/Generated/netstandard2.0/Datadog.Trace.SourceGenerators/ConfigurationKeysGenerator/ConfigurationKeys.g.cs

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,15 @@ internal static partial class ConfigurationKeys
215215
[System.Obsolete("This parameter is obsolete and should be replaced by `DD_TRACE_RATE_LIMIT`")]
216216
public const string MaxTracesSubmittedPerSecond = "DD_MAX_TRACES_PER_SECOND";
217217

218+
/// <summary>
219+
/// Enables an experimental runtime metrics collector which uses the
220+
/// <a href="https://learn.microsoft.com/en-us/dotnet/core/diagnostics/metrics">System.Diagnostics.Metrics</a> API.
221+
/// This collector can only be enabled when using .NET 6+, and will only include ASP.NET Core metrics
222+
/// when using .NET 8+.
223+
/// Default value is <c>false</c> (disabled).
224+
/// </summary>
225+
public const string RuntimeMetricsDiagnosticsMetricsApiEnabled = "DD_RUNTIME_METRICS_DIAGNOSTICS_METRICS_API_ENABLED";
226+
218227
/// <summary>
219228
/// Configuration key for enabling or disabling runtime metrics sent to DogStatsD.
220229
/// Default value is <c>false</c> (disabled).

0 commit comments

Comments
 (0)