-
Notifications
You must be signed in to change notification settings - Fork 150
Add container tags hash to DBM queries (if enabled) #8061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: vandonr/process3
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4fd01fab6f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| else | ||
| { | ||
| // PropagateDataViaComment (service) - this injects varius trace information as a comment in the query | ||
| if (tracer.Settings.InjectSqlBasehash && !string.IsNullOrEmpty(baseHash)) | ||
| { | ||
| tags.BaseHash = baseHash; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set BaseHash even when DBM comment already present
This new BaseHash tagging only happens in the else branch when the command text is not already DBM-injected. In the cached‑command scenario (or when users pre‑inject DBM comments), alreadyInjected is true, so _dd.propagated_hash is never set on subsequent spans even though the query still carries ddsh in the SQL comment. If DBM looks up container tags by scanning recent spans for that hash, later queries can’t be enriched once the first span ages out. Consider setting tags.BaseHash whenever the feature is enabled (and baseHash is non‑empty), regardless of the alreadyInjected branch.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, yes, that's an interesting point, but I'm not sure we care, we only need one span with the hash to get the values, so we don't really need to tag all spans. I think in practice it works well like this.
BenchmarksBenchmark execution time: 2026-01-14 16:02:15 Comparing candidate commit 4fd01fa in PR branch Some scenarios are present only in baseline or only in candidate runs. If you didn't create or remove some scenarios in your branch, this maybe a sign of crashed benchmarks 💥💥💥 Scenarios present only in baseline:
Found 13 performance improvements and 11 performance regressions! Performance is the same for 153 metrics, 9 unstable metrics. scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild net472
scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild net6.0
scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild netcoreapp3.1
scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody net472
scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody net6.0
scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody netcoreapp3.1
scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmarkWithAttack net6.0
scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest net6.0
scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net6.0
scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces netcoreapp3.1
scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice net6.0
scenario:Benchmarks.Trace.CharSliceBenchmark.OriginalCharSlice net6.0
scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch net6.0
scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch netcoreapp3.1
scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync netcoreapp3.1
scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync netcoreapp3.1
scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark netcoreapp3.1
scenario:Benchmarks.Trace.Log4netBenchmark.EnrichedLog netcoreapp3.1
scenario:Benchmarks.Trace.SpanBenchmark.StartFinishTwoScopes net6.0
scenario:Benchmarks.Trace.SpanBenchmark.StartFinishTwoScopes netcoreapp3.1
scenario:Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin net6.0
|
|
I just realized I need to put process tags in there too |
bouwkast
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main question that I have is that it appears this is correctly following the RFC in how we propagate the hash, but the merged Python implementation recomputes the hash.
But the RFC isn't precise enough in describing the hash and expected behavior / requirements for me to know which is correct really
| if (!string.IsNullOrEmpty(baseHash)) | ||
| { | ||
| propagatorStringBuilder.Append(',').Append(SqlCommentBaseHash).Append("='").Append(Uri.EscapeDataString(baseHash)).Append('\''); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the Python implementation correctly they appear to be recomputing the hash:
https://github.com/DataDog/dd-trace-py/blob/b5008005cd637a07e5c249f8d60ed07ee8328dc5/ddtrace/internal/process_tags/__init__.py#L92-L94
I think based on the linked document that the method here of just passing in the has as given from the agent is the correct approach, but seeing that Python has been merged I'm wondering if the approach has been changed outside of the RFC? If not, I think that the Python implementation must not work.
From the RFC:
The agent would compute a hash on low cardinality container tags that can be used to identify a workload
The agent will then propagate that hash to the tracing libraries that can then use it to complete outbound communications with container information
Could we validate this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the hash is recomputed here because it's only injected from the instrumentation site, where we get it fresh from the global instance, so if the global instance changes, we'd pick up the latest one.
However, here I just implemented container tags, but I realized as you were reviewing that I need to add proces tags in there too (like in python), so there will be some computation to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically, the agent computes a hash that's ready to use based on container tags, but we also need to pack process tags in that hash, which is why there is a re-hashing step needed in the tracer (then)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay 👍
I also noticed we were putting it as a string compared to python, but also noticed that the RFC actually seems to point to that Python PR as being the source of truth so I think that answers both this comment and the one I just left here
| public string DbmTraceInjected { get; set; } | ||
|
|
||
| [Tag(Tags.BaseHash)] | ||
| public string BaseHash { get; set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticing another discrepancy between .NET / Python
.NET is adding the hash to the meta as a string
Python is adding the hash to the metrics as an int (or whatever python uses 🤷 )
https://github.com/DataDog/dd-trace-py/blob/main/ddtrace/propagation/_database_monitoring.py#L87-L88
I don't think the RFC clarifies exactly what is expected here though:
A tag (_dd.propagated_hash) should be set on the current span when injection happens in order to correlate the container hash set on the query with the spans, and retrieve the container tags from there. The value of the tag should be the back propagated container tags hash, see PR
Wait that comment on the RFC points back to the Python PR as the source of truth, so is it really that the Python PR is the source of truth here and not the RFC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it is really that. I agree that this is not great, but that's the world we live in. Even worse, the "real spec" used to be the java implem, but then as the python implem was written, it turned out that there was some holes in it, so now the reference is the python code 🙃
I'll double check about the type of the tag, because it's important that we get all the digits, and I'm not 100% sure metric type tags do that.
Summary of changes
Add the ability to write the container tags hash to DBM queries + to the related span.
The goal is that DBM would then query the spans bearing that hash, and then use the container tags on this (those) spans(s) to enrich the queries with it.
This is controlled by a setting that is disabled by default, and would be enabled if propagation mode is "service" or greater
see RFC: https://docs.google.com/document/d/15GtNOKGBCt6Dc-HsDNnMmCdZwhewFQx8yUlI9in5n3M
related PR in python: DataDog/dd-trace-py#15293
Reason for change
Implementation details
Test coverage
Adding a test in DbScopeFactoryTests.cs forced me to inject the value from pretty high, which I find a bit "dirty", but at least we don't have to rely on global static instance in tests.
Other details