Skip to content

Conversation

Manik2708
Copy link
Contributor

Which problem is this PR solving?

Description of the changes

  • Currently it is impossible to record scope information completely as there is no way to store scope attributes in ES DB Span. Therefore it is necessary to add a scope field in db span.

How was this change tested?

  • Unit And integration tests

Checklist

@Manik2708 Manik2708 requested a review from a team as a code owner April 16, 2025 17:21
@Manik2708 Manik2708 requested a review from mahadzaryab1 April 16, 2025 17:21
Copy link

codecov bot commented Apr 16, 2025

Codecov Report

❌ Patch coverage is 95.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.09%. Comparing base (734e1be) to head (3ddaf4b).
⚠️ Report is 280 commits behind head on main.

Files with missing lines Patch % Lines
...torage/v2/elasticsearch/tracestore/from_dbmodel.go 92.68% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7039      +/-   ##
==========================================
+ Coverage   96.04%   96.09%   +0.04%     
==========================================
  Files         352      352              
  Lines       20821    20820       -1     
==========================================
+ Hits        19998    20007       +9     
+ Misses        620      613       -7     
+ Partials      203      200       -3     
Flag Coverage Δ
badger_v1 9.96% <0.00%> (-0.01%) ⬇️
badger_v2 2.07% <0.00%> (-0.01%) ⬇️
cassandra-4.x-v1-manual 14.99% <0.00%> (-0.01%) ⬇️
cassandra-4.x-v2-auto 2.06% <0.00%> (-0.01%) ⬇️
cassandra-4.x-v2-manual 2.06% <0.00%> (-0.01%) ⬇️
cassandra-5.x-v1-manual 14.99% <0.00%> (-0.01%) ⬇️
cassandra-5.x-v2-auto 2.06% <0.00%> (-0.01%) ⬇️
cassandra-5.x-v2-manual 2.06% <0.00%> (-0.01%) ⬇️
elasticsearch-6.x-v1 19.98% <100.00%> (+0.03%) ⬆️
elasticsearch-7.x-v1 20.06% <100.00%> (+0.03%) ⬆️
elasticsearch-8.x-v1 20.23% <100.00%> (+0.03%) ⬆️
elasticsearch-8.x-v2 2.07% <0.00%> (-0.01%) ⬇️
grpc_v1 11.51% <0.00%> (-0.01%) ⬇️
grpc_v2 9.14% <0.00%> (-0.01%) ⬇️
kafka-3.x-v1 10.24% <0.00%> (-0.01%) ⬇️
kafka-3.x-v2 2.07% <0.00%> (-0.01%) ⬇️
memory_v2 2.07% <0.00%> (-0.01%) ⬇️
opensearch-1.x-v1 20.11% <100.00%> (+0.03%) ⬆️
opensearch-2.x-v1 20.11% <100.00%> (+0.03%) ⬆️
opensearch-2.x-v2 2.07% <0.00%> (-0.01%) ⬇️
tailsampling-processor 0.56% <0.00%> (-0.01%) ⬇️
unittests 94.85% <95.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Name string `json:"name,omitempty"`
Version string `json:"version,omitempty"`
Tags []KeyValue `json:"tags,omitempty"`
Tag map[string]any `json:"tag,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't we add another field TagType to capture the type of value and avoid data loss? We'd want to do that for all 3 flavors of Tag field

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add it in the DotReplacement PR as tag map handling is taking place there!

operationNameField = "operationName"
objectTagsField = "tag"
objectProcessTagsField = "process.tag"
objectScopeTagsField = "scope.tag"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come this is the only change in the reader? Doesn't it need to alter queries sent to ES to look in the new fields?

And how will it work if the ES template doesn't have those fields (legacy data)? Will the query fail or just not find anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come this is the only change in the reader? Doesn't it need to alter queries sent to ES to look in the new fields?

The changes are automatically captured by the array to reflect in the query! Query looks for all the tags therefore we just need to add it in the nestedTagFieldList and objectTagFieldList

And how will it work if the ES template doesn't have those fields (legacy data)? Will the query fail or just not find anything?

For nested field query it will fail as we need to inform to ES about nested fields to make nested query success and for object fields, it will return nothing

Copy link
Contributor Author

@Manik2708 Manik2708 Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come this is the only change in the reader? Doesn't it need to alter queries sent to ES to look in the new fields?

The changes are automatically captured by the array to reflect in the query! Query looks for all the tags therefore we just need to add it in the nestedTagFieldList and objectTagFieldList

And how will it work if the ES template doesn't have those fields (legacy data)? Will the query fail or just not find anything?

For nested field query it will fail as we need to inform to ES about nested fields to make nested query success and for object fields, it will return nothing

My bad! It will fail only the nested query, nothing will happen for object (everything will be fine)!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't have queries against old data fail. What's the solution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have thought 2 step solution. We should understand that the only thing we need to ensure is: Mappings are up-to-date.

  1. The users which have enabled CreateIndexTemplates: We can introduce a feature gate to automatically update the mappings.
  2. The users which have disabled CreateIndexTemplates: We can have an update log to inform them the changes they might have to make in the mappings before updating jaeger.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we detect automatically if the mapping is missing something? Don't we allow users to make manual changes to the mappings too? You need to think through possible permutations of starting conditions and have a path for each of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to know a thing, when users will take the update, the ES Factory will be re-initialized, right? If yes then I think new templates will automatically be created if user has enabled CreateIndexTemplates. Please see this code:

if cfg.CreateIndexTemplates && !cfg.UseILM {
mappingBuilder := mappingBuilderFromConfig(cfg)
spanMapping, serviceMapping, err := mappingBuilder.GetSpanServiceMappings()
if err != nil {
return nil, err
}
if err := writer.CreateTemplates(spanMapping, serviceMapping, cfg.Indices.IndexPrefix); err != nil {
return nil, err
}

Internally we use *elastic.IndicesPutTemplateService which replaces the mapping if it already exists. For the users who have turned off CreateIndexTemplates we can inform them to make the necessary changes before updating jaeger.

Don't we allow users to make manual changes to the mappings too?

Can you please explain, how users can do this? I can understand 2 cases around this:

  1. Users who have enabled CreateIndexTemplates and manually edited the mappings later, the new mappings will be overrided on the old mappings automatically.
  2. Users who have disabled CreateIndexTemplates, as said above we need to inform them about the necessay changes!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you're saying writer.CreateTemplates will always override the mappings on collector restart? Then yes we'd just need to provide migration instructions for users who have the setting set to manual creation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but firstly I am thinking to test inserting some old data and then restarting jaeger and then see whether the queries are running fine or not. Have to make some changes in bash scripts for this locally. Will get back with enough evidences and then I think work on migration guide!

Copy link
Member

@yurishkuro yurishkuro Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. All of that should be documented in the PR description, which we can label as breaking change to call people's attention.

@yurishkuro yurishkuro added the changelog:exprimental Change to an experimental part of the code label Apr 18, 2025
@Manik2708
Copy link
Contributor Author

Manik2708 commented Apr 19, 2025

@yurishkuro Good news! Now queries are not crashing no matter whether mappings are updated or not. This has become possible by using IgnoreUnmapped of nested queries. We can test this (I did) by copying only the reader and model changes to a new branch and running integration tests. They are passing without any error! Now taking about mapping changes then yes they are updated by jaeger restart. I have tested this by following steps:

  1. Checkout to the main branch and modify the bash script of E2E tests of ES so that storage is not teared down after the tests.
  2. Modify the tests so that indices and mappings are not cleared (Comment out cleanup functions)
  3. Run the integration tests!
  4. Curl the ES end-point to get the mappings.
curl -X GET "http://localhost:9200/_mapping?pretty"
  1. Now checkout to this branch
  2. Modify the integration tests by commenting out all the spanstore tests except those of FindTraces
  3. When these tests are running, curl the ES endpoint to see the mappings (you will see scope is added to the mapping).

@yurishkuro
Copy link
Member

Good news! Now queries are not crashing no matter whether mappings are updated or not. This has become possible by using IgnoreUnmapped of nested queries.

Indeed good news. However, the scope attributes are not being tested in the e2e tests today, we need to add that to the suite. It may require some refactoring because the existing e2e tests work off the v1 model which cannot adequately represent scope attributes

s.writeTrace(t, trace)

We need to add the ability to read fixtures in OTLP and use v2 Storage API to write from the test (in the v2 e2e tests these writes are already automatically upgraded to v2 Storage API, but since the input is v1 we still can't test the scope attributes). This could be a separate prequel PR - ideally we would upgrade all fixtures to OTLP first, then it will be easier to extend fixtures to include resource and scope attributes.

@yurishkuro
Copy link
Member

We need to add the ability to read fixtures in OTLP and use v2 Storage API to write from the test

NB: this doesn't need to be a big bang, we can just upgrade the path to write via V2 API first and remove unnecessary translations on the way ... #7050

@Manik2708
Copy link
Contributor Author

We need to add the ability to read fixtures in OTLP and use v2 Storage API to write from the test

NB: this doesn't need to be a big bang, we can just upgrade the path to write via V2 API first and remove unnecessary translations on the way ... #7050

@yurishkuro Then before that I think we need to implement the factory for ES, or should we directly use the v2 writer? I am thinking to first refactor the v1 factory, then implement the v2 factory and then finally upgrade the integration tests! Does this look fine?

@yurishkuro
Copy link
Member

I think we need to implement the factory for ES

well, refactoring e2e tests is a good chunk of changes by itself. It might be easier to finish v2 memory storage first and use it as a guinea pig, rather than introducing dependency on ES just for the e2e tests upgrade. It's also much easier/faster to run e2e tests for memory storage.

@Manik2708
Copy link
Contributor Author

I think we need to implement the factory for ES

well, refactoring e2e tests is a good chunk of changes by itself. It might be easier to finish v2 memory storage first and use it as a guinea pig, rather than introducing dependency on ES just for the e2e tests upgrade. It's also much easier/faster to run e2e tests for memory storage.

Have a doubt over this! Currently we have implemented v2 APIs for two backends: ElasticSearch and Memory. The problem is both of them would give different traces (only structure wise), although no information loss is there but take an example, every Span is appended to a different resource span in ES but this is not so in memory. So how should we do testing now?

@yurishkuro
Copy link
Member

v2 API is also implemented for GRPC storage.

@yurishkuro
Copy link
Member

would give different traces (only structure wise)

we can apply some normalization function in the test before comparing with expected fixture, e.g. to split the whole payload such that each span has its own copy of resource & scope objects (even if they are duplicated).

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. You may re-open it if you need more time.

@github-actions github-actions bot added the stale The issue/PR has become stale and may be auto-closed label Jul 28, 2025
@yurishkuro
Copy link
Member

@Manik2708 is this still needed?

@Manik2708
Copy link
Contributor Author

@Manik2708 is this still needed?

@yurishkuro Extremely sorry for not looking into this. But this can be completed once refactoring of e2e tests is done and we could add integration tests related to scope

@github-actions github-actions bot removed the stale The issue/PR has become stale and may be auto-closed label Aug 4, 2025
Copy link

github-actions bot commented Oct 6, 2025

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. You may re-open it if you need more time.

@github-actions github-actions bot added the stale The issue/PR has become stale and may be auto-closed label Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog:exprimental Change to an experimental part of the code stale The issue/PR has become stale and may be auto-closed storage/elasticsearch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants