Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2][adjuster] Enhance Span Hash Adjuster For Spans That Have Already Been Hashed #6393

Open
mahadzaryab1 opened this issue Dec 22, 2024 · 2 comments
Labels
good first issue Good for beginners help wanted Features that maintainers are willing to accept but do not have cycles to implement

Comments

@mahadzaryab1
Copy link
Collaborator

mahadzaryab1 commented Dec 22, 2024

Originally posted by @yurishkuro in #6391 (comment)

  1. Some storage backends (Cassandra, in particular), perform similar deduping by computing a hash before the span is saved and using it as part of the partition key (it creates tombstones if identical span is saved 2 times or more but no dups on read). So we could make this hashing process to be a part of the ingestion pipeline (e.g. in sanitizers) and simply store it as an attribute on the span. Then this adjuster would be "lazy", it will only recompute the hash if it doesn't already exist in the storage.

  2. If we do this on the write path, we would want this to be as efficient as possible, so we would need to implement manual hashing by iterating through the attributes (and pre-sorting them to avoid dependencies) and but manually going through all fields of the Span / SpanEvent / SpanLink. The reason I was reluctant to do that in the past was to avoid unintended bugs if the data model was changed, like a new field added that we'd forget to add to the hash function. To protect against that we probably could use some fuzzing tests, by setting / unsetting each field individually and making sure the hash code changes as a result.

@mahadzaryab1 mahadzaryab1 changed the title Enhance Span Hash Adjuster For Spans That Have Already Been Hashed [v2][adjuster] Enhance Span Hash Adjuster For Spans That Have Already Been Hashed Dec 22, 2024
@yurishkuro yurishkuro added help wanted Features that maintainers are willing to accept but do not have cycles to implement good first issue Good for beginners and removed performance storage/cassandra v2 labels Dec 22, 2024
@yurishkuro
Copy link
Member

@suryaaprakassh What guidance do you need?

@zzzk1
Copy link
Contributor

zzzk1 commented Dec 25, 2024

@yurishkuro Has anyone started this? If not, I’d like to try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for beginners help wanted Features that maintainers are willing to accept but do not have cycles to implement
Projects
None yet
Development

No branches or pull requests

3 participants