[v2][adjuster] Enhance Span Hash Adjuster For Spans That Have Already Been Hashed

_Originally posted by @yurishkuro in https://github.com/jaegertracing/jaeger/pull/6391#discussion_r1894657750_

1. Some storage backends (Cassandra, in particular), perform similar deduping by computing a hash _before_ the span is saved and using it as part of the partition key (it creates tombstones if identical span is saved 2 times or more but no dups on read). So we could make this hashing process to be a part of the ingestion pipeline (e.g. in sanitizers) and simply store it as an attribute on the span. Then this adjuster would be "lazy", it will only recompute the hash if it doesn't already exist in the storage.

2. If we do this on the write path, we would want this to be as efficient as possible, so we would need to implement manual hashing by iterating through the attributes (and pre-sorting them to avoid dependencies) and but manually going through all fields of the Span / SpanEvent / SpanLink. The reason I was reluctant to do that in the past was to avoid unintended bugs if the data model was changed, like a new field added that we'd forget to add to the hash function. To protect against that we probably could use some fuzzing tests, by setting / unsetting each field individually and making sure the hash code changes as a result.


            

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[v2][adjuster] Enhance Span Hash Adjuster For Spans That Have Already Been Hashed #6393

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[v2][adjuster] Enhance Span Hash Adjuster For Spans That Have Already Been Hashed #6393

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions