SpanKind support for badger #6376

Manik2708 · 2024-12-17T07:43:28Z

Which problem is this PR solving?

Fixes: Badger storage plugin: query service to support spanKind when retrieve operations for a given service. #1922

Description of the changes

Queries with span kind will now be supported for Badger

How was this change tested?

Writing unit tests

Checklist

I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
I have signed all commits
I have added unit tests for the new functionality
I have run lint and test steps successfully
- for jaeger: make lint test
- for jaeger-ui: npm run lint and npm run test

Signed-off-by: Manik2708 <[email protected]>

Manik2708 · 2024-12-17T07:50:10Z

I have changed the structure of cache which is leading to these concerns:

Will a 3D map be a viable option for production?
Cache will never be able to retrieve operations of old data! When kind is not sent by the user, all operations related to new data will be sent. I have a probable solution for this! We might have to introduce boolean which when true will load the cache from old data (old index key) and mark all the span of kind UNSPECIFIED
To maintain consistency, we must take the service name from the newly created index, but extracting service name from serviceName+operationName+kind is the challenge! The solution which I have thought is reserving the last 7 places for len(serviceName)+len(operationName)+kind in the new index. This has an issue that we have to limit the length of serviceName and operationName to 999. This way we can get rid of the c.services map also. Removing this map is optional and a matter of discussion because for this we have to decide between storage and iteration, removing this map will lead to extra iterations in GetServices, I also thought of a solution for this:

data = map[string]struct
// Here this struct can be defined as
type struct {
expiryTime uint64
operations map[trace.SpanKind]map[string]uint64
}

Once the correct approach is discussed I will handle some more edge cases and make the e2e tests pass (making GetOperationsMissingSpanKind: false!

codecov · 2024-12-17T07:52:50Z

Codecov Report

Attention: Patch coverage is 33.17972% with 145 lines in your changes missing coverage. Please review.

Project coverage is 49.30%. Comparing base (54ceda2) to head (fec96b1).

Files with missing lines	Patch %	Lines
plugin/storage/badger/spanstore/cache.go	30.48%	106 Missing and 8 partials ⚠️
model/keyvalue.go	33.33%	22 Missing ⚠️
plugin/storage/badger/spanstore/reader.go	42.85%	4 Missing ⚠️
model/converter/thrift/zipkin/to_domain.go	0.00%	3 Missing ⚠️
model/span.go	0.00%	2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (54ceda2) and HEAD (fec96b1). Click for more details.

HEAD has 1 upload less than BASE

Flag BASE (54ceda2) HEAD (fec96b1)

unittests 1 0

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #6376       +/-   ##
===========================================
- Coverage   96.22%   49.30%   -46.93%     
===========================================
  Files         363      180      -183     
  Lines       20748    11257     -9491     
===========================================
- Hits        19965     5550    -14415     
- Misses        599     5257     +4658     
- Partials      184      450      +266

Flag	Coverage Δ
badger_v1	`9.29% <32.25%> (+0.24%)`	⬆️
badger_v2	`1.62% <0.00%> (-0.03%)`	⬇️
cassandra-4.x-v1-manual	`14.90% <5.52%> (-0.15%)`	⬇️
cassandra-4.x-v2-auto	`1.56% <0.00%> (-0.03%)`	⬇️
cassandra-4.x-v2-manual	`1.56% <0.00%> (-0.03%)`	⬇️
cassandra-5.x-v1-manual	`14.90% <5.52%> (-0.15%)`	⬇️
cassandra-5.x-v2-auto	`1.56% <0.00%> (-0.03%)`	⬇️
cassandra-5.x-v2-manual	`1.56% <0.00%> (-0.03%)`	⬇️
elasticsearch-6.x-v1	`18.50% <0.00%> (-0.26%)`	⬇️
elasticsearch-7.x-v1	`18.56% <0.00%> (-0.28%)`	⬇️
elasticsearch-8.x-v1	`18.73% <0.00%> (-0.27%)`	⬇️
elasticsearch-8.x-v2	`1.62% <0.00%> (-0.02%)`	⬇️
grpc_v1	`10.64% <5.52%> (-0.08%)`	⬇️
grpc_v2	`7.94% <5.52%> (-0.04%)`	⬇️
kafka-v1	`9.34% <5.52%> (-0.06%)`	⬇️
kafka-v2	`1.62% <0.00%> (-0.03%)`	⬇️
memory_v2	`1.62% <0.00%> (-0.03%)`	⬇️
opensearch-1.x-v1	`18.62% <0.00%> (-0.27%)`	⬇️
opensearch-2.x-v1	`18.62% <0.00%> (-0.27%)`	⬇️
opensearch-2.x-v2	`1.61% <0.00%> (-0.04%)`	⬇️
tailsampling-processor	`0.46% <0.00%> (-0.01%)`	⬇️
unittests	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Manik2708 · 2024-12-18T03:57:22Z

I have changed the structure of cache which is leading to these concerns:

Will a 3D map be a viable option for production?

Cache will never be able to retrieve operations of old data! When kind is not sent by the user, all operations related to new data will be sent. I have a probable solution for this! We might have to introduce boolean which when true will load the cache from old data (old index key) and mark all the span of kind UNSPECIFIED

To maintain consistency, we must take the service name from the newly created index, but extracting service name from serviceName+operationName+kind is the challenge! The solution which I have thought is reserving the last 7 places for len(serviceName)+len(operationName)+kind in the new index. This has an issue that we have to limit the length of serviceName and operationName to 999. This way we can get rid of the c.services map also. Removing this map is optional and a matter of discussion because for this we have to decide between storage and iteration, removing this map will lead to extra iterations in GetServices, I also thought of a solution for this:
data = map[string]struct
// Here this struct can be defined as
type struct {
expiryTime uint64
operations map[trace.SpanKind]map[string]uint64
}
Once the correct approach is discussed I will handle some more edge cases and make the e2e tests pass (making GetOperationsMissingSpanKind: false!

@yurishkuro Please review the approach and problems!

Signed-off-by: Manik2708 <[email protected]>

Manik2708 · 2024-12-19T08:20:09Z

@yurishkuro I have added more changes which reduces the iterations in prefill to 1 but it limits the serviceName to length of 999. Please review!

Manik2708 · 2024-12-19T08:50:52Z

I have an idea for old data without using the migration script! We can store the old data in two other data structures in cache (without kind). But then the only question which rises then: What to return when no span kind is given by user? Operations of new data of all kind or operations of old data (kind marked as unspecified) or an addition of both?

model/span.go

yurishkuro · 2024-12-20T00:50:20Z

plugin/storage/badger/spanstore/cache.go

@@ -18,7 +21,7 @@ type CacheStore struct {
 	// Given the small amount of data these will store, we use the same structure as the memory store
 	cacheLock  sync.Mutex // write heavy - Mutex is faster than RWMutex for writes
 	services   map[string]uint64
-	operations map[string]map[string]uint64
+	operations map[string]map[trace.SpanKind]map[string]uint64


please add a comment explaining the structure of the map, which is quite complex

yurishkuro · 2024-12-20T00:53:35Z

plugin/storage/badger/spanstore/writer.go

+	kind, _ := span.GetSpanKind()
+	kindString := strconv.FormatInt(int64(rune(kind)), 10)
+	// This format will convert length of service name to formatted 3-digit number (string) like for 9 it will change to "009"
+	formattedLengthOfService := fmt.Sprintf("%03d", len(span.Process.ServiceName))


why is this needed?

I am trying to get services, operation name and kind from the single index. Kind will be the last character of the key but we have to differentiate between service name and operation name, so storing the length of service name. Currently I am thinking to change the key to len(serviceName)+"L"+serviceName+operation name+kind

model/span.go

yurishkuro · 2024-12-20T00:58:01Z

model/span.go

+	return trace.SpanKindUnspecified
+}
+
+func GetSpanKindFromStringOfSpanKind(s string) trace.SpanKind {


what does this mean?

This will change "0" (and other integral string) to trace.SpanKind

yurishkuro · 2024-12-20T01:40:35Z

What to return when no span kind is given by user?

then we should return all operations regardless of the span kind

Manik2708 · 2024-12-20T02:58:03Z

What to return when no span kind is given by user?

then we should return all operations regardless of the span kind

That means including all spans of old data also (Whose kind is not there in cache)?

Signed-off-by: Manik Mehta <[email protected]>

Signed-off-by: Manik2708 <[email protected]>

Manik2708 · 2024-12-22T19:49:31Z

My current approach is leading to errors in unit test of factory_test.go. The badger is throwing this error infinetly times:

runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1700, retrying
badger 2024/12/23 01:12:11 ERROR: error flushing memtable to disk: error while creating table err: while creating table: /tmp/badger116881967/000002.sst error: open /tmp/badger116881967/000002.sst: no such file or directory
unable to open: /tmp/badger116881967/000002.sst
github.com/dgraph-io/ristretto/v2/z.OpenMmapFile

This is probably because f.Close is closed before the completion of prefill. That implies creation of new index for old data is slow. Hence I think we have only one way, if we want to skip even auto migration and that is using this function:

func getSpanKind(txn *badger.Txn, service string, timestampAndTraceId string) model.SpanKind {
	for i := 0; i < 6; i++ {
		value := service + model.SpanKindKey + model.SpanKind(i).String()
		valueBytes := []byte(value)
		operationKey := make([]byte, 1+len(valueBytes)+8+sizeOfTraceID)
		operationKey[0] = tagIndexKey
		copy(operationKey[1:], valueBytes)
		copy(operationKey[1+len(valueBytes):], timestampAndTraceId)
		_, err := txn.Get(operationKey)
		if err == nil {
			return model.SpanKind(i)
		}
	}
	return model.SpanKindUnspecified
}

The only problem is that, during prefilling 6*NumberOfOperations Get Queries will be called. Please review this approach @yurishkuro and I think we need to discuss about autoCreation of new index or should we skip the creation of any new index and use the function given above?

SpanKind support for badger

864f201

Signed-off-by: Manik2708 <[email protected]>

Manik2708 requested a review from a team as a code owner December 17, 2024 07:43

Manik2708 requested a review from jkowall December 17, 2024 07:43

Merge branch 'main' into kind

c2193e8

dosubot bot added enhancement storage/badger Issues related to badger storage labels Dec 17, 2024

Manik2708 added 4 commits December 18, 2024 12:17

e2e test fix

8a4cff6

Signed-off-by: Manik2708 <[email protected]>

Merge branch 'main' into kind

cda2bbd

serviceIndexKey excluded from cache

2a49f5e

Signed-off-by: Manik2708 <[email protected]>

Merge branch 'main' into kind

f7568c2

yurishkuro added the changelog:new-feature Change that should be called out as new feature in CHANGELOG label Dec 20, 2024

yurishkuro reviewed Dec 20, 2024

View reviewed changes

model/span.go Outdated Show resolved Hide resolved

yurishkuro reviewed Dec 20, 2024

View reviewed changes

model/span.go Outdated Show resolved Hide resolved

yurishkuro reviewed Dec 20, 2024

View reviewed changes

Manik2708 added 3 commits December 22, 2024 19:20

Merge branch 'main' into kind

b9ba1b5

Signed-off-by: Manik Mehta <[email protected]>

auto migration

291cc61

Signed-off-by: Manik2708 <[email protected]>

conflicts resolved

1ac092d

Signed-off-by: Manik2708 <[email protected]>

Manik2708 marked this pull request as draft December 22, 2024 14:04

Manik2708 added 3 commits December 22, 2024 22:06

conflicts resolved

c8317d6

Signed-off-by: Manik2708 <[email protected]>

Merge branch 'main' into kind

30039a7

unit test fixed

d02df1b

Signed-off-by: Manik2708 <[email protected]>

unit test fixed

fec96b1

Signed-off-by: Manik2708 <[email protected]>

Manik2708 marked this pull request as ready for review December 22, 2024 19:16

dosubot bot added the area/storage label Dec 22, 2024

Manik2708 requested a review from yurishkuro December 23, 2024 19:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SpanKind support for badger #6376

SpanKind support for badger #6376

Manik2708 commented Dec 17, 2024

Manik2708 commented Dec 17, 2024 •

edited

Loading

codecov bot commented Dec 17, 2024 •

edited

Loading

Manik2708 commented Dec 18, 2024

Manik2708 commented Dec 19, 2024

Manik2708 commented Dec 19, 2024 •

edited

Loading

yurishkuro Dec 20, 2024

yurishkuro Dec 20, 2024

Manik2708 Dec 20, 2024

yurishkuro Dec 20, 2024

Manik2708 Dec 20, 2024

yurishkuro commented Dec 20, 2024

Manik2708 commented Dec 20, 2024

Manik2708 commented Dec 22, 2024

SpanKind support for badger #6376

Are you sure you want to change the base?

SpanKind support for badger #6376

Conversation

Manik2708 commented Dec 17, 2024

Which problem is this PR solving?

Description of the changes

How was this change tested?

Checklist

Manik2708 commented Dec 17, 2024 • edited Loading

codecov bot commented Dec 17, 2024 • edited Loading

Codecov Report

Manik2708 commented Dec 18, 2024

Manik2708 commented Dec 19, 2024

Manik2708 commented Dec 19, 2024 • edited Loading

yurishkuro Dec 20, 2024

Choose a reason for hiding this comment

yurishkuro Dec 20, 2024

Choose a reason for hiding this comment

Manik2708 Dec 20, 2024

Choose a reason for hiding this comment

yurishkuro Dec 20, 2024

Choose a reason for hiding this comment

Manik2708 Dec 20, 2024

Choose a reason for hiding this comment

yurishkuro commented Dec 20, 2024

Manik2708 commented Dec 20, 2024

Manik2708 commented Dec 22, 2024

Manik2708 commented Dec 17, 2024 •

edited

Loading

codecov bot commented Dec 17, 2024 •

edited

Loading

Manik2708 commented Dec 19, 2024 •

edited

Loading