-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[clickhouse] Implement FindTraceIDs for ClickHouse Storage For Primitive Parameters #7648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7648 +/- ##
==========================================
+ Coverage 96.53% 96.57% +0.03%
==========================================
Files 384 384
Lines 19499 19548 +49
==========================================
+ Hits 18823 18878 +55
+ Misses 490 486 -4
+ Partials 186 184 -2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Metrics Comparison SummaryTotal changes across all snapshots: 106 Detailed changes per snapshotsummary_metrics_snapshot_cassandra📊 Metrics Diff SummaryTotal Changes: 53
❌ Removed Metrics
View diff sample-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="+Inf",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="0",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="100",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="1000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="25",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
...View diff sample-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="+Inf",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.005",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.01",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.025",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.05",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.075",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.1",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
...View diff sample-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="+Inf",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="0",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="100",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="1000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="25",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
...Total Changes: 53
🆕 Added Metrics
View diff sample+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="+Inf",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="0",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="100",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="1000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="25",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
...View diff sample+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="+Inf",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.005",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.01",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.025",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.05",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.075",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.1",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
...View diff sample+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="+Inf",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="0",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="100",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="1000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="25",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
... |
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
yurishkuro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to revisit the schema & index design. Looking at the spans table, all I see is
) ENGINE = MergeTree PRIMARY KEY (trace_id)
which doesn't make sense to me. Do we have a design doc about how we want CH to lay out the data?
At minimum there needs to be ORDER BY. The typical way to organize telemetry data is to group data by time range.
internal/storage/v2/clickhouse/sql/create_service_name_index.sql
Outdated
Show resolved
Hide resolved
| @@ -0,0 +1,2 @@ | |||
| ALTER TABLE spans | |||
| ADD INDEX idx_service_name service_name TYPE set(0) GRANULARITY 1; No newline at end of file | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why set(0) and granularity 1? What motivates these values? Can we document this in an ADR document?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro The set(0) needs to be changed to a reasonable number depending on the cardinality of this column. The number specifies the maximum number of unique values to track for that column. The granularity defines the number of granules in an indexed block. I removed the indexes for now to avoid confusion. I'll create an ADR document once we have the queries lined up to optimize our schema for the queries.
| q += " AND name = ?" | ||
| args = append(args, query.OperationName) | ||
| } | ||
| if query.SearchDepth > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we introduce a default value if not specified , as well as a max value which user input cannot exceed (expose both in the config)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| continue | ||
| } | ||
|
|
||
| if !yield([]tracestore.FoundTraceID{{TraceID: pcommon.TraceID(b)}}, nil) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should be returning timestamps with found trace IDs, this was one critical requirement why we added them to the GetTraces API because in CH finding trace by ID is not cheap without time range.
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
| defaultSearchDepth = 100 | ||
| defaultMaxSearchDepth = 10000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro Thoughts on these values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defaultSearchDepth could be 1000
| return err | ||
| } | ||
|
|
||
| func (cfg *Configuration) applyDefaults() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro For a lot of the ClickHouse configurations - the driver has its own defaults. Should we let the driver handle those or do we want to define our own?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we already expose them in config?
when we do it makes sense to default to driver's defaults in most cases.
Signed-off-by: Mahad Zaryab <[email protected]>
| ) | ||
|
|
||
| const ( | ||
| defualtProtocol = "native" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in constant name: defualtProtocol should be defaultProtocol. This typo is propagated to the test file (config_test.go line 90) and should be fixed in both places.
Spotted by Graphite Agent
Is this helpful? React 👍 or 👎 to let us know.
Signed-off-by: Mahad Zaryab <[email protected]>
@yurishkuro For the most part - the primary key and the sort key are the same (https://clickhouse.com/docs/best-practices/choosing-a-primary-key). A primary key in ClickHouse doesn't define uniqueness. I did change this to be the sort key however since that looks to be the convention in ClickHouse. I'm going to create a design document to optimize the schema via the sort key and data-skipping indexes. I can make those optimizations in a follow-up PR. |
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
| b, err := hex.DecodeString(traceID) | ||
| if err != nil { | ||
| if !yield(nil, fmt.Errorf("failed to decode trace ID: %w", err)) { | ||
| return | ||
| } | ||
| continue | ||
| } | ||
|
|
||
| if !yield([]tracestore.FoundTraceID{{TraceID: pcommon.TraceID(b)}}, nil) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type conversion error: Cannot convert []byte to pcommon.TraceID (which is [16]byte). The hex.DecodeString(traceID) returns a slice []byte, but pcommon.TraceID expects a fixed-size array [16]byte. This will cause a compilation error.
Fix by copying the slice into a fixed-size array:
b, err := hex.DecodeString(traceID)
if err != nil {
if !yield(nil, fmt.Errorf("failed to decode trace ID: %w", err)) {
return
}
continue
}
var traceIDArray [16]byte
if len(b) != 16 {
if !yield(nil, fmt.Errorf("invalid trace ID length: expected 16 bytes, got %d", len(b))) {
return
}
continue
}
copy(traceIDArray[:], b)
if !yield([]tracestore.FoundTraceID{{TraceID: pcommon.TraceID(traceIDArray)}}, nil) {
return
}| b, err := hex.DecodeString(traceID) | |
| if err != nil { | |
| if !yield(nil, fmt.Errorf("failed to decode trace ID: %w", err)) { | |
| return | |
| } | |
| continue | |
| } | |
| if !yield([]tracestore.FoundTraceID{{TraceID: pcommon.TraceID(b)}}, nil) { | |
| b, err := hex.DecodeString(traceID) | |
| if err != nil { | |
| if !yield(nil, fmt.Errorf("failed to decode trace ID: %w", err)) { | |
| return | |
| } | |
| continue | |
| } | |
| var traceIDArray [16]byte | |
| if len(b) != 16 { | |
| if !yield(nil, fmt.Errorf("invalid trace ID length: expected 16 bytes, got %d", len(b))) { | |
| return | |
| } | |
| continue | |
| } | |
| copy(traceIDArray[:], b) | |
| if !yield([]tracestore.FoundTraceID{{TraceID: pcommon.TraceID(traceIDArray)}}, nil) { | |
| return | |
| } |
Spotted by Graphite Agent
Is this helpful? React 👍 or 👎 to let us know.
Which problem is this PR solving?
Description of the changes
FindTraceIDsfunction for ClickHouse Storage. It only adds the primitive query parameters (service name and operation name). The other query parameters will be added in follow-up PRs.How was this change tested?
Checklist
jaeger:make lint testjaeger-ui:npm run lintandnpm run test