Skip to content

Conversation

SomilJain0112
Copy link
Contributor

Which problem is this PR solving?

#6467

Description of the changes

Added chunk encoding

How was this change tested?

Tested locally

Checklist

@SomilJain0112 SomilJain0112 requested a review from a team as a code owner October 11, 2025 20:09
@dosubot dosubot bot added the enhancement label Oct 11, 2025
Comment on lines +179 to +181
if len(traces) == 0 {
return true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current handling of empty trace arrays may lead to ambiguity in the API response. When all iterations return empty arrays, tracesFound remains false and a 404 error is returned. This approach doesn't distinguish between "no traces exist for this query" and "traces exist but contain no data." Consider adding a flag that indicates whether the query matched any traces at all, separate from whether those traces contain spans. This would provide more accurate feedback to API consumers about whether their query parameters matched anything in the system.

Spotted by Graphite Agent

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a reasonable callout. Can we add a test for this use case?

Copy link

codecov bot commented Oct 11, 2025

Codecov Report

❌ Patch coverage is 83.87097% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.40%. Comparing base (4b78f8a) to head (97748d2).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
cmd/query/app/apiv3/http_gateway.go 83.87% 7 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7555      +/-   ##
==========================================
- Coverage   96.52%   96.40%   -0.12%     
==========================================
  Files         385      385              
  Lines       23316    23422     +106     
==========================================
+ Hits        22506    22581      +75     
- Misses        623      648      +25     
- Partials      187      193       +6     
Flag Coverage Δ
badger_v1 9.02% <ø> (+<0.01%) ⬆️
badger_v2 1.58% <ø> (-0.01%) ⬇️
cassandra-4.x-v1-manual 11.67% <ø> (+<0.01%) ⬆️
cassandra-4.x-v2-auto 1.57% <ø> (-0.01%) ⬇️
cassandra-4.x-v2-manual 1.57% <ø> (-0.01%) ⬇️
cassandra-5.x-v1-manual 11.67% <ø> (+<0.01%) ⬆️
cassandra-5.x-v2-auto 1.57% <ø> (-0.01%) ⬇️
cassandra-5.x-v2-manual 1.57% <ø> (-0.01%) ⬇️
clickhouse 1.52% <ø> (-0.01%) ⬇️
elasticsearch-6.x-v1 16.57% <ø> (+<0.01%) ⬆️
elasticsearch-7.x-v1 16.61% <ø> (+<0.01%) ⬆️
elasticsearch-8.x-v1 16.75% <ø> (+<0.01%) ⬆️
elasticsearch-8.x-v2 1.58% <ø> (-0.01%) ⬇️
elasticsearch-9.x-v2 1.58% <ø> (-0.01%) ⬇️
grpc_v1 10.22% <ø> (+<0.01%) ⬆️
grpc_v2 1.58% <ø> (-0.01%) ⬇️
kafka-3.x-v1 9.66% <ø> (+<0.01%) ⬆️
kafka-3.x-v2 1.58% <ø> (-0.01%) ⬇️
memory_v2 1.58% <ø> (-0.01%) ⬇️
opensearch-1.x-v1 16.66% <ø> (+<0.01%) ⬆️
opensearch-2.x-v1 16.66% <ø> (+<0.01%) ⬆️
opensearch-2.x-v2 1.58% <ø> (-0.01%) ⬇️
opensearch-3.x-v2 1.58% <ø> (-0.01%) ⬇️
query 1.58% <ø> (-0.01%) ⬇️
tailsampling-processor 0.43% <ø> (-0.01%) ⬇️
unittests 95.41% <83.87%> (-0.12%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

github-actions bot commented Oct 11, 2025

Metrics Comparison Summary

Total changes across all snapshots: 53

Detailed changes per snapshot

summary_metrics_snapshot_cassandra

📊 Metrics Diff Summary

Total Changes: 53

  • 🆕 Added: 0 metrics
  • ❌ Removed: 53 metrics
  • 🔄 Modified: 0 metrics

❌ Removed Metrics

  • http_server_request_body_size_bytes (18 variants)
View diff sample
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="+Inf",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="0",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="100",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="1000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="25",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
...
- `http_server_request_duration_seconds` (17 variants)
View diff sample
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="+Inf",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.005",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.01",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.025",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.05",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.075",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.1",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
...
- `http_server_response_body_size_bytes` (18 variants)
View diff sample
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="+Inf",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="0",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="100",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="1000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="25",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
...

➡️ View full metrics file

@yurishkuro
Copy link
Member

there was a previous attempt to fix this issue and it was blocked. How is your approach different from that one?

Signed-off-by: Somil Jain <[email protected]>
@SomilJain0112
Copy link
Contributor Author

Hii @yurishkuro ,
Let me explain whats different from #6479 .

  1. I have used flush instead of using manual transfer. Go's http server handles this automatically.
  2. Gzip buffered internally defeating the point of streaming, instead I have used "Content-Encoding", "identity".
  3. In [WIP] Implement proper payload chunked encoding in HTTP api_v3 #6479 concatenation being done which was invalid json, instead I have used NDJSON(newline separated).

Test coverage is 83.87 percent, I don't think it can be increased further please correct me If I am wrong.

@SomilJain0112
Copy link
Contributor Author

Hii @yurishkuro , Review this PR as well please!


w.Header().Set("Content-Type", "application/json")
w.Header().Set("X-Content-Type-Options", "nosniff")
w.Header().Set("Content-Encoding", "identity")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is "identity" encoding?


marshaler := jsonpb.Marshaler{}
if err := marshaler.Marshal(w, response); err != nil {
h.Logger.Error("Failed to marshal trace chunk", zap.Error(err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens to the output stream if we just log error and exit?

return false
}

flusher.Flush()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand correctly you serialize each chunk and flush it to the client. What happens to content-length header in this case? And how does the client know that there are multiple chunks to be received?

yield([]ptrace.Traces{trace2}, nil)
})).Once()

r, err := http.NewRequest(http.MethodGet, "/api/v3/traces/1", http.NoBody)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please elaborate what and how you're testing here. All asserts seem to happen against the recorder w, but what about the client side? How is the client supposed to handle chunked response? If the only thing you're doing is just avoiding writing a single huge payload from the server to a client connection, then yes it protects the server against keeping too much state in memory, but it doesn't really help the client to precess the results in a streaming fashion, it still needs to read the whole cumulative payload.

Also, what happens to the adjusters in the query service? I suspect we still have to load the complete trace into memory to adjust it. That doesn't mean there's no benefit to streaming - we can at least chunk up the stream on individual traces rather than on ALL traces in the response coming as a single payload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants