Skip to content

Conversation

ADITYATIWARI342005
Copy link

@ADITYATIWARI342005 ADITYATIWARI342005 commented Sep 30, 2025

Which problem is this PR solving?

Description of the changes

Adds max_trace_size parameter to v2 query service with early termination when limit is reached, preventing OOM on large traces.

Key changes:

  • V1TracesFromSeq2 now accepts explicit maxTraceSize int parameter
  • Incremental span counting with early iterator termination when limit exceeded
  • applyTraceSizeLimit helper truncates traces and logs warnings
  • Limit threaded through v2 adapter (0 = unlimited)
  • v2-only; no v1 changes

Modified files:

  • internal/storage/v2/v1adapter/translator.go - limiting logic
  • internal/storage/v2/v1adapter/translator_test.go - test coverage
  • internal/storage/v2/v1adapter/{tracereader,spanreader}.go - parameter wiring
  • Call sites updated to pass 0 (unlimited)

How was this change tested?

  • Unit tests for unlimited mode and truncation scenarios
  • go test ./internal/storage/v2/v1adapter -count=1 passes
  • Manual verification with traces exceeding limit

Checklist

@ADITYATIWARI342005 ADITYATIWARI342005 requested a review from a team as a code owner September 30, 2025 07:46
@ADITYATIWARI342005 ADITYATIWARI342005 changed the title feat(query/v2): enforce max-trace-size with incremental limiting and … feat: add max-trace-size limit with early termination to prevent OOM Sep 30, 2025
@ADITYATIWARI342005
Copy link
Author

Hi @yurishkuro @AnmolxSingh
Thank you for your reviews on the PR #7499 , I realized it was overly complicated and inefficient.
therefore I closed it.

This PR is optimal and addresses the issue #7495 completely.

Please review it.
Thank you.

@ADITYATIWARI342005 ADITYATIWARI342005 changed the title feat: add max-trace-size limit with early termination to prevent OOM feat(query/v2): add max-trace-size limit with early termination to prevent OOM Sep 30, 2025
Copy link

codecov bot commented Oct 4, 2025

Codecov Report

❌ Patch coverage is 91.89189% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.48%. Comparing base (f135545) to head (82ea91b).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
internal/storage/v2/v1adapter/translator.go 90.42% 6 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7527      +/-   ##
==========================================
- Coverage   96.50%   96.48%   -0.03%     
==========================================
  Files         385      385              
  Lines       23316    23422     +106     
==========================================
+ Hits        22501    22598      +97     
- Misses        627      633       +6     
- Partials      188      191       +3     
Flag Coverage Δ
badger_v1 8.94% <0.00%> (-0.08%) ⬇️
badger_v2 1.58% <0.00%> (-0.02%) ⬇️
cassandra-4.x-v1-manual 11.57% <0.00%> (-0.10%) ⬇️
cassandra-4.x-v2-auto 1.57% <0.00%> (-0.02%) ⬇️
cassandra-4.x-v2-manual 1.57% <0.00%> (-0.02%) ⬇️
cassandra-5.x-v1-manual 11.57% <0.00%> (-0.10%) ⬇️
cassandra-5.x-v2-auto 1.57% <0.00%> (-0.02%) ⬇️
cassandra-5.x-v2-manual 1.57% <0.00%> (-0.02%) ⬇️
clickhouse 1.52% <0.00%> (-0.02%) ⬇️
elasticsearch-6.x-v1 16.42% <0.00%> (-0.15%) ⬇️
elasticsearch-7.x-v1 16.46% <0.00%> (-0.15%) ⬇️
elasticsearch-8.x-v1 16.60% <0.00%> (-0.15%) ⬇️
elasticsearch-8.x-v2 1.58% <0.00%> (-0.02%) ⬇️
elasticsearch-9.x-v2 1.58% <0.00%> (-0.02%) ⬇️
grpc_v1 10.14% <2.00%> (-0.07%) ⬇️
grpc_v2 1.58% <0.00%> (-0.02%) ⬇️
kafka-3.x-v1 9.58% <0.00%> (-0.09%) ⬇️
kafka-3.x-v2 1.58% <0.00%> (-0.02%) ⬇️
memory_v2 1.58% <0.00%> (-0.02%) ⬇️
opensearch-1.x-v1 16.51% <0.00%> (-0.15%) ⬇️
opensearch-2.x-v1 16.51% <0.00%> (-0.15%) ⬇️
opensearch-2.x-v2 1.58% <0.00%> (-0.02%) ⬇️
opensearch-3.x-v2 1.58% <0.00%> (-0.02%) ⬇️
query 1.58% <0.00%> (-0.02%) ⬇️
tailsampling-processor 0.43% <0.00%> (-0.01%) ⬇️
unittests 95.48% <91.89%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

github-actions bot commented Oct 4, 2025

Metrics Comparison Summary

Total changes across all snapshots: 53

Detailed changes per snapshot

summary_metrics_snapshot_cassandra

📊 Metrics Diff Summary

Total Changes: 53

  • 🆕 Added: 53 metrics
  • ❌ Removed: 0 metrics
  • 🔄 Modified: 0 metrics

🆕 Added Metrics

  • http_server_request_body_size_bytes (18 variants)
View diff sample
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="+Inf",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="0",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="100",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="1000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="25",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
...
- `http_server_request_duration_seconds` (17 variants)
View diff sample
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="+Inf",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.005",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.01",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.025",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.05",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.075",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.1",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
...
- `http_server_response_body_size_bytes` (18 variants)
View diff sample
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="+Inf",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="0",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="100",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="1000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="25",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
...

➡️ View full metrics file

@ADITYATIWARI342005
Copy link
Author

@yurishkuro
Sorry to bother you, But could you please add a label to this Pull Request to pass the CI checks.
Also, Please tell me how should I improve Codecov score.

Add configurable trace size limiting to prevent OOM issues when retrieving
large traces. Implements early termination with proper iterator stopping,
comprehensive unit tests, and full configuration wiring through v1/v2
query services and jaeger extension. Addresses maintainer feedback by
preserving original function signatures and adding validation tests.

Signed-off-by: ADITYATIWARI342005 <[email protected]>
Fix critical bug where truncated traces would prevent processing of
subsequent traces in the same sequence. Now properly resets truncated
state when a new trace ID is detected, allowing multiple traces to be
processed correctly even when some exceed the size limit.

Signed-off-by: ADITYATIWARI342005 <[email protected]>
…ot review on the test

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Signed-off-by: ADITYA TIWARI <[email protected]>
Add tests to cover edge cases in applyTraceSizeLimit:
- Error handling in sequence processing
- Truncated trace continuation across multiple chunks
- Multiple traces where some exceed limit while others are processed

These tests ensure the max-trace-size feature works correctly in all scenarios."

Signed-off-by: ADITYA TIWARI <[email protected]>
@ADITYATIWARI342005
Copy link
Author

@yurishkuro
I have added tests to fulfill Codecov report, This pull request is now ready for review,
Please review it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Implement max trace size parameter in the query service

2 participants