Improve error reporting / debugging UX with the OTLP default/HTTP exporters #1936

chen-anders · 2025-10-12T13:31:31Z

addresses: #1931

This PR significantly enhances the debugging experience for OTLP exporters by:

Adding rich context to export failure results
Introducing comprehensive debug-level logging throughout the export pipeline
Maintaining full backwards compatibility with existing exporter implementations

These changes ended up helping me debug a really gnarly issue where a slightly old version of the sentry-ruby SDK was causing issues with how the OpenTelemetry ruby SDK was bubbling up errors due to incorrect IPv6 parsing - causing all my traces to be dropped with an one-line error Unable to export X spans.

Reviewer's Note

Significant AI assistance was used in the process of getting this PR working.

Motivation

Previously, when OTLP exports failed, developers had minimal information to diagnose the root cause. The exporters simply returned a FAILURE constant without any context about:

What type of error occurred
HTTP response codes and messages
Response bodies from the collector
Retry attempts and their outcomes
Exception details

This made troubleshooting production issues extremely difficult, especially for:

Network connectivity problems
SSL/TLS certificate issues
Collector endpoint configuration errors
HTTP timeout scenarios
Server-side errors (4xx/5xx responses)

Changes

1. Enhanced Export Result Type (`sdk/lib/opentelemetry/sdk/trace/export.rb`)

Introduced a new ExportResult class that wraps result codes with optional error context:

class ExportResult
  attr_reader :code, :error, :message

  # Factory methods
  def self.success
  def self.failure(error: nil, message: nil)
  def self.timeout
end

Backwards Compatibility: The ExportResult class overloads the == operator and provides to_i to ensure existing code comparing results to SUCCESS, FAILURE, or TIMEOUT constants continues to work seamlessly.

2. Comprehensive Debug Logging

Added detailed debug-level logging at key points in the export pipeline:

Entry/Exit Points

Function entry with parameters (span count, timeout values)
Function exit with return values
Byte sizes (compressed vs uncompressed)

HTTP Request Flow

Request preparation and compression
Timeout calculations and retry counts
HTTP response codes and messages
Response bodies for error cases

Exception Handling

Exception type and message for all caught exceptions
Retry attempt tracking
Max retry exceeded scenarios

3. Rich Failure Context

All failure scenarios now return detailed context via Export.failure():

HTTP Error Responses

OpenTelemetry::SDK::Trace::Export.failure(
  message: "export failed with HTTP #{response.code} (#{response.message}) after #{retry_count} retries: #{body}"
)

Network Exceptions

OpenTelemetry::SDK::Trace::Export.failure(
  error: e,
  message: "export failed due to SocketError after #{retry_count} retries: #{e.message}"
)

Timeout Scenarios

OpenTelemetry::SDK::Trace::Export.failure(
  message: 'timeout exceeded before sending request'
)

4. Enhanced BatchSpanProcessor Error Reporting

Updated BatchSpanProcessor to extract and log error context:

def report_result(result_code, span_array, error: nil, message: nil)
  if result_code == SUCCESS
    # ... metrics ...
  else
    error_message = if error
                "BatchSpanProcessor: export failed due to #{error.class}: #{error.message}"
              elsif message
                "BatchSpanProcessor: export failed: #{message}"
              else
                "BatchSpanProcessor: export failed (no error details available) \n Call stack: #{caller.join("\n")}"
              end

     OpenTelemetry.handle_error(exception: ExportError.new(span_array), message: error_message)
  end
end

5. Updated Exporters

Applied consistent changes to both:

OTLP default Exporter (exporter/otlp/lib/opentelemetry/exporter/otlp/exporter.rb)
OTLP HTTP Exporter (exporter/otlp-http/lib/opentelemetry/exporter/otlp/http/trace_exporter.rb)

Both now capture exception objects and maintain the error context through the entire export pipeline.

Example Scenarios

Before

ERROR -- : OpenTelemetry error: Unable to export 10 spans

After (with debug logging enabled)

DEBUG -- : OTLP::Exporter#export: Called with 10 spans, timeout=30.0
DEBUG -- : OTLP::Exporter#export: Calling encode for 10 spans
DEBUG -- : OTLP::Exporter#send_bytes: Sending HTTP request
DEBUG -- : OTLP::Exporter#send_bytes: Caught SocketError: Connection refused, retry_count=1
DEBUG -- : OTLP::Exporter#send_bytes: Max retries exceeded for SocketError
ERROR -- : BatchSpanProcessor: export failed due to SocketError: Connection refused - connect(2) for "localhost" port 4318
ERROR -- : OpenTelemetry error: Unable to export 10 spans

tkling · 2025-10-16T13:42:33Z

Random passerby here ~ just want to say thank you @chen-anders! I am knee-deep debugging errors between my ruby app and my OTLP collector and the improvements in this PR would vastly help my efforts.

kaylareopelle

Thanks for opening this PR! This is a problem I've run into myself and I'm glad to see work toward improving the situation.

I'm a little worried about the cost of adding all of these log messages to our existing exporters. I believe the previous approach was taken for performance reasons. We may need to find something in the middle of your current design and the old system to craft a solution. 

One small adjustment in the name of performance could be to update all log messages with interpolation or other method calls to be passed in blocks rather than strings. This will delay evaluation of the strings until the message is logged, rather than running the interpolation regardless of log level. See this post for details.

I need to think about this a little more and can do so next week. Just wanted to let you know we're taking a look.

robertlaurin

I'm blocking this change based on the amount of duplicate logs alone.

robertlaurin · 2025-10-23T16:43:25Z

sdk/lib/opentelemetry/sdk/trace/export/batch_span_processor.rb

+              OpenTelemetry.logger.debug("BatchSpanProcessor#export_batch: exporter=#{@exporter.class.name}")
+              OpenTelemetry.logger.debug("BatchSpanProcessor#export_batch: Exporting batch of #{batch.size} spans with timeout #{timeout}")


So there's a lot of instances where you're emitting two lines when it could be a single log line invocation, string allocation.

Directly after these logs are emitted, when the export function is called this log line is emitted with the exact same information.

OpenTelemetry.logger.debug("OTLP::HTTP::TraceExporter#export: Called with #{span_data&.size || 0} spans, timeout=#{timeout.inspect}")

robertlaurin · 2025-10-23T16:58:07Z

exporter/otlp/lib/opentelemetry/exporter/otlp/exporter.rb

+          return OpenTelemetry::SDK::Trace::Export.failure(message: 'send_bytes called with nil bytes') if bytes.nil?

          @metrics_reporter.record_value('otel.otlp_exporter.message.uncompressed_size', value: bytes.bytesize)
+          OpenTelemetry.logger.debug("OTLP::Exporter#send_bytes: Uncompressed size=#{bytes.bytesize} bytes")


This information is already being reported in the line above.

robertlaurin · 2025-10-23T16:58:20Z

exporter/otlp/lib/opentelemetry/exporter/otlp/exporter.rb

            request.add_field('Content-Encoding', 'gzip')
            body = Zlib.gzip(bytes)
            @metrics_reporter.record_value('otel.otlp_exporter.message.compressed_size', value: body.bytesize)
+            OpenTelemetry.logger.debug("OTLP::Exporter#send_bytes: Compressed size=#{body.bytesize} bytes")


Reported in line above

robertlaurin · 2025-10-23T17:00:04Z

exporter/otlp/lib/opentelemetry/exporter/otlp/exporter.rb

+            OpenTelemetry.logger.debug("OTLP::Exporter#send_bytes: Compressed size=#{body.bytesize} bytes")
          else
            body = bytes
+            OpenTelemetry.logger.debug('OTLP::Exporter#send_bytes: No compression applied')


Does the logger need to be invoked every single time we export without compression? It is configured during initialization and isn't mutable.

robertlaurin · 2025-10-23T17:01:50Z

exporter/otlp/lib/opentelemetry/exporter/otlp/exporter.rb


            case response
            when Net::HTTPOK
+              OpenTelemetry.logger.debug('OTLP::Exporter#send_bytes: SUCCESS - HTTP 200 OK')


A few lines above this you log

OpenTelemetry.logger.debug("OTLP::Exporter#send_bytes: Received response code=#{response.code}, message=#{response.message}")

Then on this line you log the response code, and in most of the subsequent logs you log the response code again.

fbogsany

There's a lot of unnecessary code here. While this may be valuable to add during a debugging session, it does not belong in production code. I've provided some concrete feedback, but there is a lot more to address as well.

fbogsany · 2025-10-23T18:11:19Z

sdk/lib/opentelemetry/sdk/trace/export.rb

          def initialize(spans)
            super("Unable to export #{spans.size} spans")
            @spans = spans
+            @error = error


This does nothing AFAICT. The only error accessible at this point is the method created by attr_reader and it'll return @error, which is nil.

fbogsany · 2025-10-23T18:13:07Z

sdk/lib/opentelemetry/sdk/trace/export.rb

          #
          # @return [Array<OpenTelemetry::SDK::Trace::Span>]
-          attr_reader :spans
+          attr_reader :spans, :error


I assume error is intended to hold a wrapped error of some sort. This is already exposed by StandardError#cause, which will be populated automatically in cases like:

rescue FooError raise ExportError

fbogsany · 2025-10-23T18:14:27Z

sdk/lib/opentelemetry/sdk/trace/export.rb

+        # Factory method for creating a success result
+        # @return [ExportResult]
+        def self.success
+          ExportResult.new(SUCCESS)


This is an unnecessary allocation on every successful export call.

fbogsany · 2025-10-23T18:16:49Z

sdk/lib/opentelemetry/sdk/trace/export.rb

+            case other
+            when Integer
+              @code == other
+            when ExportResult
+              @code == other.code
+            else
+              super
+            end


I think this is sufficient:

Suggested change

case other

when Integer

@code == other

when ExportResult

@code == other.code

else

super

end

other.to_i == @code

fbogsany · 2025-10-23T18:29:42Z

sdk/lib/opentelemetry/sdk/trace/export/batch_span_processor.rb

+              # Log detailed error information if available
+              if error
+                OpenTelemetry.logger.error("BatchSpanProcessor: export failed due to #{error.class}: #{error.message}")
+              elsif message
+                OpenTelemetry.logger.error("BatchSpanProcessor: export failed: #{message}")
+              else
+                OpenTelemetry.logger.error('BatchSpanProcessor: export failed (no error details available)')
+                OpenTelemetry.logger.error("BatchSpanProcessor: call stack:\n#{caller.join("\n")}")
+              end
+


All of this can be implemented effectively in a custom error handler (i.e. via the OpenTelemetry.handle_error call below).

chen-anders added 3 commits October 12, 2025 15:02

BatchSpanProcessor: Improve error message

21e5b71

More detailed error message for old default exporter

81f6d1d

Add a bunch of debug logs to help debugging issues

e1830a9

chen-anders requested review from ahayworth, arielvalentin, dazuma, ericmustin, fbogsany, kaylareopelle, mwear, plantfansam, robbkidd and robertlaurin as code owners October 12, 2025 13:31

chen-anders added 2 commits October 12, 2025 17:38

Fix lint + nil errors

0c4ce31

Add tests for ExportResult

0dbb1a8

chen-anders force-pushed the anders/improve-debugging-ux branch from 3096a1c to 0dbb1a8 Compare October 12, 2025 13:38

kaylareopelle reviewed Oct 17, 2025

View reviewed changes

robertlaurin requested changes Oct 23, 2025

View reviewed changes

fbogsany requested changes Oct 23, 2025

View reviewed changes

chen-anders and others added 2 commits October 24, 2025 02:11

try to address some feedback

c6dd534

Merge branch 'main' into anders/improve-debugging-ux

d93379c

		OpenTelemetry.logger.debug("BatchSpanProcessor#export_batch: exporter=#{@exporter.class.name}")
		OpenTelemetry.logger.debug("BatchSpanProcessor#export_batch: Exporting batch of #{batch.size} spans with timeout #{timeout}")

Improve error reporting / debugging UX with the OTLP default/HTTP exporters #1936

Are you sure you want to change the base?

Improve error reporting / debugging UX with the OTLP default/HTTP exporters #1936

Uh oh!

Conversation

chen-anders commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Note

Motivation

Changes

1. Enhanced Export Result Type (sdk/lib/opentelemetry/sdk/trace/export.rb)

2. Comprehensive Debug Logging

Entry/Exit Points

HTTP Request Flow

Exception Handling

3. Rich Failure Context

HTTP Error Responses

Network Exceptions

Timeout Scenarios

4. Enhanced BatchSpanProcessor Error Reporting

5. Updated Exporters

Example Scenarios

Before

After (with debug logging enabled)

Uh oh!

tkling commented Oct 16, 2025

Uh oh!

kaylareopelle left a comment

Choose a reason for hiding this comment

Uh oh!

robertlaurin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robertlaurin Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fbogsany left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

chen-anders commented Oct 12, 2025 •

edited

Loading

1. Enhanced Export Result Type (`sdk/lib/opentelemetry/sdk/trace/export.rb`)

robertlaurin Oct 23, 2025 •

edited

Loading