Skip to content

Comments

Add OpenTelemetry OTLP exporter with full SDK support#218

Open
etiennep wants to merge 2 commits intomainfrom
add-opentelemetry-otlp-support
Open

Add OpenTelemetry OTLP exporter with full SDK support#218
etiennep wants to merge 2 commits intomainfrom
add-opentelemetry-otlp-support

Conversation

@etiennep
Copy link

@etiennep etiennep commented Feb 6, 2026

Summary

Adds production-ready OpenTelemetry Protocol (OTLP) exporter using the official OpenTelemetry SDK with comprehensive support for both gRPC and HTTP/Protobuf transports.

Features

Dual Transport Support: gRPC and HTTP/Protobuf protocols
Environment Variables: Full OTEL_* environment variable support
Resource Detection: AWS (EC2, ECS, EKS, Lambda), GCP, Azure, K8s, host, process
All Metric Types: Counter, Gauge, Histogram with proper semantics
Tag Conversion: Automatic stats tags → OpenTelemetry attributes
Production Ready: Thread-safe, tested, documented

Usage

// Simple - uses environment variables
handler, err := otlp.NewSDKHandlerFromEnv(ctx)
stats.Register(handler)

// Or explicit configuration
handler, err := otlp.NewSDKHandler(ctx, otlp.SDKConfig{
    Protocol: otlp.ProtocolGRPC,
    Endpoint: "localhost:4317",
})

Implementation Highlights

  • Gauge Semantics: Uses UpDownCounter with delta calculation to maintain absolute value semantics (workaround until stable OTel SDK adds Gauge)
  • Context Handling: Background context for metric recording prevents context cancellation issues
  • Performance: Lock-free reads for instrument lookup, efficient caching
  • Resource Detection: Automatic cloud provider metadata detection

Documentation

Testing

  • ✅ Unit tests for all metric types
  • ✅ Gauge behavior verification
  • ✅ HTTP and gRPC protocol tests
  • ✅ Value type conversion tests
  • ✅ Performance benchmarks

Changes

  • Added otlp/sdk_handler.go - Main OpenTelemetry SDK integration
  • Added otlp/sdk_handler_test.go - Comprehensive tests
  • Added otlp/example_test.go - Usage examples
  • Added otlp/README.md - Complete documentation
  • Added otlp/IMPLEMENTATION_NOTES.md - Design decisions
  • Updated README.md - Added OpenTelemetry backend overview
  • Updated HISTORY.md - Added v5.9.0 release notes
  • Updated version/version.go - Bumped to 5.9.0
  • Updated otlp/go.mod - Added OpenTelemetry SDK dependencies

Backward Compatibility

Fully backward compatible - This is a new feature addition that doesn't change existing APIs. The legacy otlp.Handler remains available for existing users.

🤖 Generated with Claude Code

@etiennep etiennep force-pushed the add-opentelemetry-otlp-support branch from bad79e6 to e6d05d8 Compare February 6, 2026 10:26
sccoache
sccoache previously approved these changes Feb 6, 2026
@etiennep etiennep force-pushed the add-opentelemetry-otlp-support branch 2 times, most recently from 724f5f7 to a82c5e2 Compare February 6, 2026 17:01
Implement production-ready OpenTelemetry Protocol (OTLP) exporter using
the official OpenTelemetry SDK with support for both gRPC and HTTP
transports.

Features:
- gRPC and HTTP/Protobuf protocol support
- Full OTEL_* environment variable integration
- Automatic resource detection (AWS, GCP, Azure, K8s)
- Counter, Gauge, and Histogram metric types
- Tag to attribute conversion
- Thread-safe instrument caching
- Proper gauge semantics via delta calculation

Implementation:
- Uses UpDownCounter for gauges with delta tracking to maintain
  absolute value semantics (workaround until stable SDK adds Gauge)
- Background context for recording to avoid cancellation issues
- Lock-free reads for instrument lookup in hot path
- Comprehensive tests and benchmarks

Documentation:
- Complete README with configuration examples
- Cloud resource detector usage guides
- Implementation notes explaining design decisions
- Example code for common use cases

Bumps version to 5.9.0

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@etiennep etiennep force-pushed the add-opentelemetry-otlp-support branch from a82c5e2 to 82249fe Compare February 6, 2026 17:08

### 3. Instrument Caching

**Implementation**: Thread-safe two-level locking pattern
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell if this is internal to the stats library or external - if users need to know about this.


### Default: Cumulative Temporality

**Decision**: Use cumulative temporality for all metric instruments (Prometheus-compatible)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rewrite this to be aimed more at people who might be curious - it's weird to talk about a "decision" without a discussion of the tradeoffs that led to that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe it just needs to be presented in reverse order

- ✅ Resource detection
- ✅ Production-ready

2. **Handler** (Legacy): Custom OTLP implementation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have internal use of this?

If so I'd like to expand on why people shouldn't use this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asked claude to search for usage and it didn't find any. Adding clear deprecation notice.

// For gRPC: "localhost:4317"
// For HTTP: "http://localhost:4318"
// If empty, uses OTEL_EXPORTER_OTLP_ENDPOINT environment variable
Endpoint string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note people frequently get tripped up between "Endpoint" and "EndpointURL" we should probably note the difference here and say this is explicitly "Endpoint"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oooh good point.

This commit improves the OTLP handler API and prepares for v6 by
deprecating the legacy alpha handler implementation.

Breaking Changes:
- Rename SDKConfig.Endpoint to EndpointURL to clarify it requires
  a full URL with scheme (http:// or https://)
- Use WithEndpointURL instead of WithEndpoint to avoid known gRPC bug
  when using http:// scheme
- Remove enforced defaults for ExportInterval and ExportTimeout,
  allowing SDK to use its own defaults (60s and 30s respectively)

Deprecations:
- Deprecate otlp.Handler (Alpha since 2022, minimal usage)
- Deprecate otlp.HTTPClient
- Deprecate otlp.NewHTTPClient()
All will be removed in v6.0.0. Migration path provided in deprecation
notices with code examples.

Improvements:
- Add Example_fullyConfiguredByEnvironment showing empty SDKConfig usage
- Enhance IMPLEMENTATION_NOTES.md with user-focused temporality explanation
- Clarify instrument caching is internal implementation detail
- Update all examples and tests to use EndpointURL with proper scheme
- Add blank lines around markdown lists to fix linting warnings

Documentation:
- Update HISTORY.md with comprehensive v5.9.0 release notes
- Document breaking changes and migration path
- Add notes about exponential histograms and temporality configuration
- Update all code examples throughout documentation

All tests pass. No functional changes to SDKHandler behavior.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants