Skip to content

Conversation

@carlesarnal
Copy link
Member

@carlesarnal carlesarnal commented Dec 15, 2025

Summary

Add comprehensive OpenTelemetry (OTel) support to Apicurio Registry for unified observability including distributed tracing, metrics export via OTLP, and log correlation with trace context.

Fixes #6939

Root Cause

Apicurio Registry lacked native OpenTelemetry support for distributed tracing and observability. While Prometheus metrics were available via Micrometer, there was no built-in support for:

  • Distributed tracing across REST API and storage layer operations
  • OTLP export to modern observability backends (Jaeger, Grafana Tempo, OpenTelemetry Collector)
  • Trace context correlation in logs
  • Kubernetes operator configuration for OpenTelemetry

This made it difficult to debug performance issues and trace requests across microservice deployments.

Changes

Core Application (app/)

  • pom.xml: Added quarkus-opentelemetry dependency
  • application.properties: Added OpenTelemetry configuration (disabled by default)
    • OTLP exporter endpoint and protocol settings
    • Trace sampling configuration
    • Kafka tracing instrumentation
  • application-prod.properties: Added production profile with 10% trace sampling and JSON logging

Custom Instrumentation

  • StorageTracingInterceptor.java: New interceptor that creates OpenTelemetry spans for storage layer operations using @StorageMetricsApply binding
  • TracingFilter.java: New JAX-RS filter that enriches REST API spans with Apicurio-specific attributes (groupId, artifactId, version)
  • OTelMetricsProvider.java: New provider for custom OpenTelemetry metrics (artifacts created/deleted, versions, schema validations, rule evaluations, search requests)

Operator (operator/)

  • OTelSpec.java: New CRD spec class with fields: enabled, endpoint, protocol, traceSamplingRatio
  • AppSpec.java: Added otel field to expose OpenTelemetry configuration in CRD
  • EnvironmentVariables.java: Added OpenTelemetry environment variable constants
  • OTel.java: New feature class to configure environment variables based on CR spec
  • AppDeploymentResource.java: Wired up OTel feature during reconciliation

Documentation

  • assembly-configuring-the-registry.adoc: Added new section "Configuring observability with OpenTelemetry"
  • ref-registry-all-configs.adoc: Added 11 OpenTelemetry configuration options to observability section
  • proc-registry-observability-otel.adoc: New operator documentation for OTel configuration
  • apicurioregistry3_otel_cr.yaml: Example CR with OpenTelemetry configuration

Docker Compose Example

  • distro/docker-compose/in-memory-with-observability/: New example with complete observability stack
    • docker-compose.yml: Apicurio Registry + Jaeger + Prometheus + Grafana
    • prometheus.yml: Prometheus scrape configuration
    • grafana/provisioning/datasources/datasources.yml: Pre-configured datasources
    • README.md: Usage documentation

Test plan

  • Build the app module: ./mvnw clean install -pl app -am -DskipTests
  • Build the operator module: cd operator && ../mvnw clean compile -DskipTests
  • Verify new classes are in JAR: jar tf app/target/apicurio-registry-app-*.jar | grep -E "(StorageTracingInterceptor|TracingFilter|OTelMetricsProvider)"
  • Verify OTel configuration in properties: unzip -p app/target/apicurio-registry-app-*.jar application.properties | grep otel
  • Test Docker Compose observability stack:
  • Test with OTel disabled (default): Verify application starts normally with QUARKUS_OTEL_ENABLED=false
  • Test with OTel enabled: Verify traces are exported with QUARKUS_OTEL_ENABLED=true
  • Verify backwards compatibility: Prometheus metrics endpoint /q/metrics still works
  • Operator integration test: Deploy CR with spec.app.otel.enabled: true and verify environment variables are set

@carlesarnal carlesarnal force-pushed the implement-otel-support branch 3 times, most recently from 19534a3 to df90419 Compare December 17, 2025 10:35
  - Skip redundant build in local tests job by reusing build artifacts
  - Reduce Awaitility timeouts (LONG: 300s→180s, MEDIUM: 75s→60s, POLL: 5s→2s)
  - Add ensureStrimziInstalled() with double-checked locking to avoid reinstalling
  - Add JUnit @tag annotations for test categorization:
    - SMOKE: basic smoke tests
    - KAFKA: Strimzi/Kafka tests
    - AUTH: Keycloak authentication tests
    - DATABASE: PostgreSQL/MySQL tests
    - FEATURE: general feature tests
    - SLOW: long-running tests

  This enables selective test execution via Maven (e.g., -Dgroups=smoke)
  and reduces overall CI execution time by an estimated 10-20 minutes.
@carlesarnal carlesarnal force-pushed the implement-otel-support branch from 63cf86b to 22571aa Compare December 19, 2025 08:31
@carlesarnal carlesarnal added this to the 3.1.7 milestone Jan 7, 2026
@carlesarnal carlesarnal moved this to Backlog in Registry 3.0 Jan 7, 2026
@carlesarnal carlesarnal removed the status in Registry 3.0 Jan 7, 2026
@carlesarnal carlesarnal moved this to In Progress in Registry 3.0 Jan 7, 2026
EricWittmann
EricWittmann previously approved these changes Jan 7, 2026
Copy link
Member

@EricWittmann EricWittmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very thorough! Docs are updated, docker compose example, even Operator enhancement. 👍

@carlesarnal carlesarnal force-pushed the implement-otel-support branch from 4ee6c93 to 0a44017 Compare January 7, 2026 12:58
@carlesarnal carlesarnal merged commit 9acf7a7 into Apicurio:main Jan 8, 2026
34 checks passed
@carlesarnal carlesarnal deleted the implement-otel-support branch January 8, 2026 07:23
@github-project-automation github-project-automation bot moved this from In Progress to Done in Registry 3.0 Jan 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Observability, logs, metrics, and tracing Request for clearer documentation over metrics support for Apicurio-registry

2 participants