-
Notifications
You must be signed in to change notification settings - Fork 25
Description
1. Background
OpenTelemetry (OTel) is an open standard for instrumenting, collecting, and exporting telemetry data - traces, metrics, and logs from applications and systems. OTel provides vendor-neutral telemetry generation across languages and platforms, with standardized data delivery to backend systems. OTel's scope is limited to data creation and delivery—it doesn't manage observability backends, storage systems, or data analysis.
1.1 Key Components of the OTel Specification
-
APIs & SDKs: Language-specific implementations providing standardized ways to instrument code with runtime logic for collecting, processing, and exporting telemetry data.
-
Instrumentation Libraries: Pre-built, language-specific libraries that automatically add telemetry to popular frameworks and tools, reducing manual instrumentation requirements.
-
Semantic Conventions: Standard naming rules ensuring consistent terminology across telemetry data (e.g., "http.status_code" for web response codes), making traces, metrics, and logs uniform across applications and tools.
-
Collector: A configurable component that receives, processes, and exports telemetry data. Deployable as an agent or gateway, it supports customization through processors for filtering, transforming, or routing data.
-
OTLP (OpenTelemetry Protocol): A transport protocol defining the encoding and delivery mechanism for telemetry data between sources, intermediate nodes, and backends.
1.2 OpenTelemetry Schema
Telemetry sources and consumers often depend on specific data structures, creating challenges when evolving telemetry data without breaking compatibility with existing consumers. OpenTelemetry addresses this through telemetry schemas, which define necessary transformations to map data between different versions of semantic conventions. These schemas enable producers and consumers to evolve independently, allowing transformation of one version to another as shown in the example Telemetry Schema for V1.30.0. The transformation can be bidirectional higher to lower and vice versa.
1.3 Simple Schema in OpenSearch
The OpenSearch ecosystem has introduced a simple schema for observability for standardizing storage layer to analyze, visualize, and correlate data efficiently. This simple schema is currently based on the OTLP data model and OpenTelemetry semantic conventions. It includes:
- A mapping file for OpenSearch
- A dictionary file with field definitions
- A schema file using JSON schema format
We will discuss more on the future of simple schema in below sections.
2. Goals and Requirements for OTel Integration
2.1 Primary Goals
-
Standards Alignment: Fully embrace OpenTelemetry as the industry standard for observability data, ensuring OpenSearch remains interoperable with the broader observability ecosystem.
-
Schema Evolution Support: Provide mechanisms to handle OpenTelemetry schema changes without breaking existing visualizations or dashboards plugin.
-
Semantic-Aware Visualization: Leverage OpenTelemetry's semantic conventions to build rich, meaningful dashboards that understand the context and relationships of telemetry data fields.
-
Seamless Upgrade Path: Enable smooth transitions between OpenSearch versions with minimal disruption to observability workflows during schema evolution.
2.2 Key Requirements
-
Version Support Documentation: Each OpenSearch plugin that integrates with OpenTelemetry must clearly document:
- Supported OTel schema versions
- Migration paths between versions
- Breaking changes
-
Transformation Flexibility: Support in-flight transformation of telemetry data between schema versions.
-
Custom Attributes: Allow opensearch users to extend standard storage mappings with domain-specific attributes while maintaining compatibility.
-
Balanced Migration: Support existing workloads while enabling adoption of newer schema versions.
3. Integration Approaches
3.1 Proposed Approach: Version-Aware OTel Integration
This approach uses OpenTelemetry semantic conventions throughout the OpenSearch Observability ecosystem, with ingest-time transformation handling schema evolution.
Architecture Overview
The architecture leverages Data Prepper for transformation to ensure consistent target OpenTelemetry version before storage:
- Telemetry sources publish data via OTLP with potentially varying schema versions
- Data Prepper transforms the data to a target OTel version (e.g., 1.30.0)
- Transformed data is stored in OpenSearch with a consistent schema
- OpenSearch Dashboards queries the standardized data structure
Each OpenSearch Dashboards (OSD) version, observability plugin will publish a documented range of OTel versions it supports, based on the fields used in its visualizations and UI components. For example:
OSD Version | Supported OTel Version Range |
---|---|
2.8.0 | 1.15.0 - 1.18.0 |
2.9.0 | 1.17.0 - 1.21.0 |
3.0.0 | 1.20.0 - 1.25.0 |
When customers upgrade their OpenSearch version, they must also update their Data Prepper configuration to target an OTel version within the supported range for the new OpenSearch version. This maintains compatibility between ingested data and dashboard expectations.
Version Publication and Compatibility Matrix:
- Each OpenSearch release observability plugin will publish:
- A minimum supported OTel version
- A maximum supported OTel version
- Documentation will include a compatibility matrix showing which OTel versions work with observability plugin for each OpenSearch version
- Migration guides will outline required changes when moving between versions
Ingest-Time Transformation:
- A schema transformation processor in Data Prepper will:
- Convert data from source OTel version to target version
- Support configuration via
target_otel_version
parameter - Handle data with and without explicit
schema_url
- Apply transformations based on published OpenTelemetry schema registry.
Schema Evolution Handling:
- Field aliases will maintain backward compatibility
- New indices can be created when major schema changes occur
- Schema versioning metadata will be stored with each document
Dashboard Compatibility:
- Dashboards will automatically adapt to schema versions within their supported range
3.2 Alternative Approaches
3.2.1 Canonical Internal Schema
Define a single, stable canonical schema in OpenSearch and map all incoming OTel data into it.
Advantages:
- Provides consistent queries independent of OTel version changes
- Simplifies dashboard development with a stable data model
- Reduces need for field aliases or runtime mapping
Disadvantages:
- Diverges from industry standard, creating a proprietary schema
- Limits storage of raw OTel fields that may be valuable to users
- Requires ongoing maintenance to incorporate new OTel conventions
This approach is not recommended as it creates a new standard with an internal schema, diverging from OTel standards.
Also, in the proposed approach the dashboards layer actually leverages stable fields from OpenTelemetry conventions that rarely change, ensuring automatic compliance with new OTel versions.
3.2.2 Raw Storage with Version-Aware Queries
Store OTel data in its raw form with the schema_url
, handling version differences at query time.
Advantages:
- Preserves complete fidelity of original telemetry data
- Supports advanced users who need access to raw attributes
- Eliminates ingest-time transformation overhead
Disadvantages:
- Significantly complicates dashboard development with version-aware queries
- Creates performance overhead from runtime transformations during queries
- Increases storage requirements with redundant or deprecated fields
- Makes cross-version correlation more complex
4. OpenTelemetry Storage Strategy
4.1 Design Principles
The OpenSearch storage strategy for OpenTelemetry data follows these key principles:
-
OTLP Alignment: Our storage model aligns with the OpenTelemetry Protocol (OTLP) transport model, utilizing its JSON representation with lowerCamelCase field naming conventions.
-
Semantic Convention Adherence: Field names under
attributes
,resourceAttributes
, andinstrumentationScopeAttributes
adhere to OpenTelemetry's semantic conventions, following dot notation to maintain a clear namespace hierarchy. -
Naming Standardization: Custom keys under attributes follow the naming conventions specified by OTel to ensure consistency.
-
Index Organization: We provide recommended index naming conventions for different telemetry signals to simplify data organization and lifecycle management.
This alignment ensures consistency between the OpenSearch storage model and the OTLP transport model, facilitating easy data interchange and reducing the need for complex transformations during data ingestion and retrieval.
4.2 Component Templates for telemetry indices.
To provide flexibility while maintaining standardization, we suggest to use composable index temaplates:
{
"index_templates": [
{
"name": "otel-traces-template",
"index_patterns": ["ss4o_traces-*-*-*"],
"composed_of": [
"otel-core-fields", // Required fields for visualization
"otel-http-attributes", // Optional HTTP semantic conventions
"otel-k8s-attributes", // Optional Kubernetes conventions
"customer-extensions" // Customer-specific extensions
],
"priority": 100
}
]
}
This approach allows:
- Core fields to be strictly enforced
- Optional semantic convention categories to be selectively included
- Custom attributes depending on user's requirements.
4.3 Tracing Core Fields Mapping
Field Path | Type | Properties | Comments |
---|---|---|---|
traceId | keyword | ||
spanId | keyword | ||
parentSpanId | keyword | ||
traceState | keyword | ||
traceFlags | integer | ||
kind | keyword | ||
startTime | date_nanos | ||
endTime | date_nanos | ||
durationInNanos | long | ||
status.code | integer | ||
status.message | text | ||
events | nested | timestamp: date_nanos name: text attributes: object (dynamic) droppedAttributesCount: integer |
|
droppedEventsCount | integer | ||
resource.schemaUrl | keyword | ||
resource.attributes | object | dynamic: true | |
resource.droppedAttributesCount | integer | ||
instrumentationScope.name | keyword | ||
instrumentationScope.version | keyword | ||
instrumentationScope.schemaUrl | keyword | ||
instrumentationScope.attributes | object | dynamic: true | |
links | nested | traceId: keyword spanId: keyword traceState: keyword attributes: object (dynamic) droppedAttributesCount: integer |
|
droppedLinksCount | integer | ||
attributes | object | dynamic: true | |
attributes.data_stream.dataset | keyword | ||
attributes.data_stream.namespace | keyword | ||
attributes.data_stream.type | keyword | ||
droppedAttributesCount | integer |
4.4 Logging Core Fields Mapping
Field Path | Type | Properties | Comments |
---|---|---|---|
@timestamp | date_nanos | ||
observedTimestamp | date_nanos | ||
traceId | keyword | ||
spanId | keyword | ||
severity.text | keyword | ||
severity.number | integer | ||
body | text | ||
droppedAttributesCount | integer | ||
eventName | text | ||
resource.schemaUrl | keyword | ||
resource.attributes | object | dynamic: true | |
instrumentationScope.name | keyword | ||
instrumentationScope.version | keyword | ||
instrumentationScope.schemaUrl | keyword | ||
instrumentationScope.attributes | object | dynamic: true | |
attributes | object | dynamic: true | |
attributes.data_stream.dataset | keyword | ||
attributes.data_stream.namespace | keyword | ||
attributes.data_stream.type | keyword |
4.5 Handling Mapping Explosion
To prevent mapping explosion, we implement a multi-tier attribute approach:
{
"mappings": {
"properties": {
"attributes": {
"type": "object",
"dynamic": true,
"properties": {
// Core attributes with proper mapping
"http.method": { "type": "keyword" },
"http.status_code": { "type": "integer" }
// Additional mapped fields...
}
},
"attributes_flat": {
"type": "flattened",
"depth_limit": 10,
"ignore_above": 1024
},
"attributes_raw": {
"type": "keyword",
"index": false,
"doc_values": false
}
}
}
}
Data Prepper configuration routes attributes appropriately:
processor:
otel_attribute_router:
default_target: "attributes"
routing_rules:
- pattern: "custom.*"
target: "attributes_flat"
- pattern: "experimental.*"
target: "attributes_flat"
- pattern: "high.cardinality.*"
target: "attributes_flat"
Fields utilized by dashboards plugin should always follow the mappings specified in otel-core-fields
template.
4.6 Custom Semantic Convention Support
Organizations often extend OpenTelemetry with custom attributes specific to their domain. Our approach supports these custom semantic conventions through:
- Index Template Customization:
- Index templates include extension points for custom attributes
- Organizations can add their own mapping templates that extend the base OTel templates
- Registry Integration:
- Explore this while integrating opentelemtry-weaver to generate component templates.
4.7 OTel Schema Evolution Reference
The table below summarizes common schema changes in OpenTelemetry and how they're handled:
Change Type | Description | Handling Strategy | Example |
---|---|---|---|
Field Renames | Attribute names change between versions | Transform at ingest time; use field aliases | messaging.consumer_id → messaging.consumer.id |
Added Fields | New attributes introduced | Allow through dynamic mapping | New http.request.method_original field |
Type Changes | Field type modifications | Requires reindex because of datatype changes. | String → integer conversion |
Deprecations | Fields marked as obsolete | Maintain backward compatibility with aliases | Older net.peer.ip still accessible |
Namespace Changes | Fields moved to different contexts | Transform at ingest time; use field aliases | DB-specific fields moved to common namespace |
5. Telemetry Schema Version and OpenSearch Upgrade Strategy
5.1 Index Naming Convention
For all OpenTelemetry data, we recommend the following standardized index naming pattern:
ss4o_{type}-{dataset}-{namespace}-{otel_version}
Where:
- type: Signal type (traces, logs, metrics)
- dataset: Application or service domain (e.g., web, database, backend)
- namespace: Environment or tenant (e.g., prod, dev, customer1)
- otel_version: The target OpenTelemetry version (e.g., v1_18, v1_30)
Examples:
ss4o_traces-payment-prod-v1_30
ss4o_logs-webserver-staging-v1_18
ss4o_metrics-database-dev-v1_25
This naming strategy enables:
- Clear separation between different telemetry types
- Isolation of data from different OTel versions
- Easy implementation of lifecycle policies by signal type
- Simplified migration during upgrades
5.2 Data Prepper Schema Version Strategy
- Maintain a consistent
target_otel_version
in Data Prepper to ensure reliable data correlation - Only update
target_otel_version
when upgrading OpenSearch to a version requiring newer OTel schema - Each OpenSearch release supports specific OTel version ranges:
- Newer releases support newer OTel versions
- Backward compatibility is maintained
- Version requirements and compatibility are documented
Example:
processor:
otel_schema_transformer:
target_otel_version: "1.30.0" # Keep stable unless upgrade requires change
store_metadata: true
5.3 OpenSearch Upgrade Strategy
When upgrading OpenSearch with OTel integration, follow this strategy:
-
Pre-Upgrade Assessment:
- Check if current
target_otel_version
is supported in the new OpenSearch version - Review compatibility matrix to identify OTel version changes required
- Check if current
-
If Target OTel Version Change Is Required:
- Create new indices with the updated naming convention reflecting the new target OTel version
- Example:
ss4o_traces-payment-prod-v1_30
replacesss4o_traces-payment-prod-v1_18
-
Field Alias Configuration:
- Apply field aliases to older indices to maintain compatibility with the new OTel version
- Create an alias index pattern that spans both old and new indices
- Example:
PUT ss4o_traces-payment-prod-v1_18/_mapping { "properties": { "attributes.messaging.consumer.id": { "type": "alias", "path": "attributes.messaging.consumer_id" } } }
-
Data Prepper Configuration Update:
- Update Data Prepper to target the new OTel version
- Direct new data to the new indices
- Example:
processor: otel_schema_transformer: target_otel_version: "1.30.0" store_metadata: true sink: opensearch: index: "ss4o_traces-payment-prod-v1_30"
-
Dashboard Transition:
- Create index patterns that span both old and new indices
- Validate dashboards against the new indices
- Update visualization references as needed
-
Legacy Data Management:
- Implement appropriate retention policies for older indices
- Consider reindexing critical historical data to the new schema if needed
5.4 Instrumentation Library Upgrades
When telemetry sources (instrumentation libraries) upgrade their OTel version:
-
Assess Compatibility:
- Review changes between source OTel version and target OTel version
- Check if Data Prepper's schema transformation can handle the differences
-
Gradual Rollout:
- Roll out instrumentation upgrades incrementally
- Monitor for transformation errors or missing data
6. Next Steps
6.1 Open Questions
-
Missing Schema URL Strategy
For data without a schema URL, we recommend:
- Default to the highest supported OTel version for the target OpenSearch version
- Allow explicit configuration in Data Prepper to specify assumed version
- Store the assumed version in the document metadata for future reference
-
Performance Optimization
- To minimize transformation overhead, use the OTLP collector to pre-transform high-volume telemetry
6.2 Implementation Tasks
Component | Task | Description |
---|---|---|
Data Prepper | Schema transformation processor | Implement converter supporting all OTel versions in supported range |
Data Prepper | Configuration options | Add target_otel_version parameter and validation logic |
Data Prepper | Validation mechanisms | Add validation for schema URL and version compatibility |
OpenSearch Catalog | Version compatibility matrix | Create documentation of supported OTel versions per OpenSearch release |
OpenSearch Catalog | Index templates | Update index templates to support ss4o naming convention with OTel version |
Observability Plugin | UI compatibility | Update visualizations to handle version-specific fields |
Documentation | Migration guides | Create guides for upgrading between versions |