-
Notifications
You must be signed in to change notification settings - Fork 25
Description
Updated Simple Schema for Logs, Traces, and Metrics
The doc is still in draft but good for initial read
Background
Currently, different schemas are used across various layers of the opensearch ecosystem. Data prepper understands OpenTelemetry (OTel) schema and ingests data into OpenSearch using its own schema, while the Observability plugin assumes a separate simple schema for visualization. Although both aim to align with OTel, they diverge in semantic and naming conventions, leading to interoperability challenges, lack of standardization, and complexity in schema migration.
Simple Schema Vision
We aim to upgrade the Simple Schema into a standardized storage layer schema that aligns with OpenTelemetry (OTel) semantic conventions while optimizing query latencies for visualization in OpenSearch Dashboards (OSD). By enforcing exact OTel naming conventions, this schema standardizes observability data across opensearch ecosystem, simplifying integrations and reducing inconsistencies. A unified approach allows users to analyze, visualize, and correlate data efficiently while ensuring long-term compatibility with evolving observability standards.
Tracing and Log Schema Design
- In designing our tracing and log schemas, we have aligned our initial model with the OpenTelemetry Protocol (OTLP) transport model, utilizing its JSON representation with lowerCamelCase field naming conventions.
- For field names under
attributes
,resourceAttributes
, andinstrumentationScopeAttributes
, we adhere to OpenTelemetry's semantic conventions, following dot notation to maintain a clear namespace hierarchy. - Any custom keys under attributes should follow thenaming conventions specified by Otel.
- Simple schema also suggests index naming convention for different signals.
This alignment ensures consistency between our storage schema and the OTLP transport model, facilitating easy data interchange and reducing the need for complex transformations during data ingestion and retrieval.
Tracing Schema Mapping
{
"mappings": {
"properties": {
// Trace Identification Fields
"traceId": { "type": "keyword" },
"spanId": { "type": "keyword" },
"parentSpanId": { "type": "keyword" },
"traceState": { "type": "keyword" },
"traceFlags": { "type": "integer" },
// Span Metadata
"kind": { "type": "keyword" },
"startTime": { "type": "date_nanos" },
"endTime": { "type": "date_nanos" },
"durationInNanos": { "type": "long" },
// Span Status Information
"status": {
"properties": {
"code": { "type": "integer" },
"message": { "type": "text" }
}
},
// Span Events
"events": {
"type": "nested",
"properties": {
"timestamp": { "type": "date_nanos" },
"name": { "type": "text" },
"attributes": { "type": "object", "dynamic": true },
"droppedAttributesCount": { "type": "integer" }
}
},
"droppedEventsCount": { "type": "integer" },
// Resource Attributes
"resource": {
"properties": {
"schemaUrl": { "type": "keyword" }, // Schema URL for resource
"attributes": { "type": "object", "dynamic": true },
"droppedAttributesCount": { "type": "integer" }
}
},
// Instrumentation Scope
"instrumentationScope": {
"properties": {
"name": { "type": "keyword" },
"version": { "type": "keyword" },
"schemaUrl": { "type": "keyword" }, // Schema URL for instrumentation scope
"attributes": { "type": "object", "dynamic": true }
}
},
// Span Links
"links": {
"type": "nested",
"properties": {
"traceId": { "type": "keyword" },
"spanId": { "type": "keyword" },
"traceState": { "type": "keyword" },
"attributes": { "type": "object", "dynamic": true },
"droppedAttributesCount": { "type": "integer" }
}
},
"droppedLinksCount": { "type": "integer" },
// Span Attributes
"attributes": {
"type": "object", "dynamic": true,
"properties": {
"data_stream": {
"properties": {
"dataset": { "type": "keyword" },
"namespace": { "type": "keyword" },
"type": { "type": "keyword" }
}
}
}
},
"droppedAttributesCount": { "type": "integer" }
}
}
}
Logging Schema Mapping
{
"mappings": {
"properties": {
// Base Fields
"@timestamp": { "type": "date_nanos" },
"observedTimestamp": { "type": "date_nanos" },
"traceId": { "type": "keyword" },
"spanId": { "type": "keyword" },
"severity": {
"properties": {
"text": { "type": "keyword" },
"number": { "type": "integer" }
}
},
"body": { "type": "text" },
"droppedAttributesCount": { "type": "integer" },
"eventName": { "type": "text" },
// Resource Attributes
"resource": {
"properties": {
"schemaUrl": { "type": "keyword" }, // Schema URL for resource
"attributes": { "type": "object", "dynamic": true }
}
},
// Instrumentation Scope Attributes
"instrumentationScope": {
"properties": {
"name": { "type": "keyword" },
"version": { "type": "keyword" },
"schemaUrl": { "type": "keyword" }, // Schema URL for instrumentation scope
"attributes": { "type": "object", "dynamic": true }
}
},
// Log Record Attributes
"attributes": {
"type": "object", "dynamic": true,
"properties": {
"data_stream": {
"properties": {
"dataset": { "type": "keyword" },
"namespace": { "type": "keyword" },
"type": { "type": "keyword" }
}
}
}
}
}
}
}
Handling Alternative Storage Conventions/Schemas
For users unable to fully adopt the Simple Schema due to legacy data structures, two possible transition options will be offered.
###1. Custom Correlations
Cons:
- Correlated ... Only a minimum set of analytics will be available with this solution. To unlock the full roadmap of analytics correlated across tracing, logs, and metrics, recommend eventual transition to standardized simple schema outlined above.
###2. Aliasing
If [need info condition on when aliasing would work, OpenSearch’s aliasing mechanisms could be leveraged. This allows field names in legacy schemas to be mapped to the standardized schema without breaking existing queries. This approach enables a gradual transition to the new schema while maintaining compatibility with current data pipelines. We recommend for long term, customers transition eventually to the new standardized schema, to unlock upcoming observability log analytics and metrics feature sets. - stitch together container blah blah blah
Example Field Aliasing in opensearch
If a user previously stored timestamps under eventTime
but now needs to align with the standardized @timestamp
field, they can define an alias in the OpenSearch mapping:
{
"mappings": {
"properties": {
"eventTime": {
"type": "alias",
"path": "@timestamp"
}
}
}
}
This allows queries that reference eventTime
to automatically resolve to @timestamp
, ensuring seamless migration without breaking anything.
Cons:
- If the UI is built around the standard schema, users familiar with the ingestion process might initially find it confusing when queries reference different field names.
- This may not work for all customer use cases/architectures. For example if [list out when this would not work].
Next Steps
- Remove default dedoting and flattening in dataprepper ingestion layer.
- Change current schemas to follow OTLP transport model and follow OTEL Semantic Conventions:
Ensure all fields across logs, traces, and metrics strictly adhere to OTeL guidelines. Remove dedotting and flattening at ingestion layer. - Correct Discrepancies in Old Mapping:
Identify and fix inconsistencies in the mapping types of previous schema. - Schema Awareness Across the Board:
Ensure the schema remains consistent across Data Prepper ingestion, opensearch storage, and OSD visualization layers, preventing inconsistencies in data interpretation. - Optimize for Query Performance:
Introduce precomputed or auto-calculated fields to accelerate query performance and improve overall efficiency. - Ensure Schema Evolution Support:
Implement schema versioning and transformation strategies that align with OTeL's schema evolution model.