|
| 1 | +# Environment Variable Specification for Context and Baggage Propagation |
| 2 | + |
| 3 | +This is a proposal to add Environment Variables to the OpenTelemetry |
| 4 | +specification as carriers for context and baggage propagation between |
| 5 | +processes. |
| 6 | + |
| 7 | +## Table of Contents |
| 8 | + |
| 9 | +* [Motivation](#motivation) |
| 10 | +* [Design](#design) |
| 11 | + * [Example Context](#example-context) |
| 12 | + * [Distributed Tracing in OpenTofu Prototype Example](#distributed-tracing-in-opentofu-prototype-example) |
| 13 | +* [Core Specification Changes](#core-specification-changes) |
| 14 | + * [UNIX](#unix-limitations) |
| 15 | + * [Windows](#windows-limitations) |
| 16 | + * [Allowed Characters](#allowed-characters) |
| 17 | +* [Trade-offs and Mitigations](#trade-offs-and-mitigations) |
| 18 | + * [Case-sensitivity](#case-sensitivity) |
| 19 | + * [Security](#security) |
| 20 | +* [Prior Art and Alternatives](#prior-art-and-alternatives) |
| 21 | + * [Alternatives and why they were not chosen](#alternatives-and-why-they-were-not-chosen) |
| 22 | +* [Open Questions](#open-questions) |
| 23 | +* [Future Possibilities](#future-possibilities) |
| 24 | + |
| 25 | +## Motivation |
| 26 | + |
| 27 | +The motivation for defining the specification for context and baggage |
| 28 | +propagation by using environment variables as carriers stems from the long open |
| 29 | +[issue #740][issue-740] on the OpenTelemetry Specification repository. This |
| 30 | +issue has been open for such a long time that multiple implementations now |
| 31 | +exist using `TRACEPARENT` and `TRACESTATE` environment variables. |
| 32 | + |
| 33 | +[Issue #740][issue-740] identifies several use cases in systems that do not |
| 34 | +communicate across bounds by leveraging network communications such as: |
| 35 | + |
| 36 | +* ETL |
| 37 | +* Batch |
| 38 | +* CI/CD systems |
| 39 | + |
| 40 | +Adding arbitrary [Text Map propagation][tmp] through environment variable carries into |
| 41 | +the OpenTelemetry Specification will enable distributed tracing within the |
| 42 | +above listed systems. |
| 43 | + |
| 44 | +There has already been a significant amount of [Prior Art](#prior-art) built |
| 45 | +within the industry and **within OpenTelemetry** to accomplish the immediate needs, |
| 46 | +however, OpenTelemetry at this time does not define the specification for this |
| 47 | +form of propagation. |
| 48 | + |
| 49 | +Notably, as we define semantic conventions within the [CI/CD Working Group][cicd-wg], |
| 50 | +we'll need the specification defined for the industry to be able to adopt |
| 51 | +native tracing within CI/CD systems. |
| 52 | + |
| 53 | +[cicd-wg]: https://github.com/open-telemetry/community/blob/main/projects/ci-cd.md |
| 54 | +[issue-740]: https://github.com/open-telemetry/opentelemetry-specification/issues/740#issue-665588273 |
| 55 | +[tmp]: https://opentelemetry.io/docs/specs/otel/context/api-propagators/#textmap-propagator |
| 56 | + |
| 57 | +## Design |
| 58 | + |
| 59 | +To propagate context and baggage between parent, sibling, and child processes |
| 60 | +in systems where network communication does not occur between processes, a |
| 61 | +specification using key-value pairs injected into the environment can be read |
| 62 | +and produced by an arbitrary TextMapPropagator. |
| 63 | + |
| 64 | +### Example Context |
| 65 | + |
| 66 | +Consider the following diagram in the context of process forking: |
| 67 | + |
| 68 | +> Note: The diagram is simply an example and simplification of process forking. |
| 69 | +> There are other ways to spawn processes which are more performant like |
| 70 | +> exec(). |
| 71 | +
|
| 72 | + |
| 73 | + |
| 74 | +In the above diagram, a parent process is forked to spawn a child process, |
| 75 | +inheriting the environment variables from the original parent. The environment |
| 76 | +variables defined here, `TRACEPARENT`, `TRACESTATE`, and `BAGGAGE` are used to |
| 77 | +propagate context to the child process such that it can be tied to the parent. |
| 78 | +Without `TRACEPARENT`, a tracing backend would not be able to connect the child |
| 79 | +process spans to the parent span, forming an end-to-end trace. |
| 80 | + |
| 81 | +> Note: While the below exclusively follows the W3C Specification translated |
| 82 | +> into environment variables, this proposal is not exclusive to W3C and is |
| 83 | +> instead focused on the mechanism of Text Map Propagation with a potential set |
| 84 | +> of well-known environment variable names. See the [Core Specification |
| 85 | +> Changes](#core-specification-changes) section for more information. |
| 86 | +
|
| 87 | +Given the above example aligning with the W3C Specification, the following is |
| 88 | +a contextual mapping of environment variables to headers defined by W3C. |
| 89 | + |
| 90 | +The `traceparent` (lowercase) header is defined in the [W3C |
| 91 | +Trace-Context][w3c-parent] specification and includes the following valid |
| 92 | +fields: |
| 93 | + |
| 94 | +* `version` |
| 95 | +* `trace-id` |
| 96 | +* `parent-id` |
| 97 | +* `trace-flags` |
| 98 | + |
| 99 | +This could be set in the environment as follows: |
| 100 | + |
| 101 | +```bash |
| 102 | +export TRACEPARENT=00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01 |
| 103 | +``` |
| 104 | + |
| 105 | +> Note: The value of TRACEPARENT is a combination of the above field values as |
| 106 | +> unsigned integer values serialized as ASCII strings, delimited by `-`. |
| 107 | +
|
| 108 | +The `tracestate` (lowercase) header is defined in [W3C |
| 109 | +Trace-State][w3c-state] and can include any opaque value in a key-value pair |
| 110 | +structure. Its goal is to provide additional vendor-specific trace information. |
| 111 | + |
| 112 | +The `baggage` (lowercase) header is defined in [W3C Baggage][w3c-bag] |
| 113 | +and is a set of key-value pairs to propagate context between signals. In |
| 114 | +OpenTelemetry, baggage is propagated through the [Baggage API][bag-api]. |
| 115 | + |
| 116 | +[w3c-parent]: https://www.w3.org/TR/trace-context-2/#traceparent-header-field-values |
| 117 | +[w3c-state]: https://www.w3.org/TR/trace-context-2/#tracestate-header |
| 118 | +[w3c-bag]: https://www.w3.org/TR/baggage/#baggage-http-header-format |
| 119 | + |
| 120 | +#### Distributed Tracing in OpenTofu Prototype Example |
| 121 | + |
| 122 | +Consider this real world example OpenTofu Controller Deployment. |
| 123 | + |
| 124 | + |
| 125 | + |
| 126 | +In this model, the OpenTofu Controller is the start of the trace, containing |
| 127 | +the actual trace_id and generating the root span. The OpenTofu controller |
| 128 | +deploys a runner which has its own environment and processes to run OpenTofu |
| 129 | +commands. If one was to trace these processes without a carrier mechanism, then |
| 130 | +they would all show up as unrelated root spans in separate traces. However, by |
| 131 | +leveraging environment variables as carriers, each span is able to be tied back |
| 132 | +to the root span, creating a single trace as shown in the image of a real |
| 133 | +OpenTofu trace below. |
| 134 | + |
| 135 | + |
| 136 | + |
| 137 | +Additionally, the `init` span is able to pass baggage to the `plan` and `apply` |
| 138 | +spans. One example of this is module version and repository information. This |
| 139 | +information is only determined and known during the `init` process. Subsequent |
| 140 | +processes only know about the module by name. With `BAGGAGE` the rest of the |
| 141 | +processes are able to understand a key piece of information which allows |
| 142 | +errors to be tied back to original module version and source code. |
| 143 | + |
| 144 | +Defining the specification for Environment Variables as carriers will have a |
| 145 | +wide impact to the industry in enabling better observability to systems outside |
| 146 | +of the normal HTTP microservice architecture. |
| 147 | + |
| 148 | +[w3c-bag]: https://www.w3.org/TR/baggage/#header-name |
| 149 | +[bag-api]: https://opentelemetry.io/docs/specs/otel/baggage/api/ |
| 150 | + |
| 151 | +The above prototype example came from the resources mentioned in [this |
| 152 | +comment][otcom] on the [OpenTofu Tracing RFC][otrfc]. |
| 153 | + |
| 154 | +[otcom]: https://github.com/opentofu/opentofu/pull/2028#issuecomment-2411588695 |
| 155 | +[otrfc]: https://github.com/opentofu/opentofu/pull/2028 |
| 156 | + |
| 157 | +## Core Specification Changes |
| 158 | + |
| 159 | +The OpenTelemetry Specification should be updated with the definitions for |
| 160 | +extending context propagation into the environment through Text Map |
| 161 | +propagators. |
| 162 | + |
| 163 | +This update should include: |
| 164 | + |
| 165 | +* A common set of environment variables like `TRACEPARENT`, `TRACESTATE`, and |
| 166 | + `BAGGAGE` that can be used to propagate context between processes. These |
| 167 | + environment variables names should be overridable for legacy support reasons |
| 168 | + (like using B3), but the default standard should align with the W3C |
| 169 | + specification. |
| 170 | +* A specification for allowed environment names and values due to operating |
| 171 | + system limitations. |
| 172 | +* A specification for how implementers can inject and extract context from the |
| 173 | + environment through a TextMapPropagator. |
| 174 | +* A specification for how processes should update environment variables before |
| 175 | + spawning new processes. |
| 176 | + |
| 177 | +Defining the specification for Environment Variables as carriers for context |
| 178 | +will enable SDK's and other tools to implement getters and setters of context |
| 179 | +in a standard, observable way. Therefore, current OpenTelemetry language |
| 180 | +maintainers will need to develop language specific implementations that adhere |
| 181 | +to the specification. |
| 182 | + |
| 183 | +Two implementations already exist within OpenTelemetry for environment |
| 184 | +variables through the TextMap Propagator: |
| 185 | + |
| 186 | +* [Python SDK][python-env] - This implementation uses environment dictionary as |
| 187 | + the carrier in Python for invoking process to invoked process context |
| 188 | + propagation. This pull request does not appear to have been merged. |
| 189 | +* [Swift SDK][swift-env] - This implementation uses `TRACEPARENT` and |
| 190 | + `TRACESTATE` environment variables alongside the W3C Propagator to inject and |
| 191 | + extract context. |
| 192 | + |
| 193 | +Due to programming conventions, operating system limitations, prior art, and |
| 194 | +information below, it is recommended to leverage upper-cased environment |
| 195 | +variables for the carrier that align with context propagator specifications. |
| 196 | + |
| 197 | +[python-env]: https://github.com/Div95/opentelemetry-python/tree/feature/env_propagator/propagator/opentelemetry-propagator-env |
| 198 | +[swift-env]: https://github.com/open-telemetry/opentelemetry-swift/blob/main/Sources/OpenTelemetrySdk/Trace/Propagation/EnvironmentContextPropagator.swift |
| 199 | + |
| 200 | +### UNIX Limitations |
| 201 | + |
| 202 | +UNIX system utilities use upper-case for environment variables and lower-case |
| 203 | +are reserved for applications. Using upper-case will prevent conflicts with |
| 204 | +internal application variables. |
| 205 | + |
| 206 | +Environment variable names used by the utilities in the Shell and Utilities |
| 207 | +(XCU) specification consist solely of upper-case letters, digits and the "_" |
| 208 | +(underscore) from the characters defined in Portable Character Set. Other |
| 209 | +characters may be permitted by an implementation; applications must tolerate |
| 210 | +the presence of such names. Upper-case and lower-case letters retain their |
| 211 | +unique identities and are not folded together. The name space of environment |
| 212 | +variable names containing lower-case letters is reserved for applications. |
| 213 | +Applications can define any environment variables with names from this name |
| 214 | +space without modifying the behaviour of the standard utilities. |
| 215 | + |
| 216 | +Source: [The Open Group, The Single UNIX® Specification, Version 2, Environment Variables](https://pubs.opengroup.org/onlinepubs/7908799/xbd/envvar.html) |
| 217 | + |
| 218 | +### Windows Limitations |
| 219 | + |
| 220 | +Windows is case-insensitive with environment variables. Despite this, the |
| 221 | +recommendation is to use upper-case names across OS. |
| 222 | + |
| 223 | +Some languages already do this. This [CPython issue][cpython] discusses how |
| 224 | +Python automatically upper-cases environment variables. The issue was merged and |
| 225 | +this [documentation][cpython-doc] was added to clarify the behavior. |
| 226 | + |
| 227 | +[cpython]: https://github.com/python/cpython/issues/101754 |
| 228 | +[cpython-doc]: https://docs.python.org/3/library/os.html#os.environ |
| 229 | + |
| 230 | +### Allowed characters |
| 231 | + |
| 232 | +To ensure compatibility, specification for Environment Variables SHOULD adhere |
| 233 | +to the current specification for `TextMapPropagator` where key/value pairs MUST |
| 234 | +only consist of US-ASCII characters that make up valid HTTP header fields as |
| 235 | +per RFC 7230. |
| 236 | + |
| 237 | +Environment variable keys, SHOULD NOT conflict with common known environment |
| 238 | +variables like those described in [IEEE Std 1003.1-2017][std1003]. |
| 239 | + |
| 240 | +One key note is that windows disallows the use of the `=` character in |
| 241 | +environment variable names. See [MS Env Vars][ms-env] for more information. |
| 242 | + |
| 243 | +There is also a limit on how many characters an environment variable can |
| 244 | +support which is 32,767 characters. |
| 245 | + |
| 246 | +[std1003]: https://pubs.opengroup.org/onlinepubs/9799919799/ |
| 247 | + |
| 248 | +[ms-env]: https://learn.microsoft.com/en-us/windows/win32/procthread/environment-variables |
| 249 | + |
| 250 | +## Trade-offs and Mitigations |
| 251 | + |
| 252 | +### Case-sensitivity |
| 253 | + |
| 254 | +On Windows, because environment variable keys are case insensitive, there is a |
| 255 | +chance that automatically instrumented context propagation variables could |
| 256 | +conflict with existing application environment variables. It will be important |
| 257 | +to denote this behavior and document how languages mitigate this issue. |
| 258 | + |
| 259 | +### Security |
| 260 | + |
| 261 | +Do not put sensitive information in environment variables. Due to the nature of |
| 262 | +environment variables, an attacker with the right access could obtain |
| 263 | +information they should not be privy too. Additionally, the integrity of the |
| 264 | +environment variables could be compromised. |
| 265 | + |
| 266 | +## Prior Art and Alternatives |
| 267 | + |
| 268 | +There are many users of `TRACEPARENT` and/or `TRACESTATE` environment variables |
| 269 | +mentioned in [opentelemetry-specification #740](https://github.com/open-telemetry/opentelemetry-specification/issues/740): |
| 270 | + |
| 271 | +* [Jenkins OpenTelemetry Plugin](https://github.com/jenkinsci/opentelemetry-plugin) |
| 272 | +* [otel-cli generic wrapper](https://github.com/equinix-labs/otel-cli) |
| 273 | +* [Maven OpenTelemetry Extension](https://github.com/cyrille-leclerc/opentelemetry-maven-extension) |
| 274 | +* [Ansible OpenTelemetry Plugin](https://github.com/ansible-collections/community.general/pull/3091) |
| 275 | +* [go-test-trace](https://github.com/rakyll/go-test-trace/commit/22493612be320e0a01c174efe9b2252924f6dda9) |
| 276 | +* [Concourse CI](https://github.com/concourse/docs/pull/462) |
| 277 | +* [BuildKite agent](https://github.com/buildkite/agent/pull/1548) |
| 278 | +* [pytest](https://github.com/chrisguidry/pytest-opentelemetry/issues/20) |
| 279 | +* [Kubernetes test-infra Prow](https://github.com/kubernetes/test-infra/issues/30010) |
| 280 | +* [hotel-california](https://github.com/parsonsmatt/hotel-california/issues/3) |
| 281 | + |
| 282 | +Additionally, there was a prototype implementation for environment variables as |
| 283 | +context carriers written in the [Python SDK][python-env]. |
| 284 | + |
| 285 | +[python-env]: https://github.com/open-telemetry/opentelemetry-specification/issues/740#issuecomment-919657003 |
| 286 | + |
| 287 | +## Alternatives and why they were not chosen |
| 288 | + |
| 289 | +### Using a file for the carrier |
| 290 | + |
| 291 | +Using a JSON file that is stored on the filesystem and referenced through an |
| 292 | +environment variable would eliminate the need to workaround case-insensitivity |
| 293 | +issues on Windows, however it would introduce a number of issues: |
| 294 | + |
| 295 | +1. Would introduce an out-of-band file that would need to be created and |
| 296 | + reliably cleaned up. |
| 297 | +2. Managing permissions on the file might be non-trivial in some circumstances |
| 298 | + (for example, if `sudo` is used). |
| 299 | +3. This would deviate from significant prior art that currently uses |
| 300 | + environment variables. |
| 301 | + |
| 302 | +## Open questions |
| 303 | + |
| 304 | +The author has no open questions at this point. |
| 305 | + |
| 306 | +## Future possibilities |
| 307 | + |
| 308 | +1. Enabling distributed tracing in systems that do not communicate over network |
| 309 | + protocols that allow trace context being propagated through headers, |
| 310 | + metadata, or other means. |
0 commit comments