Skip to content
This repository was archived by the owner on Dec 6, 2024. It is now read-only.

Commit 3c044d4

Browse files
adrielppellaredlmolkova
authored
[otep] Propose adding env variables as context carriers to specification (#258)
Based on conversations last week in the Specification and Semantic Conventions SIGs, I'm opening this duplicate pull request which was originally set as a [Draft](https://github.com/open-telemetry/oteps/pull/241/files) and hasn't had movement since last November. There are real use cases that are coming to fruiting, namely in the CI/CD working group, that will benefit from this being accepted. Once accepted we can work on getting the specification added for both general context propagation and baggage. On the note of baggage; baggage is a form of context propagation and was not originally mentioned directly by name in this OTEP. It is however, absolutely essential. I've had the pleasure of prototyping out tracing within an OpenTofu controller system where context on available in parent/child at the very start of the trace was available. Baggage was the means of transferring this critical context to subsequent siblings that would've not had it otherwise. Thanks for all the hard work to the original author (@deejgregor) and opening the draft #241 CC. TC sponsors @jsuereth @carlosalberto --------- Co-authored-by: Robert Pająk <[email protected]> Co-authored-by: Liudmila Molkova <[email protected]>
1 parent d1f73ee commit 3c044d4

4 files changed

+310
-0
lines changed
+310
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,310 @@
1+
# Environment Variable Specification for Context and Baggage Propagation
2+
3+
This is a proposal to add Environment Variables to the OpenTelemetry
4+
specification as carriers for context and baggage propagation between
5+
processes.
6+
7+
## Table of Contents
8+
9+
* [Motivation](#motivation)
10+
* [Design](#design)
11+
* [Example Context](#example-context)
12+
* [Distributed Tracing in OpenTofu Prototype Example](#distributed-tracing-in-opentofu-prototype-example)
13+
* [Core Specification Changes](#core-specification-changes)
14+
* [UNIX](#unix-limitations)
15+
* [Windows](#windows-limitations)
16+
* [Allowed Characters](#allowed-characters)
17+
* [Trade-offs and Mitigations](#trade-offs-and-mitigations)
18+
* [Case-sensitivity](#case-sensitivity)
19+
* [Security](#security)
20+
* [Prior Art and Alternatives](#prior-art-and-alternatives)
21+
* [Alternatives and why they were not chosen](#alternatives-and-why-they-were-not-chosen)
22+
* [Open Questions](#open-questions)
23+
* [Future Possibilities](#future-possibilities)
24+
25+
## Motivation
26+
27+
The motivation for defining the specification for context and baggage
28+
propagation by using environment variables as carriers stems from the long open
29+
[issue #740][issue-740] on the OpenTelemetry Specification repository. This
30+
issue has been open for such a long time that multiple implementations now
31+
exist using `TRACEPARENT` and `TRACESTATE` environment variables.
32+
33+
[Issue #740][issue-740] identifies several use cases in systems that do not
34+
communicate across bounds by leveraging network communications such as:
35+
36+
* ETL
37+
* Batch
38+
* CI/CD systems
39+
40+
Adding arbitrary [Text Map propagation][tmp] through environment variable carries into
41+
the OpenTelemetry Specification will enable distributed tracing within the
42+
above listed systems.
43+
44+
There has already been a significant amount of [Prior Art](#prior-art) built
45+
within the industry and **within OpenTelemetry** to accomplish the immediate needs,
46+
however, OpenTelemetry at this time does not define the specification for this
47+
form of propagation.
48+
49+
Notably, as we define semantic conventions within the [CI/CD Working Group][cicd-wg],
50+
we'll need the specification defined for the industry to be able to adopt
51+
native tracing within CI/CD systems.
52+
53+
[cicd-wg]: https://github.com/open-telemetry/community/blob/main/projects/ci-cd.md
54+
[issue-740]: https://github.com/open-telemetry/opentelemetry-specification/issues/740#issue-665588273
55+
[tmp]: https://opentelemetry.io/docs/specs/otel/context/api-propagators/#textmap-propagator
56+
57+
## Design
58+
59+
To propagate context and baggage between parent, sibling, and child processes
60+
in systems where network communication does not occur between processes, a
61+
specification using key-value pairs injected into the environment can be read
62+
and produced by an arbitrary TextMapPropagator.
63+
64+
### Example Context
65+
66+
Consider the following diagram in the context of process forking:
67+
68+
> Note: The diagram is simply an example and simplification of process forking.
69+
> There are other ways to spawn processes which are more performant like
70+
> exec().
71+
72+
![Environment Variable Context Propagation](./img/0258-env-context-parent-child-process.png)
73+
74+
In the above diagram, a parent process is forked to spawn a child process,
75+
inheriting the environment variables from the original parent. The environment
76+
variables defined here, `TRACEPARENT`, `TRACESTATE`, and `BAGGAGE` are used to
77+
propagate context to the child process such that it can be tied to the parent.
78+
Without `TRACEPARENT`, a tracing backend would not be able to connect the child
79+
process spans to the parent span, forming an end-to-end trace.
80+
81+
> Note: While the below exclusively follows the W3C Specification translated
82+
> into environment variables, this proposal is not exclusive to W3C and is
83+
> instead focused on the mechanism of Text Map Propagation with a potential set
84+
> of well-known environment variable names. See the [Core Specification
85+
> Changes](#core-specification-changes) section for more information.
86+
87+
Given the above example aligning with the W3C Specification, the following is
88+
a contextual mapping of environment variables to headers defined by W3C.
89+
90+
The `traceparent` (lowercase) header is defined in the [W3C
91+
Trace-Context][w3c-parent] specification and includes the following valid
92+
fields:
93+
94+
* `version`
95+
* `trace-id`
96+
* `parent-id`
97+
* `trace-flags`
98+
99+
This could be set in the environment as follows:
100+
101+
```bash
102+
export TRACEPARENT=00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
103+
```
104+
105+
> Note: The value of TRACEPARENT is a combination of the above field values as
106+
> unsigned integer values serialized as ASCII strings, delimited by `-`.
107+
108+
The `tracestate` (lowercase) header is defined in [W3C
109+
Trace-State][w3c-state] and can include any opaque value in a key-value pair
110+
structure. Its goal is to provide additional vendor-specific trace information.
111+
112+
The `baggage` (lowercase) header is defined in [W3C Baggage][w3c-bag]
113+
and is a set of key-value pairs to propagate context between signals. In
114+
OpenTelemetry, baggage is propagated through the [Baggage API][bag-api].
115+
116+
[w3c-parent]: https://www.w3.org/TR/trace-context-2/#traceparent-header-field-values
117+
[w3c-state]: https://www.w3.org/TR/trace-context-2/#tracestate-header
118+
[w3c-bag]: https://www.w3.org/TR/baggage/#baggage-http-header-format
119+
120+
#### Distributed Tracing in OpenTofu Prototype Example
121+
122+
Consider this real world example OpenTofu Controller Deployment.
123+
124+
![OpenTofu Run](./img/0258-env-context-opentofu-tracing.png)
125+
126+
In this model, the OpenTofu Controller is the start of the trace, containing
127+
the actual trace_id and generating the root span. The OpenTofu controller
128+
deploys a runner which has its own environment and processes to run OpenTofu
129+
commands. If one was to trace these processes without a carrier mechanism, then
130+
they would all show up as unrelated root spans in separate traces. However, by
131+
leveraging environment variables as carriers, each span is able to be tied back
132+
to the root span, creating a single trace as shown in the image of a real
133+
OpenTofu trace below.
134+
135+
![OpenTofu Trace](./img/0258-env-context-opentofu-trace.png)
136+
137+
Additionally, the `init` span is able to pass baggage to the `plan` and `apply`
138+
spans. One example of this is module version and repository information. This
139+
information is only determined and known during the `init` process. Subsequent
140+
processes only know about the module by name. With `BAGGAGE` the rest of the
141+
processes are able to understand a key piece of information which allows
142+
errors to be tied back to original module version and source code.
143+
144+
Defining the specification for Environment Variables as carriers will have a
145+
wide impact to the industry in enabling better observability to systems outside
146+
of the normal HTTP microservice architecture.
147+
148+
[w3c-bag]: https://www.w3.org/TR/baggage/#header-name
149+
[bag-api]: https://opentelemetry.io/docs/specs/otel/baggage/api/
150+
151+
The above prototype example came from the resources mentioned in [this
152+
comment][otcom] on the [OpenTofu Tracing RFC][otrfc].
153+
154+
[otcom]: https://github.com/opentofu/opentofu/pull/2028#issuecomment-2411588695
155+
[otrfc]: https://github.com/opentofu/opentofu/pull/2028
156+
157+
## Core Specification Changes
158+
159+
The OpenTelemetry Specification should be updated with the definitions for
160+
extending context propagation into the environment through Text Map
161+
propagators.
162+
163+
This update should include:
164+
165+
* A common set of environment variables like `TRACEPARENT`, `TRACESTATE`, and
166+
`BAGGAGE` that can be used to propagate context between processes. These
167+
environment variables names should be overridable for legacy support reasons
168+
(like using B3), but the default standard should align with the W3C
169+
specification.
170+
* A specification for allowed environment names and values due to operating
171+
system limitations.
172+
* A specification for how implementers can inject and extract context from the
173+
environment through a TextMapPropagator.
174+
* A specification for how processes should update environment variables before
175+
spawning new processes.
176+
177+
Defining the specification for Environment Variables as carriers for context
178+
will enable SDK's and other tools to implement getters and setters of context
179+
in a standard, observable way. Therefore, current OpenTelemetry language
180+
maintainers will need to develop language specific implementations that adhere
181+
to the specification.
182+
183+
Two implementations already exist within OpenTelemetry for environment
184+
variables through the TextMap Propagator:
185+
186+
* [Python SDK][python-env] - This implementation uses environment dictionary as
187+
the carrier in Python for invoking process to invoked process context
188+
propagation. This pull request does not appear to have been merged.
189+
* [Swift SDK][swift-env] - This implementation uses `TRACEPARENT` and
190+
`TRACESTATE` environment variables alongside the W3C Propagator to inject and
191+
extract context.
192+
193+
Due to programming conventions, operating system limitations, prior art, and
194+
information below, it is recommended to leverage upper-cased environment
195+
variables for the carrier that align with context propagator specifications.
196+
197+
[python-env]: https://github.com/Div95/opentelemetry-python/tree/feature/env_propagator/propagator/opentelemetry-propagator-env
198+
[swift-env]: https://github.com/open-telemetry/opentelemetry-swift/blob/main/Sources/OpenTelemetrySdk/Trace/Propagation/EnvironmentContextPropagator.swift
199+
200+
### UNIX Limitations
201+
202+
UNIX system utilities use upper-case for environment variables and lower-case
203+
are reserved for applications. Using upper-case will prevent conflicts with
204+
internal application variables.
205+
206+
Environment variable names used by the utilities in the Shell and Utilities
207+
(XCU) specification consist solely of upper-case letters, digits and the "_"
208+
(underscore) from the characters defined in Portable Character Set. Other
209+
characters may be permitted by an implementation; applications must tolerate
210+
the presence of such names. Upper-case and lower-case letters retain their
211+
unique identities and are not folded together. The name space of environment
212+
variable names containing lower-case letters is reserved for applications.
213+
Applications can define any environment variables with names from this name
214+
space without modifying the behaviour of the standard utilities.
215+
216+
Source: [The Open Group, The Single UNIX® Specification, Version 2, Environment Variables](https://pubs.opengroup.org/onlinepubs/7908799/xbd/envvar.html)
217+
218+
### Windows Limitations
219+
220+
Windows is case-insensitive with environment variables. Despite this, the
221+
recommendation is to use upper-case names across OS.
222+
223+
Some languages already do this. This [CPython issue][cpython] discusses how
224+
Python automatically upper-cases environment variables. The issue was merged and
225+
this [documentation][cpython-doc] was added to clarify the behavior.
226+
227+
[cpython]: https://github.com/python/cpython/issues/101754
228+
[cpython-doc]: https://docs.python.org/3/library/os.html#os.environ
229+
230+
### Allowed characters
231+
232+
To ensure compatibility, specification for Environment Variables SHOULD adhere
233+
to the current specification for `TextMapPropagator` where key/value pairs MUST
234+
only consist of US-ASCII characters that make up valid HTTP header fields as
235+
per RFC 7230.
236+
237+
Environment variable keys, SHOULD NOT conflict with common known environment
238+
variables like those described in [IEEE Std 1003.1-2017][std1003].
239+
240+
One key note is that windows disallows the use of the `=` character in
241+
environment variable names. See [MS Env Vars][ms-env] for more information.
242+
243+
There is also a limit on how many characters an environment variable can
244+
support which is 32,767 characters.
245+
246+
[std1003]: https://pubs.opengroup.org/onlinepubs/9799919799/
247+
248+
[ms-env]: https://learn.microsoft.com/en-us/windows/win32/procthread/environment-variables
249+
250+
## Trade-offs and Mitigations
251+
252+
### Case-sensitivity
253+
254+
On Windows, because environment variable keys are case insensitive, there is a
255+
chance that automatically instrumented context propagation variables could
256+
conflict with existing application environment variables. It will be important
257+
to denote this behavior and document how languages mitigate this issue.
258+
259+
### Security
260+
261+
Do not put sensitive information in environment variables. Due to the nature of
262+
environment variables, an attacker with the right access could obtain
263+
information they should not be privy too. Additionally, the integrity of the
264+
environment variables could be compromised.
265+
266+
## Prior Art and Alternatives
267+
268+
There are many users of `TRACEPARENT` and/or `TRACESTATE` environment variables
269+
mentioned in [opentelemetry-specification #740](https://github.com/open-telemetry/opentelemetry-specification/issues/740):
270+
271+
* [Jenkins OpenTelemetry Plugin](https://github.com/jenkinsci/opentelemetry-plugin)
272+
* [otel-cli generic wrapper](https://github.com/equinix-labs/otel-cli)
273+
* [Maven OpenTelemetry Extension](https://github.com/cyrille-leclerc/opentelemetry-maven-extension)
274+
* [Ansible OpenTelemetry Plugin](https://github.com/ansible-collections/community.general/pull/3091)
275+
* [go-test-trace](https://github.com/rakyll/go-test-trace/commit/22493612be320e0a01c174efe9b2252924f6dda9)
276+
* [Concourse CI](https://github.com/concourse/docs/pull/462)
277+
* [BuildKite agent](https://github.com/buildkite/agent/pull/1548)
278+
* [pytest](https://github.com/chrisguidry/pytest-opentelemetry/issues/20)
279+
* [Kubernetes test-infra Prow](https://github.com/kubernetes/test-infra/issues/30010)
280+
* [hotel-california](https://github.com/parsonsmatt/hotel-california/issues/3)
281+
282+
Additionally, there was a prototype implementation for environment variables as
283+
context carriers written in the [Python SDK][python-env].
284+
285+
[python-env]: https://github.com/open-telemetry/opentelemetry-specification/issues/740#issuecomment-919657003
286+
287+
## Alternatives and why they were not chosen
288+
289+
### Using a file for the carrier
290+
291+
Using a JSON file that is stored on the filesystem and referenced through an
292+
environment variable would eliminate the need to workaround case-insensitivity
293+
issues on Windows, however it would introduce a number of issues:
294+
295+
1. Would introduce an out-of-band file that would need to be created and
296+
reliably cleaned up.
297+
2. Managing permissions on the file might be non-trivial in some circumstances
298+
(for example, if `sudo` is used).
299+
3. This would deviate from significant prior art that currently uses
300+
environment variables.
301+
302+
## Open questions
303+
304+
The author has no open questions at this point.
305+
306+
## Future possibilities
307+
308+
1. Enabling distributed tracing in systems that do not communicate over network
309+
protocols that allow trace context being propagated through headers,
310+
metadata, or other means.
449 KB
Loading
654 KB
Loading
Loading

0 commit comments

Comments
 (0)