You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+63-56Lines changed: 63 additions & 56 deletions
Original file line number
Diff line number
Diff line change
@@ -10,85 +10,84 @@ View the [documentation](https://hexdocs.pm/spandex)
10
10
11
11
Spandex is a platform agnostic tracing library. Currently there is only a datadog APM adapter, but its designed to be able to have more adapters written for it.
12
12
13
+
This library is undergoing some structural changes for future versions. This documentation will be kept up to date, but if there are any inconsistencies, don't hesitate to make an issue.
14
+
13
15
## Installation
16
+
14
17
```elixir
15
18
defdepsdo
16
-
[{:spandex, "~> 1.3.4"}]
19
+
[{:spandex, "~> 1.4.0"}]
17
20
end
18
21
```
19
-
## Warning
20
-
21
-
Don't use the endpoint/channel configuration in your production environment. We saw a significant increase in scheduler/cpu load during high traffic times due to this feature. It was intended to provide a way to write custom visualizations by subscribing to a channel. We haven't removed it yet, but we probably will soon.
22
22
23
-
## Performance
23
+
## Setup and Configuration
24
24
25
-
Originally, the library had an api server and spans were sent via `GenServer.cast`, but we've seen the need to introduce backpressure, and limit the overall amount of requests made. As such, there are two new configuration options (also shown in the configuration section below)
25
+
Define your tracer:
26
26
27
27
```elixir
28
-
config :spandex, :datadog,
29
-
batch_size:10,
30
-
sync_threshold:20
28
+
defmoduleMyApp.Tracerdo
29
+
useSpandex.Tracer, otp_app::mya_app
30
+
end
31
31
```
32
32
33
-
Batch size refers to *traces* not spans, so if you send a large amount of spans per trace, then you probably want to keep that number low. If you send only a few spans, then you could set it significantly higher.
34
-
35
-
Sync threshold refers to the *number of processes concurrently sending spans*. *NOT* the number of traces queued up waiting to be sent. It is used to apply backpressure while still taking advantage of parallelism. Ideally, the sync threshold would be set to a point that you wouldn't reasonably reach often, but that is low enough to not cause systemic performance issues if you don't apply backpressure. A simple way to think about it is that if you are seeing 1000 request per second, and your batch size is 10, then you'll be making 100 requests per second to datadog(probably a bad config). But if your sync_threshold is set to 10, you'll almost certainly exceed that because 100 requests in 1 second will likely overlap in that way. So when that is exceeded, the work is done synchronously, (not waiting for the asynchronous ones to complete even). This concept of backpressure is very important, and strategies for switching to synchronous operation are often surprisingly far more performant than purely asynchronous strategies (and much more predictable).
33
+
Configure it:
36
34
37
-
## Configuration
35
+
```elixir
36
+
config :my_app, MyApp.Tracer,
37
+
service::my_api,
38
+
adapter:Spandex.Adapters.Datadog,
39
+
disabled?:false,
40
+
env:"PROD"
41
+
```
38
42
39
-
Spandex uses `Confex` under the hood. See the formats usable for declaring values at their [documentation](https://github.com/Nebo15/confex)
43
+
Or at runtime, by calling `configure/1` (usually in your application's startup)
# ignored routes accepts regexes, and strings. If it is a string it must match exactly.
50
-
ignored_routes: [~r/health_check/, "/status"],
51
-
# do not set the following configurations unless you are sure.
52
-
log_traces?:false# You probably don't want this to be on, *especially* if you have high load. For debugging.
46
+
MyApp.Tracer.configure(disabled?:Mix.env==:test)
53
47
```
54
48
55
-
Even though datadog is the only adapter currently, configurations are still namespaced by the adapter to allow adding more in the future.
49
+
For more information on tracer configuration, view the docs for `Spandex.Tracer`. There you will find the documentation for the opts schema. The entire configuration can also be passed into each function in your tracer to be overridden if desired. For example:
Your configuration and the configuration in your config files is merged together, to avoid needing to specify this config at all times.
54
+
55
+
To bypass the tracer pattern entirely, you can call directly into the functions in `Spandex`, like `Spandex.start_span("span_name", [adapter: Foo, service: :bar])`
56
+
57
+
### Adapter specific configuration
58
+
59
+
To start the datadog adapter, add a worker to your application's supervisor
56
60
57
61
```elixir
58
-
config :spandex, :datadog,
59
-
host: {:system, "DATADOG_HOST", "localhost"},
60
-
port: {:system, "DATADOG_PORT", 8126},
61
-
batch_size:10,
62
-
sync_threshold:20,
63
-
services: [ # for defaults mapping in spans service => type
64
-
ecto::db,
65
-
my_api::web,
66
-
my_cache::cache,
67
-
],
68
-
# Do not set the following configurations unless you are sure.
69
-
api_adapter:Spandex.Datadog.ApiServer, # Traces will get sent in background
70
-
asynchronous_send?:true, # Defaults to `true`. no reason to change it except perhaps for testing purposes. If changed, expect performance impacts.
71
-
endpoint:MyApp.Endpoint, # See notice about potential performance impacts from publishing traces to channels.
72
-
channel:"spandex_traces", # If endpoint and channel are set, all traces will be broadcast across that channel
`Spandex.Plug.AddContext` can be modified to include options for `:allowed_route_replacements` and `:disallowed_route_replacements`, so a route of `:base_route/:id/:relationship` would only have `:base_route` and `:relationship` swapped to their param values if included in `:allowed_route_replacements` and not included in `:disallowed_route_replacements`.
84
-
85
-
Ensure that `Spandex.Plug.EndTrace` goes *after* your router. This is important because we want rendering the response to be included in the tracing/timing. Put `Spandex.Plug.StartTrace` as early as is reasonable in your pipeline. Put `Spandex.Plug.AddContext` either after router or inside a pipeline in router.
81
+
*`Spandex.Plug.StartTrace` - See moduledocs for options. Goes as early in your pipeline as possible.
82
+
*`Spandex.Plug.AddContext` - See moduledocs for options. Either after the router, or inside a pipeline in the router.
83
+
*`Spandex.Plug.EndTrace` - Must go *after* your router.
86
84
87
85
## Distributed Tracing
88
86
89
87
Distributed tracing is supported via headers `x-datadog-trace-id` and `x-datadog-parent-id`. If they are set, the `StartTrace` plug will act accordingly, continuing that trace and span instead of starting a new one. *Both* must be set for distributed tracing to work.
90
88
91
89
## Logger metadata
90
+
92
91
In general, you'll probably want the current span_id and trace_id in your logs, so that you can find them in your tracing service. Make sure to add `span_id` and `trace_id` to logger_metadata
93
92
94
93
```elixir
@@ -106,29 +105,29 @@ defmodule ManuallyTraced do
106
105
107
106
# Does not handle exceptions for you.
108
107
deftrace_me() do
109
-
_=Spandex.start_trace("my_trace") #also opens a span
# Handles exception at the span level. Trace still must be reported.
130
129
defspan_me_also() do
131
-
Spandex.span("span_me_also) do
130
+
Tracer.span("span_me_also) do
132
131
...
133
132
end
134
133
end
@@ -139,4 +138,12 @@ Spandex used to ship with function decorators, but those decorators had a habit
139
138
140
139
## Asynchronous Processes
141
140
142
-
The current trace_id and span_id can be retrieved with `Spandex.current_trace_id()` and `Spandex.current_span_id()`. This can then be used as `Spandex.continue_trace("new_trace", trace_id, span_id)`. New spans can then be logged from there and will be sent in a separate batch.
141
+
The current trace_id and span_id can be retrieved with `Tracer.current_trace_id()` and `Tracer.current_span_id()`. This can then be used as `Tracer.continue_trace("new_trace", trace_id, span_id)`. New spans can then be logged from there and will be sent in a separate batch.
142
+
143
+
## Datadog Api Sender Performance
144
+
145
+
Originally, the library had an api server and spans were sent via `GenServer.cast`, but we've seen the need to introduce backpressure, and limit the overall amount of requests made. As such, the datadog api sender accepts `batch_size` and `sync_threshold` options.
146
+
147
+
Batch size refers to *traces* not spans, so if you send a large amount of spans per trace, then you probably want to keep that number low. If you send only a few spans, then you could set it significantly higher.
148
+
149
+
Sync threshold refers to the *number of processes concurrently sending spans*. *NOT* the number of traces queued up waiting to be sent. It is used to apply backpressure while still taking advantage of parallelism. Ideally, the sync threshold would be set to a point that you wouldn't reasonably reach often, but that is low enough to not cause systemic performance issues if you don't apply backpressure. A simple way to think about it is that if you are seeing 1000 request per second, and your batch size is 10, then you'll be making 100 requests per second to datadog(probably a bad config). But if your sync_threshold is set to 10, you'll almost certainly exceed that because 100 requests in 1 second will likely overlap in that way. So when that is exceeded, the work is done synchronously, (not waiting for the asynchronous ones to complete even). This concept of backpressure is very important, and strategies for switching to synchronous operation are often surprisingly far more performant than purely asynchronous strategies (and much more predictable).
0 commit comments