Skip to content

[FEATURE]: Creating 1 span that combines latency/flamegraph/waterfall of two separate spans that live in different threads. #13683

Open
@kevin-hermeneutic

Description

@kevin-hermeneutic

Package Name

ddtrace

Package Version(s)

2.14.4

Describe the goal of the feature

Describe the goal of the feature
Current Architecture
Get Data Phase (has its span):
One or more fetching threads each send a GET request to retrieve data from a website.

Process Data Phase (each topic has its own span):
Multiple worker threads (n workers) consume data from a queue populated by the fetching threads and process each item concurrently.

I’m already able to track metrics in Datadog such as:

The duration of each get_data execution.

The duration of each process_data execution. and can see flamegraphs and waterfalls of its executed functions.

The Goal
I want to introduce a new span in Datadog called round_trip, which would represent the total time from:

Starting the GET request (get_data)

To the completion of processing that specific response (process_data) for that particular worker thread(topic X)

In other words, I want a parent span:
round_trip(topic X) = get_data(topic X) + process_data(topic X)
This should be visualized in the flame graph or waterfall as:

round_trip(topic 1)
get_data(topic 1)
process_data(topic 1)

round_trip(topic 2)
get_data(topic 2)
process_data(topic 2)

...

This structure needs to support:

1-to-n: One fetch thread → Multiple process threads

n-to-n: Multiple fetch threads → Multiple process threads

Each data item has a topic_id (or similar identifier) and is independently processed by its worker thread when its gets the data from the fetch thread(get_data).

Problem
When I try to create the round_trip span in the fetch thread (where get_data runs), the trace does not include the process_data span (which runs in another thread and later in time). i tried creating round_trip spans per topic id before the get request and activate the first span(context) and pass this span to the queue(worker thread) but only one trace(out of the n worker threads) only shows the GET request in the flame graph/waterfall.

I want to understand:
How can I properly create a trace/span that encompasses both the GET request and its associated data processing, even across different threads? I dont think i can pass the span ID here, it doesnt work, and creating a child span does not contain both of the GET request+process data span.

Constraints & Notes
get_data() puts items into a queue after the GET request returns.

process_data() pulls items from the queue and handles them.

There can be multiple fetchers and multiple workers.

I want the tracing to be scoped per-topic (e.g., per message or record), not per batch or per thread.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions