Store sampling context in ContinuationPreservedEmbedderData #186

szegedi · 2024-12-11T16:22:57Z

What does this PR do?:
Moves profiler's sampling context into the V8 isolate's ContinuationPreservedEmbedderData (CPED.) In fact, moves it into a private symbol-keyed property within the object as Node.js version 24 and later will store AsyncContextFrame objects in the CPED.

Motivation:
Before this change, we rely on async_hooks in dd-trace-js to update the sampling context on every context switch. It is both somewhat costly to do this on every async context switch and we should also migrate away from async hooks, as profiling is their last user, they'll be deprecated in Node, and they bring their own set of performance overhead by using them.

This PR comes with a document explaining the design in it.

Additional Notes:

The code using this in dd-trace-js' wall.js sets the context on AsyncLocalStorage.{enterWith|run}, and the before callback of AsyncHooks.createHook. With CPED storage enabled, we can eliminate the before callback and "only" keep the AsyncLocalStorage.{enterWith|run} callbacks – they are the minimum amount of instrumentation needed for the feature to work correctly then.

This branch can be tested with the same-named branch in dd-trace-js.

Jira: PROF-11771

datadog-datadog-prod-us1 · 2025-02-27T10:33:16Z

ts/src/time-profiler.ts

@@ -66,6 +66,7 @@ export interface TimeProfilerOptions {
  withContexts?: boolean;
  workaroundV8Bug?: boolean;
  collectCpuTime?: boolean;
+  useCPED?: boolean;


⚪ Code Quality Violation

Prop name (useCPED) doesn't match rule (is|has) (...read more)

Enforces a consistent naming pattern for boolean props.

The pattern is: "^(is|has)[A-Z]([A-Za-z0-9]?)+" to enforce is and has prefixes.

github-actions · 2025-03-20T15:16:29Z

Overall package size

Self size: 1.62 MB
Deduped: 1.99 MB
No deduping: 1.99 MB

Dependency sizes

| name | version | self size | total size | |------|---------|-----------|------------| | source-map | 0.7.4 | 226 kB | 226 kB | | pprof-format | 2.1.0 | 111.69 kB | 111.69 kB | | p-limit | 3.1.0 | 7.75 kB | 13.78 kB | | delay | 5.0.0 | 11.17 kB | 11.17 kB | | node-gyp-build | 3.9.0 | 8.81 kB | 8.81 kB |

_{🤖 This report was automatically generated by heaviest-objects-in-the-universe}

pr-commenter · 2025-03-20T15:17:16Z

Benchmarks

Benchmark execution time: 2025-06-11 11:29:00

Comparing candidate commit 98b683c in PR branch szegedi/cped-context with baseline commit 2f6f2f8 in branch main.

Found 1 performance improvements and 0 performance regressions! Performance is the same for 89 metrics, 30 unstable metrics.

scenario:profiler-light-load-with-wall-profiler-20

🟩 cpu_user_time [-11.651ms; -2.413ms] or [-11.063%; -2.291%]

r1viollet · 2025-05-22T09:31:15Z

doc/sample_context_in_cped.md

@@ -0,0 +1,190 @@
+# Storing Sample Context in V8 Continuation-Preserved Embedder Data


huge thanks for the very comprehensive doc

r1viollet · 2025-05-22T09:40:33Z

bindings/profilers/wall.cc

@@ -521,12 +523,16 @@ WallProfiler::WallProfiler(std::chrono::microseconds samplingPeriod,
  workaroundV8Bug_ = workaroundV8Bug && DD_WALL_USE_SIGPROF && detectV8Bug_;
  collectCpuTime_ = collectCpuTime && withContexts;
  collectAsyncId_ = collectAsyncId && withContexts;
+#if NODE_MAJOR_VERSION >= 23


I thought this was node 24 ? Can we really rely on 23 being OK ?

The AsyncContextFrame implementation was released in Node 23 but needed a command line option to activate. Node 24 just made it so it's turned on by default. It's pretty much the same code in both. I developed and tested against 23 for most of the time as 24 was just released recently. (That is also the reason it says 23 here :-) )

FWIW, now that Node 24 is out, we no longer publish a release for 23, so might as well bump this to 24. Technically though there's no reason to not support this in source code from 23 onwards.

r1viollet · 2025-05-22T09:50:00Z

bindings/profilers/wall.cc

      isolate->RemoveGCPrologueCallback(&GCPrologueCallback, this);
      isolate->RemoveGCEpilogueCallback(&GCEpilogueCallback, this);
    }
+
+    for (auto it = liveContextPtrs_.begin(); it != liveContextPtrs_.end();) {


Is it OK to have non atomic operations on this structure ?
edit: it looks like we no longer rely on GC cleanups (so no risk there)
We have one instance per isolate and we always sample from the same thread ?

V8 isolates can be used from a single thread at once safely (they have enter/exit APIs for threads for this purpose.) The way Node uses them, there'll be a 1:1 correspondence between an isolate and a thread. Normally this disposal method is invoked from WallProfiler.StopImpl() when the profiler is not restarted – and that itself happens from JS code for TimeProfiler.stop() so this is expected to always execute on the thread belonging to the current isolate (same one that calls SetContext that puts elements into the set.)

deadContextPtrs_ OTOH are inserted into from the GC callback, and I'm not 100% sure if GC callbacks are always invoked from the thread currently running JS code. I'd think yes as GC is triggered during execution of JS code in its allocation path when it needs more memory. But also some GC algorithms can be parallel. That's actually something we should look into and get 100% certainty on.

bindings/profilers/wall.cc

r1viollet · 2025-05-22T10:08:37Z

bindings/profilers/wall.cc

+
+  auto cpedObj = cped.As<Object>();
+  auto localSymbol = cpedSymbol_.Get(isolate);
+  auto maybeProfData = cpedObj->Get(v8Ctx, localSymbol);


I might need a refresher on the re-entrant v8 calls and how to protect ourselves (mainly from GC?). The getters call into v8 right ?

We register callbacks with the V8 Isolate to observe start and end of GC operations, and when we know engine is GCing, the signal handler won't touch the CPED. It's actually even better, because when our GC prologue callback is invoked, we can still safely read the current sample context and store its shared_ptr<Global<Value>> in a profiler instance field, and signal handler then just copies that shared_ptr<> into struct elements in a ring buffer that associates samples with contexts. This way, even the GC samples will have a sample context associated with them – the one that was the active one when the GC started.

And yeah, Object::Get will read into the object. As it is an ordinary property (meaning, it does not have a getter function) it will essentially execute few dependent loads (pointer traversals and offset reads.) I believe this is safe to do from a signal handler. There's a V8 blog post that describes this in more detail.

There can still be a concurrency issue between setting the property on the frame object by SetContext and it being read by the signal handler. In SetContext we define the property only once on every frame object. Its JS value is immutable – a v8::External with a pointer to a C++ PersistentContextPtr instance. All mutation happens within this C++ object which is a subclass of AtomicContextPtr and is coded to be atomic and signal-safe. This leaves with only making sure signal handler is not trying to execute Object::Get if it interrupted SetContext while it executes Object::DefineProperty – we use the setInProgress atomic boolean for this purpose.

This covers all cases I can see. Some of this is captured in the design doc too.

bindings/profilers/wall.cc

r1viollet · 2025-05-22T10:11:52Z

bindings/profilers/wall.cc

+  PersistentContextPtr* contextPtr = nullptr;
+  auto profData = maybeProfData.ToLocalChecked();
+  if (profData->IsUndefined()) {
+    contextPtr = new PersistentContextPtr(&deadContextPtrs_);


could this be too many new operations ?

There will be at most as many as there are AsyncContextFrame (ACF) objects in Node (practically, there'll be exactly as many.) With GC callbacks we'll keep them deleted in sync with how ACFs are being GCd in V8 heap. It's theoretically possible that V8 could still create new ACF instances on its heap (as it allocates bigger chunks of memory up front) but we can't allocate a C++ object on the heap. I would like to think it is very unlikely. We might want to add comments here in the code though to explain this.

So O(1000) per minute of elements ?

Essentially, any time there's an AsyncLocalStorage.enterWith so quite likely a lot. I could expose a counter and see if we can report it through telemetry?

bindings/profilers/wall.cc

bindings/profilers/wall.hh

bindings/profilers/wall.cc

nsavoire · 2025-05-28T09:45:59Z

bindings/profilers/wall.cc

+
+    auto external = External::New(isolate, contextPtr);
+    setInProgress.store(true, std::memory_order_relaxed);
+    std::atomic_signal_fence(std::memory_order_release);


release fence should should be placed before atomic operation.
This always confuses me, but the pattern should be:

Thread Signal handler fence(Release); A -------------- m.store(true, Relaxed); X --------- | | | | | -------------> Y if m.load(Relaxed) { |-------> B fence(Acquire); ... }

Yeah, it's tricky and confuses me too, especially since the specification only discusses it in the context of it being applied on loads and stores, not on fences, and the documentation on signal fences when it says:

a fence with release semantics prevents reads or writes from being moved past subsequent writes and a fence with acquire semantics prevents reads or writes from being moved ahead of preceding reads

it does not say whether the atomic reads/writes are the one being prevented from being moved or the non-atomic ones.

I decided to review all uses of atomic_signal_fence in our code and frankly, there's a bunch of them that look incorrect following the above logic (you can see for yourself.) We even have one release after load, but we surely have a bunch of releases after store too. I'll canonicalize 'em all.

I decided to fix fencing on main first: #217. Let me know there if everything looks right.

I found https://doc.rust-lang.org/std/sync/atomic/fn.compiler_fence.html clearer in its explanations.
From what I understand, atomic variable provides the synchronization between thread / signal handler and fences ensure that:

no non-atomic writes are moved after the atomic write

no non-atomic read are moved before the atomic read

nsavoire · 2025-05-28T09:50:59Z

bindings/profilers/wall.hh


  std::atomic<int> gcCount = 0;
+  std::atomic<bool> setInProgress = false;


nit:

Suggested change

std::atomic<bool> setInProgress = false;

std::atomic<bool> setInProgress_ = false;

nsavoire · 2025-05-28T09:51:08Z

bindings/profilers/wall.hh

  double gcAsyncId;
+  ContextPtr gcContext;


nit:

Suggested change

ContextPtr gcContext;

ContextPtr gcContext_;

nsavoire · 2025-05-28T09:54:43Z

bindings/profilers/wall.cc

+    contextPtr = new PersistentContextPtr(&deadContextPtrs_);
+
+    auto external = External::New(isolate, contextPtr);
+    setInProgress.store(true, std::memory_order_relaxed);


I really like the idea of setProgress atomic that prevents the read during the update.
With this, AtomicContextPtr should not be needed for CPED, right ?
And we could use the same mechanism for non-CPED flow and also remove AtomicContextPtr.

🤔 The behavior would be different for sure. If we manage to sample just when the value is being set then currently we can still bind the previous value (good question of whether it is valid or not) and now we'll bail out.
However, it would allow for fairly strong simplification. We still couldn't just store the JS value in the CPED, I think. At the very minimum, we would still need to have a v8::External with a std::shared_ptr<v8::Global<v8::Value>> as the signal handler is just copying the shared_ptr into the structs in the ring buffer. We could "just" be resetting the shared_ptr though. And we'd still need to do the live/dead pointer tracking. But still simpler.

Just bailing out when a context set is in progress seems reasonable, we would lose context for a few samples.
I agree that sadly we would still need a shared_ptr on a v8::Global.
We could explore storing only span ID / endpoint in an array buffer, but it's unclear to me if it's possible.

nsavoire · 2025-05-28T10:03:46Z

bindings/profilers/wall.cc

 }

 WallProfiler::~WallProfiler() {
  Dispose(nullptr);
 }

+class PersistentContextPtr : AtomicContextPtr {


nit: move this class at the top file instead of putting it in the middle of WallProfiler methods

… store.

…preserved embedder data

Also optimize away some setting of null/undefined as context

szegedi changed the base branch from main to szegedi/atomic-context December 11, 2024 20:10

szegedi force-pushed the szegedi/cped-context branch from 5b796ea to 1b15c3a Compare December 11, 2024 20:48

szegedi changed the base branch from szegedi/atomic-context to main December 11, 2024 20:53

szegedi force-pushed the szegedi/cped-context branch from 1b15c3a to 3decb46 Compare December 11, 2024 21:04

szegedi mentioned this pull request Dec 11, 2024

use atomic load/store on curContext_ #187

Closed

szegedi force-pushed the szegedi/cped-context branch 4 times, most recently from bfa23b5 to 32b2170 Compare December 17, 2024 11:26

szegedi force-pushed the szegedi/cped-context branch from 32b2170 to 9f398f2 Compare January 20, 2025 16:43

szegedi mentioned this pull request Feb 26, 2025

Various improvements to profiling sample label generator DataDog/dd-trace-js#5316

Merged

szegedi force-pushed the szegedi/cped-context branch from 9f398f2 to 965a627 Compare February 26, 2025 16:40

szegedi changed the base branch from main to szegedi/async-id-2 February 26, 2025 16:41

szegedi force-pushed the szegedi/cped-context branch from 965a627 to 0c1791e Compare February 26, 2025 16:51

szegedi force-pushed the szegedi/async-id-2 branch from a2384e5 to bb0bc2d Compare February 27, 2025 10:30

szegedi force-pushed the szegedi/cped-context branch from 0c1791e to 8f88f3d Compare February 27, 2025 10:32

datadog-datadog-prod-us1 bot reviewed Feb 27, 2025

View reviewed changes

szegedi force-pushed the szegedi/cped-context branch from 8f88f3d to 717f17b Compare March 5, 2025 12:53

szegedi force-pushed the szegedi/async-id-2 branch from 3c86526 to 79992a0 Compare March 6, 2025 17:05

Base automatically changed from szegedi/async-id-2 to main March 19, 2025 15:27

szegedi force-pushed the szegedi/cped-context branch from 717f17b to 2b9041d Compare March 20, 2025 15:12

szegedi changed the title ~~Experiment for storing profiling context in the ContinuationPreservedEmbeddedData~~ Store profiling context in ContinuationPreservedEmbeddedData Mar 20, 2025

szegedi changed the title ~~Store profiling context in ContinuationPreservedEmbeddedData~~ Store sampling context in ContinuationPreservedEmbeddedData Mar 20, 2025

szegedi added the semver-minor Usually minor non-breaking improvements label Mar 20, 2025

szegedi force-pushed the szegedi/cped-context branch from 2b9041d to c475e2f Compare March 20, 2025 16:14

szegedi force-pushed the szegedi/cped-context branch 3 times, most recently from 9ba3bf1 to 9c4090e Compare May 7, 2025 12:00

szegedi requested a review from nsavoire as a code owner May 20, 2025 13:32

szegedi changed the title ~~Store sampling context in ContinuationPreservedEmbeddedData~~ Store sampling context in ContinuationPreservedEmbedderData May 20, 2025

szegedi marked this pull request as draft May 20, 2025 13:41

szegedi force-pushed the szegedi/cped-context branch 5 times, most recently from 89ff3a8 to d774391 Compare May 21, 2025 12:58

szegedi marked this pull request as ready for review May 21, 2025 13:15

r1viollet reviewed May 22, 2025

View reviewed changes

bindings/profilers/wall.cc Show resolved Hide resolved

r1viollet reviewed May 22, 2025

View reviewed changes

bindings/profilers/wall.cc Outdated Show resolved Hide resolved

r1viollet reviewed May 22, 2025

View reviewed changes

nsavoire reviewed May 28, 2025

View reviewed changes

szegedi force-pushed the szegedi/cped-context branch 2 times, most recently from c6d75ac to 6d6b07a Compare June 6, 2025 14:34

szegedi added 10 commits June 11, 2025 12:52

format

cfb8280

Fix fences to consistently have acquire after load and release before…

ba9b7b0

… store.

Encapsulate the pair of pointers with atomic switch into a class

d3725a9

Allow use of CPED to store sampling context

8f79711

Test useCPED when on Node 24

e961ed2

Design doc explaining how the sampling context works in continuation-…

f56e442

…preserved embedder data

Addressing simpler points of review feedback

9464c93

Remove AtomicContextPtr in favor of setInProgress_

d12ae39

Move PersistentContextPtr to top

353825b

Expose a counter of number of live context pointers.

98b683c

Also optimize away some setting of null/undefined as context

szegedi force-pushed the szegedi/cped-context branch from 6d6b07a to 98b683c Compare June 11, 2025 11:25

		@@ -0,0 +1,190 @@
		# Storing Sample Context in V8 Continuation-Preserved Embedder Data


		std::atomic<int> gcCount = 0;
		std::atomic<bool> setInProgress = false;

	std::atomic<bool> setInProgress = false;
	std::atomic<bool> setInProgress_ = false;

Store sampling context in ContinuationPreservedEmbedderData #186

Are you sure you want to change the base?

Store sampling context in ContinuationPreservedEmbedderData #186

Uh oh!

Conversation

szegedi commented Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datadog-datadog-prod-us1 bot Feb 27, 2025

Choose a reason for hiding this comment

⚪ Code Quality Violation

Uh oh!

github-actions bot commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overall package size

Uh oh!

pr-commenter bot commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

scenario:profiler-light-load-with-wall-profiler-20

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

r1viollet May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szegedi Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

szegedi commented Dec 11, 2024 •

edited

Loading

github-actions bot commented Mar 20, 2025 •

edited

Loading

pr-commenter bot commented Mar 20, 2025 •

edited

Loading

r1viollet May 22, 2025 •

edited

Loading

szegedi Jun 10, 2025 •

edited

Loading