Skip to content

[SCP-759] Fix profiler shutdown when isolate is terminated #215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 10, 2025

Conversation

nsavoire
Copy link

@nsavoire nsavoire commented Jun 5, 2025

What does this PR do?:

When an isolate is terminated abruptly (eg. by Worker.terminate()), the profiler is not stopped and not removed from the isolate->profiler map. This can lead to a crash.

This commit adds a cleanup hook with node::AddEnvironmentCleanupHook that is called when the isolate is destroyed.
The cleanup hook stops the profiler and removes it from the profiler map.

@pr-commenter
Copy link

pr-commenter bot commented Jun 5, 2025

Benchmarks

Benchmark execution time: 2025-06-10 09:16:55

Comparing candidate commit 17343f3 in PR branch nsavoire/fix_profiler_shutdown with baseline commit bfa2637 in branch main.

Found 2 performance improvements and 0 performance regressions! Performance is the same for 88 metrics, 30 unstable metrics.

scenario:profiler-light-load-no-wall-profiler-22

  • 🟩 cpu_user_time [-5.500ms; -1.111ms] or [-7.485%; -1.512%]

scenario:profiler-light-load-with-wall-profiler-24

  • 🟩 cpu_user_time [-10.178ms; -2.901ms] or [-8.759%; -2.497%]

@nsavoire nsavoire added the semver-patch Bug or security fixes, mainly label Jun 5, 2025
@nsavoire nsavoire force-pushed the nsavoire/fix_profiler_shutdown branch 2 times, most recently from 2da68aa to 488b05f Compare June 5, 2025 23:50
Copy link

github-actions bot commented Jun 5, 2025

Overall package size

Self size: 1.58 MB
Deduped: 1.95 MB
No deduping: 1.95 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | source-map | 0.7.4 | 226 kB | 226 kB | | pprof-format | 2.1.0 | 111.69 kB | 111.69 kB | | p-limit | 3.1.0 | 7.75 kB | 13.78 kB | | delay | 5.0.0 | 11.17 kB | 11.17 kB | | node-gyp-build | 3.9.0 | 8.81 kB | 8.81 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@nsavoire nsavoire force-pushed the nsavoire/fix_profiler_shutdown branch 2 times, most recently from 87e6b98 to f9a54c2 Compare June 6, 2025 00:32
@nsavoire nsavoire marked this pull request as ready for review June 6, 2025 00:38
@nsavoire nsavoire requested a review from szegedi as a code owner June 6, 2025 00:38
Comment on lines 17 to 44
it('should not crash when worker is terminated', async function () {
this.timeout(20000);
const nruns = 5;
const concurrentWorkers = 100;
for (let i = 0; i < nruns; i++) {
const workers = [];
for (let j = 0; j < concurrentWorkers; j++) {
const worker = new Worker('./out/test/worker2.js')
worker.postMessage('hello')

worker.on('message', () => {
worker.terminate()
})

workers.push(new Promise<void>((resolve, reject) => {
worker.on('exit', (exitCode) => {
if (exitCode === 1) {
resolve()
} else {
reject(new Error('Worker exited with code 0'))
}
})
}))
}
await Promise.all(workers)
}
});

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Quality Violation

Unexpected unnamed function. (...read more)

It is easier to debug your application code when you avoid anonymous functions so that the stack trace can show you meaningful error messages. This rule enforces all your function to be consistently declared with a name.

View in Datadog  Leave us feedback  Documentation

@nsavoire nsavoire force-pushed the nsavoire/fix_profiler_shutdown branch 3 times, most recently from a03101a to 10b3b89 Compare June 6, 2025 15:30
nsavoire added 3 commits June 6, 2025 15:35
When an isolate is terminated abruptly (eg. by Worker.terminate()), the
profiler is not stopped and not removed from the isolate->profiler map.
This can lead to a crash.

This commit adds a cleanup hook with `node::AddEnvironmentCleanupHook`
that is called when the isolate is destroyed.
The cleanup hook stops the profiler and removes it from the profiler map.
@nsavoire nsavoire force-pushed the nsavoire/fix_profiler_shutdown branch from 10b3b89 to ab2ddb2 Compare June 6, 2025 15:36
szegedi
szegedi previously approved these changes Jun 10, 2025
Copy link

@szegedi szegedi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great! Going to approve as-is but I wouldn't mind if you considered my suggestion for that if (isolate != nullptr) block in Dispose.

if (cpuProfiler_ != nullptr) {
cpuProfiler_->Dispose();
cpuProfiler_ = nullptr;

g_profilers.RemoveProfiler(isolate, this);
if (removeFromMap) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could have one big if (isolate != nullptr) block here, and move both this and the if (collectAsyncId_) logic (and the cleanup hook removal) below into it. I know they're no-ops when isolate is null, but I think it'd read better as in "here's what is executed when isolate is known".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the call to Dispose from destructor

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's one way to simplify it 😁. I like it. I did feel our disposal and destruction story was a bit overcomplicated, so this is 👍.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dispose call in destructor felt like a desperate attempt to clean up, now that we have the environment cleanup hook, this should not be needed anymore (and I am not even sure in which cases, other than profiler stop, the destructor is called).

return {};
}

std::string WallProfiler::StartInternal() {
v8::ProfilerId WallProfiler::StartInternal() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that we can use this now 😄

@@ -548,20 +583,25 @@ WallProfiler::WallProfiler(std::chrono::microseconds samplingPeriod,
}

WallProfiler::~WallProfiler() {
Dispose(nullptr);
Dispose(nullptr, true);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isolate == nullptr will make g_profilers.RemoveProfiler(isolate, this); in Dispose a no-op anyway so you might as well be passing false. On the other hand, if we do the transformation in Dispose I suggested then it'd get short-circuited by the nullptr anyway so it doesn't matter what's the value here.

Copy link
Author

@nsavoire nsavoire Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove this call to Dispose, the only case when it's not a no-op is when dispose is called on a running profiler, and that should be an error.

@@ -366,6 +380,27 @@ static int64_t GetV8ToEpochOffset() {
return V8toEpochOffset;
}

void WallProfiler::CleanupHook(void* data) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this a private static method on WallProfiler actually isn't a bad idea. I should probably do an equivalent for GC callbacks so I can make them private.

Comment on lines +16 to +44
it('should not crash when worker is terminated', async function () {
this.timeout(30000);
const nruns = 5;
const concurrentWorkers = 20;
for (let i = 0; i < nruns; i++) {
const workers = [];
for (let j = 0; j < concurrentWorkers; j++) {
const worker = new Worker('./out/test/worker2.js');
worker.postMessage('hello');

worker.on('message', () => {
worker.terminate();
});

workers.push(
new Promise<void>((resolve, reject) => {
worker.on('exit', exitCode => {
if (exitCode === 1) {
resolve();
} else {
reject(new Error('Worker exited with code 0'));
}
});
})
);
}
await Promise.all(workers);
}
});

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Quality Violation

Unexpected unnamed function. (...read more)

It is easier to debug your application code when you avoid anonymous functions so that the stack trace can show you meaningful error messages. This rule enforces all your function to be consistently declared with a name.

View in Datadog  Leave us feedback  Documentation

Copy link

@szegedi szegedi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@szegedi szegedi merged commit 2f6f2f8 into main Jun 10, 2025
57 checks passed
@szegedi szegedi deleted the nsavoire/fix_profiler_shutdown branch June 10, 2025 09:24
@nsavoire nsavoire changed the title Fix profiler shutdown when isolate is terminated [SCP-759] Fix profiler shutdown when isolate is terminated Jun 10, 2025
szegedi pushed a commit that referenced this pull request Jun 10, 2025
* Fix profiler shutdown when isolate is terminated

When an isolate is terminated abruptly (eg. by Worker.terminate()), the
profiler is not stopped and not removed from the isolate->profiler map.
This can lead to a crash.

This commit adds a cleanup hook with `node::AddEnvironmentCleanupHook`
that is called when the isolate is destroyed.
The cleanup hook stops the profiler and removes it from the profiler map.
szegedi pushed a commit that referenced this pull request Jun 12, 2025
* Fix profiler shutdown when isolate is terminated

When an isolate is terminated abruptly (eg. by Worker.terminate()), the
profiler is not stopped and not removed from the isolate->profiler map.
This can lead to a crash.

This commit adds a cleanup hook with `node::AddEnvironmentCleanupHook`
that is called when the isolate is destroyed.
The cleanup hook stops the profiler and removes it from the profiler map.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-patch Bug or security fixes, mainly
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants