Skip to content

feat: add automatic graceful exit handlers for abort and migrating events#561

Open
B4nan wants to merge 9 commits intomasterfrom
claude/slack-add-abort-event-handler-JYZBh
Open

feat: add automatic graceful exit handlers for abort and migrating events#561
B4nan wants to merge 9 commits intomasterfrom
claude/slack-add-abort-event-handler-JYZBh

Conversation

@B4nan
Copy link
Member

@B4nan B4nan commented Feb 18, 2026

Register event handlers in Actor.init() that automatically call Actor.exit() when the platform signals an abort or migration. This ensures a graceful shutdown without requiring developers to manually handle these events.

Key changes:

  • Add graceful exit handler for aborting and migrating events
  • Handler waits 1 second to allow other event handlers (like Crawlee's state persistence) to be triggered first
  • Add isExiting flag to prevent double-exit when Actor.exit() is already in progress
  • Add test for the new graceful exit behavior

Thread: https://apify.slack.com/archives/C07ED56TA1K/p1771414614569569?thread_ts=1771347955.370739&cid=C07ED56TA1K

https://claude.ai/code/session_017U4px7vy75ioiTaC5teCaV

…ents

Register event handlers in Actor.init() that automatically call Actor.exit()
when the platform signals an abort or migration. This ensures graceful shutdown
without requiring developers to manually handle these events.

Key changes:
- Add graceful exit handler for 'aborting' and 'migrating' events
- Handler waits 1 second to allow other event handlers (like Crawlee's
  state persistence) to be triggered first
- Add isExiting flag to prevent double-exit when Actor.exit() is already
  in progress
- Add test for the new graceful exit behavior

Thread: https://apify.slack.com/archives/C07ED56TA1K/p1771414614569569?thread_ts=1771347955.370739&cid=C07ED56TA1K

https://claude.ai/code/session_017U4px7vy75ioiTaC5teCaV
@B4nan B4nan added the adhoc Ad-hoc unplanned task added during the sprint. label Feb 18, 2026
@github-actions github-actions bot added this to the 134th sprint - Tooling team milestone Feb 18, 2026
@github-actions github-actions bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels Feb 18, 2026
@janbuchar
Copy link
Contributor

isn't this a breaking change?

@B4nan
Copy link
Member Author

B4nan commented Feb 18, 2026

How exactly? If one of those events fire, the actor will shut down either way in ~30s.

@janbuchar
Copy link
Contributor

janbuchar commented Feb 18, 2026

How exactly? If one of those events fire, the actor will shut down either way in ~30s.

Exactly like this, 30s is a lot and I can imagine a lot of Actors relying on this unknowingly.

Imagine doing await crawler.run() and then a bunch of sequential stuff, such as storing aggregated run results somewhere. The crawler will exit cleanly because it's bound to the event system, but the rest of the code was pretty much guaranteed to finish in 30s. If the Actor suddenly starts terminating in 1s, that might break someone's stuff.

@B4nan
Copy link
Member Author

B4nan commented Feb 18, 2026

First of all, I don't think the 30s were ever guaranteed. And if they relied on this, it feels like a bug in their code.

IMO it's fine, worst case we can revert it, but it's a much better solution than polluting all the templates with the explicit handler (see the slack conversation for details). I'll add an opt-out of this, too.

…able

- Use setTimeout instead of async sleep in graceful exit handler to avoid
  deadlock with waitForAllListenersToComplete() in exit()
- Remove unnecessary 1s delay (exit() already waits for all handlers)
- Add gracefulShutdown option to disable auto-exit on aborting/migrating
- Add gracefulShutdownDelayMillis option for configurable delay
- Reset isExiting flag after exit() completes to fix singleton reuse
- Add tests for migrating event and gracefulShutdown: false option

https://claude.ai/code/session_017U4px7vy75ioiTaC5teCaV
Change gracefulShutdown option from default true (opt-out) to default false (opt-in)
to avoid potential breaking changes for existing users.

https://claude.ai/code/session_017U4px7vy75ioiTaC5teCaV
- Add graceful shutdown section to actor_lifecycle.mdx with examples
- Update Actor.init() JSDoc to document gracefulShutdown option
- Update PlatformEventManager JSDoc to mention gracefulShutdown for aborting/migrating events

https://claude.ai/code/session_017U4px7vy75ioiTaC5teCaV
src/actor.ts Outdated
export interface InitOptions {
storage?: StorageClient;
/**
* Whether to automatically call `Actor.exit()` when the platform sends an `aborting` or `migrating` event.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct, migrating needs to trigger Actor.reboot, not Actor.exit. I think I tested it some time ago and it just shuts down forever.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come? Actor.exit will just call process.exit, there are no API calls, if the actor is migrating, that's handled by the platform itself, no?

As opposed to Actor.reboot which is an API call to the platform to actually restart the Actor (so something that should be already in progress if you see migrating event).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just validated in a test that if you call Actor.exit after receiving migrating, the Actor will simply finish with SUCCEEDED while it should keep running through the migration and after it. Funnily, if you throw, the migration will happen immediately and successfully continue the run.

But the correct recommended behavior is to call Actor.reboot on migrating to speed it up.

Being API call or not doesn't really matter here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks!

@metalwarrior665
Copy link
Member

metalwarrior665 commented Feb 19, 2026

Regarding the breaking change, I'm leaning slightly to yolo it out before v4. I think the risk of messing up someone is quite low, and we will not need another configuration in the templates.

  1. Migrations are now extremely rare since we don't relocate runs around workers from 2026. Abort + resurrect is also quite rare.
  2. All guides for long time recommended to do handle persistence either via useState or in persistState handler. The biggest risk is someone forgeting to await but that's on them.
  3. Migration times were never really stable, the effort to do only materialized in last year. It used to be randomly 15 seconds or 45 seconds (the 30 sec was never true lol).
  4. So you couldn't rely on any of your requests finishing. In practice some will finish but some will not and those will get processed again So the situation would be the same.

Calling exit() on migration would terminate the run, but migrations should
continue on a new worker. reboot() speeds up the migration and lets the
run resume. exit() is still used for aborting events.

Updated implementation, tests, JSDoc, and docs accordingly.

https://claude.ai/code/session_017U4px7vy75ioiTaC5teCaV
Copy link
Member

@metalwarrior665 metalwarrior665 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise it looks good


If you're using Crawlee crawlers (like `CheerioCrawler`, `PlaywrightCrawler`, etc.), graceful shutdown is handled automatically. The crawler listens for these events and stops accepting new requests while finishing the ones in progress.

### Without Crawlee crawlers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the with vs without Crawlee is mixing two unrelated things:

  1. Crawler stops fetching new requests from the queue. This is mainly to save resources like proxies for requests that would not finish anyway. It might finish the ones in progress till the 30 sec migration.
  2. SDK (with or without Crawlee) with gracefulShutdown: true will exit immediately (no waiting for requests in progress) after persist. This is mainly to make abort faster and sync state better

src/actor.ts Outdated
* **Graceful shutdown:** When running on the Apify platform, the Actor may receive `aborting` or `migrating`
* events. By setting `options.gracefulShutdown` to `true`, the SDK will automatically call `Actor.exit()`
* on `aborting` events and `Actor.reboot()` on `migrating` events (to speed up the migration and continue the
* run on a new worker). This is useful for Actors that don't use Crawlee crawlers (which handle this internally)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, not related to Crawlee?

Copy link
Member

@metalwarrior665 metalwarrior665 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you were waiting for my review but it is good by me. As I said above, I would set this to true by default.

Co-authored-by: Lukáš Křivka <lukaskrivka@gmail.com>
@B4nan
Copy link
Member Author

B4nan commented Mar 2, 2026

I was out last week. Let's move this forward. I would also just enable this. If we see real issues reported, we can reconsider and go with enabling this in the templates instead.

cc @patrikbraborec

@danpoletaev danpoletaev force-pushed the claude/slack-add-abort-event-handler-JYZBh branch from 45ab49b to 59dd3e7 Compare March 6, 2026 22:01
@B4nan B4nan force-pushed the claude/slack-add-abort-event-handler-JYZBh branch from 59dd3e7 to 45ab49b Compare March 6, 2026 22:54
Changed gracefulShutdown from opt-in to opt-out:
- Default is now true (handlers registered automatically)
- Set gracefulShutdown: false to disable

Updated docs and tests accordingly.

https://claude.ai/code/session_017U4px7vy75ioiTaC5teCaV
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants