-
Notifications
You must be signed in to change notification settings - Fork 8
Description
In a comment on #336 , @grondo pointed out that any dynamically-loaded plugin that does asynchronous work may cause the broker to crash:
If the plugin is unloaded while the watcher is still active, this could cause a broker crash since the epilog_timeout_cb symbol won't exist anymore.
Unfortunately, in these situations, the plugin needs to keep a list of outstanding watchers so it can stop them and/or destroy them on exit. i.e. you can't rely on using the job aux item for this purpose.
I asked
How could I trigger cleanup on plugin removal? By using flux_plugin_set_aux or something, with the destructor set to my cleanup function?
and @grondo said:
Yeah, I think the approach would be to have a global context stored in the plugin aux cache and add a list of objects to that ctx that would need to be freed on plugin removal.
Actually looking at existing plugins in core, many of them suffer from this same issue 🤦. Feel free to add an issue to address this at some point in the future. For now we'll have to be careful reloading most plugins that do asynchronous work. Maybe we can add something to the API to make this more manageable and obvious.
Both the dws-jobtap and cray_pals_port_distributor plugins have this vulnerability.