Skip to content

broker: convert groups to a module #6959

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

garlick
Copy link
Member

@garlick garlick commented Aug 8, 2025

Problem: the broker uses the groups service for quorum management, but groups is a standalone service that doesn't need to be as tightly coupled to the broker as it is.

Convert groups to a built-in broker module with autoload=true. The reduced coupling between broker and groups should make both easier to maintain.

garlick added 5 commits August 8, 2025 16:02
Problem: the groups subsystem uses callbacks to monitor overlay
activity, which prevents it from being converted to a built-in
broker module.

Add an alternative message-based interface for overlay monitoring.
Problem: the broker groups subsystem tracks the broker context,
complicating conversion to a built-in broker module.

Since groups only needs ctx->h and ctx->rank, update 'struct groups'
to track those two items rather than the entire context.
Problem: the state machine calls the groups.get RPC during its
own initialization, earlier than necessary and before built-in
broker modules are loaded.

Call it on entry to QUORUM state to allow the groups subsystem
to move to a built-in broker module.
Problem: the broker uses the groups service for quorum
management, but groups is a standalone service that doesn't
need to be as tightly coupled to the broker as it is.

Convert groups to a built-in broker module with autoload=true.
The reduced coupling between broker and groups should make both
easier to maintain.
Problem: there is no test coverage for the overlay.monitor RPC

Add one test.
@grondo
Copy link
Contributor

grondo commented Aug 8, 2025

Only somewhat related to this PR, but should builtin modules with autoload=true and that cause extreme failure if unloaded be listed in flux module list output by default? Seems like maybe connector-local and perhaps this new groups module should be excluded from this output since they are not really meant to be reloaded.

@garlick
Copy link
Member Author

garlick commented Aug 8, 2025

Good question. Maybe we could add an unloadable flag? It is mainly connector-local that should never be unloaded since there is no way to load it back again. groups actually can be reloaded. I don't think we should make them invisible because it might be useful to have their Idle, Sendq, Recvq fields in flux module list, remember that we can flux module stats them, etc..

By the way, when I reloaded groups I noticed that resource didn't seem to notice. I better fix that.

@grondo
Copy link
Contributor

grondo commented Aug 9, 2025

Ok, that makes sense. It only came up because modprobe wants a list of all modules, but has to treat connector-local differently since it is auto-loaded and should not be unloaded, etc. Easy enough to work around though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants