Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Satellite recovery after controller failover #403

Open
dniasoff opened this issue Apr 19, 2024 · 1 comment
Open

Satellite recovery after controller failover #403

dniasoff opened this issue Apr 19, 2024 · 1 comment

Comments

@dniasoff
Copy link

Hi

Just testing controller HA (using drbd-reactor) and simulating failures.

Drbd-reactor is working nicely but satellite nodes aren't recovering automatically.

Here is what I see after a failure

image

However, if I restart the satellite daemon on both nodes - here is what I see

image

I could somehow work out how to create a process to restart all satellite nodes after controller failure but I guess somehow there should be a way to allow the satellite service to "self-heal" without a restart.

Thanks

Daniel

@ghernadi
Copy link
Contributor

Yes, once the satellite is up again, the controller should automatically reconnect to the satellite and get the resource UpToDate again. Can you verify if the satellite 1) did start and 2) is shown as Online in linstor node list?

If that looks fine, please check for possible ErrorReports (via err list) to see if something happened that could help us why LINSTOR did not properly restore the state again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants