-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(app): allow robot restarts to track boot ID and timeout #7589
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be looked at by JS person.
This seems pretty tight to me. Anecdotally, I think I've seen restarts take multiple minutes, depending on whether the motor controller board needs to update its firmware, and potentially even how old the SD card is. Admittedly, though, that's measuring from power-on to API server readiness—not update server readiness. Should we instead set the timeout very very conservatively? Like 10 minutes? Edit: And the current "Robot is restarting..." dialog copy says "this may take up to 3 minutes," which still seems kind of tight to me. |
Codecov Report
@@ Coverage Diff @@
## edge #7589 +/- ##
=======================================
Coverage ? 82.31%
=======================================
Files ? 328
Lines ? 22062
Branches ? 0
=======================================
Hits ? 18160
Misses ? 3902
Partials ? 0 Continue to review full report at Codecov.
|
Overview
This PR moves restart tracking logic from a smear across the
robot-admin
andbuildroot
modules into a single epic in therobot-admin
module. The new epic:Closes #6585
Changelog
Review requests
This is kind of a gnarly test matrix, but here's the behaviors we want to test. The "Restart" button on the robot page is the easiest way to trigger a restart. This works best with a non-Wi-Fi robot so you can pull the cable at any point to simulate failures.
We also need to test both of these for robot updates:
With those behaviors, we have two types of robots to test on:
bootId
in their update server's health endpointrobotAdmin
>${robotName}
>restart
>bootId
to make sure its being trackedrobotAdmin:RESTART_STATUS_CHANGED
actions, there should be abootId
in the action that changes the status torestart-pending
bootId
Finally:
60changed to 300 seconds (5 minutes) was an arbitrary choice! Is that too short? Too long?If you need refresher on what the buildroot update epics are supposed to do, check out the app cookbook which has a mostly complete outline + flowcharts of the app-side logic of the update procedure.
Risk assessment
Medium.
I have not tested this on a robot yet, and it required some changes to the robot update logic.After robot testing, this PR appears to be doing its job well when combined with #7608Mitigating factors: