You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been wondering how I corrupted an SD card before, and now I found at least one way: with an active deployment running and fwup in progress, but also local performing a fwup via ssh, it looks like the two update processes simultaneously wrote to the "unused" partition, resulting in complete garbage.
As an aside, is the on-disk cecksum not verified on firmware updates?
The text was updated successfully, but these errors were encountered:
fwup uses checksums to verify that the .fw file is correct and it reads back what it writes by default. It does the latter progressively as it applies the update. This means that if you have one firmware update write a block 1 and then have another firmware update change it after the first one has verified the write, you can get correction. Even if fwup were to verify everything at the very end, this could still happen, but the window of time would be smaller.
What you're looking for is a mutex to only allow one firmware update to happen at a time. That doesn't exist yet. Given that we can changenerves_hub_link, fwup, and ssh_subsystem_fwup, it seems possible to make something. Could you make a proposal on how to implement?
Having both a final checksum as well as some mutex mechanism seems worthwhile to me, but I do know too little about each of those subsystems, plus there are probably some caveats, eg. when "updating" an SD-Card outside of the device, we'd probably want that to not be blocked by an aborted update process, and similarly a crashed subsystem should not block a new update attempt.
It does seem like a robust mutex mechanism would be somewhat complicated to account for such edge cases, and right now I'm unsure what could work. The only condition is that a partition is not marked good unless it is wholly checked, how a retry is done is another matter.
Maybe writing a checksum before, one after, and final check of pre and post checksums against actual data? That way, assuming another sequential write interferes, the pre checksum would necessarily be changed before final validation, thus blocking the validation? This way, last one to complete wins, if there was no interference.
I've been wondering how I corrupted an SD card before, and now I found at least one way: with an active deployment running and fwup in progress, but also local performing a fwup via ssh, it looks like the two update processes simultaneously wrote to the "unused" partition, resulting in complete garbage.
As an aside, is the on-disk cecksum not verified on firmware updates?
The text was updated successfully, but these errors were encountered: