FWUP via nerves_hub_link can be run while fwup is performed via ssh, resulting in corruption #98

dognotdog · 2021-11-29T23:03:50Z

I've been wondering how I corrupted an SD card before, and now I found at least one way: with an active deployment running and fwup in progress, but also local performing a fwup via ssh, it looks like the two update processes simultaneously wrote to the "unused" partition, resulting in complete garbage.

As an aside, is the on-disk cecksum not verified on firmware updates?

fhunleth · 2021-11-30T00:07:15Z

fwup uses checksums to verify that the .fw file is correct and it reads back what it writes by default. It does the latter progressively as it applies the update. This means that if you have one firmware update write a block 1 and then have another firmware update change it after the first one has verified the write, you can get correction. Even if fwup were to verify everything at the very end, this could still happen, but the window of time would be smaller.

What you're looking for is a mutex to only allow one firmware update to happen at a time. That doesn't exist yet. Given that we can changenerves_hub_link, fwup, and ssh_subsystem_fwup, it seems possible to make something. Could you make a proposal on how to implement?

dognotdog · 2021-11-30T00:48:19Z

Having both a final checksum as well as some mutex mechanism seems worthwhile to me, but I do know too little about each of those subsystems, plus there are probably some caveats, eg. when "updating" an SD-Card outside of the device, we'd probably want that to not be blocked by an aborted update process, and similarly a crashed subsystem should not block a new update attempt.

It does seem like a robust mutex mechanism would be somewhat complicated to account for such edge cases, and right now I'm unsure what could work. The only condition is that a partition is not marked good unless it is wholly checked, how a retry is done is another matter.

Maybe writing a checksum before, one after, and final check of pre and post checksums against actual data? That way, assuming another sequential write interferes, the pre checksum would necessarily be changed before final validation, thus blocking the validation? This way, last one to complete wins, if there was no interference.

dognotdog mentioned this issue Nov 29, 2021

ssh fwup vs. nerves_hub fwup race condition? nerves-project/ssh_subsystem_fwup#36

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FWUP via nerves_hub_link can be run while fwup is performed via ssh, resulting in corruption #98

FWUP via nerves_hub_link can be run while fwup is performed via ssh, resulting in corruption #98

dognotdog commented Nov 29, 2021

fhunleth commented Nov 30, 2021

dognotdog commented Nov 30, 2021

FWUP via nerves_hub_link can be run while fwup is performed via ssh, resulting in corruption #98

FWUP via nerves_hub_link can be run while fwup is performed via ssh, resulting in corruption #98

Comments

dognotdog commented Nov 29, 2021

fhunleth commented Nov 30, 2021

dognotdog commented Nov 30, 2021