Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPT table in Genimage-created images may be mangled by Windows OS #262

Open
sairon opened this issue Jul 24, 2024 · 6 comments · May be fixed by #273
Open

GPT table in Genimage-created images may be mangled by Windows OS #262

sairon opened this issue Jul 24, 2024 · 6 comments · May be fixed by #273

Comments

@sairon
Copy link
Contributor

sairon commented Jul 24, 2024

Windows (at least starting with version 10) contain logic that alters the GPT table of a drive connected to it in some cases. More specifically it's been found it relocates the backup GPT table to the end of the drive and the partition array to LBA address equal to first_usable_lba - 32. There is no (easy) way to prevent this behavior in Genimage-created images because first_usable_lba is always set to the start of the first partition. Although this is rather a Windows bug/misfeature, images created using other utilities (like sgdisk) are not affected by this, as they generally set first_usable_lba to LBA 34, so it should be possible to achieve the same with Genimage.

For details see:

These issues show a real world regression triggered by the fact that Genimage works slightly different compared to other utilities. In theory I can imagine this relocation might also overwrite data located between the GPT header and the first actual partition if this area of the disk contained for example some bootloader data - although this is purely hypothetical.

I'd like to discuss the potential ways to mitigate this issue. Should we add a flag for setting first_usable_lba to the lowest possible value, or an arbitrary one, or even set it always to 34 like sgdisk does? Currently it's correlated to the align value but setting it to a one-sector size and juggling with offset values afterwards feels rather clumsy to me.

@michaelolbrich
Copy link
Member

So my understanding was that first_usable_lba is the first LBA that can be used for partitions, and anything before that is off limits. That's why it is implemented as it is right now: There are real use-cases where bootloader data is placed between the partition table and the first partition.

But my understanding is clearly mistaken, or at least windows has a different understanding of what should happen here. So we should probably always do:

header.first_usable_lba = hd->gpt_location + (GPT_SECTORS - 1)*512;

That's 2 + 32 = 34 for the default partition table location. If the partition table is moved to make room for a bootloader, then that should still be acceptable based on you're description above.

@sairon
Copy link
Contributor Author

sairon commented Oct 9, 2024

Hi Michael, first of all, sorry for the delayed reply.

So my understanding was that first_usable_lba is the first LBA that can be used for partitions, and anything before that is off limits. That's why it is implemented as it is right now: There are real use-cases where bootloader data is placed between the partition table and the first partition.

Well, I'd assume the same but obviously Microsoft engineers thought it's good idea to do it like this. I haven't found any reasoning why it's happening, and I think it's too esoteric to cause any other issues. Just find the cause of the problem was quite a detective work.

But my understanding is clearly mistaken, or at least windows has a different understanding of what should happen here. So we should probably always do:
...

It's essentially the same what I ended up with: home-assistant/operating-system#3497 (just taking into the fact that header.first_usable_lba is actually the LBA number, not byte offset). So if you agree, I'll put it into a PR here.

FWIW, the root cause for the RPi bootloader issue has been found and fix is on the way, but I think it'd be still good to stay on the safe side and adjust the Genimage behavior, as I'm not aware of any side effects.

@Villemoes
Copy link

FWIW, I've also been bitten by the current default behaviour. Not because of any Windows interference, but simply because I expected to be able to define a new partition on an existing target that had originally been created using genimage, between the (disk-order-)first and the GPT array, where I knew there was a large gap. But sgdisk wouldn't let me because the first-usable-lba was set to that disk-order-first partition.

So yes, I think that we should by default set first-usable-lba so it follows the primart GPT array, not being the start of the first (in disk-order) partition. We could make it an image option to set first-usable-lba explicitly, with some sanity check that it is after the end-of-array and before the first in-partition-table=true partition.

I can't really think of a situation where one would have some area between the partition array and the first partition which is known to be off-limits, where one wouldn't just define that area as a partition. But there are certainly situations where the first defined partition ends up being quite far from the end of the partition array (e.g. because it is defined with align=1M or something like that), and where one later wants to use some of that left-over area for a new small partition.

@Villemoes
Copy link

Hm, so digging a little, I found that I touched this area years ago.

commit 9190a2baadbf22374bd30369bc968c5e98a5ec05
Author: Rasmus Villemoes <[email protected]>
Date:   Sat Mar 27 22:26:55 2021 +0100

    image-hd.c: correct computation of first_usable_lba

[...]    
    
    I considered instead just unconditionally using the LBA immediately
    following the primary GPT array. However, if there is a bootloader
    spanning the MBR+GPT header+GPT array, the area on the other side of
    the GPT array should not be available for adding a new partition. And
    doing it this way is in any case more aligned with what was done
    previously.

But yes, I've changed my mind, and now do believe that we should default to that GPT_array+32 placement.

Essentially, there are two extreme possible values for first_usable_lba:
(1) minimum, gpt-array + 32, proposed new default
(2) maximum, start of first in-partition-table partition, current default

and then there's the case where some in-partition-table=false defined image straddles the gpt-array (e.g. test/hole.config), in which case we would probably want to bump the default to (3) the lba following the end of that "bootloader" partition.

So how about the following: We add a new hdimage option, first-usable-offset (I want it to be in bytes and not LBA, similar to gpt-location), defaulting to the sentinel value 0. If it is given, we check during hdimage_setup that it is compatible with the above min and max values. If it is not given, we set it at the end of hdimage_setup to either (1) or (3), depending on whether there was such a bootloader image.

sairon added a commit to sairon/genimage that referenced this issue Nov 5, 2024
Currently first usable LBA in the GPT header is pointing to offset of
the first partition, ignoring the gpt-location specified in the config.
This can lead to some issues as explained in [1]. Disabling this
behavior doesn't break any tests and allows for generating of images
that have same layout as those generated by sgdisk or other utilities.

[1] pengutronix#262

Signed-off-by: Jan Čermák <[email protected]>
@sairon
Copy link
Contributor Author

sairon commented Nov 5, 2024

I have opened a PR with the patch that we use for a while for Home Assistant OS. I'll leave it up to Michael if he wants to merge it as-is or extend it with some toggles to accommodate for the scenarios discussed above.

@michaelolbrich michaelolbrich linked a pull request Nov 8, 2024 that will close this issue
@michaelolbrich
Copy link
Member

Take a look at #273. This sets first_usable_lba to the lba after either a bootloader or the partition table.
So without the bootloader, the result should be the same as #270.
If you need a bootloader and combine that with Windows, then you should probably put the bootloader in an explicit partition, otherwise Windows will consider that space as unused...

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants