-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[Bug]: default OVMF build overwrites memory on FreePages() causing page fault if pages are not writable #11074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@os-d If FreePages() call is from UnloadImage(), then the memory attributes should be reset in UnloadImage() before FreePages() is called. Is that the current behavior? |
@mdkinney that is the expected behavior, however in this case I believe that the Linux shim is not using EFI load/unload image and instead is calling AllocatePages(), does a custom load/unload image, then calls FreePages(), with the attributes being set to RO. So when DebugClearMemory comes in, it hits the RO page. From the attached shim PR, they are just directly calling FreePages(): https://github.com/rhboot/shim/pull/750/files. I believe we have a fix for this situation in Mu, I am looking for it now and assessing upstreamability (I am generally going through all the Mu memory protections fixes and attempting to upstream). I believe in edk2 the expectation in FreePages() is that memory is writeable (for DebugClearMemory), but I think it is fair to say that in today's world we may have more complex situations like where attributes are changed on a page and the core doesn't know, so likely the core should check the attributes and update as necessary. I will try to get a PR up soon, I'm working with @bluca offline to confirm the Mu fix fixes this. This is one where a fix from both sides is reasonable, the shim loader should unset attributes on unload (at least back to RW) and DXE core should not write to pages it doesn't know if they are writeable or not. |
@mdkinney , I can't assign this issue to myself currently (I'm guessing because maintainer write permissions revoked during the freeze), but feel free to assign me. |
Thank you for the additional details - leaving some more pointers here. Yes shim is overriding the system tab with its own LoadImage/UnloadImage. The load image is at: https://github.com/rhboot/shim/blob/main/loader-proto.c#L167 It will call AllocatePages() where the image is stored, and set the W- X+ bits here: https://github.com/rhboot/shim/blob/main/pe.c#L558 Then on UnloadImage FreePages() is called here: https://github.com/rhboot/shim/blob/main/loader-proto.c#L342 |
FYI, this seems very similar to the issue that I reported on OVMF in Xen a few days before you did: #11046 |
@bluca if shim can match the I don't think it's wrong for EDIT: I have no particular horse in this race, I just dislike an improper (or at least asymmetrical) distribution of responsibilities. What @mdkinney describes here (as something that's happening in edk2) is (AIUI) precisely what shim should be doing, too. (edit2: disentangled the two expressions "horse in this race" and "skin in this game"; sorry) |
I disagree. The application is already responsible for tracking (releasing) the memory it allocates; the core does not track memory allocations for applications or drivers. If the core hands out memory with a particular set of attributes, I think it can expect the memory to be returned with the same set of attributes. Trying to cover up missing application actions is a recipe for disaster, IMO. |
It will, in the next version
EDK2 should not write over pages that it doesn't know are writable, regardless of what Shim does or does not. The NX/W^X stuff is becoming a hard requirement soon, so this situation will become more and more common. The allocator should just ensure that memory it gets back is configured as it needs it to be, before using it further, as a matter of robustness. The application has no idea what the allocator will or will not do, and shouldn't really care about it. Setting attributes is part of a standard protocol, so it has to be expected that applications use it as they see fit. |
@lersek , well unfortunately this is an area where the spec doesn't say anything about this. I agree that shim should be fixed and reset the attributes before freeing the pages (I'm told this fix is happening). However, there are released versions of shim (v16 at least) that have this behavior. So, users will either see a build with debug clear memory crash or who knows what happen if those pages are handed out again on a release build. On the flip side of the argument, the core is responsible for handing out pages that are RW. If it simply accepts pages back without enforcing the attributes, it opens up problems for random parts of the system and can't make that guarantee. I think it is much better to catch it and fix it at that point. Especially as OPROMs or bootloaders (as in this case) can call the memory attribute protocol and do whatever they want, i.e. things not fixable in edk2. I think it is riskier for the core to not guarantee pages it hands out are RW than to fix it up. Now I can definitely see the argument that we use the memory attribute protocol Get API in the core to determine whether attributes need to be fixed up at all instead of blindly doing it, as the majority of cases should be fine. |
Why should the core assume that the page attributes changed from allocation till release?
Which is great.
Therefore applications will have to extend their cleanup paths carefully, such that those mirror their normal (construction) paths.
This is where we disagree. It's a matter of principle (and in this case, a misbehaving application triggers a crash really soon, which helps with improving the application's correctness).
I agree.
I agree. The application should not restore the page attributes because it expects Instead, the application should roll back the attributes on principle. Mirror every construction step with a proper destruction step.
I agree.
This does not follow, in my opinion. UEFI defines a large number of standard protocols, and the only case that I can recall, offhand anyway, where you don't need to undo an opening action is |
The core is responsible for upholding its specified invariants and promises as long as the application doesn't violate interface contracts. That's the entire characterization of the C language, more generally speaking. If you invoke undefined behavior, no promises from the environment hold.
Indeed. The core shouldn't make that guarantee, if the application didn't perform the proper cleanup.
That's my point precisely. By design, there is no hardware-based privilege separation between modules in UEFI (let's put SMM aside for now). Even if we implement W^X pervasively (which we should), that won't prevent UEFI applications and DXE / runtime DXE / UEFI drivers from scribbling over each other's data areas, or from doing untold damage to hardware resources (in general). Trying to contain the havoc that modules may wreak is futile, per the original UEFI design of no address space separation and no CPL/rings separation. Therefore I dislike diluting the responsibility of application writers, and masking application bugs.
I agree. Less tolerance for application issues results in more crashes and misbehavior. It should also result in more correct applications over time. Anyway: don't let me block you. I've voiced my disagreement; thanks for considering it. |
I thought we were working on a PR that would change FreePages() behavior to not free pages whose memory attributes were not in the same state as returned from AllocatePages(). This would mean there could be a memory leak, but the application would run. I agree we want well behaved applications and never like putting a workaround for an OS in FW. But we also want compatibility with widely distributed OS binaries. Always hard to find the right balance. Could make this proposed behavior optional with default disabled and a platform van choose to opt-in to compatibility. |
Crossing fingers and hoping all applications that ever existed, and that will ever exist, do everything absolutely right or the entire system crashes is very risky. The allocator's job is to hand out usable pages. If the pages cannot be used by my application because of a third one that misbehaved, and the allocator gives me pages that make my application crash the system, then I'll have to spend time to do some extremely difficult triage and debugging. This is not a useful way for developers to be spending their time. Applying a bit of defensive programming where it matters and makes a difference is also a good principle to follow. It's a bug if an application marks pages W^X and doesn't undo it, sure. But it's also a bug if I ask the allocator to give me some pages, and it gives me unwritable ones because of a previous application's bug.
Please don't - as we have seen with the debug overwrite, distributors just use whatever is the default. There are tons of options, and they are not easy to grasp, so making it optional in practice it means it will be rarely used. |
We are, but this PR also tries to set the attributes correctly, which is what I believe @lersek disagrees with. Another option is to check the attributes and leak the pages if they don't have the correct attributes, but since the same protocol that provides the core the ability to get the attributes also lets the core set the attributes, it seems a bit silly to just drop the pages without trying to fix the attributes. Definitely shim should be fixed. But, taking a step back, the core does already take a stance on what the state of free memory should be (dependent on configuration). If the NX memory protection policy is set for EfiConventionalMemory, this doesn't matter (well the debug clear memory piece is still an issue but that can be moved), the core will set the pages as RW (i.e. NX). So I don't see this as new behavior for the core, just changing where it does it. Arguably that can be cleaned up so we don't do it twice, but that doesn't apply cleanly with the idea of leaking memory we couldn't change attributes on.
This is why in my original implementation, I had asserts here. I want to catch the situations where applications are misbehaving and fix them, but I also want the core to be robust to failures, especially known issues in the wild. |
Is there an existing issue for this?
Bug Type
What packages are impacted?
MdePkg
Which targets are impacted by this bug?
RELEASE
Current Behavior
The default value of gEfiMdePkgTokenSpaceGuid.PcdDebugPropertyMask in the OvmfPKgX64.dsc config enables the 0x08 bit, which causes DebugClearMemory() to attempt to overwrite memory freed by FreePages() with a fixed pattern.
In shim we have started marking memory used to load executable images as executable and non-writable, according to the NX workflow requirement. We use AllocatePages() to allocate memory for the image, and FreePages to free it.
When an image loading/starting goes wrong, we attempt to free it. Given it's an executable image, it's marked as non-writable. DebugClearMemory causes a page fault when it attempts to overwrite it:
Expected Behavior
Firmware doesn't crash :-)
Steps To Reproduce
Either using OVMF from Debian testing, or building from latest master, and booting with qemu, running shim 16.0 as the first stage loader with a UKI that loads PE addons.
This should be reproducible with a standalone PE app that calls AllocatePages, uses the protocol added by efaa102 to mark the pages as executable and non-writable, and calls FreePages, but haven't attempted to create this yes.
Build Environment
Version Information
Urgency
Low
Are you going to fix this?
Someone else needs to fix it
Do you need maintainer feedback?
Maintainer feedback requested
Anything else?
Should DebugClearMemory attempt to check/reset the writable bit before overwriting?
Should this feature be enabled by default in the default OVMF configuration file?
We can work around it for now, but it would be good to answer these questions. Shim PR with workaround:
rhboot/shim#750
The text was updated successfully, but these errors were encountered: