Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bin/admin-upgrade: add kexec support #3362

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Mstrodl
Copy link

@Mstrodl Mstrodl commented Dec 19, 2024

Adds a new --kexec flag to ostree admin upgrade which will cause
the deployment to be loaded into kexec after the upgrade completes.
It is particularly useful in conjunction with the --reboot flag to
perform a reboot into the new deployment without waiting for the
(often slow) firmware initialization to take place. (And in my case,
allows me to avoid a normal reboot, which can be unreliable on my
hardware).

After an image has been loaded (using the kexec_file_load syscall),
the systemctl-reboot command (which is called when the existing
-r flag is included) will trigger a kexec on the loaded image
rather than a normal reboot. From systemctl(1):

If a new kernel has been loaded via kexec --load, a kexec will be
performed instead of a reboot, unless "SYSTEMCTL_SKIP_AUTO_KEXEC=1"
has been set. If a new root file system has been set up on
"/run/nextroot/", a soft-reboot will be performed instead of a
reboot, unless "SYSTEMCTL_SKIP_AUTO_SOFT_REBOOT=1" has been set.

A good in-depth technical explanation of kexec can be found here:
https://web.archive.org/web/20090505132901/http://www.ibm.com/developerworks/linux/library/l-kexec.html

My implementation uses the kexec_file_load syscall rather than the
older kexec_load syscall, which allows the kernel to verify the
signatures of the new kernel. It is supported on Linux 3.17 and
newer. I assume this probably won't be an issue, but if it is, it's
not that hard to put a preprocessor directive around the kexec stuff
to disable it for older kernels. Even RHEL is new enough now to
not be an issue :)

Closes: #435

Copy link

openshift-ci bot commented Dec 19, 2024

Hi @Mstrodl. Thanks for your PR.

I'm waiting for a ostreedev member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@cgwalters
Copy link
Member

/ok-to-test

Thanks for working on this! It's worth noting the intersection with e.g. systemctl kexec; if we supported writing the BLS type 1 entries (ref #3359 ) then on UEFI that would work instead.

I think this would be best as something like ostree admin kexec-load <deployment> in the general case, which would allow targeting for kexec e.g. a rollback as well in a general way.

Then ostree admin upgrade --kexec would just be sugar for that.

@cgwalters
Copy link
Member

BTW just to emphasize again, thanks for hacking on this! A wonderful thing about working on FOSS is when people appear out of the blue with nice contributions, and this is one of those!

@Mstrodl
Copy link
Author

Mstrodl commented Dec 19, 2024

Thanks for working on this! It's worth noting the intersection with e.g. systemctl kexec; if we supported writing the BLS type 1 entries (ref #3359 ) then on UEFI that would work instead.

I didn't do a ton of research into systemd's own automatic detection, but I got some weird errors that seemed like there was more going on with it than it not liking my BLS entries:

Cannot automatically load kernel: ESP mount point not found.

I think this would be best as something like ostree admin kexec-load <deployment> in the general case, which would allow targeting for kexec e.g. a rollback as well in a general way.

Sure. I may add a new ostree_sysroot_deployment_kexec_load like you proposed and have admin-upgrade and admin-switch call that before reboot when -k is passed.

BTW just to emphasize again, thanks for hacking on this! A wonderful thing about working on FOSS is when people appear out of the blue with nice contributions, and this is one of those!

I've been working on getting my employer's debian-based appliances using ostree for immutability. The only issue is that the hardware doesn't always reboot correctly (especially if it's got a lot of uptime, which makes it extra-hard to reproduce) which means I've had a lot of embarrassing "hey can you powercycle your box?" phonecalls :)

@cgwalters
Copy link
Member

I've been working on getting my employer's debian-based appliances using ostree for immutability.

Makes sense! Can I ask you the same question asked here - have you investigated bootc? I'd love to help enable this use case there if the container-native approach is of interest to you! There's a lot of new feature work happening in bootc and that's where most of my time is going. What bootloader stack are you using?

Regardless just a side note at some point we should have bootc upgrade --apply --kexec or so once this lands, would be a trivial patch.

@Mstrodl
Copy link
Author

Mstrodl commented Dec 19, 2024

Makes sense! Can I ask you the #3359 (comment) - have you investigated bootc? I'd love to help enable this use case there if the container-native approach is of interest to you! There's a lot of new feature work happening in bootc and that's where most of my time is going. What bootloader stack are you using?
I actually discovered bootc a few weeks ago and it's pretty close but different to what we do right now. If I had known about bootc when I was making this stuff I probably would have used it :)
We're using grub and dracut, mostly because I was having some issues with getting systemd-boot to work correctly. I would sort of rather use systemd-boot. Part of the issue is that we use ext4 for /boot, which I did because ostree threw some errors when I used FAT (I think because it wanted to use symlinks? I was probably just doing something wrong, but at the time I just couldn't figure it out).

It would probably not be an insane amount of work for us to move to bootc; Our images right now are generated by building a docker image, exporting it, and committing that to the ostree repository. The big thing we'd need to figure out is migrating our existing ostree-based systems and our existing ordinary debian-based systems to bootc. Right now I have a scary bash script that migrates our existing debian installs to the new ostree-managed stuff, so I'd need to figure out how to do that with bootc.

@Mstrodl Mstrodl force-pushed the feature/mstrodl/kexec2 branch from 27f8347 to 33c412d Compare December 19, 2024 21:50
@Mstrodl
Copy link
Author

Mstrodl commented Dec 19, 2024

Okay, here's a new version that does it how you suggested. I haven't done much testing yet, so I guess consider this as a "draft" for now :)

@Mstrodl Mstrodl force-pushed the feature/mstrodl/kexec2 branch from 33c412d to d5bd0e0 Compare December 19, 2024 23:15
@Mstrodl
Copy link
Author

Mstrodl commented Dec 19, 2024

Okay, I didn't realize this until now, but the SYS_kexec_file_load syscall is only defined on some architectures. Now, it checks to make sure the architecture/kernel is supported. This is why the tests were failing on the 32 bit systems.

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good!

src/libostree/ostree-sysroot-upgrader.c Show resolved Hide resolved
@@ -44,6 +44,7 @@ typedef enum
OSTREE_SYSROOT_UPGRADER_FLAGS_NONE = (1 << 0),
OSTREE_SYSROOT_UPGRADER_FLAGS_IGNORE_UNCONFIGURED = (1 << 1),
OSTREE_SYSROOT_UPGRADER_FLAGS_STAGE = (1 << 2),
OSTREE_SYSROOT_UPGRADER_FLAGS_KEXEC = (1 << 3),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not opposed but do you think we still need this API flag though, versus just having callers chose to do it?

It feels like an optional secondary step, and it's basically a flag that just adds an API call that the invoking code can just run directly too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I did it this way was because the updater doesn't expose the deployment to the caller. If there's some way to do it that I'm missing though, let me know!

src/libostree/ostree-sysroot.h Show resolved Hide resolved
@cgwalters
Copy link
Member

Can you squash this into your commit?

From 63356cd077f195fa44a89db18b78edc3dc971287 Mon Sep 17 00:00:00 2001
From: Colin Walters <[email protected]>
Date: Thu, 19 Dec 2024 18:53:41 -0500
Subject: [PATCH] Add new API to symbols and docs

---
 Makefile-libostree.am             | 6 +++---
 apidoc/ostree-sections.txt        | 1 +
 src/libostree/libostree-devel.sym | 5 +++++
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/Makefile-libostree.am b/Makefile-libostree.am
index 11a7bbed..915b20b8 100644
--- a/Makefile-libostree.am
+++ b/Makefile-libostree.am
@@ -175,9 +175,9 @@ endif # USE_GPGME
 symbol_files = $(top_srcdir)/src/libostree/libostree-released.sym
 
 # Uncomment this include when adding new development symbols.
-#if BUILDOPT_IS_DEVEL_BUILD
-#symbol_files += $(top_srcdir)/src/libostree/libostree-devel.sym
-#endif
+if BUILDOPT_IS_DEVEL_BUILD
+symbol_files += $(top_srcdir)/src/libostree/libostree-devel.sym
+endif
 
 # http://blog.jgc.org/2007/06/escaping-comma-and-space-in-gnu-make.html
 wl_versionscript_arg = -Wl,--version-script=
diff --git a/apidoc/ostree-sections.txt b/apidoc/ostree-sections.txt
index b46e606c..e934b859 100644
--- a/apidoc/ostree-sections.txt
+++ b/apidoc/ostree-sections.txt
@@ -582,6 +582,7 @@ ostree_sysroot_repo
 ostree_sysroot_get_repo
 ostree_sysroot_get_staged_deployment
 ostree_sysroot_init_osname
+ostree_sysroot_deployment_kexec_load
 ostree_sysroot_deployment_set_kargs
 ostree_sysroot_deployment_set_kargs_in_place
 ostree_sysroot_deployment_set_mutable
diff --git a/src/libostree/libostree-devel.sym b/src/libostree/libostree-devel.sym
index 6640e11c..0af241dc 100644
--- a/src/libostree/libostree-devel.sym
+++ b/src/libostree/libostree-devel.sym
@@ -20,6 +20,11 @@
    - uncomment the include in Makefile-libostree.am
 */
 
+LIBOSTREE_2024.11 {
+global:
+  ostree_sysroot_deployment_kexec_load;
+} LIBOSTREE_2024.7;
+
 /* Stub section for the stable release *after* this development one; don't
  * edit this other than to update the year.  This is just a copy/paste
  * source.  Replace $LASTSTABLE with the last stable version, and $NEWVERSION
-- 
2.47.0

@Mstrodl
Copy link
Author

Mstrodl commented Dec 20, 2024

Will do tomorrow when I test. Didn't realize there was more stuff to update, sorry!

@Mstrodl Mstrodl force-pushed the feature/mstrodl/kexec2 branch from d5bd0e0 to 40ef30b Compare December 23, 2024 18:41
@Mstrodl
Copy link
Author

Mstrodl commented Dec 23, 2024

@cgwalters Sorry I ended up being much busier friday than I expected. I rolled your changes into my commit. Let me know what you'd like with the kexec flag. I don't disagree that having it at the call-site might make more sense, but as far as I could tell, there wasn't a great way to get access to the deployment the upgrader ultimately made. Now that I'm looking more at the API, I suppose I could call ostree_sysroot_get_staged_deployment, but that feels kinda inelegant? Let me know what you think! (And particularly if there's a better way to do this!)

Adds a new `--kexec` flag to `ostree admin upgrade` which will cause
the deployment to be loaded into kexec after the upgrade completes.
It is particularly useful in conjunction with the `--reboot` flag to
perform a reboot into the new deployment without waiting for the
(often slow) firmware initialization to take place. (And in my case,
allows me to avoid a normal reboot, which can be unreliable on my
hardware).

After an image has been loaded (using the `kexec_file_load` syscall),
the `systemctl-reboot` command (which is called when the existing
`-r` flag is included) will trigger a kexec on the loaded image
rather than a normal reboot. From `systemctl(1)`:

  If a new kernel has been loaded via kexec --load, a kexec will be
  performed instead of a reboot, unless "SYSTEMCTL_SKIP_AUTO_KEXEC=1"
  has been set. If a new root file system has been set up on
  "/run/nextroot/", a soft-reboot will be performed instead of a
  reboot, unless "SYSTEMCTL_SKIP_AUTO_SOFT_REBOOT=1" has been set.

A good in-depth technical explanation of kexec can be found here:
https://web.archive.org/web/20090505132901/http://www.ibm.com/developerworks/linux/library/l-kexec.html

My implementation uses the `kexec_file_load` syscall rather than the
older `kexec_load` syscall, which allows the kernel to verify the
signatures of the new kernel. It is supported on Linux 3.17 and
newer. I assume this probably won't be an issue, but if it is, it's
not that hard to put a preprocessor directive around the kexec stuff
to disable it for older kernels. Even RHEL is new enough now to
not be an issue :)

Closes: ostreedev#435
@Mstrodl Mstrodl force-pushed the feature/mstrodl/kexec2 branch from 40ef30b to b04cb8f Compare December 23, 2024 18:59
@Mstrodl
Copy link
Author

Mstrodl commented Dec 23, 2024

/retest -- Looks like an error accessing mirrors

Copy link

openshift-ci bot commented Dec 23, 2024

@Mstrodl: The /retest command does not accept any targets.
The following commands are available to trigger required jobs:

/test fcos-e2e
/test images

Use /test all to run all jobs.

In response to this:

/retest -- Looks like an error accessing mirrors

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@Mstrodl
Copy link
Author

Mstrodl commented Dec 23, 2024

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ostree admin prepare-kexec
2 participants