-
Notifications
You must be signed in to change notification settings - Fork 143
Initial factory reset code #1588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
387f495
to
ab75caf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new experimental bootc install reset
command, which appears to function as a factory reset. The implementation is well-structured, refactoring existing deployment logic to use a new MergeState
enum, which is a clean way to handle different deployment sources. The changes also include new utilities for managing stateroots. I've found a couple of issues that need to be addressed before merging.
I think we'll want tests at least before merging, and supporting that with tmt is probably going to be an excellent driver for the feature. I haven't dug in but I think tmt is relying on injecting ssh keys w/cloud-init, so it might Just Work to do a reset there? |
b45e272
to
a0cd215
Compare
Updated with a basic test. |
a0cd215
to
2ad17c6
Compare
crates/lib/src/deploy.rs
Outdated
let deployment_dir = ostree.deployment_dirpath(&staged); | ||
let deployment_dir = std::path::Path::new(deployment_dir.as_str()); | ||
if deployment_dir.exists() { | ||
let marker_path = deployment_dir.join(".bootc-factory-reset"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have clear const
for things like this with an explanation there.
But backing up a second, in the ostree side we have "deployment backing" directories already where we can put metadata.
Though of course we should also be thinking about how this would work with the composefs backend.
Hmmm hmm. Here's an idea, how about we use the deployment's etc
directory for state like this instead? I think we could save e.g. system.bootc.merged_from
xattr or so?
Basically I lean towards extended attributes over "stamp files" for metadata.
BTW though do note the intersection here with etc.transient
- I wonder if we should make that more visible in status output too. If that's enabled each upgrade is already like a reset of /etc
.
crates/lib/src/status.rs
Outdated
if is_factory_reset { | ||
// Factory reset deployments don't support soft reboots | ||
// this is primarily because the kargs validation will fail when checking for soft reboot | ||
// compatibility in the ostree code, which will cause bootc status and upgrade to fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we need to special case this. Why is bootc status failing exactly?
If we're changing (usually dropping) kernel arguments won't the existing karg check work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a followup - we talked about this in a live chat and there seems to be a libostree bug here where the staged deployment is missing ostree=
and that trips an assertion in the soft reboot checks.
Ideally we root cause and fix that.
But we may be able to work around this by detecting when there's no kargs in the staged deployment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will fix the factory resest issue: ostreedev/ostree#3532. Still investigating why the deployment is missing the ostree
arg.
Just saw the demo for this. A few random thoughts. It'd be nice to be able to have different reset levels. One is just resetting Should this use stateroots at all? This would be another thing to implement in the composefs backend (or maybe that code and concept already exist there?). But also it's quite common for One alternative though is basically to just move everything in (There's still the case of submounts in |
2ad17c6
to
6003aa7
Compare
be8b439
to
46e0668
Compare
Signed-off-by: Colin Walters <[email protected]>
This is a nondestructive variant of `to-existing-root`. Signed-off-by: Colin Walters <[email protected]>
Also cleans up some bugs from rebasing previous commits. Signed-off-by: ckyrouac <[email protected]>
Signed-off-by: ckyrouac <[email protected]>
46e0668
to
def648d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there's some CI failures, but otherwise we could try to get this in?
It'd be a bit ugly but maybe we could detect the ostree version and avoid crashing if it's too old?
Add ostree version check to has_soft_reboot_capability() to ensure soft reboot is only enabled when ostree 2025.7 or later is present. This prevents attempting soft reboots on older ostree versions that have a bug when validating kargs during a factory reset. The check happens before the systemd and deployment capability checks, providing a fast path return for incompatible versions. Signed-off-by: ckyrouac <[email protected]>
def648d
to
ea5896f
Compare
Just pushed up a commit to check the ostree version in the has_soft_reboot_capability function where the crash happens. I haven't had a chance to dig into the deeper root cause of why the ostree= karg is getting set but the version check should unblock this PR's CI. The result is on systems with an outdated ostree, soft reboot will be disabled for factory reset deployments. If it works for you, I'll just file an issue to track fixing the ostree= karg issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks overall sane to me!
One thing that would probably help is to add docs for this in our experimental section.
assert not equal $reset_status.status.otherDeployments.0.ostree.stateroot "default" | ||
|
||
# we need tmt in the new stateroot for second_boot | ||
print "Copying tmt into new stateroot" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this one is really asking for enhancing reset to support copying data in a nicer way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
working on adding bootc mount
This is a rebased version of #1389 with some fixes on top to make
bootc install reset --experimental
work with the latest soft reboot code. I'd like to get this in hidden behind--experimental
, then iterate on it. I plan to add some e2e tests as a followup but I can work on including them as part of this PR if you want.