Skip to content

osbuild: conditionally use the built payload as the buildroot #3808

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 22, 2024

Conversation

dustymabe
Copy link
Member

With this we now use a buildroot that is derived from the OCI container that was built by the pipeline. This allows us to use the exact same versions of software from the payload we built when we construct the images that we will ship, which will be better for us over time.

The benefits of this are immediately apparent in this commit as we are able to drop configuration that tries to set feature flags for our ext4 filesystems based on what we think are the current defaults in RHEL.

For now we aren't able to do this with FCOS because FCOS doesn't have python in it. This should be OK for now because COSA is almost always based on the latest version of Fedora. Though one benefit we would have if we did switch to doing this for FCOS is that we would test newer versions of "build tools" from rawhide alongside the rawhide pipeline builds that we do.

dustymabe added 4 commits May 21, 2024 12:14
The copy/paste from 1d51ca4 didn't account for the fact that the
image filename is renamed to disk.raw for GCP. Fix it and also
add a comment with some context on why the filename differs here
from the others.
Drop a few patches here:

- 0004-fscache-add-eviction-log-statement.patch
- 0001-stages-qemu-sanity-check-created-image.patch

The fscache eviction log statement was just informational anyway
and the qemu sanity check should no longer be needed since we got
a fixed XFS/kernel in RHEL 9.4 and Fedora.

We also drop OSBuild version locking since things are a lot more
stable now and it's probably better to follow latest and find issues
immediately rather than forget to update this for long periods of
time.
We'll now make deploying the OSTree directly conditional on if
ostree_repo is passed in or not. The only use case today where
we are deploying via OSTree and not via container is our FCOS
non-rawhide streams, so this shouldn't affect any non FCOS users
out there today.

This is prep for some re-work to create a buildroot from the
container. We always have the container image in our pipelines
so we want to unconditionally pass it in to OSBuild now (i.e.
so we can create the buildroot from it).
@dustymabe
Copy link
Member Author

Implementing this we think will mean issues like openshift/os#1504 will go away.

@dustymabe
Copy link
Member Author

This will fix #3801

cat /usr/lib/coreos-assembler/0004-fscache-add-eviction-log-statement.patch \
/usr/lib/coreos-assembler/0001-stages-qemu-sanity-check-created-image.patch \
| patch -d /usr/lib/osbuild -p1
#cat your.patch \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep a comment here that says to freeze osbuild if we ever need to add patches here again. Or alternatively, delete all of patch_osbuild().

Reverting the commit to add patches will also revert the freeze that can then be updated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depending on the situation I don't think it's required that we pin OSBuild if we patch.

If it's an area of code that changes a lot then it might be worth it to pin, if not then maybe not.

I'm OK with deleting the whole function, but thought for now it might be worth keeping the scaffolding in place.

Let me know what you think is best.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaning towards nuking it for now with the understanding that it's only a git revert away if we ever need to carry patches again in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I missed this in the final upload

followup in #3809

@@ -79,6 +81,8 @@ pipelines:
mpp-format-int: '{image.layout[''boot''].partnum}'
target: /boot
- name: qemu
build:
mpp-format-string: '{qemu_stage_buildroot}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused why we need this. Since the buildroot is scoped to the pipeline and here we're defining a different pipeline, isn't it going to use the host buildroot if we don't specify anything?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. it will use the "host" buildroot if we don't specify anything, but I do like the explicitness that goes along with the comment where this is defined:

# For the qemu stages we'll use the host buildroot (COSA)
# because we definitely don't include qemu-img in CoreOS 
qemu_stage_buildroot: ""                                 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK to leave a comment but maybe let's make it clearer that that's the default anyway and we're just being explicit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I'll enhance the comment further.

@dustymabe
Copy link
Member Author

so coreos.unique.boot.ignition.failure is failing here in ci/prow/rhcos. I can also reproduce this locally.

It looks like the Ensure Unique 'boot' Filesystem Label in the console is happening before Ignition even runs, but the coreos.unique.boot.ignition.failure adds a boot labeled filesystem using Ignition.

Is it really required for this unit to run after Ignition is complete? I guess so since that's a test case we want to cover, but we'll probably have to strengthen the unit dependencies.

@dustymabe
Copy link
Member Author

ahh. interestingly enough we have two units that check if boot is unique, but they both have very similar descriptions so the log messages are hard to distinguish to the untrained eye.

so it looks like maybe the coreos-ignition-unique-boot.service somehow isn't doing it's job here.

@dustymabe
Copy link
Member Author

ok I think I found the real error earlier up in the log:

[^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mGenerate New UUID For Boot Disk GPT^[[0m.^M                                                                                                                        
[    4.126289] systemd[1]: Finished Generate New UUID For Boot Disk GPT.^M                                                                                                                                  
[    4.137089] ignition-ostree-transposefs[908]: Moving bootfs to RAM...^M                                                                                                                                  
         Starting ^[[0;1;39mIgnition OSTree: Save Partitions^[[0m...[    4.137822] systemd[1]: Starting Ignition OSTree: Save Partitions...^M                                                               
^M                                                                                                                                                                                                          
[    4.140833] ignition-ostree-transposefs[908]: Mounting /dev/disk/by-label/boot ro (/dev/vdb3) to /var/tmp/mnt^M                                                                                          
         Starting ^[[0;1;39mIgnition OSTree: …rate Filesystem UUID (boot)^[[0m...[    4.144472] systemd[1]: Starting Ignition OSTree: Regenerate Filesystem UUID (boot)...^M                                
^M                                                                                                                                                                                                          
[    4.155819] ignition-ostree-firstboot-uuid[918]: e2fsck 1.46.5 (30-Dec-2021)^M                                                                                                                           
[    4.155947] ignition-ostree-firstboot-uuid[918]: /dev/disk/by-label/boot is in use.^M                                                                                                                    
[    4.155973] ignition-ostree-firstboot-uuid[918]: e2fsck: Cannot continue, aborting.^M                                                                                                                    
[    4.158304] EXT4-fs (vdb3): mounted filesystem 96d15588-3596-4b3c-adca-a2ff7279ea63 ro with ordered data mode. Quota mode: none.^M                                                                       
[^[[0;1;31mFAILED^[[0m] Failed to start ^[[0;1;39mIgnition O…nerate Filesystem UUID (boot)^[[0m.^M

# really use FCOS as the buildroot so we'll set it to "" so that
# the host (COSA) gets used as the buildroot there.
buildroot:
mpp-if: osname == 'rhcos'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to the current CI failure, but shouldn't this be

Suggested change
mpp-if: osname == 'rhcos'
mpp-if: osname in ['rhcos', 'scos']

?

@jlebon
Copy link
Member

jlebon commented May 21, 2024

I think the CI issue is roughly:

  • metadata_csum_seed is still off by default in el9
  • so with this patch the feature in RHCOS will go from on to off
  • changing the filesystem UUID while it's mounted doesn't work without the metadata_csum_seed feature so we'd have to serialize things more to ensure there's no mounts happening at the same time

But also, this would make us enter a completely different path in ignition-ostree-firstboot-uuid.

I think it'd be safer to still enable the feature? If we just hard enable it and then we can simplify ignition-ostree-firstboot-uuid.

dustymabe added 3 commits May 21, 2024 17:00
With this we now use a buildroot that is derived from the OCI
container that was built by the pipeline. This allows us to use
the exact same versions of software from the payload we built
when we construct the images that we will ship, which will be
better for us over time.

The benefits of this are immediately apparent in this commit
as we are able to drop configuration that tries to set feature
flags for our ext4 filesystems based on what we think are the
current defaults in RHEL.

For now we aren't able to do this with FCOS because FCOS doesn't
have python in it. This should be OK for now because COSA is
almost always based on the latest version of Fedora. Though one
benefit we would have if we did switch to doing this for FCOS is
that we would test newer versions of "build tools" from `rawhide`
alongside the `rawhide` pipeline builds that we do.
Since we are now storing the OSBuild buildroot in the cache2.qcow2
let's bump the size again.
Here we just unconditionally set it because otherwise changing
the filesystem while it is mounted doesn't work and we get errors
like this during startup:

```
[    4.155947] ignition-ostree-firstboot-uuid[918]: /dev/disk/by-label/boot is in use.
```

Anyways, we are going to delete the fallback code there anyway
coreos/fedora-coreos-config#3008
so it's required to be set so let's set it here and then we can
drop this once RHEL has caught up.
@dustymabe
Copy link
Member Author

CI is fixed and all comments should be addressed now.

@dustymabe dustymabe enabled auto-merge (rebase) May 21, 2024 23:53
Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dustymabe dustymabe merged commit f9f9c5b into coreos:main May 22, 2024
5 checks passed
@dustymabe dustymabe deleted the dusty-osbuild branch May 22, 2024 01:02
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this pull request May 29, 2024
 - Reflect changes done in coreos/coreos-assembler#3808

Signed-off-by: Renata Ravanelli <[email protected]>
jlebon pushed a commit to coreos/fedora-coreos-pipeline that referenced this pull request Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants