Skip to content

Commit 242a88e

Browse files
committed
mantle/kola: Add function to enhance upgrade stability
This commit introduces the `waitForUpgradeToBeStaged` function to improve the stability of kola upgrade test by reducing timeout-related failures. The new function sets up a systemd path unit to monitor updates in the `/ostree/repo/refs/heads/ostree/1/1` directory, triggering a stop on `wait.service` once changes are detected. By ensuring we wait later in the upgrade process, we minimize the waiting period in `runFnAndWaitForRebootIntoVersion`, focusing only on the actual reboot phase. Author : Dusty Mabe <[email protected]> Ref: coreos/fedora-coreos-tracker#1805
1 parent b379402 commit 242a88e

File tree

1 file changed

+24
-0
lines changed

1 file changed

+24
-0
lines changed

mantle/kola/tests/upgrade/basic.go

+24
Original file line numberDiff line numberDiff line change
@@ -313,10 +313,33 @@ func runFnAndWaitForRebootIntoVersion(c cluster.TestCluster, m platform.Machine,
313313
}
314314
}
315315

316+
func waitForUpgradeToBeStaged(c cluster.TestCluster, m platform.Machine) {
317+
// Here we set up a systemd path unit to watch for when ostree
318+
// behind the scenes updates the refs in the repo under the
319+
// /ostree/deploy directory.
320+
// Using /ostree/deploy as the canonical API for monitoring deployment changes.
321+
// This path is updated by ostree for deployment changes.
322+
// refchanged.path will trigger when it gets updated and will then stop wait.service.
323+
// The systemd-run --wait causes it to not return here (and thus
324+
// continue execution of code here) until wait.service has been
325+
// stopped by refchanged.service. This is an effort to make us
326+
// start waiting inside runFnAndWaitForRebootIntoVersion until
327+
// later in the upgrade process because we are seeing failures due
328+
// to timeouts and we're trying to reduce the variability by
329+
// minimizing the wait inside that function to just the actual reboot.
330+
// https://github.com/coreos/fedora-coreos-tracker/issues/1805
331+
//
332+
// Note: if systemd-run ever gains the ability to --wait when
333+
// generating a path unit then the below can be simplified.
334+
c.RunCmdSync(m, "sudo systemd-run -u refchanged --path-property=PathChanged=/ostree/deploy systemctl stop wait.service")
335+
c.RunCmdSync(m, "sudo systemd-run --wait -u wait sleep infinity")
336+
}
337+
316338
func waitForUpgradeToVersion(c cluster.TestCluster, m platform.Machine, version string) {
317339
runFnAndWaitForRebootIntoVersion(c, m, version, func() {
318340
// Start Zincati so it will apply the update
319341
c.RunCmdSync(m, "sudo systemctl start zincati.service")
342+
waitForUpgradeToBeStaged(c, m)
320343
})
321344
}
322345

@@ -328,6 +351,7 @@ func rpmostreeRebase(c cluster.TestCluster, m platform.Machine, ref, version str
328351
// we use systemd-run here so that we can test the --reboot path
329352
// without having SSH not exit cleanly, which would cause an error
330353
c.RunCmdSyncf(m, "sudo systemd-run rpm-ostree rebase --reboot %s", ref)
354+
waitForUpgradeToBeStaged(c, m)
331355
})
332356
}
333357

0 commit comments

Comments
 (0)