Skip to content

Conversation

jlebon
Copy link
Member

@jlebon jlebon commented Sep 3, 2024

Currently, we only try to detect if the QEMU process exited by actually wait()ing for it in the kola qemuexec path. We should do it in the kola testing path as well so that it's easy to tell if e.g. it was killed while the test was running.

Note this doesn't actually stop the test early if QEMU exited. That would require some tricky wiring into the harness. But at least what it prints helps diagnose the issue when we see the test time out on SSH. And the QEMU process won't just hang there as defunct.

@dustymabe
Copy link
Member

I observe this properly in a test pipeline run:

[2024-09-03T19:14:08.996Z] 2024-09-03T19:14:06Z platform/machine/qemu: QEMU process finished abnormally: signal: killed

@dustymabe
Copy link
Member

Note this doesn't actually stop the test early if QEMU exited. That would require some tricky wiring into the harness. But at least what it prints helps diagnose the issue when we see the test time out on SSH. And the QEMU process won't just hang there as defunct.

I wonder if we should add this as a comment in the code somewhere (i.e. explaining the more ideal future state).

@dustymabe
Copy link
Member

I'm guessing we don't need this code from the other day then?

$ git diff mantle/platform/qemu.go
diff --git a/mantle/platform/qemu.go b/mantle/platform/qemu.go
index 6fe76da9f..5106b42b5 100644
--- a/mantle/platform/qemu.go
+++ b/mantle/platform/qemu.go
@@ -208,7 +208,9 @@ func (inst *QemuInstance) SSHAddress() (string, error) {
 
 // Wait for the qemu process to exit
 func (inst *QemuInstance) Wait() error {
-       return inst.qemu.Wait()
+       r := inst.qemu.Wait()
+       plog.Debugf("Waited for qemu. r=%v", r)
+       return r
 }
 
 // WaitIgnitionError will only return if the instance

Currently, we only try to detect if the QEMU process exited by actually
`wait()`ing for it in the `kola qemuexec` path. We should do it in
the kola testing path as well so that it's easy to tell if e.g. it was
killed while the test was running.

Note this doesn't actually stop the test early if QEMU exited. That
would require some tricky wiring into the harness. But at least what it
prints helps diagnose the issue when we see the test time out on SSH.
And the QEMU process won't just hang there as defunct.
@jlebon
Copy link
Member Author

jlebon commented Sep 4, 2024

I'm guessing we don't need this code from the other day then?

Yeah, we don't need it anymore.

Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dustymabe dustymabe enabled auto-merge (rebase) September 4, 2024 16:15
@dustymabe dustymabe merged commit 30480db into coreos:main Sep 4, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants