Skip to content

Conversation

@BulaYoungR
Copy link

@BulaYoungR BulaYoungR commented Jan 11, 2026

Reorder teardown operations to use current working VM config instead of backup config when removing kernel parameters. Add serial console fallback if SSH login fails.

Couldn't reproduce the failure locally but adding these changes proactively to prevent potential timeout issues in CI.

Summary by CodeRabbit

  • Tests
    • Enhanced teardown procedures with more robust VM lifecycle management including destruction and restart handling
    • Introduced configurable login timeout with automatic fallback to serial console when SSH authentication fails
    • Improved session management to ensure consistent boot parameter updates during test cleanup

✏️ Tip: You can customize this high-level summary in your review settings.

Reorder teardown operations to use current working VM config instead of backup config when removing kernel parameters. Add serial console fallback if SSH login fails.
@coderabbitai
Copy link

coderabbitai bot commented Jan 11, 2026

Walkthrough

This change modifies the teardown method in a test file that handles virtio memory device size requests. The teardown logic now includes enhanced VM cleanup when kernel parameters are removed: the VM is destroyed if alive and restarted, followed by an attempted SSH login with a configurable timeout that falls back to serial console login on failure. Once a session is established, boot options are updated with the removed kernel parameter, the session closes, and final VM cleanup occurs. A login_timeout parameter (default 360 seconds) was added to control login attempts, and the sync operation was repositioned to the end of teardown.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly summarizes the main change: reordering teardown to remove kernel parameters before XML restoration to fix timeout issues.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

@BulaYoungR BulaYoungR changed the title Fix teardown timeout by removing kernel param before XML restore memory.devices: Fix teardown timeout by removing kernel param before XML restore Jan 11, 2026
@BulaYoungR
Copy link
Author

Before fix

2025-10-08 04:26:30,686 avocado.app human            L0129 DEBUG|  (1/1) type_specific.io-github-autotest-libvirt.memory.devices.virtio_mem.change_memory.memfd.zero_requested.running_guest:
2025-10-08 04:26:30,686 avocado.app human            L0136 DEBUG| INTERRUPTED: Test interrupted: Timeout reached (1198.87 s)
2025-10-08 04:26:30,687 avocado.job testlogs         L0138 INFO | type_specific.io-github-autotest-libvirt.memory.devices.virtio_mem.change_memory.memfd.zero_requested.running_guest: INTERRUPTED
2025-10-08 04:26:30,687 avocado.job testlogs         L0139 INFO | More information in /var/log/avocado/job-results/job-2025-10-08T04.06-1c08d65/test-results/1-type_specific.io-github-autotest-libvirt.memory.devices.virtio_mem.change_memory.memfd.zero_requested.running_guest
2025-10-08 04:26:30,729 avocado.job job              L0659 INFO | Test results available in /var/log/avocado/job-results/job-2025-10-08T04.06-1c08d65
2025-10-08 04:26:30,729 avocado.app human            L0143 INFO |
2025-10-08 04:26:30,729 avocado.app human            L0146 INFO | RESULTS    : PASS 0 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 1 | CANCEL 0

After fix

JOB ID     : 476f4df2d90707cb90c4f6b61b395cc1da18bdbf
JOB LOG    : /var/log/avocado/job-results/job-2026-01-11T07.11-476f4df/job.log
 (1/1) type_specific.io-github-autotest-libvirt.memory.devices.virtio_mem.change_memory.memfd.zero_requested.running_guest: STARTED
 (1/1) type_specific.io-github-autotest-libvirt.memory.devices.virtio_mem.change_memory.memfd.zero_requested.running_guest: PASS (170.89 s)
RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB HTML   : /var/log/avocado/job-results/job-2026-01-11T07.11-476f4df/results.html
JOB TIME   : 173.19 s

1 similar comment
@BulaYoungR
Copy link
Author

Before fix

2025-10-08 04:26:30,686 avocado.app human            L0129 DEBUG|  (1/1) type_specific.io-github-autotest-libvirt.memory.devices.virtio_mem.change_memory.memfd.zero_requested.running_guest:
2025-10-08 04:26:30,686 avocado.app human            L0136 DEBUG| INTERRUPTED: Test interrupted: Timeout reached (1198.87 s)
2025-10-08 04:26:30,687 avocado.job testlogs         L0138 INFO | type_specific.io-github-autotest-libvirt.memory.devices.virtio_mem.change_memory.memfd.zero_requested.running_guest: INTERRUPTED
2025-10-08 04:26:30,687 avocado.job testlogs         L0139 INFO | More information in /var/log/avocado/job-results/job-2025-10-08T04.06-1c08d65/test-results/1-type_specific.io-github-autotest-libvirt.memory.devices.virtio_mem.change_memory.memfd.zero_requested.running_guest
2025-10-08 04:26:30,729 avocado.job job              L0659 INFO | Test results available in /var/log/avocado/job-results/job-2025-10-08T04.06-1c08d65
2025-10-08 04:26:30,729 avocado.app human            L0143 INFO |
2025-10-08 04:26:30,729 avocado.app human            L0146 INFO | RESULTS    : PASS 0 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 1 | CANCEL 0

After fix

JOB ID     : 476f4df2d90707cb90c4f6b61b395cc1da18bdbf
JOB LOG    : /var/log/avocado/job-results/job-2026-01-11T07.11-476f4df/job.log
 (1/1) type_specific.io-github-autotest-libvirt.memory.devices.virtio_mem.change_memory.memfd.zero_requested.running_guest: STARTED
 (1/1) type_specific.io-github-autotest-libvirt.memory.devices.virtio_mem.change_memory.memfd.zero_requested.running_guest: PASS (170.89 s)
RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB HTML   : /var/log/avocado/job-results/job-2026-01-11T07.11-476f4df/results.html
JOB TIME   : 173.19 s

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py:
- Around line 365-377: The teardown currently catches broad Exception for
vm.wait_for_login and vm.wait_for_serial_login which can hide unexpected errors;
update the try/except to catch specific login-related exceptions if available
(e.g., remote.LoginTimeoutError, aexpect.Expect* or whatever framework exposes)
for both vm.wait_for_login and vm.wait_for_serial_login, and if you must keep a
broad except, replace test.log.error/test.log.warning calls with
test.log.exception(...) to record stack traces (ensure the exception variable
like ssh_err/serial_err is passed or included). Locate the block around
vm.wait_for_login and vm.wait_for_serial_login and apply these changes
consistently so both failure paths log full traceback or narrow-catch the proper
exception types.
- Around line 357-387: The teardown currently can replace an original test
exception and skip bkxml.sync() because it raises from the nested login except
and doesn't guarantee session.close()/vm.destroy() run; modify the
kernel_params_remove block to capture any exception from the main test, then
perform login attempts (vm.wait_for_login / vm.wait_for_serial_login) without
re-raising so they don't mask the original error, and wrap update_boot_option,
session.close(), and vm.destroy() in a finally that always runs; ensure
bkxml.sync() is called after that finally (or call bkxml.sync() in the outer
finally) and reference functions/variables kernel_params_remove, vm.is_alive(),
vm.destroy(), vm.start(), vm.wait_for_login(), vm.wait_for_serial_login(),
utils_test.update_boot_option(), session.close(), and bkxml.sync() when applying
the changes.
🧹 Nitpick comments (2)
libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py (2)

409-409: Harden login_timeout parsing to avoid ValueError from malformed params

int(params.get("login_timeout", 360)) will throw if login_timeout is "", "None", etc., which can break teardown unexpectedly.

Proposed tweak
-    login_timeout = int(params.get("login_timeout", 360))
+    try:
+        login_timeout = int(params.get("login_timeout", 360))
+    except (TypeError, ValueError):
+        login_timeout = 360

360-362: Double-destroy in teardown may be redundant (and add time)

You explicitly vm.destroy() before starting, and again at the end. Given the framework commonly handles VM cleanup in teardown paths already (per retrieved learnings), consider whether the extra destroy(s) are necessary here, especially since this change is trying to reduce timeout risk. Based on learnings, cleanup is typically centralized in teardown helpers.

Also applies to: 384-386

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f35e2b and 7511ac5.

📒 Files selected for processing (1)
  • libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: hholoubk
Repo: autotest/tp-libvirt PR: 6579
File: libvirt/tests/src/sriov/vIOMMU/iommu_device_lifecycle.py:95-97
Timestamp: 2025-09-24T08:01:27.899Z
Learning: In the libvirt test framework used in tp-libvirt, VM cleanup including destroying running VMs is handled by the teardown method (test_obj.teardown_iommu_test()) called in the finally block, so explicit VM destroy calls on timeout are not necessary according to the maintainers.
📚 Learning: 2025-09-24T08:01:27.899Z
Learnt from: hholoubk
Repo: autotest/tp-libvirt PR: 6579
File: libvirt/tests/src/sriov/vIOMMU/iommu_device_lifecycle.py:95-97
Timestamp: 2025-09-24T08:01:27.899Z
Learning: In the libvirt test framework used in tp-libvirt, VM cleanup including destroying running VMs is handled by the teardown method (test_obj.teardown_iommu_test()) called in the finally block, so explicit VM destroy calls on timeout are not necessary according to the maintainers.

Applied to files:

  • libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py
🧬 Code graph analysis (1)
libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py (1)
libvirt/tests/src/serial/serial_functional.py (1)
  • wait_for_login (235-258)
🪛 Ruff (0.14.10)
libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py

369-369: Do not catch blind exception: Exception

(BLE001)


375-376: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Python 3.11
  • GitHub Check: Python 3.12
  • GitHub Check: Python 3.8
  • GitHub Check: Python 3.9

Comment on lines 357 to +387
if kernel_params_remove:
if not vm.is_alive():
vm.start()
test.log.info("TEST_TEARDOWN: Removing kernel parameter '%s'",
kernel_params_remove)
if vm.is_alive():
vm.destroy()
vm.start()

session = None
try:
test.log.debug("TEST_TEARDOWN: Trying SSH login (timeout=%ds)",
login_timeout)
session = vm.wait_for_login(timeout=login_timeout)
except Exception as ssh_err:
test.log.warning("TEST_TEARDOWN: SSH login failed: %s", ssh_err)
test.log.debug("TEST_TEARDOWN: Trying serial console login")
try:
session = vm.wait_for_serial_login(timeout=login_timeout)
except Exception as serial_err:
test.log.error("TEST_TEARDOWN: Serial login also failed: %s",
serial_err)
raise

utils_test.update_boot_option(
vm, args_removed=kernel_params_remove, guest_arch_name=vm_arch_name)
vm, args_removed=kernel_params_remove,
guest_arch_name=vm_arch_name)
session.close()

if vm.is_alive():
vm.destroy()

bkxml.sync()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find wait_for_serial_login and update_boot_option definitions
echo "=== Searching for wait_for_serial_login ==="
rg -n "def wait_for_serial_login" --type py

echo "=== Searching for update_boot_option definition ==="
rg -n "def update_boot_option" --type py

echo "=== Searching for teardown_test and test structure ==="
rg -n "def teardown_test|test_obj.teardown" --type py | head -20

Repository: autotest/tp-libvirt

Length of output: 2178


🏁 Script executed:

#!/bin/bash
# Read the file to understand imports and structure
echo "=== Reading the reviewed file ==="
wc -l libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py
head -80 libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py

echo -e "\n=== Searching for utils_test import and usage ==="
grep -n "import.*utils_test\|from.*utils_test\|utils_test\." libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py | head -10

echo -e "\n=== Searching for update_boot_option in utils modules ==="
find . -path ./libvirt/tests -name "utils_test.py" 2>/dev/null | head -5
rg -l "def update_boot_option" --type py | head -10

Repository: autotest/tp-libvirt

Length of output: 2867


🏁 Script executed:

#!/bin/bash
echo "=== Reading teardown_test context (lines 350-420) ==="
sed -n '350,420p' libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py

echo -e "\n=== Searching for wait_for_serial_login ==="
rg "wait_for_serial_login" --type py -A 2 | head -30

echo -e "\n=== Looking for teardown_test() call site (finally block) ==="
sed -n '390,424p' libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py

Repository: autotest/tp-libvirt

Length of output: 6266


🏁 Script executed:

#!/bin/bash
echo "=== Checking if there's any test infrastructure teardown beyond local teardown_test ==="
grep -n "class\|def run\|finally:" libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py | tail -20

echo -e "\n=== Full view of run_test function wrapper ==="
sed -n '240,424p' libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py | grep -E "^def run_test|^    try:|^    finally:|^        teardown_test"

Repository: autotest/tp-libvirt

Length of output: 369


Teardown can suppress the original test failure and skip bkxml.sync() on login failure

If SSH and serial login both fail, the raise inside the nested try/except will propagate from the finally block, replacing any original test exception and preventing bkxml.sync() from running—leaving the VM XML unrestored. Additionally, session.close() and the final vm.destroy() are not protected by a finally block, so an exception in update_boot_option() can leak the session and leave the VM running.

Proposed fix (ensure restore and cleanup always happen, avoid masking original failures)
 def teardown_test():
     """
     Clean data.
     """
     test.log.info("TEST_TEARDOWN: Clean up env.")
     utils_memory.set_num_huge_pages(0)
-    if kernel_params_remove:
-        test.log.info("TEST_TEARDOWN: Removing kernel parameter '%s'",
-                      kernel_params_remove)
-        if vm.is_alive():
-            vm.destroy()
-        vm.start()
-
-        session = None
-        try:
-            test.log.debug("TEST_TEARDOWN: Trying SSH login (timeout=%ds)",
-                           login_timeout)
-            session = vm.wait_for_login(timeout=login_timeout)
-        except Exception as ssh_err:
-            test.log.warning("TEST_TEARDOWN: SSH login failed: %s", ssh_err)
-            test.log.debug("TEST_TEARDOWN: Trying serial console login")
-            try:
-                session = vm.wait_for_serial_login(timeout=login_timeout)
-            except Exception as serial_err:
-                test.log.error("TEST_TEARDOWN: Serial login also failed: %s",
-                               serial_err)
-                raise
-
-        utils_test.update_boot_option(
-            vm, args_removed=kernel_params_remove,
-            guest_arch_name=vm_arch_name)
-        session.close()
-
-        if vm.is_alive():
-            vm.destroy()
-
-    bkxml.sync()
+    session = None
+    try:
+        if kernel_params_remove:
+            test.log.info("TEST_TEARDOWN: Removing kernel parameter '%s'",
+                          kernel_params_remove)
+            if vm.is_alive():
+                vm.destroy()
+            vm.start()
+
+            try:
+                test.log.debug("TEST_TEARDOWN: Trying SSH login (timeout=%ds)",
+                               login_timeout)
+                session = vm.wait_for_login(timeout=login_timeout)
+            except Exception:
+                test.log.exception("TEST_TEARDOWN: SSH login failed")
+                test.log.debug("TEST_TEARDOWN: Trying serial console login")
+                try:
+                    session = vm.wait_for_serial_login(timeout=login_timeout)
+                except Exception:
+                    # Don't raise from teardown; restore XML and continue cleanup.
+                    test.log.exception("TEST_TEARDOWN: Serial login also failed")
+                    return
+
+            utils_test.update_boot_option(
+                vm, args_removed=kernel_params_remove,
+                guest_arch_name=vm_arch_name)
+    finally:
+        try:
+            if session:
+                session.close()
+        finally:
+            if vm.is_alive():
+                vm.destroy()
+            bkxml.sync()
🧰 Tools
🪛 Ruff (0.14.10)

369-369: Do not catch blind exception: Exception

(BLE001)


375-376: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🤖 Prompt for AI Agents
In @libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py
around lines 357 - 387, The teardown currently can replace an original test
exception and skip bkxml.sync() because it raises from the nested login except
and doesn't guarantee session.close()/vm.destroy() run; modify the
kernel_params_remove block to capture any exception from the main test, then
perform login attempts (vm.wait_for_login / vm.wait_for_serial_login) without
re-raising so they don't mask the original error, and wrap update_boot_option,
session.close(), and vm.destroy() in a finally that always runs; ensure
bkxml.sync() is called after that finally (or call bkxml.sync() in the outer
finally) and reference functions/variables kernel_params_remove, vm.is_alive(),
vm.destroy(), vm.start(), vm.wait_for_login(), vm.wait_for_serial_login(),
utils_test.update_boot_option(), session.close(), and bkxml.sync() when applying
the changes.

Comment on lines +365 to +377
try:
test.log.debug("TEST_TEARDOWN: Trying SSH login (timeout=%ds)",
login_timeout)
session = vm.wait_for_login(timeout=login_timeout)
except Exception as ssh_err:
test.log.warning("TEST_TEARDOWN: SSH login failed: %s", ssh_err)
test.log.debug("TEST_TEARDOWN: Trying serial console login")
try:
session = vm.wait_for_serial_login(timeout=login_timeout)
except Exception as serial_err:
test.log.error("TEST_TEARDOWN: Serial login also failed: %s",
serial_err)
raise
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid “blind” except Exception (or at least log stack traces consistently)

Ruff’s BLE001/TRY400 are pointing at a real maintainability issue here: catching Exception twice makes it easy to hide unexpected teardown bugs. If the framework has specific login exceptions (e.g., remote.LoginTimeoutError, aexpect.Expect*), prefer catching those; otherwise keep the broad catch but use test.log.exception(...) so you don’t lose stack traces. (The diff above already adjusts logging.)

🧰 Tools
🪛 Ruff (0.14.10)

369-369: Do not catch blind exception: Exception

(BLE001)


375-376: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🤖 Prompt for AI Agents
In @libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py
around lines 365 - 377, The teardown currently catches broad Exception for
vm.wait_for_login and vm.wait_for_serial_login which can hide unexpected errors;
update the try/except to catch specific login-related exceptions if available
(e.g., remote.LoginTimeoutError, aexpect.Expect* or whatever framework exposes)
for both vm.wait_for_login and vm.wait_for_serial_login, and if you must keep a
broad except, replace test.log.error/test.log.warning calls with
test.log.exception(...) to record stack traces (ensure the exception variable
like ssh_err/serial_err is passed or included). Locate the block around
vm.wait_for_login and vm.wait_for_serial_login and apply these changes
consistently so both failure paths log full traceback or narrow-catch the proper
exception types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant