-
Notifications
You must be signed in to change notification settings - Fork 184
memory.devices: Fix teardown timeout by removing kernel param before XML restore #6760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Reorder teardown operations to use current working VM config instead of backup config when removing kernel parameters. Add serial console fallback if SSH login fails.
WalkthroughThis change modifies the teardown method in a test file that handles virtio memory device size requests. The teardown logic now includes enhanced VM cleanup when kernel parameters are removed: the VM is destroyed if alive and restarted, followed by an attempted SSH login with a configurable timeout that falls back to serial console login on failure. Once a session is established, boot options are updated with the removed kernel parameter, the session closes, and final VM cleanup occurs. A Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
|
Before fix After fix |
1 similar comment
|
Before fix After fix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py:
- Around line 365-377: The teardown currently catches broad Exception for
vm.wait_for_login and vm.wait_for_serial_login which can hide unexpected errors;
update the try/except to catch specific login-related exceptions if available
(e.g., remote.LoginTimeoutError, aexpect.Expect* or whatever framework exposes)
for both vm.wait_for_login and vm.wait_for_serial_login, and if you must keep a
broad except, replace test.log.error/test.log.warning calls with
test.log.exception(...) to record stack traces (ensure the exception variable
like ssh_err/serial_err is passed or included). Locate the block around
vm.wait_for_login and vm.wait_for_serial_login and apply these changes
consistently so both failure paths log full traceback or narrow-catch the proper
exception types.
- Around line 357-387: The teardown currently can replace an original test
exception and skip bkxml.sync() because it raises from the nested login except
and doesn't guarantee session.close()/vm.destroy() run; modify the
kernel_params_remove block to capture any exception from the main test, then
perform login attempts (vm.wait_for_login / vm.wait_for_serial_login) without
re-raising so they don't mask the original error, and wrap update_boot_option,
session.close(), and vm.destroy() in a finally that always runs; ensure
bkxml.sync() is called after that finally (or call bkxml.sync() in the outer
finally) and reference functions/variables kernel_params_remove, vm.is_alive(),
vm.destroy(), vm.start(), vm.wait_for_login(), vm.wait_for_serial_login(),
utils_test.update_boot_option(), session.close(), and bkxml.sync() when applying
the changes.
🧹 Nitpick comments (2)
libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py (2)
409-409: Hardenlogin_timeoutparsing to avoidValueErrorfrom malformed params
int(params.get("login_timeout", 360))will throw iflogin_timeoutis"","None", etc., which can break teardown unexpectedly.Proposed tweak
- login_timeout = int(params.get("login_timeout", 360)) + try: + login_timeout = int(params.get("login_timeout", 360)) + except (TypeError, ValueError): + login_timeout = 360
360-362: Double-destroy in teardown may be redundant (and add time)You explicitly
vm.destroy()before starting, and again at the end. Given the framework commonly handles VM cleanup in teardown paths already (per retrieved learnings), consider whether the extra destroy(s) are necessary here, especially since this change is trying to reduce timeout risk. Based on learnings, cleanup is typically centralized in teardown helpers.Also applies to: 384-386
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: hholoubk
Repo: autotest/tp-libvirt PR: 6579
File: libvirt/tests/src/sriov/vIOMMU/iommu_device_lifecycle.py:95-97
Timestamp: 2025-09-24T08:01:27.899Z
Learning: In the libvirt test framework used in tp-libvirt, VM cleanup including destroying running VMs is handled by the teardown method (test_obj.teardown_iommu_test()) called in the finally block, so explicit VM destroy calls on timeout are not necessary according to the maintainers.
📚 Learning: 2025-09-24T08:01:27.899Z
Learnt from: hholoubk
Repo: autotest/tp-libvirt PR: 6579
File: libvirt/tests/src/sriov/vIOMMU/iommu_device_lifecycle.py:95-97
Timestamp: 2025-09-24T08:01:27.899Z
Learning: In the libvirt test framework used in tp-libvirt, VM cleanup including destroying running VMs is handled by the teardown method (test_obj.teardown_iommu_test()) called in the finally block, so explicit VM destroy calls on timeout are not necessary according to the maintainers.
Applied to files:
libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py
🧬 Code graph analysis (1)
libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py (1)
libvirt/tests/src/serial/serial_functional.py (1)
wait_for_login(235-258)
🪛 Ruff (0.14.10)
libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py
369-369: Do not catch blind exception: Exception
(BLE001)
375-376: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Python 3.11
- GitHub Check: Python 3.12
- GitHub Check: Python 3.8
- GitHub Check: Python 3.9
| if kernel_params_remove: | ||
| if not vm.is_alive(): | ||
| vm.start() | ||
| test.log.info("TEST_TEARDOWN: Removing kernel parameter '%s'", | ||
| kernel_params_remove) | ||
| if vm.is_alive(): | ||
| vm.destroy() | ||
| vm.start() | ||
|
|
||
| session = None | ||
| try: | ||
| test.log.debug("TEST_TEARDOWN: Trying SSH login (timeout=%ds)", | ||
| login_timeout) | ||
| session = vm.wait_for_login(timeout=login_timeout) | ||
| except Exception as ssh_err: | ||
| test.log.warning("TEST_TEARDOWN: SSH login failed: %s", ssh_err) | ||
| test.log.debug("TEST_TEARDOWN: Trying serial console login") | ||
| try: | ||
| session = vm.wait_for_serial_login(timeout=login_timeout) | ||
| except Exception as serial_err: | ||
| test.log.error("TEST_TEARDOWN: Serial login also failed: %s", | ||
| serial_err) | ||
| raise | ||
|
|
||
| utils_test.update_boot_option( | ||
| vm, args_removed=kernel_params_remove, guest_arch_name=vm_arch_name) | ||
| vm, args_removed=kernel_params_remove, | ||
| guest_arch_name=vm_arch_name) | ||
| session.close() | ||
|
|
||
| if vm.is_alive(): | ||
| vm.destroy() | ||
|
|
||
| bkxml.sync() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find wait_for_serial_login and update_boot_option definitions
echo "=== Searching for wait_for_serial_login ==="
rg -n "def wait_for_serial_login" --type py
echo "=== Searching for update_boot_option definition ==="
rg -n "def update_boot_option" --type py
echo "=== Searching for teardown_test and test structure ==="
rg -n "def teardown_test|test_obj.teardown" --type py | head -20Repository: autotest/tp-libvirt
Length of output: 2178
🏁 Script executed:
#!/bin/bash
# Read the file to understand imports and structure
echo "=== Reading the reviewed file ==="
wc -l libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py
head -80 libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py
echo -e "\n=== Searching for utils_test import and usage ==="
grep -n "import.*utils_test\|from.*utils_test\|utils_test\." libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py | head -10
echo -e "\n=== Searching for update_boot_option in utils modules ==="
find . -path ./libvirt/tests -name "utils_test.py" 2>/dev/null | head -5
rg -l "def update_boot_option" --type py | head -10Repository: autotest/tp-libvirt
Length of output: 2867
🏁 Script executed:
#!/bin/bash
echo "=== Reading teardown_test context (lines 350-420) ==="
sed -n '350,420p' libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py
echo -e "\n=== Searching for wait_for_serial_login ==="
rg "wait_for_serial_login" --type py -A 2 | head -30
echo -e "\n=== Looking for teardown_test() call site (finally block) ==="
sed -n '390,424p' libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.pyRepository: autotest/tp-libvirt
Length of output: 6266
🏁 Script executed:
#!/bin/bash
echo "=== Checking if there's any test infrastructure teardown beyond local teardown_test ==="
grep -n "class\|def run\|finally:" libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py | tail -20
echo -e "\n=== Full view of run_test function wrapper ==="
sed -n '240,424p' libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py | grep -E "^def run_test|^ try:|^ finally:|^ teardown_test"Repository: autotest/tp-libvirt
Length of output: 369
Teardown can suppress the original test failure and skip bkxml.sync() on login failure
If SSH and serial login both fail, the raise inside the nested try/except will propagate from the finally block, replacing any original test exception and preventing bkxml.sync() from running—leaving the VM XML unrestored. Additionally, session.close() and the final vm.destroy() are not protected by a finally block, so an exception in update_boot_option() can leak the session and leave the VM running.
Proposed fix (ensure restore and cleanup always happen, avoid masking original failures)
def teardown_test():
"""
Clean data.
"""
test.log.info("TEST_TEARDOWN: Clean up env.")
utils_memory.set_num_huge_pages(0)
- if kernel_params_remove:
- test.log.info("TEST_TEARDOWN: Removing kernel parameter '%s'",
- kernel_params_remove)
- if vm.is_alive():
- vm.destroy()
- vm.start()
-
- session = None
- try:
- test.log.debug("TEST_TEARDOWN: Trying SSH login (timeout=%ds)",
- login_timeout)
- session = vm.wait_for_login(timeout=login_timeout)
- except Exception as ssh_err:
- test.log.warning("TEST_TEARDOWN: SSH login failed: %s", ssh_err)
- test.log.debug("TEST_TEARDOWN: Trying serial console login")
- try:
- session = vm.wait_for_serial_login(timeout=login_timeout)
- except Exception as serial_err:
- test.log.error("TEST_TEARDOWN: Serial login also failed: %s",
- serial_err)
- raise
-
- utils_test.update_boot_option(
- vm, args_removed=kernel_params_remove,
- guest_arch_name=vm_arch_name)
- session.close()
-
- if vm.is_alive():
- vm.destroy()
-
- bkxml.sync()
+ session = None
+ try:
+ if kernel_params_remove:
+ test.log.info("TEST_TEARDOWN: Removing kernel parameter '%s'",
+ kernel_params_remove)
+ if vm.is_alive():
+ vm.destroy()
+ vm.start()
+
+ try:
+ test.log.debug("TEST_TEARDOWN: Trying SSH login (timeout=%ds)",
+ login_timeout)
+ session = vm.wait_for_login(timeout=login_timeout)
+ except Exception:
+ test.log.exception("TEST_TEARDOWN: SSH login failed")
+ test.log.debug("TEST_TEARDOWN: Trying serial console login")
+ try:
+ session = vm.wait_for_serial_login(timeout=login_timeout)
+ except Exception:
+ # Don't raise from teardown; restore XML and continue cleanup.
+ test.log.exception("TEST_TEARDOWN: Serial login also failed")
+ return
+
+ utils_test.update_boot_option(
+ vm, args_removed=kernel_params_remove,
+ guest_arch_name=vm_arch_name)
+ finally:
+ try:
+ if session:
+ session.close()
+ finally:
+ if vm.is_alive():
+ vm.destroy()
+ bkxml.sync()🧰 Tools
🪛 Ruff (0.14.10)
369-369: Do not catch blind exception: Exception
(BLE001)
375-376: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
🤖 Prompt for AI Agents
In @libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py
around lines 357 - 387, The teardown currently can replace an original test
exception and skip bkxml.sync() because it raises from the nested login except
and doesn't guarantee session.close()/vm.destroy() run; modify the
kernel_params_remove block to capture any exception from the main test, then
perform login attempts (vm.wait_for_login / vm.wait_for_serial_login) without
re-raising so they don't mask the original error, and wrap update_boot_option,
session.close(), and vm.destroy() in a finally that always runs; ensure
bkxml.sync() is called after that finally (or call bkxml.sync() in the outer
finally) and reference functions/variables kernel_params_remove, vm.is_alive(),
vm.destroy(), vm.start(), vm.wait_for_login(), vm.wait_for_serial_login(),
utils_test.update_boot_option(), session.close(), and bkxml.sync() when applying
the changes.
| try: | ||
| test.log.debug("TEST_TEARDOWN: Trying SSH login (timeout=%ds)", | ||
| login_timeout) | ||
| session = vm.wait_for_login(timeout=login_timeout) | ||
| except Exception as ssh_err: | ||
| test.log.warning("TEST_TEARDOWN: SSH login failed: %s", ssh_err) | ||
| test.log.debug("TEST_TEARDOWN: Trying serial console login") | ||
| try: | ||
| session = vm.wait_for_serial_login(timeout=login_timeout) | ||
| except Exception as serial_err: | ||
| test.log.error("TEST_TEARDOWN: Serial login also failed: %s", | ||
| serial_err) | ||
| raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid “blind” except Exception (or at least log stack traces consistently)
Ruff’s BLE001/TRY400 are pointing at a real maintainability issue here: catching Exception twice makes it easy to hide unexpected teardown bugs. If the framework has specific login exceptions (e.g., remote.LoginTimeoutError, aexpect.Expect*), prefer catching those; otherwise keep the broad catch but use test.log.exception(...) so you don’t lose stack traces. (The diff above already adjusts logging.)
🧰 Tools
🪛 Ruff (0.14.10)
369-369: Do not catch blind exception: Exception
(BLE001)
375-376: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
🤖 Prompt for AI Agents
In @libvirt/tests/src/memory/memory_devices/change_virtio_mem_request_size.py
around lines 365 - 377, The teardown currently catches broad Exception for
vm.wait_for_login and vm.wait_for_serial_login which can hide unexpected errors;
update the try/except to catch specific login-related exceptions if available
(e.g., remote.LoginTimeoutError, aexpect.Expect* or whatever framework exposes)
for both vm.wait_for_login and vm.wait_for_serial_login, and if you must keep a
broad except, replace test.log.error/test.log.warning calls with
test.log.exception(...) to record stack traces (ensure the exception variable
like ssh_err/serial_err is passed or included). Locate the block around
vm.wait_for_login and vm.wait_for_serial_login and apply these changes
consistently so both failure paths log full traceback or narrow-catch the proper
exception types.
Reorder teardown operations to use current working VM config instead of backup config when removing kernel parameters. Add serial console fallback if SSH login fails.
Couldn't reproduce the failure locally but adding these changes proactively to prevent potential timeout issues in CI.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.