Skip to content

Conversation

@heywji
Copy link
Contributor

@heywji heywji commented Dec 17, 2025

Refactor the test to use a percentage-based memory leak threshold instead of a fixed value, making it more reliable for VMs with different memory sizes. The wait times for hotplug and unplug events are also now configurable parameters.

ID: 1480
Signed-off-by: Wenkang Ji [email protected]

Summary by CodeRabbit

Release Notes

  • New Features
    • Added configurable sleep timing parameters for network interface hotplug and hotunplug operations.
    • Implemented per-operating system memory leak detection thresholds using percentage-based evaluation metrics.
    • Enhanced memory leak reporting to include both absolute and relative measurements for improved analysis.

✏️ Tip: You can customize this high-level summary in your review settings.

Refactor the test to use a percentage-based memory leak threshold
instead of a fixed value, making it more reliable for VMs with different
memory sizes. The wait times for hotplug and unplug events are also now
configurable parameters.

Signed-off-by: Wenkang Ji <[email protected]>
@coderabbitai
Copy link

coderabbitai bot commented Dec 17, 2025

Walkthrough

The changes configure memory leak detection thresholds and timing parameters for NIC hotplug testing. The configuration file introduces hotplug and hotunplug sleep durations set to 3 seconds each, along with per-OS memory leak thresholds via a variants block (1.0% for Windows, 0.2% for Linux). The test script replaces fixed 3-second sleep durations with configurable parameters and implements percentage-based memory leak evaluation, computing the ratio of memory reduced to free memory before hotplug and comparing against configurable thresholds instead of hard-coded logic.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Percentage-based calculation logic: Verify that the mem_reduced_percent formula (mem_reduced / free_mem_before_nichotplug) * 100 correctly represents the intended metric and handles edge cases (e.g., zero free memory).
  • Parameter parsing and defaults: Ensure hotplug_sleep and hotunplug_sleep parameters are correctly parsed from configuration and have appropriate fallback values if missing.
  • Per-OS threshold application: Confirm that the variants in the configuration file correctly apply OS-specific thresholds (1.0% vs. 0.2%) and that the Python code reads these values appropriately.
  • Error messaging accuracy: Review updated error messages and error_context outputs to ensure they accurately reflect both absolute memory values (in KB) and percentage metrics with threshold comparisons.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: making the NIC hotplug memory leak test configurable through percentage-based thresholds and configurable wait times.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@heywji
Copy link
Contributor Author

heywji commented Dec 17, 2025

@yanglei-rh Hi Lei, could you please help review this patch? Especially the "mem_leak_threshold_percent" for your Linux side, if it makes sense.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
qemu/tests/memory_leak_after_nichotplug.py (1)

66-66: Inconsistent with PR objectives: Linux path still uses hardcoded sleep.

The PR description states "Makes wait times for hotplug and unplug events configurable parameters," but the Linux code path (line 66) still uses a hardcoded time.sleep(3). This is inconsistent with the Windows implementation (lines 43-44, 54, 56) and the configuration parameters defined in the cfg file.

Apply this diff to use the configurable parameter:

+    hotplug_sleep = int(params.get("hotplug_sleep", 3))
+    hotunplug_sleep = int(params.get("hotunplug_sleep", 3))
     session.cmd_output_safe("swapoff -a")
     mac = vm.get_mac_address()
     guest_nic = utils_net.get_linux_ifname(session, mac)
     for i in range(1, 300):
         session.cmd_output_safe(
             "ip link add link %s name %s.%s type vlan id %s"
             % (guest_nic, guest_nic, i, i)
         )
-    time.sleep(3)
+    time.sleep(hotplug_sleep)
     for i in range(1, 300):
         session.cmd_output_safe("ip link delete %s.%s" % (guest_nic, i))
+        time.sleep(hotunplug_sleep)
🧹 Nitpick comments (1)
qemu/tests/cfg/memory_leak_after_nichotplug.cfg (1)

6-8: Clarify comment to cover both parameters.

The comment mentions "device initialization" but hotunplug_sleep also needs clarification. Consider specifying that both parameters are in seconds and apply to different operations.

Apply this diff to improve clarity:

-    # Wait time for device initialization (seconds)
+    # Wait time in seconds: hotplug_sleep for device initialization, hotunplug_sleep for device removal
     hotplug_sleep = 3
     hotunplug_sleep = 3
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 89b4300 and edaa6e5.

📒 Files selected for processing (2)
  • qemu/tests/cfg/memory_leak_after_nichotplug.cfg (1 hunks)
  • qemu/tests/memory_leak_after_nichotplug.py (3 hunks)
🔇 Additional comments (5)
qemu/tests/cfg/memory_leak_after_nichotplug.cfg (1)

9-14: LGTM!

The OS-specific thresholds are well-chosen, with Windows having a higher tolerance (1.0%) than Linux (0.2%), which aligns with typical OS memory management differences.

qemu/tests/memory_leak_after_nichotplug.py (4)

43-44: LGTM!

The configurable sleep parameters are correctly parsed with sensible defaults for the Windows hotplug flow.


54-56: LGTM!

The configurable sleep times are correctly applied in the Windows hotplug/unplug loop.


77-82: LGTM!

The percentage-based comparison logic and updated error message are clear and informative, providing both absolute and relative metrics for debugging.


83-86: LGTM!

The context message correctly displays both absolute and percentage values for the memory reduction.

Comment on lines +74 to +75
mem_reduced_percent = (mem_reduced / free_mem_before_nichotplug) * 100
threshold_percent = float(params.get("mem_leak_threshold_percent"))
Copy link

@coderabbitai coderabbitai bot Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Division by zero and missing default value.

Two critical issues:

  1. Division by zero: Line 74 will raise ZeroDivisionError if free_mem_before_nichotplug is 0.
  2. Missing default: Line 75 calls float() on params.get("mem_leak_threshold_percent") without a default value. If the parameter is missing, params.get() returns None and float(None) raises TypeError.

Apply this diff to add safeguards:

     mem_reduced = free_mem_before_nichotplug - free_mem_after_nichotplug
-    mem_reduced_percent = (mem_reduced / free_mem_before_nichotplug) * 100
-    threshold_percent = float(params.get("mem_leak_threshold_percent"))
+    
+    if free_mem_before_nichotplug == 0:
+        test.error("Free memory before hotplug is 0, cannot calculate percentage")
+    
+    mem_reduced_percent = (mem_reduced / free_mem_before_nichotplug) * 100
+    threshold_percent = float(params.get("mem_leak_threshold_percent", "1.0"))
🤖 Prompt for AI Agents
In qemu/tests/memory_leak_after_nichotplug.py around lines 74-75, protect
against division-by-zero and missing parameter: before computing
mem_reduced_percent, check if free_mem_before_nichotplug is zero and set
mem_reduced_percent to 0 (or skip/fail the check) to avoid ZeroDivisionError;
when reading the threshold, use params.get("mem_leak_threshold_percent", "0")
(or another sensible default string) and then cast with float(...) so float()
never receives None—optionally validate the parsed float and raise a clear error
if it's not numeric.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, will include it together when Lei also reviews the Linux value part.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heywji, sounds good! I'll wait for the updated changes after Lei's review. Feel free to tag me if you have any questions when implementing the fixes.

pci_model = params.get("pci_model")
netdst = params.get("netdst", "virbr0")
nettype = params.get("nettype", "bridge")
hotplug_sleep = int(params.get("hotplug_sleep", 3))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please switch to get_numeric.

queues=params.get("queues"),
)
time.sleep(3)
time.sleep(hotplug_sleep)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't would like to using hard code, you can combine them an one parameters . While hotplug_sleep and hotunplug_sleep are clear, they are somewhat redundant.

Copy link
Contributor Author

@heywji heywji Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with you. We can set the same value for the hotplug and hot-unplug operation, which doesn't matter and makes the code cleaner.

@yanglei-rh
Copy link
Contributor

Hi @heywji I don't object to you using percentages to calculate the results. But please double-check with the developers to ensure the count is correct, so we don't overlook any bug. And remember to paste the results of your discussion with the developers into the automation bug.

@heywji
Copy link
Contributor Author

heywji commented Dec 18, 2025

@yanglei-rh OK, but does the 0.2% make sense for your Linux side? Could you please help confirm that as well?

@yanglei-rh
Copy link
Contributor

@yanglei-rh OK, but does the 0.2% make sense for your Linux side? Could you please help confirm that as well?

In fact, it not related to windows or linux guest, you should confirm it with memory developer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants