Skip to content

WIP: feat: implement context manager-based resource cleanup #246

Closed
krcmarik wants to merge 2 commits intoRedHatQE:mainfrom
krcmarik:_vm_cleanup
Closed

WIP: feat: implement context manager-based resource cleanup #246
krcmarik wants to merge 2 commits intoRedHatQE:mainfrom
krcmarik:_vm_cleanup

Conversation

@krcmarik
Copy link
Contributor

@krcmarik krcmarik commented Jan 12, 2026

Implement automatic resource cleanup using Python context managers with
immediate cleanup when contexts exit, eliminating resource accumulation
during test sessions.

Key changes:

  • Convert create_and_store_resource() to https://github.com/contextmanager for automatic cleanup
  • Convert create_storagemap_and_networkmap() and helpers to context managers
  • Convert run_migration() to context manager that yields Plan
  • Update migrate_vms() to use context manager for Plan/Migration
  • Update all tests to use with blocks for explicit resource scope
  • Mark cleaned resources in tracking (cleaned_by_context) for session verification

Resource cleanup order (outer to inner):

  1. NetworkMap - deleted first (inner context)
  2. StorageMap - deleted second
  3. Migration - deleted third
  4. Plan - deleted last (outer context)

Implementation follows context manager pattern from plan:

  • Phase 1: Added --skip-teardown CLI option and fixture
  • Phase 2: Converted create_and_store_resource() to context manager
  • Phase 3: Converted migration helpers to context managers
  • Phase 4: Updated migrate_vms() and run_migration()
  • Phase 5: Updated all test files to use with blocks

Critical fixes applied:

  • run_migration() yields Plan instead of returning (keeps context alive)
  • Resources marked as cleaned but kept in tracking for session teardown
  • Proper nested context cleanup order maintained
  • Fixed all mypy type annotation errors

Tests now use explicit with blocks showing resource lifecycle:
with create_storagemap_and_networkmap(...) as (sm, nm):
with run_migration(...) as plan:
# Use resources
pass
# Plan and Migration cleaned up here

Storage and Network maps cleaned up here

Summary by CodeRabbit

  • New Features

    • Added a new SecretCreationError to surface secret creation failures.
    • Resource creation functions now support context-managed usage and optional skip-teardown control.
  • Bug Fixes

    • More reliable automatic cleanup with improved teardown ordering and logging.
    • Reduced risk of leftover resources by honoring session-level teardown flags.
  • Tests

    • Migration tests updated to use scoped context-managed resource maps for safer setup/teardown.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 12, 2026

Walkthrough

This PR refactors resource lifecycle management to use context managers throughout fixtures, utilities, and tests. Many resource-creation functions now return context-managed iterators (via @contextmanager) and accept a new skip_teardown option; tests and fixtures were updated to acquire and consume those resources inside with blocks to ensure deterministic cleanup.

Changes

Cohort / File(s) Summary
Context Manager Core
utilities/resources.py, CLAUDE.md
Converted create_and_store_resource to a @contextmanager yielding Iterator[T]; added T = TypeVar("T", bound=Resource) and `skip_teardown: bool
Migration Flow
utilities/mtv_migration.py
Converted run_migration, get_storage_migration_map, get_network_migration_map, and create_storagemap_and_networkmap to context managers (Iterator[...] returns); nested managers for storage/network maps; archive/teardown moved into context-exit paths; migrate flow now runs inside with-blocks.
Fixture & Test Integration
conftest.py, tests/*_migration.py (all three tests)
Reworked fixtures to use ExitStack and yield resources from with-blocks; added session-level skip_teardown flag and propagated it into teardown logic; tests updated to call migration-map creators with with ... as ... and run migrate_vms inside those blocks.
Provider / Secrets / VMs
utilities/utils.py, exceptions/exceptions.py
Secret creation moved into context-managed flows; introduced SecretCreationError; create_source_provider and CNV VM creation now use nested with-contexts and use skip_teardown=True for session-tracked VMs.
Tests: map usage changes
tests/test_copyoffload_migration.py, tests/test_mtv_cold_migration.py, tests/test_mtv_warm_migration.py
Replaced direct map-return calls with context-managed with get_*_migration_map(...) as ... and with create_storagemap_and_networkmap(...) as (...); relocated migrate_vms calls into those with-blocks so maps are scoped and cleaned up automatically.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The PR title is vague and uses a generic 'WIP: feat:' prefix that doesn't clearly summarize the main objective of immediate VM cleanup to prevent resource exhaustion. Revise to a specific, concrete title like 'feat: implement immediate VM cleanup in plan fixture' or 'feat: delete destination VMs after each test to prevent resource exhaustion'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@redhat-qe-bot1
Copy link

Report bugs in Issues

Welcome! 🎉

This pull request will be automatically processed with the following features:

🔄 Automatic Actions

  • Reviewer Assignment: Reviewers are automatically assigned based on the OWNERS file in the repository root
  • Size Labeling: PR size labels (XS, S, M, L, XL, XXL) are automatically applied based on changes
  • Issue Creation: Disabled for this repository
  • Pre-commit Checks: pre-commit runs automatically if .pre-commit-config.yaml exists
  • Branch Labeling: Branch-specific labels are applied to track the target branch
  • Auto-verification: Auto-verified users have their PRs automatically marked as verified

📋 Available Commands

PR Status Management

  • /wip - Mark PR as work in progress (adds WIP: prefix to title)
  • /wip cancel - Remove work in progress status
  • /hold - Block PR merging (approvers only)
  • /hold cancel - Unblock PR merging
  • /verified - Mark PR as verified
  • /verified cancel - Remove verification status
  • /reprocess - Trigger complete PR workflow reprocessing (useful if webhook failed or configuration changed)

Review & Approval

  • /lgtm - Approve changes (looks good to me)
  • /approve - Approve PR (approvers only)
  • /automerge - Enable automatic merging when all requirements are met (maintainers and approvers only)
  • /assign-reviewers - Assign reviewers based on OWNERS file
  • /assign-reviewer @username - Assign specific reviewer
  • /check-can-merge - Check if PR meets merge requirements

Testing & Validation

  • /retest tox - Run Python test suite with tox
  • /retest build-container - Rebuild and test container image
  • /retest pre-commit - Run pre-commit hooks and checks
  • /retest all - Run all available tests

Container Operations

  • /build-and-push-container - Build and push container image (tagged with PR number)
    • Supports additional build arguments: /build-and-push-container --build-arg KEY=value

Cherry-pick Operations

  • /cherry-pick <branch> - Schedule cherry-pick to target branch when PR is merged
    • Multiple branches: /cherry-pick branch1 branch2 branch3

Label Management

  • /<label-name> - Add a label to the PR
  • /<label-name> cancel - Remove a label from the PR

✅ Merge Requirements

This PR will be automatically approved when the following conditions are met:

  1. Approval: /approve from at least one approver
  2. LGTM Count: Minimum 0 /lgtm from reviewers
  3. Status Checks: All required status checks must pass
  4. No Blockers: No WIP, hold, or conflict labels
  5. Verified: PR must be marked as verified (if verification is enabled)

📊 Review Process

Approvers and Reviewers

Approvers:

  • myakove

Reviewers:

  • AmenB
  • Chenli-Hu
  • krcmarik
  • myakove
  • rnetser
Available Labels
  • hold
  • verified
  • wip
  • lgtm
  • approve
  • automerge

💡 Tips

  • WIP Status: Use /wip when your PR is not ready for review
  • Verification: The verified label is automatically removed on each new commit
  • Cherry-picking: Cherry-pick labels are processed when the PR is merged
  • Container Builds: Container images are automatically tagged with the PR number
  • Permission Levels: Some commands require approver permissions
  • Auto-verified Users: Certain users have automatic verification and merge privileges

For more information, please refer to the project documentation or contact the maintainers.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @conftest.py:
- Around line 739-740: Replace the LOGGER.error call in the VM cleanup except
block so that the full traceback is preserved: change the except block that
catches Exception (current code using LOGGER.error(f"Failed to delete VM
{vm['name']}: {exc}")) to call LOGGER.exception with %-style formatting (e.g.,
LOGGER.exception("Failed to delete VM %s", vm["name"])) and remove the unused
exc variable; this ensures the stack trace is logged and avoids f-string lint
warnings.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2063edf and 66a9ff7.

📒 Files selected for processing (2)
  • conftest.py
  • utilities/pytest_utils.py
💤 Files with no reviewable changes (1)
  • utilities/pytest_utils.py
🧰 Additional context used
🪛 Ruff (0.14.10)
conftest.py

737-737: Logging statement uses f-string

(G004)


739-739: Do not catch blind exception: Exception

(BLE001)


740-740: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


740-740: Logging statement uses f-string

(G004)

🔇 Additional comments (1)
conftest.py (1)

727-738: LGTM! Immediate VM cleanup aligns with PR objectives

The cleanup logic correctly:

  • Targets destination VMs in target_namespace (not source VMs)
  • Uses exists check to avoid NotFoundError on already-deleted VMs
  • Employs wait=True for synchronous deletion, preventing resource contention between sequential tests
  • Handles the case where vm["name"] was updated to the migrated name on line 711

This directly addresses the resource exhaustion issue on smaller clusters.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @conftest.py:
- Around line 739-740: Remove the unused exception variable by changing the
except clause from "except Exception as exc:" to "except Exception:" around the
block that calls LOGGER.exception("Failed to delete VM %s", vm["name"]);
LOGGER.exception already logs the current exception context so drop the "as exc"
binding and leave the LOGGER.exception call unchanged.
- Line 737: Replace the eager f-string in the LOGGER.info call with lazy logging
interpolation: change the LOGGER.info invocation that currently uses f"Deleting
VM {vm_obj.name} after test" to use a format string and argument(s) so the
message is constructed only if the log level is enabled (i.e.,
LOGGER.info("Deleting VM %s after test", vm_obj.name)); update the specific
LOGGER.info call that references vm_obj.name accordingly.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 66a9ff7 and 4a99f8a.

📒 Files selected for processing (1)
  • conftest.py
🧰 Additional context used
🪛 Ruff (0.14.10)
conftest.py

737-737: Logging statement uses f-string

(G004)


739-739: Local variable exc is assigned to but never used

Remove assignment to unused variable exc

(F841)

🔇 Additional comments (2)
conftest.py (2)

727-729: LGTM on the cleanup strategy.

Moving VM cleanup from session-level to function-level addresses the resource exhaustion issue on smaller clusters. The comment about Pods being garbage collected via ownerReferences is accurate — Kubernetes will handle pod cleanup automatically.


730-740: No issues found. The vm["name"] mutation at line 711 is intentional and correct. The source_provider.vm_dict() method returns the actual cloned/renamed VM name in source_vm_details["name"]. Updating vm["name"] with this value ensures cleanup uses the correct actual VM name in the target namespace, which is the proper design. The updated name is consistently used throughout (line 717 waits for the updated name, line 733 cleans up the updated name).

@krcmarik
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 12, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@krcmarik
Copy link
Contributor Author

@coderabbitai full review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 12, 2026

✅ Actions performed

Full review triggered.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @conftest.py:
- Around line 727-740: The LOGGER.info call uses an f-string while other logs
use lazy % formatting; change the info call inside the VM cleanup loop to use
lazy formatting by replacing the f-string with LOGGER.info("Deleting VM %s after
test", vm_obj.name) so logging is consistent with LOGGER.exception("Failed to
delete VM %s", vm["name"]) and avoids eager interpolation.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2063edf and c223a54.

📒 Files selected for processing (2)
  • conftest.py
  • utilities/pytest_utils.py
💤 Files with no reviewable changes (1)
  • utilities/pytest_utils.py
🧰 Additional context used
🪛 Ruff (0.14.10)
conftest.py

737-737: Logging statement uses f-string

(G004)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: can-be-merged
  • GitHub Check: can-be-merged
  • GitHub Check: can-be-merged
  • GitHub Check: tox
  • GitHub Check: pre-commit
  • GitHub Check: build-container
🔇 Additional comments (1)
conftest.py (1)

729-740: Immediate per-test VM cleanup is a solid improvement.

This change directly addresses the resource exhaustion issue described in the PR objectives. Deleting VMs immediately after each test rather than deferring to session end prevents OCP clusters from running out of CPU/memory when running sequential warm migrations or migrations with target power state "on".

The assumption that Kubernetes will garbage-collect pods via ownerReferences is correct - VMs own their virt-launcher pods, so pod cleanup is automatic.

coderabbitai[bot]
coderabbitai bot previously approved these changes Jan 12, 2026
"module": pod.__module__,
})
try:
vm_obj = VirtualMachine(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change defeats the whole process of having a functional environment for debugging if someone needs it.
If the issue is resources then I prefer to stop the VMs and not delete them here.
And do not remove the exsiting code, we store them also in order to get leftovers for the current run

Implement automatic resource cleanup using Python context managers with
immediate cleanup when contexts exit, eliminating resource accumulation
during test sessions.

Key changes:
- Convert create_and_store_resource() to @contextmanager for automatic cleanup
- Convert create_storagemap_and_networkmap() and helpers to context managers
- Convert run_migration() to context manager that yields Plan
- Update migrate_vms() to use context manager for Plan/Migration
- Update all tests to use with blocks for explicit resource scope
- Mark cleaned resources in tracking (cleaned_by_context) for session verification

Resource cleanup order (outer to inner):
1. NetworkMap - deleted first (inner context)
2. StorageMap - deleted second
3. Migration - deleted third
4. Plan - deleted last (outer context)

Implementation follows context manager pattern from plan:
- Phase 1: Added --skip-teardown CLI option and fixture
- Phase 2: Converted create_and_store_resource() to context manager
- Phase 3: Converted migration helpers to context managers
- Phase 4: Updated migrate_vms() and run_migration()
- Phase 5: Updated all test files to use with blocks

Critical fixes applied:
- run_migration() yields Plan instead of returning (keeps context alive)
- Resources marked as cleaned but kept in tracking for session teardown
- Proper nested context cleanup order maintained
- Fixed all mypy type annotation errors

Tests now use explicit with blocks showing resource lifecycle:
with create_storagemap_and_networkmap(...) as (sm, nm):
    with run_migration(...) as plan:
        # Use resources
        pass
    # Plan and Migration cleaned up here
# Storage and Network maps cleaned up here
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Fix all issues with AI agents
In `@conftest.py`:
- Around line 581-583: Replace the f-string log messages with lazy `%`-style
logging to match the codebase convention: change LOGGER.info(f"Created NAD:
{nad_name} in namespace {target_namespace}") to use LOGGER.info("Created NAD: %s
in namespace %s", nad_name, target_namespace) and change LOGGER.info(f"Created
{len(created_nads)} NADs for migration: {created_nads}") to LOGGER.info("Created
%d NADs for migration: %s", len(created_nads), created_nads); edit the logging
calls where LOGGER.info is used with variables nad_name, target_namespace,
created_nads to use the `%` formatting signature so interpolation is deferred.

In `@utilities/mtv_migration.py`:
- Line 202: Fix the typo in the error message logged by LOGGER.error in
utilities/mtv_migration.py: change the string "Destinaion provider:
{dest_provider.instance}" to "Destination provider: {dest_provider.instance}" so
the message reads correctly; locate the LOGGER.error call that references
dest_provider.instance and update the f-string accordingly.
- Around line 190-230: Replace bare LOGGER.error calls in the
TimeoutExpiredError handling with LOGGER.exception so the stack trace is
preserved (references: Plan.wait_for_condition, the TimeoutExpiredError except
block that logs plan and provider instances). In the finally block where
archive_plan(plan) is called, stop using LOGGER.error for the archive exception
and instead catch Exception as archive_exc and call LOGGER.exception("Failed to
archive Plan %s", plan.name) (referencing archive_plan and the archive_exc
variable) so the full traceback is kept; keep suppressing the archive exception
to avoid masking the original error but ensure the log contains the exception
details.

In `@utilities/resources.py`:
- Around line 99-106: The condition kwargs.get("teardown", True) is redundant
because the key is always set earlier (kwargs["teardown"] = not skip_teardown);
change the check to use the direct key (e.g., if kwargs["teardown"]: ) or read
into a local variable (teardown = kwargs["teardown"]; if teardown:) and keep the
existing removal/exception handling in the block that manipulates
fixture_store["teardown"][deployed.kind] and logs via LOGGER.debug.

In `@utilities/utils.py`:
- Around line 267-268: Replace the generic ValueError with a dedicated exception
type to allow programmatic handling: add a SecretCreationError (e.g., class
SecretCreationError(ValueError)) in your exceptions module and update the raise
site in utilities/utils.py to raise SecretCreationError("Failed to create source
provider secret") where source_provider_secret is checked; ensure any callers
that catch ValueError are updated if they should handle the new
SecretCreationError explicitly.
- Around line 327-351: The loop uses the create_and_store_resource context
manager with skip_teardown=True (resource class VirtualMachineFromInstanceType)
but immediately exits the with-block, which is semantically confusing; update
the code to make intent explicit by either adding a concise inline comment above
the with-call explaining that skip_teardown=True is intentionally used to only
register the VM in fixture_store while deferring cleanup to session teardown, or
replace the pattern with a small helper function (e.g.,
register_vm_without_teardown) that calls create_and_store_resource with
skip_teardown=True and returns the VM before appending to vms_to_create so the
purpose is clear; ensure references to create_and_store_resource, skip_teardown,
VirtualMachineFromInstanceType, and vms_to_create are updated accordingly.
♻️ Duplicate comments (1)
conftest.py (1)

751-756: MEDIUM: Use LOGGER.exception for full traceback in cleanup failures.

Per static analysis (TRY400) and past review feedback, LOGGER.exception automatically captures the traceback, which is critical for diagnosing intermittent cleanup failures in CI.

♻️ Proposed fix
             try:
                 LOGGER.info("Deleting VM %s in plan fixture teardown", vm_obj.name)
                 vm_obj.clean_up(wait=True)
                 LOGGER.info("Successfully deleted VM %s", vm_obj.name)
-            except Exception as exc:
-                LOGGER.error("Failed to delete VM %s: %s", vm_obj.name, exc)
+            except Exception:
+                LOGGER.exception("Failed to delete VM %s", vm_obj.name)

coderabbitai[bot]
coderabbitai bot previously approved these changes Jan 20, 2026
- Simplify redundant condition check in utilities/resources.py
- Add custom SecretCreationError exception for clearer error handling
- Replace generic ValueError with SecretCreationError in utilities/utils.py
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

@krcmarik krcmarik changed the title Implement immediate VM cleanup in plan fixture feat: implement context manager-based resource cleanup Jan 20, 2026
@krcmarik
Copy link
Contributor Author

/wip

@redhat-qe-bot2 redhat-qe-bot2 changed the title feat: implement context manager-based resource cleanup WIP: feat: implement context manager-based resource cleanup Jan 26, 2026
@krcmarik krcmarik closed this Mar 6, 2026
@krcmarik krcmarik deleted the _vm_cleanup branch March 11, 2026 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants