Skip to content

Conversation

@deanq
Copy link
Member

@deanq deanq commented Nov 25, 2025

Summary

Refactors the flash undeploy command architecture in response to PR #121 review feedback. Moves endpoint deletion logic from CLI to proper abstraction layers following polymorphic design patterns.

Problem

The reviewer (jhcipar) noted that _delete_endpoint() logic was tightly coupled to the CLI layer, suggesting it should be self-contained in ResourceManager or another layer for better separation of concerns and reusability.

Solution

Implemented a comprehensive architectural refactoring following object-oriented design principles:

1. Abstract Contract - DeployableResource Base Class

Added abstract undeploy() method to establish polymorphic lifecycle contract:

@abstractmethod
async def undeploy(self) -> bool:
    """Undeploy/delete the resource."""

2. Concrete Implementations

  • ServerlessResource: Full implementation using RunpodGraphQLClient
  • NetworkVolume: NotImplementedError stub with helpful message for future implementation

3. Orchestration Layer - ResourceManager

Refactored API to clarify public vs internal interfaces:

Protected methods (internal state management):

  • add_resource()_add_resource()
  • remove_resource()_remove_resource()

New public method:

  • undeploy_resource(resource_id, resource_name) - Orchestrates polymorphic undeploy + tracking removal

Public API surface:

  • get_or_deploy_resource() - Deploy resources
  • undeploy_resource() - Remove resources
  • list_all_resources() - Query tracking
  • find_resources_by_name() - Search by name

4. Presentation Layer - CLI

CLI now acts as thin presentation layer:

  • Removed _delete_endpoint() helper
  • Uses manager.undeploy_resource() throughout
  • Focuses on user interaction, confirmation prompts, and output formatting

Benefits

  1. Separation of Concerns: CLI handles UI, ResourceManager orchestrates, resources manage their own lifecycle
  2. Polymorphism: Each resource type knows how to undeploy itself
  3. Reusability: undeploy_resource() can be used programmatically outside CLI
  4. Extensibility: Easy to add undeploy support for new resource types
  5. Testability: Clean boundaries make mocking and testing easier
  6. Consistency: Symmetric deploy()/undeploy() API

Changes by Commit

Commit 1: Add abstract undeploy() method

  • Establishes polymorphic contract in DeployableResource base class
  • Mirrors existing deploy() pattern

Commit 2: Implement undeploy() in resource classes

  • ServerlessResource: Full implementation with error handling
  • NetworkVolume: NotImplementedError stub for future work

Commit 3: Refactor ResourceManager API

  • Protected internal methods: _add_resource(), _remove_resource()
  • Public orchestration: undeploy_resource()
  • Clear separation of concerns

Commit 4: Update CLI to use new architecture

  • Remove _delete_endpoint() from undeploy.py
  • Use manager.undeploy_resource() in all undeploy modes
  • CLI becomes thin presentation layer

Commit 5: Update all tests

  • 253 tests passing with 56.78% coverage
  • Updated mocking to use new API
  • Removed obsolete TestDeleteEndpoint class

Testing

  • ✅ All 253 unit tests passing
  • ✅ Code coverage: 56.78% (exceeds 35% requirement)
  • ✅ Linting: Clean (ruff check)
  • ✅ Formatting: Clean (ruff format)
  • ✅ All undeploy modes tested: list, by-name, --all, --interactive, --cleanup-stale

Related

deanq added 12 commits November 25, 2025 00:04
Implements comprehensive error handling for missing API keys with
actionable guidance for users.

When RUNPOD_API_KEY is missing, users now receive helpful error
messages that include:
- Documentation URL for obtaining API keys
- Three setup methods (env var, .env file, shell profile)
- Context about which operation requires the key
- Troubleshooting guidance for .env file location

Implementation:
- Created RunpodAPIKeyError with helpful default message
- Added validate_api_key() helper functions
- Updated API clients to use custom exception
- Added resource deployment error context
- Enhanced flash init with API key documentation link
- Added 15 comprehensive tests (all passing)

Users previously saw generic "Runpod API key is required" errors.
Now they get clear, actionable steps to resolve the issue.
Simplify validation logic and extract duplicated error handling:

- Simplify API key validation condition in validation.py
  Changed `api_key.strip() == ""` to `not api_key.strip()` for clarity

- Extract duplicated error handling in resource_manager.py
  Created `_deploy_with_error_context` helper method to handle
  RunpodAPIKeyError with resource context, eliminating code duplication

All tests pass (231 unit tests).
Implements comprehensive undeploy command with multiple interaction modes:
- List all tracked endpoints with status indicators
- Delete specific endpoint by name
- Delete all endpoints with double confirmation
- Interactive checkbox selection for batch deletion

Changes:
- Add undeploy.py command with Rich/questionary UI
- Extend ResourceManager with list_all_resources() and find_resources_by_name()
- Integrate undeploy command into CLI
- Add comprehensive unit tests with mocking
- Fix duplicate import in resource_manager.py

Safety features include confirmation prompts, keyboard interrupt handling,
and detailed error reporting per endpoint.
Fixes two critical bugs preventing flash undeploy from working correctly:

1. GraphQL deleteEndpoint success detection
   - RunPod API returns {"deleteEndpoint": null} on success
   - Changed from checking null value to checking key presence
   - Added detailed comments explaining GraphQL response pattern

2. Unclosed aiohttp session
   - Wrapped RunpodGraphQLClient in async context manager
   - Ensures proper session cleanup after deletion
   - Eliminates "Unclosed client session" warnings

3. Updated test mocks to support async context manager usage

Tested with real endpoint deletion - successfully removes endpoint
from both RunPod and local .tetra_resources.pkl tracking file.
Adds ability to clean up stale endpoint entries from .tetra_resources.pkl
when endpoints have been deleted externally (via RunPod UI/API).

Changes:
- Added --cleanup-stale flag to flash undeploy command
- New _cleanup_stale_endpoints() function that:
  - Checks all tracked endpoints for inactive status
  - Lists inactive endpoints for user review
  - Prompts for confirmation before removal
  - Removes only from tracking (endpoints already deleted remotely)
- Added imports: Dict, DeployableResource, Confirm

Use case:
When users delete endpoints via RunPod UI/API instead of flash CLI,
the tracking file (.tetra_resources.pkl) becomes stale. This flag
identifies and removes those orphaned tracking entries.

Usage:
  flash undeploy --cleanup-stale

Note: The "Status" column in `flash undeploy list` makes a health check
API call for each endpoint to determine Active/Inactive state. While this
adds latency (6 endpoints = 6 API calls), it's valuable for identifying
stale entries that need cleanup.
- Remove debugging print statements from test_undeploy.py
- Use Tuple from typing module instead of lowercase tuple for Python 3.9 consistency
- Update type hints in undeploy.py and resource_manager.py
- Add Tuple to imports in both files

All 289 tests pass, quality checks pass.
Add abstract undeploy() method to establish polymorphic contract for
resource lifecycle management. Mirrors existing deploy() pattern for
symmetry.

Returns bool indicating success/failure of undeploy operation.
…rkVolume

ServerlessResource:
- Full implementation using RunpodGraphQLClient
- Handles missing endpoint ID gracefully
- Logs success/failure for debugging

NetworkVolume:
- NotImplementedError stub with helpful message
- Guides users to manual deletion via RunPod UI/API
- Ready for future implementation

Both implementations fulfill the DeployableResource contract.
…ndeploy_resource()

Refactor ResourceManager API to clarify public vs internal interfaces:

Protected methods (internal use):
- add_resource() → _add_resource()
- remove_resource() → _remove_resource()

New public method:
- undeploy_resource(resource_id, resource_name) -> Dict[str, Any]
  - Orchestrates polymorphic resource.undeploy()
  - Removes from tracking on success
  - Handles NotImplementedError gracefully
  - Returns detailed status dict for CLI consumption

Public API surface:
- get_or_deploy_resource() for deployment
- undeploy_resource() for removal
- list_all_resources() for querying
- find_resources_by_name() for searching

This establishes clear separation between internal state management
and public resource lifecycle operations.
…oy_resource()

Major architectural improvement moving deletion logic from CLI to
proper abstraction layers:

undeploy.py:
- Remove _delete_endpoint() helper function
- Remove RunpodGraphQLClient import
- Use manager.undeploy_resource() in all undeploy modes:
  - _undeploy_by_name()
  - _undeploy_all()
  - _interactive_undeploy()
- Use manager._remove_resource() in _cleanup_stale_endpoints()

resource.py:
- Update to use manager._remove_resource() for tracking cleanup
- Clarify comment: removal is tracking-only, no remote deletion

CLI now acts as thin presentation layer focused on user interaction,
while ResourceManager handles orchestration and resources handle their
own lifecycle. This addresses PR review feedback about separation of
concerns.
Update all tests to use new protected method names and public API:

test_resource_manager.py:
- Use _add_resource() instead of add_resource()
- Use _remove_resource() instead of remove_resource()

test_undeploy.py:
- Remove obsolete TestDeleteEndpoint class (_delete_endpoint no longer exists)
- Update mocking to use manager.undeploy_resource()
- Remove AsyncMock import (no longer needed)
- Mock asyncio.run to properly execute coroutines in tests

test_concurrency_issues.py:
- Add undeploy() implementation to MockDeployableResource
- Use _add_resource() instead of add_resource()

All 253 tests pass with 56.78% coverage.
@deanq deanq requested review from Copilot and jhcipar November 25, 2025 08:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the endpoint deletion logic to follow proper abstraction layers and polymorphic design patterns. The changes move the _delete_endpoint() function from the CLI layer into appropriate abstraction layers (resource classes and ResourceManager), improving separation of concerns and enabling reusability.

Key changes:

  • Added abstract undeploy() method to the DeployableResource base class
  • Implemented undeploy() in ServerlessResource and NetworkVolume classes
  • Refactored ResourceManager to make internal methods protected and added public undeploy_resource() method
  • Simplified CLI code to use the new ResourceManager API

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/tetra_rp/core/resources/base.py Added abstract undeploy() method to establish polymorphic contract
src/tetra_rp/core/resources/serverless.py Implemented undeploy() method for serverless endpoints
src/tetra_rp/core/resources/network_volume.py Added undeploy() stub that raises NotImplementedError
src/tetra_rp/core/resources/resource_manager.py Renamed public methods to protected (_add_resource, _remove_resource) and added public undeploy_resource() orchestration method
src/tetra_rp/cli/commands/undeploy.py Removed _delete_endpoint() helper and updated to use manager.undeploy_resource()
src/tetra_rp/cli/commands/resource.py Updated to use renamed _remove_resource() method
tests/unit/test_concurrency_issues.py Added undeploy() implementation to MockDeployableResource and updated method calls
tests/unit/resources/test_resource_manager.py Updated test calls to use renamed protected methods
tests/unit/cli/test_undeploy.py Updated mocking strategy and removed obsolete TestDeleteEndpoint class

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Remove clean_command, generate_cleanup_preview, and unused imports
(Progress, SpinnerColumn, TextColumn, questionary) as the command
was never connected to the CLI (commented out in main.py).

This removes 77 lines of dead code and 4 unused dependencies.
Address PR #124 feedback by extracting the duplicated run_coro helper
function into a reusable pytest fixture. This reduces code duplication
in test_undeploy_by_name_success and test_undeploy_all_flag methods.
@deanq deanq requested a review from Copilot November 25, 2025 12:50
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants