Skip to content

OmniMCP Development Plan and Implementation Strategy #2

@abrichr

Description

@abrichr

OmniMCP Development Plan

This development plan outlines a focused, test-driven approach to OmniMCP development, prioritizing the OmniParser integration (Issue #1) with a path toward a compelling demonstration of the framework's capabilities.

Phase 1: OmniParser Integration Foundation (Days 1-3)

Priority: Highest - Addresses Issue #1
Dependencies: AWS credentials

1.1 Test Framework Enhancement [Complexity: S]

- Leverage existing test_synthetic_ui.py for automated image generation
- Enhance to generate more complex UI scenarios
- Implement deterministic testing approach with mockable services
- Create pytest fixtures for common testing needs

1.2 OmniParser Client Reliability [Complexity: M]

- Refactor client.py to improve error handling
- Add consistent retry logic with exponential backoff
- Create mock implementation for testing without deployment
- Fix any bugs in the existing implementation
- Add full client-side logging for troubleshooting

1.3 Deployment Automation & Reliability [Complexity: M]

- Build reliable auto-discovery for existing instances (including stopped ones)
- Implement CloudWatch alarm-based idle shutdown mechanism
- Add automatic restart of stopped instances when needed
- Create smooth transition between instance states (running/stopped)
- Ensure proper cleanup of resources when no longer needed

1.4 End-to-End Tests [Complexity: S]

- Create minimal test that proves OmniParser works start-to-finish
- Implement test that verifies auto-discovery functionality
- Test instance state transitions (running → stopped → running)
- Document setup requirements in README

Milestone 1: A reliable OmniParser integration that anyone can use by setting AWS credentials in .env file, with automatic cost management through idle shutdown.

Phase 2: MCP Core Framework (Days 4-5)

Priority: High - Core functionality
Dependencies: Completed Phase 1

2.1 Core Types Implementation [Complexity: S]

- Implement UIElement, ScreenState, and InteractionResult
- Add serialization/deserialization support
- Create test suite for type validation
- Ensure proper typing throughout the codebase

2.2 MCP Protocol Implementation [Complexity: S]

- Ensure clean API for model interaction
- Implement proper tool registration
- Create standardized response formats
- Handle error cases appropriately

2.3 Basic Tool Implementation [Complexity: M]

- Implement get_screen_state() tool
- Implement find_element() tool
- Implement click_element() tool
- Implement type_text() tool
- Add tests for each tool

Milestone 2: Working MCP implementation that handles basic UI interactions with proper error handling.

Phase 3: Demonstration and Documentation (Days 6-7)

Priority: High - Showcase functionality
Dependencies: Completed Phase 2

3.1 Demo Implementation [Complexity: S]

- Create a demonstration script showcasing core functionality
- Add visualization of detected elements
- Implement element search and interaction capabilities
- Create visual output of interactions

3.2 Documentation Enhancement [Complexity: S]

- Create comprehensive README with setup instructions
- Document architecture and design decisions
- Add API reference documentation
- Include example usage patterns

3.3 Error Recovery & Handling [Complexity: M]

- Add basic error recovery strategies
- Implement verification for actions
- Add detailed error reporting
- Create troubleshooting guide

Milestone 3: Compelling demonstration of core functionality with comprehensive documentation.

Implementation Plan for Issue #1: OmniParser Deployment

Day 1: Test Infrastructure

  • Create GitHub Actions workflow for CI
    • Run tests when PR is opened/updated
    • Run linting and type checking
    • Skip tests requiring GPU/AWS in CI
  • Enhance test_synthetic_ui.py to generate more complex scenarios
  • Implement mocking infrastructure for OmniParser

Day 2: OmniParser Client Refinement

  • Refactor client.py to add consistent error handling
  • Add retry logic for intermittent failures
  • Implement better logging for deployment operations
  • Add configurable timeouts

Day 3: Deployment Reliability

  • Fix any bugs in current deployment automation
  • Implement auto-discovery of existing deployments
  • Add CloudWatch alarm for idle instance shutdown
  • Create automatic restart functionality
  • Test deployment flow end-to-end

Server-Side Idle Shutdown Implementation

The idle shutdown mechanism will use AWS native capabilities to provide true zero-cost when not in use:

def setup_idle_shutdown_mechanism(instance_id):
    """Set up automated shutdown after idle period using server-side mechanisms."""
    try:
        # Create CloudWatch alarm for CPU utilization
        cloudwatch = boto3.client('cloudwatch')
        
        # Create a CPU utilization alarm that triggers when CPU usage
        # is below threshold for specified period
        cloudwatch.put_metric_alarm(
            AlarmName=f"OmniMCP-IdleShutdown-{instance_id}",
            ComparisonOperator='LessThanThreshold',
            EvaluationPeriods=2,  # Number of periods to evaluate
            MetricName='CPUUtilization',
            Namespace='AWS/EC2',
            Period=900,  # 15 minutes (in seconds)
            Statistic='Average',
            Threshold=5.0,  # CPU utilization below 5%
            ActionsEnabled=True,
            AlarmDescription='Shutdown EC2 instance when idle',
            AlarmActions=[
                f'arn:aws:automate:{config.AWS_REGION}:ec2:stop'  # Automatic instance stop action
            ],
            Dimensions=[
                {
                    'Name': 'InstanceId',
                    'Value': instance_id
                },
            ]
        )
        
        # Add a tag indicating this instance has automated shutdown
        ec2 = boto3.resource('ec2')
        instance = ec2.Instance(instance_id)
        instance.create_tags(
            Tags=[
                {
                    'Key': 'AutoShutdown',
                    'Value': 'Enabled'
                }
            ]
        )
        
        logger.info(f"Automatic idle shutdown configured for instance {instance_id}")
        return True
    except Exception as e:
        logger.error(f"Failed to set up idle shutdown: {e}")
        return False

Mock OmniParser Client for Testing

To enable testing without actual deployment:

class MockOmniParserClient(OmniParserClient):
    """Mock client for testing without actual deployment."""
    
    def __init__(self, synthetic_results=None):
        """Initialize with optional synthetic results."""
        self.synthetic_results = synthetic_results or []
        self.auto_deploy = False
        self.server_url = "http://mock-server"
        
    def _ensure_server(self) -> None:
        """Always succeeds for mock."""
        pass
    
    def _check_server(self) -> None:
        """Always succeeds for mock."""
        pass
    
    def parse_image(self, image: Image.Image) -> Dict:
        """Return synthetic results or generate them."""
        if self.synthetic_results:
            return {"parsed_content_list": self.synthetic_results}
            
        # Generate synthetic results based on the image
        img_array = np.array(image)
        height, width = img_array.shape[:2]
        
        # Generate some basic elements
        results = [
            {
                "type": "button",
                "content": "Submit",
                "bounds": {"x": 0.1, "y": 0.1, "width": 0.1, "height": 0.05},
                "confidence": 0.95
            },
            {
                "type": "text_field",
                "content": "Username",
                "bounds": {"x": 0.3, "y": 0.1, "width": 0.2, "height": 0.05},
                "confidence": 0.9
            }
        ]
        
        return {"parsed_content_list": results}

GitHub CI Workflow

name: OmniMCP CI

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.10'
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install uv
        ./install.sh
    - name: Lint with ruff
      run: |
        pip install ruff
        ruff check .
    - name: Test with pytest (non-AWS)
      run: |
        pytest omnimcp/tests/test_synthetic_ui.py -v

First Steps to Get Started

  1. Create a new branch feature/omniparser-integration
  2. Set up the CI workflow in .github/workflows/ci.yml
  3. Enhance the test infrastructure in tests/
  4. Begin refactoring the OmniParser client for reliability
  5. Implement the server-side idle shutdown mechanism

This development plan focuses on building a solid foundation with OmniParser integration, then expanding to a full-featured MCP implementation. By following test-driven development practices, we'll create a reliable framework that can be easily used and extended by contributors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions