OmniMCP Development Plan and Implementation Strategy

# OmniMCP Development Plan

This development plan outlines a focused, test-driven approach to OmniMCP development, prioritizing the OmniParser integration (Issue #1) with a path toward a compelling demonstration of the framework's capabilities.

## Phase 1: OmniParser Integration Foundation (Days 1-3)

> **Priority**: Highest - Addresses Issue #1
> **Dependencies**: AWS credentials

### 1.1 Test Framework Enhancement [Complexity: S]
```
- Leverage existing test_synthetic_ui.py for automated image generation
- Enhance to generate more complex UI scenarios
- Implement deterministic testing approach with mockable services
- Create pytest fixtures for common testing needs
```

### 1.2 OmniParser Client Reliability [Complexity: M]
```
- Refactor client.py to improve error handling
- Add consistent retry logic with exponential backoff
- Create mock implementation for testing without deployment
- Fix any bugs in the existing implementation
- Add full client-side logging for troubleshooting
```

### 1.3 Deployment Automation & Reliability [Complexity: M]
```
- Build reliable auto-discovery for existing instances (including stopped ones)
- Implement CloudWatch alarm-based idle shutdown mechanism
- Add automatic restart of stopped instances when needed
- Create smooth transition between instance states (running/stopped)
- Ensure proper cleanup of resources when no longer needed
```

### 1.4 End-to-End Tests [Complexity: S]
```
- Create minimal test that proves OmniParser works start-to-finish
- Implement test that verifies auto-discovery functionality
- Test instance state transitions (running → stopped → running)
- Document setup requirements in README
```

**Milestone 1**: A reliable OmniParser integration that anyone can use by setting AWS credentials in `.env` file, with automatic cost management through idle shutdown.

## Phase 2: MCP Core Framework (Days 4-5)

> **Priority**: High - Core functionality
> **Dependencies**: Completed Phase 1

### 2.1 Core Types Implementation [Complexity: S]
```
- Implement UIElement, ScreenState, and InteractionResult
- Add serialization/deserialization support
- Create test suite for type validation
- Ensure proper typing throughout the codebase
```

### 2.2 MCP Protocol Implementation [Complexity: S]
```
- Ensure clean API for model interaction
- Implement proper tool registration
- Create standardized response formats
- Handle error cases appropriately
```

### 2.3 Basic Tool Implementation [Complexity: M]
```
- Implement get_screen_state() tool
- Implement find_element() tool
- Implement click_element() tool
- Implement type_text() tool
- Add tests for each tool
```

**Milestone 2**: Working MCP implementation that handles basic UI interactions with proper error handling.

## Phase 3: Demonstration and Documentation (Days 6-7)

> **Priority**: High - Showcase functionality
> **Dependencies**: Completed Phase 2

### 3.1 Demo Implementation [Complexity: S]
```
- Create a demonstration script showcasing core functionality
- Add visualization of detected elements
- Implement element search and interaction capabilities
- Create visual output of interactions
```

### 3.2 Documentation Enhancement [Complexity: S]
```
- Create comprehensive README with setup instructions
- Document architecture and design decisions
- Add API reference documentation
- Include example usage patterns
```

### 3.3 Error Recovery & Handling [Complexity: M]
```
- Add basic error recovery strategies
- Implement verification for actions
- Add detailed error reporting
- Create troubleshooting guide
```

**Milestone 3**: Compelling demonstration of core functionality with comprehensive documentation.

## Implementation Plan for Issue #1: OmniParser Deployment

### Day 1: Test Infrastructure
- Create GitHub Actions workflow for CI
  - Run tests when PR is opened/updated
  - Run linting and type checking
  - Skip tests requiring GPU/AWS in CI
- Enhance test_synthetic_ui.py to generate more complex scenarios
- Implement mocking infrastructure for OmniParser

### Day 2: OmniParser Client Refinement
- Refactor client.py to add consistent error handling
- Add retry logic for intermittent failures
- Implement better logging for deployment operations
- Add configurable timeouts

### Day 3: Deployment Reliability
- Fix any bugs in current deployment automation
- Implement auto-discovery of existing deployments
- Add CloudWatch alarm for idle instance shutdown
- Create automatic restart functionality
- Test deployment flow end-to-end

## Server-Side Idle Shutdown Implementation

The idle shutdown mechanism will use AWS native capabilities to provide true zero-cost when not in use:

```python
def setup_idle_shutdown_mechanism(instance_id):
    """Set up automated shutdown after idle period using server-side mechanisms."""
    try:
        # Create CloudWatch alarm for CPU utilization
        cloudwatch = boto3.client('cloudwatch')
        
        # Create a CPU utilization alarm that triggers when CPU usage
        # is below threshold for specified period
        cloudwatch.put_metric_alarm(
            AlarmName=f"OmniMCP-IdleShutdown-{instance_id}",
            ComparisonOperator='LessThanThreshold',
            EvaluationPeriods=2,  # Number of periods to evaluate
            MetricName='CPUUtilization',
            Namespace='AWS/EC2',
            Period=900,  # 15 minutes (in seconds)
            Statistic='Average',
            Threshold=5.0,  # CPU utilization below 5%
            ActionsEnabled=True,
            AlarmDescription='Shutdown EC2 instance when idle',
            AlarmActions=[
                f'arn:aws:automate:{config.AWS_REGION}:ec2:stop'  # Automatic instance stop action
            ],
            Dimensions=[
                {
                    'Name': 'InstanceId',
                    'Value': instance_id
                },
            ]
        )
        
        # Add a tag indicating this instance has automated shutdown
        ec2 = boto3.resource('ec2')
        instance = ec2.Instance(instance_id)
        instance.create_tags(
            Tags=[
                {
                    'Key': 'AutoShutdown',
                    'Value': 'Enabled'
                }
            ]
        )
        
        logger.info(f"Automatic idle shutdown configured for instance {instance_id}")
        return True
    except Exception as e:
        logger.error(f"Failed to set up idle shutdown: {e}")
        return False
```

## Mock OmniParser Client for Testing

To enable testing without actual deployment:

```python
class MockOmniParserClient(OmniParserClient):
    """Mock client for testing without actual deployment."""
    
    def __init__(self, synthetic_results=None):
        """Initialize with optional synthetic results."""
        self.synthetic_results = synthetic_results or []
        self.auto_deploy = False
        self.server_url = "http://mock-server"
        
    def _ensure_server(self) -> None:
        """Always succeeds for mock."""
        pass
    
    def _check_server(self) -> None:
        """Always succeeds for mock."""
        pass
    
    def parse_image(self, image: Image.Image) -> Dict:
        """Return synthetic results or generate them."""
        if self.synthetic_results:
            return {"parsed_content_list": self.synthetic_results}
            
        # Generate synthetic results based on the image
        img_array = np.array(image)
        height, width = img_array.shape[:2]
        
        # Generate some basic elements
        results = [
            {
                "type": "button",
                "content": "Submit",
                "bounds": {"x": 0.1, "y": 0.1, "width": 0.1, "height": 0.05},
                "confidence": 0.95
            },
            {
                "type": "text_field",
                "content": "Username",
                "bounds": {"x": 0.3, "y": 0.1, "width": 0.2, "height": 0.05},
                "confidence": 0.9
            }
        ]
        
        return {"parsed_content_list": results}
```

## GitHub CI Workflow

```yaml
name: OmniMCP CI

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.10'
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install uv
        ./install.sh
    - name: Lint with ruff
      run: |
        pip install ruff
        ruff check .
    - name: Test with pytest (non-AWS)
      run: |
        pytest omnimcp/tests/test_synthetic_ui.py -v
```

## First Steps to Get Started

1. Create a new branch `feature/omniparser-integration`
2. Set up the CI workflow in `.github/workflows/ci.yml`
3. Enhance the test infrastructure in `tests/`
4. Begin refactoring the OmniParser client for reliability
5. Implement the server-side idle shutdown mechanism

This development plan focuses on building a solid foundation with OmniParser integration, then expanding to a full-featured MCP implementation. By following test-driven development practices, we'll create a reliable framework that can be easily used and extended by contributors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OmniMCP Development Plan and Implementation Strategy #2

OmniMCP Development Plan

Phase 1: OmniParser Integration Foundation (Days 1-3)

1.1 Test Framework Enhancement [Complexity: S]

1.2 OmniParser Client Reliability [Complexity: M]

1.3 Deployment Automation & Reliability [Complexity: M]

1.4 End-to-End Tests [Complexity: S]

Phase 2: MCP Core Framework (Days 4-5)

2.1 Core Types Implementation [Complexity: S]

2.2 MCP Protocol Implementation [Complexity: S]

2.3 Basic Tool Implementation [Complexity: M]

Phase 3: Demonstration and Documentation (Days 6-7)

3.1 Demo Implementation [Complexity: S]

3.2 Documentation Enhancement [Complexity: S]

3.3 Error Recovery & Handling [Complexity: M]

Implementation Plan for Issue #1: OmniParser Deployment

Day 1: Test Infrastructure

Day 2: OmniParser Client Refinement

Day 3: Deployment Reliability

Server-Side Idle Shutdown Implementation

Mock OmniParser Client for Testing

GitHub CI Workflow

First Steps to Get Started

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OmniMCP Development Plan and Implementation Strategy #2

Description

OmniMCP Development Plan

Phase 1: OmniParser Integration Foundation (Days 1-3)

1.1 Test Framework Enhancement [Complexity: S]

1.2 OmniParser Client Reliability [Complexity: M]

1.3 Deployment Automation & Reliability [Complexity: M]

1.4 End-to-End Tests [Complexity: S]

Phase 2: MCP Core Framework (Days 4-5)

2.1 Core Types Implementation [Complexity: S]

2.2 MCP Protocol Implementation [Complexity: S]

2.3 Basic Tool Implementation [Complexity: M]

Phase 3: Demonstration and Documentation (Days 6-7)

3.1 Demo Implementation [Complexity: S]

3.2 Documentation Enhancement [Complexity: S]

3.3 Error Recovery & Handling [Complexity: M]

Implementation Plan for Issue #1: OmniParser Deployment

Day 1: Test Infrastructure

Day 2: OmniParser Client Refinement

Day 3: Deployment Reliability

Server-Side Idle Shutdown Implementation

Mock OmniParser Client for Testing

GitHub CI Workflow

First Steps to Get Started

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions