-
Notifications
You must be signed in to change notification settings - Fork 13
Description
OmniMCP Development Plan
This development plan outlines a focused, test-driven approach to OmniMCP development, prioritizing the OmniParser integration (Issue #1) with a path toward a compelling demonstration of the framework's capabilities.
Phase 1: OmniParser Integration Foundation (Days 1-3)
Priority: Highest - Addresses Issue #1
Dependencies: AWS credentials
1.1 Test Framework Enhancement [Complexity: S]
- Leverage existing test_synthetic_ui.py for automated image generation
- Enhance to generate more complex UI scenarios
- Implement deterministic testing approach with mockable services
- Create pytest fixtures for common testing needs
1.2 OmniParser Client Reliability [Complexity: M]
- Refactor client.py to improve error handling
- Add consistent retry logic with exponential backoff
- Create mock implementation for testing without deployment
- Fix any bugs in the existing implementation
- Add full client-side logging for troubleshooting
1.3 Deployment Automation & Reliability [Complexity: M]
- Build reliable auto-discovery for existing instances (including stopped ones)
- Implement CloudWatch alarm-based idle shutdown mechanism
- Add automatic restart of stopped instances when needed
- Create smooth transition between instance states (running/stopped)
- Ensure proper cleanup of resources when no longer needed
1.4 End-to-End Tests [Complexity: S]
- Create minimal test that proves OmniParser works start-to-finish
- Implement test that verifies auto-discovery functionality
- Test instance state transitions (running → stopped → running)
- Document setup requirements in README
Milestone 1: A reliable OmniParser integration that anyone can use by setting AWS credentials in .env file, with automatic cost management through idle shutdown.
Phase 2: MCP Core Framework (Days 4-5)
Priority: High - Core functionality
Dependencies: Completed Phase 1
2.1 Core Types Implementation [Complexity: S]
- Implement UIElement, ScreenState, and InteractionResult
- Add serialization/deserialization support
- Create test suite for type validation
- Ensure proper typing throughout the codebase
2.2 MCP Protocol Implementation [Complexity: S]
- Ensure clean API for model interaction
- Implement proper tool registration
- Create standardized response formats
- Handle error cases appropriately
2.3 Basic Tool Implementation [Complexity: M]
- Implement get_screen_state() tool
- Implement find_element() tool
- Implement click_element() tool
- Implement type_text() tool
- Add tests for each tool
Milestone 2: Working MCP implementation that handles basic UI interactions with proper error handling.
Phase 3: Demonstration and Documentation (Days 6-7)
Priority: High - Showcase functionality
Dependencies: Completed Phase 2
3.1 Demo Implementation [Complexity: S]
- Create a demonstration script showcasing core functionality
- Add visualization of detected elements
- Implement element search and interaction capabilities
- Create visual output of interactions
3.2 Documentation Enhancement [Complexity: S]
- Create comprehensive README with setup instructions
- Document architecture and design decisions
- Add API reference documentation
- Include example usage patterns
3.3 Error Recovery & Handling [Complexity: M]
- Add basic error recovery strategies
- Implement verification for actions
- Add detailed error reporting
- Create troubleshooting guide
Milestone 3: Compelling demonstration of core functionality with comprehensive documentation.
Implementation Plan for Issue #1: OmniParser Deployment
Day 1: Test Infrastructure
- Create GitHub Actions workflow for CI
- Run tests when PR is opened/updated
- Run linting and type checking
- Skip tests requiring GPU/AWS in CI
- Enhance test_synthetic_ui.py to generate more complex scenarios
- Implement mocking infrastructure for OmniParser
Day 2: OmniParser Client Refinement
- Refactor client.py to add consistent error handling
- Add retry logic for intermittent failures
- Implement better logging for deployment operations
- Add configurable timeouts
Day 3: Deployment Reliability
- Fix any bugs in current deployment automation
- Implement auto-discovery of existing deployments
- Add CloudWatch alarm for idle instance shutdown
- Create automatic restart functionality
- Test deployment flow end-to-end
Server-Side Idle Shutdown Implementation
The idle shutdown mechanism will use AWS native capabilities to provide true zero-cost when not in use:
def setup_idle_shutdown_mechanism(instance_id):
"""Set up automated shutdown after idle period using server-side mechanisms."""
try:
# Create CloudWatch alarm for CPU utilization
cloudwatch = boto3.client('cloudwatch')
# Create a CPU utilization alarm that triggers when CPU usage
# is below threshold for specified period
cloudwatch.put_metric_alarm(
AlarmName=f"OmniMCP-IdleShutdown-{instance_id}",
ComparisonOperator='LessThanThreshold',
EvaluationPeriods=2, # Number of periods to evaluate
MetricName='CPUUtilization',
Namespace='AWS/EC2',
Period=900, # 15 minutes (in seconds)
Statistic='Average',
Threshold=5.0, # CPU utilization below 5%
ActionsEnabled=True,
AlarmDescription='Shutdown EC2 instance when idle',
AlarmActions=[
f'arn:aws:automate:{config.AWS_REGION}:ec2:stop' # Automatic instance stop action
],
Dimensions=[
{
'Name': 'InstanceId',
'Value': instance_id
},
]
)
# Add a tag indicating this instance has automated shutdown
ec2 = boto3.resource('ec2')
instance = ec2.Instance(instance_id)
instance.create_tags(
Tags=[
{
'Key': 'AutoShutdown',
'Value': 'Enabled'
}
]
)
logger.info(f"Automatic idle shutdown configured for instance {instance_id}")
return True
except Exception as e:
logger.error(f"Failed to set up idle shutdown: {e}")
return FalseMock OmniParser Client for Testing
To enable testing without actual deployment:
class MockOmniParserClient(OmniParserClient):
"""Mock client for testing without actual deployment."""
def __init__(self, synthetic_results=None):
"""Initialize with optional synthetic results."""
self.synthetic_results = synthetic_results or []
self.auto_deploy = False
self.server_url = "http://mock-server"
def _ensure_server(self) -> None:
"""Always succeeds for mock."""
pass
def _check_server(self) -> None:
"""Always succeeds for mock."""
pass
def parse_image(self, image: Image.Image) -> Dict:
"""Return synthetic results or generate them."""
if self.synthetic_results:
return {"parsed_content_list": self.synthetic_results}
# Generate synthetic results based on the image
img_array = np.array(image)
height, width = img_array.shape[:2]
# Generate some basic elements
results = [
{
"type": "button",
"content": "Submit",
"bounds": {"x": 0.1, "y": 0.1, "width": 0.1, "height": 0.05},
"confidence": 0.95
},
{
"type": "text_field",
"content": "Username",
"bounds": {"x": 0.3, "y": 0.1, "width": 0.2, "height": 0.05},
"confidence": 0.9
}
]
return {"parsed_content_list": results}GitHub CI Workflow
name: OmniMCP CI
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install uv
./install.sh
- name: Lint with ruff
run: |
pip install ruff
ruff check .
- name: Test with pytest (non-AWS)
run: |
pytest omnimcp/tests/test_synthetic_ui.py -vFirst Steps to Get Started
- Create a new branch
feature/omniparser-integration - Set up the CI workflow in
.github/workflows/ci.yml - Enhance the test infrastructure in
tests/ - Begin refactoring the OmniParser client for reliability
- Implement the server-side idle shutdown mechanism
This development plan focuses on building a solid foundation with OmniParser integration, then expanding to a full-featured MCP implementation. By following test-driven development practices, we'll create a reliable framework that can be easily used and extended by contributors.