Define Version 0.1 Protocol for GUI Interaction State and Action Sequences

### 🧩 Description

We need to define and implement a minimal but extensible protocol for representing GUI interaction sequences. This protocol will unify the visual state, action metadata, and interaction history into a single structured format—enabling consistent logging, dataset creation, LLM training, planning, and replay.

This format serves as the foundation for downstream systems including the Action Graph (#10), ModelDrivenVisualState, and planner/LLM interfaces.

---

### 🧠 Background

OmniMCP currently:
- Captures visual state via OmniParser
- Plans actions using an LLM
- Executes actions via InputController

But there is no standardized, reusable format for representing:
- What was seen
- What was done
- Why it was done (optional)

This protocol fills that gap—similar to what OpenAI Operator, Adept’s AWL, and WebArena’s annotated programs use.

---

### 📦 Proposed Data Model (v0.1)

Using `pydantic` for type safety and validation.

```python
class BoundingBox(BaseModel):
    x1: int
    y1: int
    x2: int
    y2: int

class GUIElement(BaseModel):
    element_id: str
    tag: Optional[str] = None
    text: Optional[str] = None
    role: Optional[str] = None
    bbox: Optional[BoundingBox] = None
    visible: bool = True

class VisualState(BaseModel):
    screenshot_path: str
    screen_resolution: tuple[int, int]
    elements: list[GUIElement]
    timestamp: float

class GUIAction(BaseModel):
    type: Literal["click", "type", "hover", "launch_app", "scroll"]
    target_id: Optional[str] = None
    bbox: Optional[BoundingBox] = None
    text: Optional[str] = None
    delay: Optional[float] = None  # e.g. before typing

class InteractionStep(BaseModel):
    timestamp: float
    visual_state: VisualState
    action: GUIAction
```

---

### 🧪 Examples

```json
{
  "timestamp": 4.1,
  "visual_state": {
    "screenshot_path": "frames/frame_002.png",
    "screen_resolution": [1920, 1080],
    "elements": [
      {
        "element_id": "url_bar",
        "text": "Search or type URL",
        "bbox": [120, 80, 800, 120],
        "visible": true
      }
    ]
  },
  "action": {
    "type": "click",
    "target_id": "url_bar",
    "bbox": [120, 80, 800, 120]
  }
}
```

---

### ✅ Acceptance Criteria

- [ ] Protocol spec exists as Python `pydantic` models with JSON schema export
- [ ] Example logs (real or synthetic) stored in versioned `protocol/` directory
- [ ] Validator for loading, validating, and pretty-printing logs
- [ ] Unit tests for schema validity and round-trip I/O
- [ ] Integration into `AgentExecutor` logging pipeline (optional, stub OK)

---

### 📚 References

- [[Operator JSON schema examples](https://platform.openai.com/docs/guides/function-calling)](https://platform.openai.com/docs/guides/function-calling)
- [[WebArena annotated programs](https://github.com/web-arena/WebArena)](https://github.com/web-arena/WebArena)
- [[Adept’s AWL DSL](https://www.adept.ai/blog/act-1)](https://www.adept.ai/blog/act-1)
- [[MiniWoB++ trajectories](https://github.com/google/miniwob-plusplus)](https://github.com/google/miniwob-plusplus)
- [[OmniMCP Action Graph issue](https://github.com/OpenAdaptAI/OmniMCP/issues/10)](https://github.com/OpenAdaptAI/OmniMCP/issues/10)

---

### 📌 Priority
High. This is foundational to planning, replay, dataset creation, and eventual fine-tuning. Enables reuse of traces across components and simplifies future evaluation and debugging.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Define Version 0.1 Protocol for GUI Interaction State and Action Sequences #25

🧩 Description

🧠 Background

📦 Proposed Data Model (v0.1)

🧪 Examples

✅ Acceptance Criteria

📚 References

📌 Priority

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Define Version 0.1 Protocol for GUI Interaction State and Action Sequences #25

Description

🧩 Description

🧠 Background

📦 Proposed Data Model (v0.1)

🧪 Examples

✅ Acceptance Criteria

📚 References

📌 Priority

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions