Open
Description
Summary
Implement a module that constructs an Action Graph from an interaction log and corresponding scene snapshots. The Action Graph models UI states as nodes and UI actions as edges, capturing transitions between visual states triggered by user or agent interactions.
Supports both real and synthetic data sources.
Motivation
- Provides a unified, structured representation of recorded UI behavior over time.
- Enables downstream planning, summarization, visualization, and analysis.
- Forms the backbone of OmniMCP’s process abstraction stack:
Parser → Segments → Tracks → Scene Graph → Action Graph → Plan → Actions → API
- Can optionally use an Interaction Log (real or synthetic) to help derive the Action Graph.
- Can later be converted into symbolic process logs for use with PM4Py or other process mining tools.
Diagram
graph TD
Parser --> Segments
Segments --> Tracks
Tracks --> SceneGraph
SceneGraph --> ActionGraph
InteractionLog -.-> ActionGraph
ActionGraph --> Plan
Plan --> Actions
Actions --> API
InteractionLog[Interaction Log]
Scope
Inputs
interaction_log
(optional): List of structured user/agent interactions (click
,type
,scroll
, etc.), each with:timestamp
orstep
action_type
element_id
or selector- (optional)
element_description
,bounding_box
,value
scene_snapshots
: List of scene graph snapshots (UI state summaries or raw graph objects), aligned with interaction steps.
Outputs
action_graph
: A JSON or in-memory object with:nodes
: One per unique UI state (e.g., via hash or semantic description)edges
: One per interaction, with:source_node_id
target_node_id
action_type
,element_id
,timestamp
Features
- Node deduplication: similar scene snapshots map to the same node
- Edge labeling with action metadata
- Optional use of interaction log for state transition alignment
- Support for synthetic logs to bootstrap development and testing
- Easy export to JSON for visualization/debugging
- Integration-ready for prompt-based planners and optional PM4Py pipeline
Example
Given:
[
{ "step": 0, "action": "type", "element_id": "email", "value": "[email protected]", "scene": "Login page with empty fields" },
{ "step": 1, "action": "type", "element_id": "password", "value": "hunter2", "scene": "Login page with email filled" },
{ "step": 2, "action": "click", "element_id": "login_button", "scene": "Login page with both fields filled" },
{ "step": 3, "action": "wait", "duration": 2, "scene": "Dashboard with welcome message" }
]
The resulting Action Graph:
{
"nodes": [
{ "id": "n0", "description": "Login page with empty fields" },
{ "id": "n1", "description": "Login page with email filled" },
{ "id": "n2", "description": "Login page with both fields filled" },
{ "id": "n3", "description": "Dashboard with welcome message" }
],
"edges": [
{ "source": "n0", "target": "n1", "action": "type", "element": "email", "step": 0 },
{ "source": "n1", "target": "n2", "action": "type", "element": "password", "step": 1 },
{ "source": "n2", "target": "n3", "action": "click", "element": "login_button", "step": 2 }
]
}
Tasks
- Define
ActionGraph
data model (nodes, edges) - Implement graph construction logic
- Handle node deduplication (exact match or fuzzy hash of scene descriptions)
- Add support for synthetic log generation (for testing)
- Add JSON export + optional visualization hooks
- Unit tests with synthetic and real logs
Notes
- Later extensions may include loops, branches, and hierarchical grouping to derive high-level process graphs.
- LLM prompting may be used to generate symbolic descriptions for nodes or edge annotations.
- Should remain model-agnostic; planner integration will be handled in downstream stages.
Metadata
Metadata
Assignees
Labels
No labels