You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enable users to launch OpenAdapt workflows using natural language commands like “do my taxes,” replacing the current manual replay model. This aligns with modern AI UX expectations and leverages existing recordings more effectively.
Problem
Requiring users to manually select and replay a Recording is unintuitive and limits accessibility. A natural language interface would allow users to describe tasks in plain English and let the system infer the most relevant automation.
Goal
Let users initiate task automation by typing a natural language description. The system finds relevant past demonstrations and uses them to guide replay or plan next steps adaptively.
Components
1. UI Input
Launch from system tray icon or similar.
Prompt: “What do you want help with today?”
Accepts a free-form natural language input.
2. Embedding Generation
Generate an embedding from user input using a model like sentence-transformers/all-MiniLM-L6-v2.
3. Embedding Search
Use sqlite-vss to find nearest matches among stored Recording.description embeddings.
Store embeddings in SQLite (RecordingEmbedding table or similar).
Leverage Recording.description as initial source of semantic content.
4. Reranking & Demonstration Injection (instead of hard selection)
Instead of selecting a single best match or displaying a list:
Retrieve top-N semantically similar recordings.
Use reranking (e.g. with a cross-encoder) to sort them.
Inject the top few (or most relevant subsections) as demonstrations into the model prompt.
At each step, the model sees:
Current GUI state (screenshot + accessible DOM or bounding box data).
Prior user query.
Retrieved demonstrations (replay logs or summaries).
The model then decides what to do next.
5. Hierarchical / Recursive Summarization
For each recording, generate multi-resolution summaries:
High level: “file taxes”
Mid-level: “log in to TurboTax”
Low level: “click 'T4 Income' tab”
Use these summaries to:
Improve retrieval accuracy.
Enable context-aware planning and prompt construction.
Eventually support segmentation and partial replays.
UX Considerations
Lightweight, interruptible input box.
System should confirm before executing anything destructive.
If confidence is low, suggest clarifications or fallback paths.
Alternatives Considered
Keyword search: brittle and non-semantic.
Command line or dropdown replay: slower and less intuitive.
Static top-k display: less adaptive than demonstration-based inference.
Acceptance Criteria
User can enter a natural language query from a tray icon.
System retrieves and embeds the query, finds similar past demos.
The model uses these demos to determine how to proceed.
Summaries and embeddings are auto-generated for future replays.
Motivation
No response
The text was updated successfully, but these errors were encountered:
Feature request
Enable users to launch OpenAdapt workflows using natural language commands like “do my taxes,” replacing the current manual replay model. This aligns with modern AI UX expectations and leverages existing recordings more effectively.
Problem
Requiring users to manually select and replay a
Recording
is unintuitive and limits accessibility. A natural language interface would allow users to describe tasks in plain English and let the system infer the most relevant automation.Goal
Let users initiate task automation by typing a natural language description. The system finds relevant past demonstrations and uses them to guide replay or plan next steps adaptively.
Components
1. UI Input
2. Embedding Generation
sentence-transformers/all-MiniLM-L6-v2
.3. Embedding Search
sqlite-vss
to find nearest matches among storedRecording.description
embeddings.RecordingEmbedding
table or similar).Recording.description
as initial source of semantic content.4. Reranking & Demonstration Injection (instead of hard selection)
Instead of selecting a single best match or displaying a list:
At each step, the model sees:
The model then decides what to do next.
5. Hierarchical / Recursive Summarization
For each recording, generate multi-resolution summaries:
Use these summaries to:
UX Considerations
Alternatives Considered
Acceptance Criteria
Motivation
No response
The text was updated successfully, but these errors were encountered: