Natural Language Interface to Trigger Relevant Recordings

### Feature request

Enable users to launch OpenAdapt workflows using natural language commands like “do my taxes,” replacing the current manual replay model. This aligns with modern AI UX expectations and leverages existing recordings more effectively.

### Problem

Requiring users to manually select and replay a `Recording` is unintuitive and limits accessibility. A natural language interface would allow users to describe tasks in plain English and let the system infer the most relevant automation.

### Goal

Let users initiate task automation by typing a natural language description. The system finds relevant past demonstrations and uses them to guide replay or plan next steps adaptively.

### Components

**1. UI Input**

* Launch from system tray icon or similar.
* Prompt: “What do you want help with today?”
* Accepts a free-form natural language input.

**2. Embedding Generation**

* Generate an embedding from user input using a model like `sentence-transformers/all-MiniLM-L6-v2`.

**3. Embedding Search**

* Use `sqlite-vss` to find nearest matches among stored `Recording.description` embeddings.
* Store embeddings in SQLite (`RecordingEmbedding` table or similar).
* Leverage `Recording.description` as initial source of semantic content.

**4. Reranking & Demonstration Injection (instead of hard selection)**

* Instead of selecting a single best match or displaying a list:

  * Retrieve top-N semantically similar recordings.
  * Use reranking (e.g. with a cross-encoder) to sort them.
  * Inject the top few (or most relevant subsections) as demonstrations into the model prompt.
* At each step, the model sees:

  * Current GUI state (screenshot + accessible DOM or bounding box data).
  * Prior user query.
  * Retrieved demonstrations (replay logs or summaries).
* The model then decides what to do next.

**5. Hierarchical / Recursive Summarization**

* For each recording, generate multi-resolution summaries:

  * High level: “file taxes”
  * Mid-level: “log in to TurboTax”
  * Low level: “click 'T4 Income' tab”
* Use these summaries to:

  * Improve retrieval accuracy.
  * Enable context-aware planning and prompt construction.
  * Eventually support segmentation and partial replays.

### UX Considerations

* Lightweight, interruptible input box.
* System should confirm before executing anything destructive.
* If confidence is low, suggest clarifications or fallback paths.

### Alternatives Considered

* Keyword search: brittle and non-semantic.
* Command line or dropdown replay: slower and less intuitive.
* Static top-k display: less adaptive than demonstration-based inference.

### Acceptance Criteria

* User can enter a natural language query from a tray icon.
* System retrieves and embeds the query, finds similar past demos.
* The model uses these demos to determine how to proceed.
* Summaries and embeddings are auto-generated for future replays.

### Motivation

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Natural Language Interface to Trigger Relevant Recordings #951

Feature request

Problem

Goal

Components

UX Considerations

Alternatives Considered

Acceptance Criteria

Motivation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Natural Language Interface to Trigger Relevant Recordings #951

Description

Feature request

Problem

Goal

Components

UX Considerations

Alternatives Considered

Acceptance Criteria

Motivation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions