|
| 1 | + |
| 2 | +## PRD: Stakeholder Query Insights Platform |
| 3 | + |
| 4 | +### 1. Objective & Summary |
| 5 | + |
| 6 | +This document outlines the requirements for a new **Query Insights Platform** for Cairo Coder. The platform's objective is to systematically capture, analyze, and surface user query data to provide actionable insights into developer trends, pain points, and areas of interest within the Cairo and Starknet ecosystem. |
| 7 | + |
| 8 | +This will empower stakeholders in the Starknet ecosystem, from ecosystem managers to library developers, to make data-driven decisions that improve the overall developer and user experience on Starknet. |
| 9 | + |
| 10 | +The project will be delivered in two phases: |
| 11 | +1. **Phase 1 (Data Persistence):** Establish a system for storing user interactions and LLM-generated analyses in our own database, creating a permanent and queryable record. |
| 12 | +2. **Phase 2 (Insight Delivery):** Expose this data through a set of API endpoints for raw data retrieval and access to persisted, on-demand analysis on the data to extract key topics, issues, pain points, and get insights into what people are building on Starknet. |
| 13 | + |
| 14 | +### 2. Background & Strategic Fit |
| 15 | + |
| 16 | +Currently, we keep persist LLM traces of every query within Langsmith traces for observability purposes. This makes it easy to monitor each internal component in the system and identify eventual bottlenecks in the latency / quality of answers, but it is difficult to answer critical questions about the developer and user journey, such as: |
| 17 | + |
| 18 | +* **For Documentation Teams:** *What are the biggest knowledge gaps in our current docs? Are users finding the information they need for core concepts related to cairo (storage, testing), integrations (starknetjs, frontend, indexers), blockchain (tx fees, account abstraction)? |
| 19 | +* **For Core Protocol & Tooling Teams:** *Which compiler errors are causing the most friction? How are developers adopting new features we've shipped? Where are the rough edges in our tooling (Scarb, Foundry) that cause confusion?* |
| 20 | +* **For the Ecosystem (e.g., Starknet Foundation):** *What are the emerging development patterns? Are more developers building DeFi, Gaming, or NFT projects? What topics are they most curious about this quarter?* |
| 21 | + |
| 22 | +While this platform will not be able to spot all the things that work well *(people usually don't ask about things they understand, are clear, or simple!)*, they'll help catch everything that would benefit from being improved. |
| 23 | + |
| 24 | +By building this platform, we transform Cairo Coder from just a developer assistance tool into a source of quantitative data, considerably helping gathering feedback to work on improvements for the entire Starknet ecosystem. |
| 25 | + |
| 26 | +### 3. Personas & User Stories |
| 27 | + |
| 28 | +| Persona | Description | User Stories | |
| 29 | +| :----------------------- | :------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | |
| 30 | +| **Dana the Docs Writer** | Responsible for creating and maintaining Starknet and Cairo documentation. | 1. **As Dana**, I want to see the topic breakdown for the last month, so I can prioritize updates for the most frequently asked-about subjects.<br/>2. **As Dana**, I want to view the raw queries for the "Starknet/Prover" topic, so I can understand what's unclear about this topic | |
| 31 | +| **Charlie the Core Dev** | Works on the Cairo compiler and Starknet tooling. | 1. **As Charlie**, I want to filter queries by `agent_id='cairo-coder'` so I can see how developers are using (or misusing) specific language features.<br>2. **As Charlie**, I want to trigger a new analysis on last quarter's queries, so I can see if a recent library update has reduced the number of confusion about a feature. | |
| 32 | +| **Alex the Analyst** | An ecosystem manager. | 1. **As Alex**, I want to list all historical analyses, so I can find the Q3 2025 summary report.<br/>2. **As Alex**, I want to retrieve a specific, persisted analysis by its ID, so I can use its structured JSON data to build visualizations in my own BI tool. | |
| 33 | + |
| 34 | +### 4. Proposed Solution & Features |
| 35 | + |
| 36 | +> Note: the following are internal technical data |
| 37 | +
|
| 38 | +#### Phase 1: Robust Data Persistence Layer |
| 39 | + |
| 40 | +> Note: the database schema will be revisited upon implementation |
| 41 | +
|
| 42 | +1. **Database Schema:** |
| 43 | + * We will add two new tables to the existing Postgres database. |
| 44 | + |
| 45 | + * **Table 1: `user_interactions`** (To store every interaction) |
| 46 | + ```sql |
| 47 | + CREATE TABLE user_interactions ( |
| 48 | + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), |
| 49 | + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), |
| 50 | + agent_id VARCHAR(50) NOT NULL, |
| 51 | + mcp_mode BOOLEAN NOT NULL DEFAULT FALSE, |
| 52 | + query_history JSONB, |
| 53 | + final_user_query TEXT NOT NULL, |
| 54 | + generated_answer TEXT, |
| 55 | + retrieved_sources JSONB, |
| 56 | + llm_usage JSONB |
| 57 | + ); |
| 58 | + -- Indexes for efficient filtering |
| 59 | + CREATE INDEX idx_interactions_created_at ON user_interactions(created_at); |
| 60 | + CREATE INDEX idx_interactions_agent_id ON user_interactions(agent_id); |
| 61 | + ``` |
| 62 | + * **Table 2: `query_analyses`** (To store the results of analysis jobs) |
| 63 | + ```sql |
| 64 | + CREATE TABLE query_analyses ( |
| 65 | + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), |
| 66 | + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), |
| 67 | + status VARCHAR(20) NOT NULL DEFAULT 'pending', -- pending, completed, failed |
| 68 | + analysis_parameters JSONB, -- { start_date, end_date, agent_id } |
| 69 | + analysis_result JSONB, -- The full JSON output from the analysis script |
| 70 | + error_message TEXT -- To log errors if the job fails |
| 71 | + ); |
| 72 | + CREATE INDEX idx_analyses_created_at ON query_analyses(created_at); |
| 73 | + ``` |
| 74 | + |
| 75 | +2. **Data Ingestion Logic:** |
| 76 | + * The `_handle_chat_completion` function in `cairo_coder/server/app.py` will be modified. |
| 77 | + * After a response is successfully generated, it will spawn a non-blocking background task to write the full interaction details into the `user_interactions` table. This ensures the logging does not add latency to user requests. |
| 78 | + |
| 79 | +#### Phase 2: Insights API Endpoints |
| 80 | + |
| 81 | +A new API router will be created at `/v1/insights`. |
| 82 | + |
| 83 | +1. **Get Raw Queries:** `GET /v1/insights/queries` |
| 84 | + * **Description:** Provides paginated access to the raw interaction data. |
| 85 | + * **Query Parameters:** |
| 86 | + * `start_date` (ISO 8601, required) |
| 87 | + * `end_date` (ISO 8601, required) |
| 88 | + * `agent_id` (string, optional): Filters by a specific agent. If omitted, returns data for all agents. |
| 89 | + * `limit` (integer, default: 100) |
| 90 | + * `offset` (integer, default: 0) |
| 91 | + * **Response:** A JSON object with a list of interaction records and pagination metadata. |
| 92 | + |
| 93 | +2. **Trigger a New Analysis:** `POST /v1/insights/analyze` |
| 94 | + * **Description:** Kicks off an asynchronous LLM-based analysis job on a specified time range. |
| 95 | + * **Request Body:** |
| 96 | + ```json |
| 97 | + { |
| 98 | + "start_date": "2025-10-01T00:00:00Z", |
| 99 | + "end_date": "2025-11-01T00:00:00Z", |
| 100 | + "agent_id": "starknet-agent" // Optional |
| 101 | + } |
| 102 | + ``` |
| 103 | + * **Logic:** |
| 104 | + 1. The endpoint immediately creates a new record in the `query_analyses` table with `status='pending'`. |
| 105 | + 2. It returns a `202 Accepted` response with the `analysis_id`. |
| 106 | + 3. A background worker fetches the corresponding interactions from `user_interactions` and runs them through the analysis logic from `cairo_coder_tools/datasets/analysis.py`. |
| 107 | + 4. Upon completion, the worker updates the `query_analyses` record with `status='completed'` and saves the resulting JSON in `analysis_result`. If it fails, the status is set to `failed` and an error is logged. |
| 108 | + * **Response:** |
| 109 | + ```json |
| 110 | + { |
| 111 | + "analysis_id": "uuid-for-the-new-analysis-job", |
| 112 | + "status": "pending" |
| 113 | + } |
| 114 | + ``` |
| 115 | + |
| 116 | +3. **List and Retrieve Analyses:** |
| 117 | + * `GET /v1/insights/analyses` |
| 118 | + * **Description:** Returns a list of all historical analysis jobs, most recent first. |
| 119 | + * **Response:** A JSON list containing metadata for each analysis (`id`, `created_at`, `status`, `analysis_parameters`). |
| 120 | + * `GET /v1/insights/analyses/{analysis_id}` |
| 121 | + * **Description:** Retrieves the detailed result of a specific analysis job. |
| 122 | + * **Response:** The full record from the `query_analyses` table, including the `analysis_result` JSON if the job is complete. |
| 123 | + |
| 124 | +### 5. Non-Functional Requirements |
| 125 | + |
| 126 | +* **Asynchronicity:** The analysis endpoint (`POST /analyze`) must be fully asynchronous to handle potentially long-running LLM jobs without blocking the server. |
| 127 | + |
| 128 | +### 6. Out of Scope for This Version |
| 129 | + |
| 130 | +The following items are important but are explicitly deferred to future iterations to manage scope: |
| 131 | + |
| 132 | +* **API Authentication:** The initial version of the insights API will be deployed internally without an authentication layer. The auth layer builds on top of this API. |
| 133 | +* **PII Anonymization:** We will operate under the assumption that user queries do not contain PII. A formal PII scrubbing process is a future requirement. |
| 134 | +* **Data Retention Policy:** No automated data retention or cleanup will be implemented in this version. |
| 135 | +* **Frontend Dashboard:** A dedicated UI for visualizing these insights is a logical next step but is not part of this project. |
0 commit comments