Skip to content

Conversation

@enitrat
Copy link
Collaborator

@enitrat enitrat commented Oct 27, 2025

Summary

This PR introduces a comprehensive query insights infrastructure for Cairo Coder, enabling persistent logging of user interactions and providing API endpoints for analytics. The implementation includes a new database layer with asyncpg, migration tools for historical data from LangSmith, and extensive test coverage.

Key Features:

  • Persistent user interaction logging with PostgreSQL
  • RESTful insights API with filtering and pagination
  • LangSmith data migration tooling
  • Comprehensive test suite with database integration tests

Architecture

Database Layer (python/src/cairo_coder/db/)

New modules:

  • models.py - Pydantic model for UserInteraction with fields for agent_id, query, chat_history, generated_answer, retrieved_sources, and LLM usage metrics
  • repository.py - Data access layer with functions for creating and querying interactions, includes upsert support for migrations
  • session.py - Asyncpg connection pool management with per-event-loop pooling to handle FastAPI TestClient and AnyIO edge cases

Database schema:

CREATE TABLE user_interactions (
    id UUID PRIMARY KEY,
    created_at TIMESTAMPTZ NOT NULL,
    agent_id VARCHAR(50) NOT NULL,
    mcp_mode BOOLEAN NOT NULL DEFAULT FALSE,
    chat_history JSONB,
    query TEXT NOT NULL,
    generated_answer TEXT,
    retrieved_sources JSONB,
    llm_usage JSONB
);

API Endpoints (python/src/cairo_coder/server/insights_api.py)

New endpoint:

  • GET /v1/insights/queries - Paginated query retrieval with filters:
    • start_date, end_date - Time range filtering (ISO 8601)
    • agent_id - Filter by specific agent
    • query_text - Text search (case-insensitive)
    • limit, offset - Pagination controls

Returns JSON with structure:

{
  "items": [{"id": "...", "created_at": "...", "agent_id": "...", "query": "...", "chat_history": [], "output": "..."}],
  "total": 123,
  "limit": 100,
  "offset": 0
}

Server Integration (python/src/cairo_coder/server/app.py)

  • Background task logging - Non-blocking interaction persistence using FastAPI's BackgroundTasks
  • Lifecycle hooks - Database pool initialization on startup, cleanup on shutdown
  • Dual logging paths - Handles both streaming and non-streaming responses
  • Last retrieved documents - New property on RagPipeline to access retrieved sources for logging

Migration Tools (python/src/cairo_coder_tools/datasets/migrate_langsmith.py)

Comprehensive LangSmith migration:

  • Fetches historical runs from LangSmith API
  • Transforms LangSmith run format to UserInteraction model
  • Upsert behavior allows re-running migrations to update data
  • Progress reporting with statistics (inserted/updated/failed counts)
  • Supports date range filtering and dry-run mode

Key features:

  • Extracts chat history, queries, and generated answers from LangSmith runs
  • Maps retrieved documents from run outputs
  • Preserves original LangSmith run IDs for traceability
  • Handles malformed data gracefully with error logging

Dataset Analysis (python/src/cairo_coder_tools/datasets/analysis.py)

  • Updated extractors to work with new database schema
  • CSV export functionality for downstream analysis
  • Query statistics and aggregation helpers

Testing

  • Integration Tests (python/tests/integration/test_insights_api.py)
  • Migration Tests (python/tests/unit/test_migrate_langsmith.py)
  • Unit Tests (python/tests/unit/db/test_repository.py)

Migration Guide

To migrate historical data from LangSmith:

cd python
uv run dataset migrate langsmith --days 14

@enitrat enitrat changed the title Feat/add insights infra feat: add insights infra Oct 27, 2025
@enitrat enitrat force-pushed the feat/add-insights-infra branch 2 times, most recently from 1dd06f7 to 30f1657 Compare November 11, 2025 10:46
@enitrat enitrat force-pushed the feat/add-insights-infra branch from 30f1657 to a702ec0 Compare November 11, 2025 15:25
@enitrat enitrat changed the title feat: add insights infra feat: add insights API & DB Nov 11, 2025
@enitrat enitrat marked this pull request as ready for review November 16, 2025 22:00
@enitrat enitrat changed the title feat: add insights API & DB feat: Add Query Insights Infrastructure & Database Layer Nov 18, 2025
@ijusttookadnatest ijusttookadnatest merged commit a50ba81 into main Nov 20, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants