Skip to content

Commit df21098

Browse files
authored
refactor(tests): setup proper test architecture (#35)
1 parent 0a22a37 commit df21098

15 files changed

+951
-1300
lines changed

CLAUDE.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
44

5+
For information on how to work in the Python part of this project, see `python/CLAUDE.md`.
6+
57
## Project Overview
68

79
Cairo Coder is an open-source Cairo language code generation service using Retrieval-Augmented Generation (RAG) to transform natural language requests into functional Cairo smart contracts and programs. It was adapted from the Starknet Agent project.

python/CLAUDE.md

Lines changed: 187 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,187 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Project Overview
6+
7+
Cairo Coder is an open-source Cairo language code generation service using Retrieval-Augmented Generation (RAG) with the DSPy framework. It transforms natural language requests into functional Cairo smart contracts and programs.
8+
9+
## Essential Commands
10+
11+
### Installation and Setup
12+
13+
- `curl -LsSf https://astral.sh/uv/install.sh | sh` - Install uv package manager
14+
- `uv sync` - Install all dependencies
15+
- `cp sample.config.toml config.toml` - Create configuration file
16+
- `cp .env.example .env` - Set up environment variables (if .env.example exists)
17+
18+
### Development
19+
20+
- `uv run cairo-coder` - Start the FastAPI server
21+
- `uv run pytest` - Run all tests
22+
- `uv run pytest tests/unit/test_query_processor.py::test_specific` - Run specific test
23+
- `uv run pytest -k "test_name"` - Run tests matching pattern
24+
- `uv run pytest --cov=src/cairo_coder` - Run tests with coverage
25+
- `trunk check --fix` - Run linting and auto-fix issues
26+
- `uv run ty check` - Run type checking
27+
28+
### Docker Operations
29+
30+
- `docker compose up postgres` - Start PostgreSQL database
31+
- `docker compose up backend` - Start the API server
32+
- `docker compose run ingester` - Run documentation ingestion
33+
34+
### Optimization and Evaluation
35+
36+
- `marimo run optimizers/generation_optimizer.py` - Run generation optimizer notebook
37+
- `marimo run optimizers/rag_pipeline_optimizer.py` - Run full pipeline optimizer
38+
- `uv run starklings_evaluate` - Evaluate against Starklings dataset
39+
- `uv run cairo-coder-summarize <repo-url>` - Summarize documentation
40+
41+
## High-Level Architecture
42+
43+
### DSPy-Based RAG Pipeline
44+
45+
Cairo Coder uses a three-stage RAG pipeline implemented with DSPy modules:
46+
47+
1. **Query Processing** (`src/cairo_coder/dspy/query_processor.py`):
48+
49+
- Uses `CairoQueryAnalysis` signature with ChainOfThought
50+
- Extracts search terms and identifies relevant documentation sources
51+
- Detects if query is contract/test related
52+
53+
2. **Document Retrieval** (`src/cairo_coder/dspy/document_retriever.py`):
54+
55+
- Custom `SourceFilteredPgVectorRM` extends DSPy's retriever
56+
- Queries PostgreSQL with pgvector for similarity search
57+
- Supports source filtering and metadata extraction
58+
59+
3. **Answer Generation** (`src/cairo_coder/dspy/generation_program.py`):
60+
- `CairoCodeGeneration` signature for code synthesis
61+
- Streaming support via async generators
62+
- MCP mode for raw documentation retrieval
63+
64+
### Agent-Based Architecture
65+
66+
- **Agent Factory** (`src/cairo_coder/core/agent_factory.py`): Creates specialized agents from TOML configs
67+
- **Agents**: General, Scarb-specific, or custom agents with filtered sources
68+
- **Pipeline Factory**: Creates optimized RAG pipelines loading from `optimizers/results/`
69+
70+
### FastAPI Server
71+
72+
- **OpenAI-Compatible API** (`src/cairo_coder/server/app.py`):
73+
- `/v1/chat/completions` - Legacy endpoint
74+
- `/v1/agents/{agent_id}/chat/completions` - Agent-specific
75+
- Supports streaming (SSE) and MCP mode via headers
76+
- **Lifecycle Management**: Connection pooling, resource cleanup
77+
- **Error Handling**: OpenAI-compatible error responses
78+
79+
### Optimization Framework
80+
81+
- **DSPy Optimizers**: MIPROv2 for prompt tuning
82+
- **Datasets**: Generated from Starklings exercises
83+
- **Metrics**: Code compilation success, relevance scores
84+
- **Marimo Notebooks**: Reactive optimization workflows with MLflow tracking
85+
86+
## Development Guidelines
87+
88+
### Code Organization
89+
90+
- Follow DSPy patterns: Signatures → Modules → Programs
91+
- Use dependency injection for testability (e.g., vector_db parameter)
92+
- Prefer async/await for I/O operations
93+
- Type hints required (enforced by mypy)
94+
95+
### Adding New Features
96+
97+
1. **New Agent**: Add configuration to `config.toml`, extend `AgentConfiguration`
98+
2. **New DSPy Module**: Create signature, implement forward/aforward methods
99+
3. **New Optimizer**: Create Marimo notebook, define metrics, use MIPROv2
100+
101+
### Configuration Management
102+
103+
- `ConfigManager` loads from `config.toml` and environment
104+
- Vector store config in `[VECTOR_DB]` section
105+
- LLM providers in `[PROVIDERS]` section
106+
- Agent definitions in `[[AGENTS]]` array
107+
108+
## Important Notes
109+
110+
- Always load optimized programs from `optimizers/results/` in production
111+
- Use `uv` for all dependency management (not pip/poetry)
112+
- Structlog for JSON logging (`get_logger(__name__)`)
113+
- DSPy tracks token usage via `lm.get_usage()`
114+
- MLflow experiments logged to `mlruns/` directory
115+
116+
## Working with the test suite
117+
118+
This document provides guidelines for interacting with the Python test suite. Adhering to these patterns is crucial for maintaining a clean, efficient, and scalable testing environment.
119+
120+
### 1. Running Tests
121+
122+
All test commands should be run from the `python/` directory.
123+
124+
- **Run all tests:**
125+
126+
```bash
127+
uv run pytest
128+
```
129+
130+
- **Run tests in a specific file:**
131+
132+
```bash
133+
uv run pytest tests/unit/test_rag_pipeline.py
134+
```
135+
136+
- **Run a specific test by name (using `-k`):**
137+
```bash
138+
uv run pytest -k "test_mcp_mode_pipeline_execution"
139+
```
140+
141+
### 2. Test Architecture
142+
143+
The test suite is divided into two main categories:
144+
145+
- `tests/unit/`: For testing individual classes or functions in isolation. These tests should be fast and rely heavily on mocks to prevent external dependencies (like databases or APIs).
146+
- `tests/integration/`: For testing how multiple components work together. This is primarily for testing the FastAPI server endpoints using `fastapi.testclient.TestClient`. These tests are slower and verify the contracts between different parts of the application.
147+
148+
### 3. The Golden Rule: `conftest.py` is King
149+
150+
**`python/tests/conftest.py` is the single source of truth for all shared fixtures, mocks, and test data.**
151+
152+
- **Before adding any new mock or helper, check `conftest.py` first.** It is highly likely a suitable fixture already exists.
153+
- **NEVER define a reusable fixture in an individual test file.** All shared fixtures **must** reside in `conftest.py`. This is non-negotiable for maintainability.
154+
155+
### 4. Key Fixtures to Leverage
156+
157+
Familiarize yourself with these core fixtures defined in `conftest.py`. Use them whenever possible.
158+
159+
- `client`: An instance of `TestClient` for making requests to the FastAPI app in **integration tests**.
160+
- `mock_agent`: A powerful, pre-configured mock of a RAG pipeline agent. It has mock implementations for `forward`, `aforward`, and `forward_streaming`.
161+
- `mock_agent_factory`: A mock of the `AgentFactory` used in server tests to control which agent is created.
162+
- `mock_vector_db`: A mock of `SourceFilteredPgVectorRM` for testing the document retrieval layer without a real database.
163+
- `mock_lm`: A mock of a `dspy` language model for testing DSPy programs (`QueryProcessorProgram`, `GenerationProgram`) without making real API calls.
164+
- `sample_documents`, `sample_agent_configs`, `sample_processed_query`: Consistent, reusable data fixtures for your tests.
165+
- `sample_config_file`: A temporary, valid `config.toml` file for integration testing the configuration manager.
166+
167+
### 5. Guidelines for Adding & Modifying Tests
168+
169+
- **Adding a New Test File:**
170+
171+
- If you are testing a single class's methods or a utility function, create a new file in `tests/unit/`.
172+
- If you are testing a new API endpoint or a flow that involves multiple components, add it to the appropriate file in `tests/integration/`.
173+
174+
- **Avoiding Code Duplication (DRY):**
175+
176+
- If you find yourself writing several tests that differ only by their input values, you **must** use parametrization.
177+
- **Pattern:** Use `@pytest.mark.parametrize`. See `tests/unit/test_document_retriever.py` for a canonical example of how this is done effectively.
178+
179+
- **Adding New Mocks or Test Data:**
180+
181+
- If the mock or data will be used in more than one test function, add it to `conftest.py` as a new fixture.
182+
- If it's truly single-use, you may define it within the test function itself, but be certain it won't be needed elsewhere.
183+
184+
- **Things to Be Careful About:**
185+
- **Fixture Dependencies:** Understand that some fixtures depend on others (e.g., `client` depends on `mock_agent_factory`). Modifying a base fixture can have cascading effects on tests that use dependent fixtures.
186+
- **Unit vs. Integration Mocks:** Do not use `TestClient` (`client` fixture) in unit tests. Unit tests should mock the direct dependencies of the class they are testing, not the entire application.
187+
- **Removing Tests:** Only remove tests for code that has been removed. If you are refactoring, ensure that the new tests provide equivalent or better coverage than the ones being replaced. The recent refactoring that merged `test_server.py` into `test_openai_server.py` and `test_server_integration.py` is a key example of this pattern.

python/src/cairo_coder/dspy/document_retriever.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -574,7 +574,7 @@ async def aforward(
574574
return []
575575

576576
# Step 2: Enrich context with appropriate templates based on query type.
577-
return self._enhance_context(processed_query.original, documents)
577+
return self._enhance_context(processed_query, documents)
578578

579579
def forward(
580580
self, processed_query: ProcessedQuery, sources: list[DocumentSource] | None = None
@@ -670,7 +670,7 @@ async def _afetch_documents(
670670
logger.error(f"Error fetching documents: {traceback.format_exc()}")
671671
raise e
672672

673-
def _enhance_context(self, query: str, context: list[Document]) -> list[Document]:
673+
def _enhance_context(self, processed_query: ProcessedQuery, context: list[Document]) -> list[Document]:
674674
"""
675675
Enhance context with appropriate templates based on query type.
676676
@@ -681,12 +681,12 @@ def _enhance_context(self, query: str, context: list[Document]) -> list[Document
681681
Returns:
682682
Enhanced context with relevant templates
683683
"""
684-
query_lower = query.lower()
684+
query_lower = processed_query.original.lower()
685685

686686
# Add contract template for contract-related queries
687687
if any(
688688
keyword in query_lower for keyword in ["contract", "storage", "external", "interface"]
689-
):
689+
) or processed_query.is_contract_related:
690690
context.append(
691691
Document(
692692
page_content=CONTRACT_TEMPLATE,
@@ -695,7 +695,7 @@ def _enhance_context(self, query: str, context: list[Document]) -> list[Document
695695
)
696696

697697
# Add test template for test-related queries
698-
if any(keyword in query_lower for keyword in ["test", "testing", "assert", "mock"]):
698+
if any(keyword in query_lower for keyword in ["test", "testing", "assert", "mock"]) or processed_query.is_test_related:
699699
context.append(
700700
Document(
701701
page_content=TEST_TEMPLATE,

python/src/cairo_coder/server/app.py

Lines changed: 0 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -166,9 +166,6 @@ def __init__(
166166
allow_headers=["*"],
167167
)
168168

169-
# Token tracking for usage statistics
170-
self.token_tracker = TokenTracker()
171-
172169
# Setup routes
173170
self._setup_routes()
174171

@@ -490,32 +487,6 @@ async def _generate_chat_completion(
490487
)
491488

492489

493-
class TokenTracker:
494-
"""Simple token tracker for usage statistics."""
495-
496-
def __init__(self):
497-
self.sessions = {}
498-
499-
def track_tokens(self, session_id: str, prompt_tokens: int, completion_tokens: int):
500-
"""Track token usage for a session."""
501-
if session_id not in self.sessions:
502-
self.sessions[session_id] = {
503-
"prompt_tokens": 0,
504-
"completion_tokens": 0,
505-
"total_tokens": 0,
506-
}
507-
508-
self.sessions[session_id]["prompt_tokens"] += prompt_tokens
509-
self.sessions[session_id]["completion_tokens"] += completion_tokens
510-
self.sessions[session_id]["total_tokens"] += prompt_tokens + completion_tokens
511-
512-
def get_session_usage(self, session_id: str) -> dict[str, int]:
513-
"""Get session token usage."""
514-
return self.sessions.get(
515-
session_id, {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0}
516-
)
517-
518-
519490
def create_app(
520491
vector_store_config: VectorStoreConfig, config_manager: ConfigManager | None = None
521492
) -> FastAPI:

0 commit comments

Comments
 (0)