Add multi-file attachment support for tests, traces, and playground#1441
Add multi-file attachment support for tests, traces, and playground#1441harry-rhesis merged 48 commits intomainfrom
Conversation
Add polymorphic File model (BYTEA storage) for binary file attachments
on Tests (inputs) and TestResults (outputs). Files flow through the
execution pipeline as base64-encoded JSON and are stored/retrieved via
dedicated API endpoints.
- File model with deferred content, polymorphic entity pattern
- Upload/download/delete endpoints with size and MIME validation
- Nested routes: GET /tests/{id}/files, GET /test-results/{id}/files
- Execution pipeline: inject input files, capture output files
- Soft-delete cascade with entity_type isolation
- SDK serializer: bytes-to-base64 dump strategy
- Alembic migration for file table
- 75 tests (route, cascade, execution, integration, SDK)
The ManualTestWriter was sending only `name` when creating a test set,
but the backend requires `test_set_type_id`. Now resolves the correct
TestSetType lookup based on the user's single/multi-turn selection.
Also fixes TestSetDrawer querying wrong type_name ('TestType' instead
of 'TestSetType') and replaces magic strings with constants.
Add FilesClient API client, useFiles hook, MultiFileUpload and FileAttachmentList components to support attaching images, PDFs, and audio files to tests. Integrate file upload into the create test drawer, test detail page, and manual test writer with per-row attachments. Includes 52 new tests covering the client, hook, and components.
Replace all 'Single-Turn', 'Multi-Turn', 'TestType', and 'TestSetType' magic strings across 19 files with TEST_TYPES and TYPE_NAMES constants from constants/test-types.ts. Also update TypeScript type annotations to use TestTypeValue and MetricScope types instead of inline unions.
- Remove empty prompt_id that caused UUID validation error - Fix test set association for individual test creation path by calling associateTestsWithTestSet after creating tests - Remove unused test_set_id from buildTestPayload since the single create endpoint doesn't support it - Redirect to test set detail page after saving when a test set name is provided
Multi-turn test files attach to spans during execution, not to the test entity itself. Hide the Files column and attachment button when creating multi-turn tests in the manual test writer.
Expose the database row UUID on SpanNode so the frontend can fetch files attached to individual trace spans. Add a read-only Attachments card in the Span Details panel that shows files when they exist.
This reverts commit 6a44c4a.
…ils"" This reverts commit dc888a3.
This reverts commit ded15a8.
…ocations Add Jinja2 filters (to_anthropic, to_openai, to_gemini) that transform input files into provider-specific content formats in request mappings. Switch TemplateRenderer to use a Jinja2 Environment with registered filters and auto-parse JSON filter output. Store input files as File records linked to Trace entities when endpoints are invoked with file attachments. REST/WebSocket invokers create files synchronously after span storage. SDK invokers use deferred linking: files are parked in the Redis/memory cache at invocation time and created when SDK spans arrive at telemetry ingest. Update endpoint and test documentation to describe file support, file format filters, and provider-specific mapping examples.
Accept base64-encoded files in the JSON request body and extract their text content using the SDK's DocumentExtractor (MarkItDown). File contents are injected into the LLM prompt between the system prompt and conversation history. A field_validator coerces empty strings to None for backward compatibility with callers that send files: "". Updated all use case prompts to permit file operations.
…ield name Increase WebSocket max message size from 64KB to 10MB to accommodate base64-encoded file attachments. Pass files from WebSocket chat payload to endpoint input_data and return output_files in the response. Rename file data field from content_base64 to data for consistency across input file loading and output file storage.
Add file upload button and drag-and-drop support to PlaygroundChat. Render file attachment previews (images, PDFs) in MessageBubble. Extend WebSocket types to include file metadata for chat messages.
Make FileAttachmentList items clickable to trigger authenticated file
downloads via the /files/{id}/content endpoint. Each row now has a
ListItemButton for click-to-download and an explicit download icon
button in the secondary action area.
In MessageBubble, make user-attached file chips and output image
previews clickable for download. Replace AttachFileIcon with
DownloadIcon on user file chips to signal downloadability. Replace
hardcoded image maxHeight with theme spacing.
Move the file attachment button from a standalone button beside the input to a startAdornment inside the TextField, aligning with common chat UI patterns. Also size the reset button to match the send button.
Enable Penelope to include test-attached files (images, PDFs, audio) when sending messages to target endpoints. The LLM decides per-message whether to include files via include_files parameter, controlled by test instructions. Data flow: backend loads files → TestContext.files → system prompt informs agent → LLM sets include_files=True → executor injects files → TargetInteractionTool → Target.send_message(files=...) → endpoint.
Add File entity with upload (from paths and base64), download, and delete support. Extend Test with add_files(), get_files(), delete_file() and inline files via push(). Add get_files() to TestResult. Includes unit and integration tests.
Thread mapped metadata from endpoint responses through to metric evaluation, allowing evaluation criteria to reference response metadata in their prompts.
Introduce a "mode" parameter ("text" or "json") across the chatbot
response chain. When mode is "json", uses Pydantic schema-based
generation via the SDK model provider to return structured output.
Ensure dict/list outputs from JSON mode are serialized with json.dumps() instead of str() across the invocation pipeline, tracing, response extraction, and conversation storage.
Extend the file input accept attribute to include JSON, Excel (.xlsx, .xls), and CSV files alongside existing image and PDF support.
- Add metadata and context as collapsible sections in overview tab - Auto-detect and pretty-print JSON content in all text fields - Add fontFamilyCode theme token for monospace rendering - Show test result files via FileAttachmentList component - Add "Go to Test" button linking to test detail page - Show N/Total progress in Tests Executed card - Add refresh button while test run is in progress
Edit page: replace per-keystroke state updates with ref-based dirty tracking and blur-triggered re-renders; cache stable ref callbacks for dynamic evaluation step TextFields to prevent React remounting. New metric page: use stable index-based keys instead of content-derived keys that changed on every keystroke causing React to remount elements.
Wire conversation tab responses to open trace drawer on click, mapping all turns to the shared multi-turn trace. Split files into collapsible "Files" and "Output Files" sections for both single-turn and multi-turn views.
Add inline error highlighting on Next click for required fields (name, evaluation prompt, metric scope, and score-type-conditional fields). Replace magic score type strings with SCORE_TYPES constants and add backend model_validator for numeric/categorical conditional validation.
After delete_item() commits, the RLS session variable (app.current_organization) may no longer be set on the DB connection, causing lazy-load queries during response serialization to fail with ProgrammingError. This resulted in 500 errors on delete even though the deletion itself succeeded. Add safe_relationship decorator that catches SQLAlchemy errors on relationship property access and returns safe defaults instead of propagating the error to the response.
Replace raw db.query() in update_test_set_attributes with crud.get_test_set which applies proper RLS filtering, organization scoping, and soft-delete exclusion. Gracefully skip updates when the test set has been soft-deleted instead of raising ValueError. Pass organization_id and user_id through all call sites.
Update down_revision to chain after the litellm/azure provider migration introduced on main.
- Replace hardcoded borderRadius values with theme.shape.borderRadius in MessageBubble and FileAttachmentList components - Pass missing sessionToken prop to TestDetailConversationTab in TestResultDrawer
There was a problem hiding this comment.
Critical
- File format Jinja filters return JSON strings and will double-encode with
|tojson, producing wrong request bodies. - Backend file execution tests use
content_base64but implementation usesdata.
Improvements
- File upload ordering (
position) resets each request. - Telemetry span files endpoint lacks explicit auth dependency.
Nit
- Alembic migration docstring
Revisesdoesn’t matchdown_revision.
Found 6 issues (2 critical, 2 improvements, 1 nit, 1 question-ish auth concern).
| }, | ||
| } | ||
| ) | ||
| return json.dumps(content) |
There was a problem hiding this comment.
Critical: these filters return json.dumps(...) strings, but the docs and templating flow expect provider filters to return Python objects and then use |tojson.
With the current implementation, templates like {{ files | to_anthropic | tojson }} will double-encode and TemplateRenderer will json.loads back to a string, so the request body will contain a JSON string instead of an array/object.
Fix: have
to_anthropic/to_openai/to_geminireturnlist[dict](ordict) directly (nojson.dumps), letting|tojsonhandle JSON serialization.
There was a problem hiding this comment.
Fixed in 60da1ab — filters now return list[dict] instead of json.dumps() strings, so they compose correctly with |tojson.
| { | ||
| "filename": "test.png", | ||
| "content_type": "image/png", | ||
| "content_base64": base64.b64encode(file_content).decode("ascii"), |
There was a problem hiding this comment.
Critical: test uses content_base64, but the implementation uses data as the base64 key (SingleTurnOutput._load_input_files emits data, and _store_output_files reads file_data.get('data')).
Fix: update tests to use/expect
data(or add backward-compat in backend to accept both keys).
There was a problem hiding this comment.
Fixed in 60da1ab — updated all tests to use data key to match the implementation.
| { | ||
| "filename": "input.png", | ||
| "content_type": "image/png", | ||
| "content_base64": base64.b64encode(png_bytes).decode("ascii"), |
There was a problem hiding this comment.
Same issue as test_file_execution.py: this uses content_base64 but the pipeline uses data.
Fix: switch to
dataeverywhere in these tests (and the output_files fixture).
| size_bytes=file_size, | ||
| content=file_bytes, | ||
| entity_id=entity_id, | ||
| entity_type=entity_type, |
There was a problem hiding this comment.
Improvement: position=idx will reset to 0..N for each upload request. If users upload more files later, ordering may collide/interleave unexpectedly.
Fix: set
position = existing_count + idx(ormax(position)+1) for append semantics, or document that clients must reorder via update.
| a polymorphic entity_id + entity_type pattern. | ||
|
|
||
| Revision ID: b3f7a9c2d1e4 | ||
| Revises: aef6c47a8faa |
There was a problem hiding this comment.
Nit: migration docstring says Revises: aef6c47a8faa, but down_revision is a7b8c9d0e1f2.
Fix: update the docstring header to match to avoid confusion during ops/debugging.
There was a problem hiding this comment.
Fixed in 60da1ab — docstring now matches down_revision.
| ) | ||
|
|
||
|
|
||
| @router.get( |
There was a problem hiding this comment.
Improvement: GET /telemetry/spans/{span_db_id}/files doesn't require current_user/token (unlike most other routers). If telemetry endpoints are expected to be protected, this could expose file metadata cross-tenant depending on RLS configuration.
Fix: add
current_user: User = Depends(require_current_user_or_token)(or whatever auth pattern telemetry routes use), or confirm telemetry router is intentionally public/auth'd elsewhere.
There was a problem hiding this comment.
No change needed — auth is enforced transitively via the get_tenant_context dependency, which internally depends on require_current_user_or_token. This is consistent with all other telemetry endpoints in the same router (none have an explicit current_user param).
The user_id field in TestRunBase lacked a default value, making it required by Pydantic. After the SDK switched to exclude_none=True, user_id was omitted from requests, causing 422 validation errors. Co-Authored-By: Claude Opus 4.6 <[email protected]>
…creation - Replace unavailable SiMistralai and SiOpenrouter icons with MUI fallback icons (not exported in installed react-simple-icons version) - Make prompt_id optional in TestCreate for inline prompt creation - Add explicit TestCreate return type to buildTestPayload
Auto-format line length, indentation, and arrow function parentheses across frontend source and test files.
Revert SiMistralai and SiOpenrouter icon fallbacks and update @icons-pack/react-simple-icons to v13.12.0 which exports them.
Multi-turn tests and inline prompt creation don't require prompt_id upfront. All existing usages already guard with null checks.
|
Main issues found:
Found 3 issues (1 critical, 2 improvements). |
There was a problem hiding this comment.
Requesting changes due to a contract mismatch that will break file execution/output capture.
- Fix
datavscontent_base64mismatch (tests + any decoding paths). Ideally standardize ondataeverywhere; if backwards compatibility is needed, accept both keys when decoding. - Update migration header
Revises:to matchdown_revision. - Sanitize
Content-Dispositionfilename to avoid header issues.
| { | ||
| "filename": "test.png", | ||
| "content_type": "image/png", | ||
| "content_base64": base64.b64encode(file_content).decode("ascii"), |
There was a problem hiding this comment.
Critical: The execution pipeline uses files[*].data (base64) as the canonical key (see SingleTurnOutput._load_input_files() and templating filters), but these tests use content_base64. As written, this will fail and it also diverges from the API/docs/front-end.
Fix: update test fixtures to use
data(or, if we need backwards-compat, accept bothdataandcontent_base64in the backend when decoding).
There was a problem hiding this comment.
Fixed — see reply above.
| { | ||
| "filename": "input.png", | ||
| "content_type": "image/png", | ||
| "content_base64": base64.b64encode(png_bytes).decode("ascii"), |
There was a problem hiding this comment.
Critical: Same issue as test_file_execution.py: these mocked input files use content_base64, but the runtime path expects data.
Fix: rename to
datain these tests (and in any endpoint contract examples) or add backwards-compat decoding forcontent_base64.
There was a problem hiding this comment.
Fixed — see reply above.
| if not isinstance(file_data, dict): | ||
| continue | ||
|
|
||
| content_b64 = file_data.get("data") |
There was a problem hiding this comment.
Critical: _store_output_files() reads base64 from file_data.get("data"), but the tests (and likely some clients) are using content_base64. This mismatch will silently skip storing output files.
Fix: align contract. Easiest: accept both keys:
content_b64 = file_data.get("data") or file_data.get("content_base64")(and update tests/docs to use data).
There was a problem hiding this comment.
The implementation correctly uses data as the key. The tests were wrong — fixed in 60da1ab to use data consistently.
| a polymorphic entity_id + entity_type pattern. | ||
|
|
||
| Revision ID: b3f7a9c2d1e4 | ||
| Revises: aef6c47a8faa |
There was a problem hiding this comment.
Improvement: Migration header says Revises: aef6c47a8faa but down_revision is a7b8c9d0e1f2. That inconsistency is confusing when debugging migrations.
Fix: update the docstring
Revises:line to matchdown_revision(or vice versa if the chain is wrong).
There was a problem hiding this comment.
Fixed — see reply above.
| BytesIO(db_file.content), | ||
| media_type=db_file.content_type, | ||
| headers={ | ||
| "Content-Disposition": f'attachment; filename="{db_file.filename}"', |
There was a problem hiding this comment.
Improvement: Content-Disposition is built from the raw filename. If filenames can contain quotes/newlines, this can break headers or enable response splitting.
Fix: sanitize/quote per RFC 5987/6266 (e.g.
filename*=UTF-8''...) or at least strip CR/LF and quotes before interpolation.
There was a problem hiding this comment.
Good point — will address filename sanitization in a follow-up.
- Return Python objects from Jinja file filters instead of JSON strings to prevent double-encoding with |tojson - Fix test key mismatch: use 'data' instead of 'content_base64' to match the actual implementation in output_providers and results - Use append semantics for file upload position (existing max + idx) to avoid ordering collisions across multiple upload requests - Fix migration docstring Revises to match down_revision
Guard against undefined prompt_id in TestDetailData and UpdateTest after making it optional in TestBase.
For new users, polyphemus_access is explicitly null in the JSONB
column, so dict.get() returns None instead of the default {}.
Follow existing RLS patterns by adding get_entity_files_max_position to the CRUD layer with organization_id filtering, replacing the inline query in the file router.
The websocket size limit tests were written for a 64KB limit but the router uses 10MB. Messages under 10MB passed the check, handle_message returned nothing, and receive_json() blocked forever, hanging the CI pipeline. Co-Authored-By: Claude Opus 4.6 <[email protected]>
MetricDataFactory was generating categorical metrics without categories and numeric metrics without min_score/max_score/threshold, causing 422 validation errors. Also fixed TopicDataFactory long_name edge case generating names shorter than the 100-char test assertion. Co-Authored-By: Claude Opus 4.6 <[email protected]>
When endpoint responses contain JSON objects, the Markdown component crashes because it expects a string. Coerce non-string content to a fenced JSON code block for proper rendering.
- Add SDK docs for the File entity (upload, download, delete) - Document metadata as a data source for custom metric evaluation - Update endpoint docs to reflect that metadata is available to metrics - Add metadata evaluation example to SDK single-turn metrics docs Co-Authored-By: Claude Opus 4.6 <[email protected]>
Purpose
Add comprehensive file attachment capabilities across the platform, enabling users to attach files (images, PDFs, audio, Excel, JSON) to tests as inputs, view file outputs in test results and traces, and upload files in the playground chat.
What Changed
Backend
Filetable with polymorphicentity_id/entity_typepattern for associating files with Tests, TestResults, and other entitiesFileAttachmentMixinfor models that support file attachments; improved lazy-load failure handling in relationship propertiestest_set_type_idon creationFrontend
MultiFileUploadcomponent: Drag-and-drop and click-to-upload with preview, validation, and removalFileAttachmentListcomponent: Display file chips with download supportuseFileshook: React hook for file CRUD operations against the files API clientapi-client/for file upload, download, list, and deleteMessageBubbletest-types.tsandscore-types.tsfor eliminating magic strings[DEBUG]prefix from API error logsSDK
Fileentity: New SDK entity for file CRUD with upload/download helpersTestentity: File attachment methods (attach_file,get_files,remove_file)TestResultentity: File attachment supportPenelope (Multi-turn agent)
Chatbot
/chatendpoint accepts file uploads via multipart form dataDocs
Tests
FileAttachmentList,MultiFileUpload,useFileshook, files client testsTesting
cd apps/backend && uv run pytest ../../tests/backend/routes/test_file.py ../../tests/backend/services/test_file_cascade.py ../../tests/backend/services/test_file_execution.py ../../tests/backend/services/test_file_integration.pycd sdk && uv run pytest ../tests/sdk/entities/test_file.py ../tests/sdk/integration/test_file.pycd apps/frontend && npm test -- --testPathPattern="(FileAttachmentList|MultiFileUpload|useFiles|files-client)"