Skip to content

Commit 4f0ad7b

Browse files
authored
Enhanced Market Research Integration with Multi-Source Analysis (#146)
1 parent 78a01e0 commit 4f0ad7b

File tree

61 files changed

+3391
-1500
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+3391
-1500
lines changed

.cursor/rules/general-cursor-project-rule.mdc

Lines changed: 8 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,13 @@ Uses Prisma's official migration workflow. Cleaner approach - lets Prisma handle
159159
Use `pnpm run db:migrate:` commands where possible.
160160
Migration naming: Provide `--name descriptive_name` to avoid interactive prompts. Example `pnpm run db:migrate:dev --name add_user_table`
161161

162+
### Database Migrations
163+
- Use Prisma's official migration workflow. Cleaner approach - lets Prisma handle everything
164+
- **CRITICAL**: Do NOT use `prisma db push` unless explicitly requested by the user. Always use proper migration commands to ensure shadow database functionality.
165+
- Use `pnpm run db:migrate:` commands where possible.
166+
- Migration naming: Provide `--name descriptive_name` to avoid interactive prompts. Example `pnpm run db:migrate:dev --name add_user_table`
167+
- If you run into Database timeout or advisory lock issues, just pause 30s, in order for the lock to clear, then continue.
168+
162169
**Shadow Database Requirement**: The project uses a schema-based shadow database (`betterai_shadow` schema) for migration validation. This ensures:
163170
- Safe migration validation before applying to main database
164171
- Proper schema drift detection
@@ -206,30 +213,10 @@ Privy: User authentication
206213
### Internal Services
207214
`generate-batch-predictions.ts`: Bulk prediction generation
208215
`generate-single-prediction.ts`: Individual market predictions
209-
`market-research-service.ts`: Web research for predictions
216+
`research-service-v2.ts`: Multi-source market research (Exa.ai + Grok)
210217
`prediction-checker.ts`: Validation and accuracy tracking
211218
`updatePolymarketEventsAndMarketData.ts`: Data synchronization
212219

213-
### Important API Endpoints
214-
215-
#### tRPC Endpoints (Primary)
216-
- `trpc.markets.list` - Unified market search/filtering with event context
217-
- `trpc.markets.getById` - Single market queries
218-
- `trpc.markets.trending` - Trending markets with event data
219-
- `trpc.events.list` - Event listings with optional market inclusion
220-
- `trpc.predictions.recent` - Recent predictions with pagination
221-
- `trpc.search.searchAll` - Unified search across markets, events, and tags
222-
223-
#### Legacy REST Endpoints (Maintained)
224-
`POST /api/predict` Generate AI prediction (authenticated)
225-
`POST /api/run-data-pipeline` Manual data pipeline trigger (authenticated)
226-
227-
### Cron Job Endpoints (Authenticated)
228-
All cron endpoints require `CRON_SECRET` authentication via `Authorization: Bearer` header:
229-
`GET /api/cron/daily-update-polymarket-data` Sync Polymarket events and markets (max 100 per request)
230-
`GET /api/cron/daily-generate-batch-predictions` Generate AI predictions for trending markets
231-
`GET /api/cron/prediction-check` Validate and score existing predictions
232-
`GET /api/cron/update-ai-models` Refresh available AI model list
233220

234221
Security Requirements:
235222
All cron endpoints are secured with `CRON_SECRET` environment variable

.env.example

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,9 @@ OPENROUTER_API_KEY=sk-or-your-openrouter-api-key-here
2020
OPENROUTER_SITE_URL=https://betterai.tools
2121
OPENROUTER_SITE_NAME=BetterAI
2222

23+
# Research Sources - API Keys
24+
EXA_API_KEY=your-exa-api-key-here
25+
2326
# Vercel KV (Redis) Configuration - Rate Limiting
2427
# These are automatically injected by Vercel when you provision a KV database
2528
# For local development, run: vercel env pull .env.local
@@ -82,11 +85,7 @@ PREDICTION_CHECK_LOOKBACK_DAYS=45
8285
BATCH_PREDICTIONS_END_RANGE_HOURS=48
8386
BATCH_PREDICTIONS_MODEL=google/gemini-2.5-flash-lite
8487

85-
# Web Search Configuration
86-
# IMPORTANT: Web search costs ~54x more per prediction due to search API fees
87-
#
88-
# Single/manual predictions web search control (set to 'true' to enable, anything else disables)
89-
SINGLE_PREDICTIONS_WEB_SEARCH=false
88+
9089
#
9190
# Batch predictions web search control (set to 'true' to enable, anything else disables)
9291
BATCH_PREDICTIONS_WEB_SEARCH=false

CLAUDE.md

Lines changed: 5 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -147,11 +147,11 @@ Validate all inputs and implement proper authentication
147147
148148
149149
### Database Migrations
150-
Uses Prisma's official migration workflow. Cleaner approach - lets Prisma handle everything
151-
**CRITICAL**: Do NOT use `prisma db push` unless explicitly requested by the user. Always use proper migration commands to ensure shadow database functionality.
152-
153-
Use `pnpm run db:migrate:` commands where possible.
154-
Migration naming: Provide `--name descriptive_name` to avoid interactive prompts. Example `pnpm run db:migrate:dev --name add_user_table`
150+
- Use Prisma's official migration workflow. Cleaner approach - lets Prisma handle everything
151+
- **CRITICAL**: Do NOT use `prisma db push` unless explicitly requested by the user. Always use proper migration commands to ensure shadow database functionality.
152+
- Use `pnpm run db:migrate:` commands where possible.
153+
- Migration naming: Provide `--name descriptive_name` to avoid interactive prompts. Example `pnpm run db:migrate:dev --name add_user_table`
154+
- If you run into Database timeout or advisory lock issues, just pause 30s, in order for the lock to clear, then continue.
155155
156156
**Shadow Database Requirement**: The project uses a schema-based shadow database (`betterai_shadow` schema) for migration validation. This ensures:
157157
- Safe migration validation before applying to main database
@@ -197,33 +197,6 @@ Polymarket API: Market and event data (via `polymarket-client.ts`)
197197
OpenRouter API: AI model access (via `openrouter-client.ts`)
198198
Privy: User authentication
199199
200-
### Internal Services
201-
`generate-batch-predictions.ts`: Bulk prediction generation
202-
`generate-single-prediction.ts`: Individual market predictions
203-
`market-research-service.ts`: Web research for predictions
204-
`prediction-checker.ts`: Validation and accuracy tracking
205-
`updatePolymarketEventsAndMarketData.ts`: Data synchronization
206-
207-
### Important API Endpoints
208-
209-
#### tRPC Endpoints (Primary)
210-
- `trpc.markets.list` - Unified market search/filtering with event context
211-
- `trpc.markets.getById` - Single market queries
212-
- `trpc.markets.trending` - Trending markets with event data
213-
- `trpc.events.list` - Event listings with optional market inclusion
214-
- `trpc.predictions.recent` - Recent predictions with pagination
215-
- `trpc.search.searchAll` - Unified search across markets, events, and tags
216-
217-
#### Legacy REST Endpoints (Maintained)
218-
`POST /api/predict` Generate AI prediction (authenticated)
219-
`POST /api/run-data-pipeline` Manual data pipeline trigger (authenticated)
220-
221-
### Cron Job Endpoints (Authenticated)
222-
All cron endpoints require `CRON_SECRET` authentication via `Authorization: Bearer` header:
223-
`GET /api/cron/daily-update-polymarket-data` Sync Polymarket events and markets (max 100 per request)
224-
`GET /api/cron/daily-generate-batch-predictions` Generate AI predictions for trending markets
225-
`GET /api/cron/prediction-check` Validate and score existing predictions
226-
`GET /api/cron/update-ai-models` Refresh available AI model list
227200
228201
Security Requirements:
229202
All cron endpoints are secured with `CRON_SECRET` environment variable

PLAN_EXA_ENHANCED.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Exa Research Enhancement Plan
2+
3+
## Overview
4+
Task plan for implementing enhanced Exa.ai content retrieval with robust two-step approach beyond the basic drop-in fix.
5+
6+
## Option 2: Two-Step Approach for Production Reliability
7+
8+
### Task Breakdown
9+
10+
#### Phase 2A: Core Two-Step Implementation
11+
- [ ] Create new function `performExaResearchTwoStep()` in research-service-v2.ts
12+
- [ ] Implement Step 1: Search-only API call to `/search` endpoint
13+
- Remove `text` and `highlights` parameters
14+
- Focus on getting high-quality URLs only
15+
- Limit to top 10 URLs for performance
16+
- [ ] Implement Step 2: Content retrieval via `/contents` endpoint
17+
- Add `maxCharacters: 8000` for content size control
18+
- Include `text`, `highlights`, and optional `summary` parameters
19+
- Add `livecrawlTimeout: 8000` for fresh content when needed
20+
- [ ] **Response JSON Structure Optimization:**
21+
- Save `relevant_information` (content) first in response
22+
- Limit `links` array to maximum 10 URLs
23+
- Position `links` array at bottom of JSON structure
24+
- Ensure content takes priority over metadata in response ordering
25+
26+
#### Phase 2B: Error Handling & Resilience
27+
- [ ] Parse `statuses[]` array from `/contents` response
28+
- [ ] Implement per-URL error tracking and logging
29+
- [ ] Create graceful fallback when individual URLs fail:
30+
- Skip failed URLs rather than failing entire request
31+
- Log failed URLs with error codes for debugging
32+
- Ensure minimum content threshold (e.g., at least 3/10 URLs succeed)
33+
34+
#### Phase 2C: Performance Optimization
35+
- [ ] Add timeout handling for both API calls
36+
- [ ] Implement retry logic for failed URLs (max 2 retries)
37+
- [ ] Add content quality validation (minimum character length, relevance check)
38+
- [ ] Cache failed URLs temporarily to avoid re-fetching
39+
40+
### Implementation Notes
41+
- Maintain backward compatibility with existing `performExaResearch()`
42+
- Add comprehensive error logging for production debugging
43+
- Consider adding metrics/telemetry for success rates
44+
45+
---
46+
47+
## Implementation Priority
48+
1. **Option 2** (Two-Step) - More robust, production-ready approach
49+
50+
## Data Structure Requirements
51+
52+
### ResearchResult JSON Format (Optimized)
53+
```typescript
54+
interface ResearchResult {
55+
// CONTENT FIRST - Primary research data
56+
relevant_information: string // Main content - appears first
57+
58+
// METADATA - Secondary information
59+
source: string
60+
timestamp: Date
61+
confidence_score?: number
62+
sentiment_analysis?: string // Grok only
63+
key_accounts?: string[] // Grok only
64+
65+
// LINKS LAST - Maximum 10 URLs at bottom of structure
66+
links: string[] // Limited to 10, positioned last
67+
}
68+
```
69+
70+
### Implementation Requirements
71+
- **Content Priority**: `relevant_information` must be first property
72+
- **Link Limitation**: Maximum 10 URLs in `links` array
73+
- **JSON Ordering**: Links positioned as last property in response
74+
- **Performance**: Smaller JSON payloads improve caching and network transfer
75+
76+
## Success Metrics
77+
- Content quality: Average character count per result
78+
- Reliability: Success rate of content retrieval
79+
- Performance: Average response time per mode
80+
- **Data efficiency**: Response size reduction with 10-link limit
81+
- User satisfaction: Prediction accuracy improvements
82+
83+
## Risk Considerations
84+
- API rate limiting with two-step approach (2x API calls)
85+
- Increased complexity in error handling and debugging
86+
- Potential performance regression for simple markets
87+
- **Link truncation**: May lose valuable sources beyond top 10
88+
- Need for comprehensive testing across market types

0 commit comments

Comments
 (0)