better-labs
diff --git a/‎.cursor/rules/general-cursor-project-rule.mdc‎
Lines changed: 8 additions & 21 deletions b/‎.cursor/rules/general-cursor-project-rule.mdc‎
Lines changed: 8 additions & 21 deletions
diff --git a/‎.env.example‎
Lines changed: 4 additions & 5 deletions b/‎.env.example‎
Lines changed: 4 additions & 5 deletions
diff --git a/‎CLAUDE.md‎
Lines changed: 5 additions & 32 deletions b/‎CLAUDE.md‎
Lines changed: 5 additions & 32 deletions
diff --git a/‎PLAN_EXA_ENHANCED.md‎
Lines changed: 88 additions & 0 deletions b/‎PLAN_EXA_ENHANCED.md‎
Lines changed: 88 additions & 0 deletions
@@ -159,6 +159,13 @@ Uses Prisma's official migration workflow. Cleaner approach - lets Prisma handle
 Use `pnpm run db:migrate:` commands where possible.
 Migration naming: Provide `--name descriptive_name` to avoid interactive prompts. Example `pnpm run db:migrate:dev --name add_user_table`
 
+### Database Migrations
+- Use Prisma's official migration workflow. Cleaner approach - lets Prisma handle everything
+- **CRITICAL**: Do NOT use `prisma db push` unless explicitly requested by the user. Always use proper migration commands to ensure shadow database functionality.
+- Use `pnpm run db:migrate:` commands where possible.
+- Migration naming: Provide `--name descriptive_name` to avoid interactive prompts. Example `pnpm run db:migrate:dev --name add_user_table`
+- If you run into Database timeout or advisory lock issues, just pause 30s, in order for the lock to clear, then continue.
+
 **Shadow Database Requirement**: The project uses a schema-based shadow database (`betterai_shadow` schema) for migration validation. This ensures:
 - Safe migration validation before applying to main database
 - Proper schema drift detection
@@ -206,30 +213,10 @@ Privy: User authentication
 ### Internal Services
 `generate-batch-predictions.ts`: Bulk prediction generation
 `generate-single-prediction.ts`: Individual market predictions
-`market-research-service.ts`: Web research for predictions
+`research-service-v2.ts`: Multi-source market research (Exa.ai + Grok)
 `prediction-checker.ts`: Validation and accuracy tracking
 `updatePolymarketEventsAndMarketData.ts`: Data synchronization
 
-### Important API Endpoints
-
-#### tRPC Endpoints (Primary)
-- `trpc.markets.list` - Unified market search/filtering with event context
-- `trpc.markets.getById` - Single market queries  
-- `trpc.markets.trending` - Trending markets with event data
-- `trpc.events.list` - Event listings with optional market inclusion
-- `trpc.predictions.recent` - Recent predictions with pagination
-- `trpc.search.searchAll` - Unified search across markets, events, and tags
-
-#### Legacy REST Endpoints (Maintained)
-`POST /api/predict` Generate AI prediction (authenticated)
-`POST /api/run-data-pipeline` Manual data pipeline trigger (authenticated)
-
-### Cron Job Endpoints (Authenticated)
-All cron endpoints require `CRON_SECRET` authentication via `Authorization: Bearer` header:
-`GET /api/cron/daily-update-polymarket-data` Sync Polymarket events and markets (max 100 per request)
-`GET /api/cron/daily-generate-batch-predictions` Generate AI predictions for trending markets
-`GET /api/cron/prediction-check` Validate and score existing predictions
-`GET /api/cron/update-ai-models` Refresh available AI model list
 
 Security Requirements:
 All cron endpoints are secured with `CRON_SECRET` environment variable
 
@@ -20,6 +20,9 @@ OPENROUTER_API_KEY=sk-or-your-openrouter-api-key-here
 OPENROUTER_SITE_URL=https://betterai.tools
 OPENROUTER_SITE_NAME=BetterAI
 
+# Research Sources - API Keys
+EXA_API_KEY=your-exa-api-key-here
+
 # Vercel KV (Redis) Configuration - Rate Limiting
 # These are automatically injected by Vercel when you provision a KV database
 # For local development, run: vercel env pull .env.local
@@ -82,11 +85,7 @@ PREDICTION_CHECK_LOOKBACK_DAYS=45
 BATCH_PREDICTIONS_END_RANGE_HOURS=48
 BATCH_PREDICTIONS_MODEL=google/gemini-2.5-flash-lite
 
-# Web Search Configuration  
-# IMPORTANT: Web search costs ~54x more per prediction due to search API fees
-# 
-# Single/manual predictions web search control (set to 'true' to enable, anything else disables)
-SINGLE_PREDICTIONS_WEB_SEARCH=false
+
 #
 # Batch predictions web search control (set to 'true' to enable, anything else disables)
 BATCH_PREDICTIONS_WEB_SEARCH=false
 
@@ -147,11 +147,11 @@ Validate all inputs and implement proper authentication
 
 
 ### Database Migrations
-Uses Prisma's official migration workflow. Cleaner approach - lets Prisma handle everything
-**CRITICAL**: Do NOT use `prisma db push` unless explicitly requested by the user. Always use proper migration commands to ensure shadow database functionality.
-
-Use `pnpm run db:migrate:` commands where possible.
-Migration naming: Provide `--name descriptive_name` to avoid interactive prompts. Example `pnpm run db:migrate:dev --name add_user_table`
+- Use Prisma's official migration workflow. Cleaner approach - lets Prisma handle everything
+- **CRITICAL**: Do NOT use `prisma db push` unless explicitly requested by the user. Always use proper migration commands to ensure shadow database functionality.
+- Use `pnpm run db:migrate:` commands where possible.
+- Migration naming: Provide `--name descriptive_name` to avoid interactive prompts. Example `pnpm run db:migrate:dev --name add_user_table`
+- If you run into Database timeout or advisory lock issues, just pause 30s, in order for the lock to clear, then continue.
 
 **Shadow Database Requirement**: The project uses a schema-based shadow database (`betterai_shadow` schema) for migration validation. This ensures:
 - Safe migration validation before applying to main database
@@ -197,33 +197,6 @@ Polymarket API: Market and event data (via `polymarket-client.ts`)
 OpenRouter API: AI model access (via `openrouter-client.ts`)
 Privy: User authentication
 
-### Internal Services
-`generate-batch-predictions.ts`: Bulk prediction generation
-`generate-single-prediction.ts`: Individual market predictions
-`market-research-service.ts`: Web research for predictions
-`prediction-checker.ts`: Validation and accuracy tracking
-`updatePolymarketEventsAndMarketData.ts`: Data synchronization
-
-### Important API Endpoints
-
-#### tRPC Endpoints (Primary)
-- `trpc.markets.list` - Unified market search/filtering with event context
-- `trpc.markets.getById` - Single market queries  
-- `trpc.markets.trending` - Trending markets with event data
-- `trpc.events.list` - Event listings with optional market inclusion
-- `trpc.predictions.recent` - Recent predictions with pagination
-- `trpc.search.searchAll` - Unified search across markets, events, and tags
-
-#### Legacy REST Endpoints (Maintained)
-`POST /api/predict` Generate AI prediction (authenticated)
-`POST /api/run-data-pipeline` Manual data pipeline trigger (authenticated)
-
-### Cron Job Endpoints (Authenticated)
-All cron endpoints require `CRON_SECRET` authentication via `Authorization: Bearer` header:
-`GET /api/cron/daily-update-polymarket-data` Sync Polymarket events and markets (max 100 per request)
-`GET /api/cron/daily-generate-batch-predictions` Generate AI predictions for trending markets
-`GET /api/cron/prediction-check` Validate and score existing predictions
-`GET /api/cron/update-ai-models` Refresh available AI model list
 
 Security Requirements:
 All cron endpoints are secured with `CRON_SECRET` environment variable
 
@@ -0,0 +1,88 @@
+# Exa Research Enhancement Plan
+
+## Overview
+Task plan for implementing enhanced Exa.ai content retrieval with robust two-step approach beyond the basic drop-in fix.
+
+## Option 2: Two-Step Approach for Production Reliability
+
+### Task Breakdown
+
+#### Phase 2A: Core Two-Step Implementation
+- [ ] Create new function `performExaResearchTwoStep()` in research-service-v2.ts
+- [ ] Implement Step 1: Search-only API call to `/search` endpoint
+  - Remove `text` and `highlights` parameters 
+  - Focus on getting high-quality URLs only
+  - Limit to top 10 URLs for performance
+- [ ] Implement Step 2: Content retrieval via `/contents` endpoint
+  - Add `maxCharacters: 8000` for content size control
+  - Include `text`, `highlights`, and optional `summary` parameters
+  - Add `livecrawlTimeout: 8000` for fresh content when needed
+- [ ] **Response JSON Structure Optimization:**
+  - Save `relevant_information` (content) first in response
+  - Limit `links` array to maximum 10 URLs 
+  - Position `links` array at bottom of JSON structure
+  - Ensure content takes priority over metadata in response ordering
+
+#### Phase 2B: Error Handling & Resilience  
+- [ ] Parse `statuses[]` array from `/contents` response
+- [ ] Implement per-URL error tracking and logging
+- [ ] Create graceful fallback when individual URLs fail:
+  - Skip failed URLs rather than failing entire request
+  - Log failed URLs with error codes for debugging
+  - Ensure minimum content threshold (e.g., at least 3/10 URLs succeed)
+
+#### Phase 2C: Performance Optimization
+- [ ] Add timeout handling for both API calls
+- [ ] Implement retry logic for failed URLs (max 2 retries)
+- [ ] Add content quality validation (minimum character length, relevance check)
+- [ ] Cache failed URLs temporarily to avoid re-fetching
+
+### Implementation Notes
+- Maintain backward compatibility with existing `performExaResearch()`
+- Add comprehensive error logging for production debugging
+- Consider adding metrics/telemetry for success rates
+
+---
+
+## Implementation Priority
+1. **Option 2** (Two-Step) - More robust, production-ready approach
+
+## Data Structure Requirements
+
+### ResearchResult JSON Format (Optimized)
+```typescript
+interface ResearchResult {
+  // CONTENT FIRST - Primary research data
+  relevant_information: string        // Main content - appears first
+  
+  // METADATA - Secondary information  
+  source: string
+  timestamp: Date
+  confidence_score?: number
+  sentiment_analysis?: string        // Grok only
+  key_accounts?: string[]           // Grok only
+  
+  // LINKS LAST - Maximum 10 URLs at bottom of structure
+  links: string[]                   // Limited to 10, positioned last
+}
+```
+
+### Implementation Requirements
+- **Content Priority**: `relevant_information` must be first property
+- **Link Limitation**: Maximum 10 URLs in `links` array
+- **JSON Ordering**: Links positioned as last property in response
+- **Performance**: Smaller JSON payloads improve caching and network transfer
+
+## Success Metrics
+- Content quality: Average character count per result
+- Reliability: Success rate of content retrieval
+- Performance: Average response time per mode
+- **Data efficiency**: Response size reduction with 10-link limit
+- User satisfaction: Prediction accuracy improvements
+
+## Risk Considerations
+- API rate limiting with two-step approach (2x API calls)
+- Increased complexity in error handling and debugging
+- Potential performance regression for simple markets
+- **Link truncation**: May lose valuable sources beyond top 10
+- Need for comprehensive testing across market types