Fully Open Source Personalized Content Streamer
- Scaffold monorepo (backend, frontend, connectors)
- Docker Compose environment
- Postgres + Redis + MinIO
- Implement 3 connectors (RSS, YouTube, newsletters via Mailgun webhook or IMAP parsing)
- Parse → clean → store core schema + raw content
- Add transcript extraction for YouTube/podcasts
- Hook in embedding model, store vectors in pgvector
- Build simple React UI showing daily digest from last 24h
- Add Obsidian export endpoint
- Add explicit feedback API (save, highlight + reason)
- Basic ranking: recency + source weight + embedding similarity
- Add Celery tasks
- Scheduled digest generation
- Analytics dashboard
- Begin gathering feedback for Phase 1 ML
- RLHF too early → wait for high-signal feedback (≥ 500 explicit examples)
- Over-indexing content → prune old vectors, compress, or store reduced embeddings
- Connector maintenance → build adapters + monitoring/tests per connector
- Initialize repo + Docker Compose (Postgres, Redis, MinIO)
- Define DB schema (users, sources, items, interactions, feedback)
- Implement RSS & YouTube fetchers + HTML → Markdown parser
- Add sentence-transformers embedding job and pgvector integration
- Build FastAPI endpoints for digest and feedback
- Minimal React UI to view digest, mark likes/highlights, export to Obsidian
