An AI-enabled karaoke analyzer and recommendation engine that helps singers understand their vocal performance and discover songs that match their voice.
DuckJams analyzes recorded singing performances to measure pitch accuracy, vocal range, and performance quality. By comparing a singer's performance to reference tracks, DuckJams provides actionable insights and personalized song recommendations.
The initial release focuses on core analysis capabilities:
- Pitch Analysis: Measure pitch accuracy and stability in recorded performances
- Key Detection: Identify the musical key of reference songs and compare to performances
- Direct Comparison: Compare a karaoke cover to the original reference song
- Basic Scoring: Calculate accuracy metrics showing how well a performance matches the reference
- Simple Web Interface: Browser-based audio analysis with URL input for both reference and cover tracks
Future enhancements will expand into a comprehensive vocal analysis and recommendation platform:
- Vocal Range Detection: Determine a singer's comfortable tessitura and full vocal range
- Artist Comparison: Compare performances to specific reference artists and styles
- Song Fit Scoring: Score how well specific songs fit a user's voice characteristics
- Personalized Recommendations: Recommend songs and keys that match the singer's vocal profile
- Key Frame Analysis: Analyze song sections and identify challenging passages
- Vector Database Integration: Store and compare favorite lists to find similar vocal profiles
- User Profiles: Track progress over time and build a personal vocal profile
- Song Library: Integrated library of reference tracks for comparison
Core Framework:
- FastAPI: Modern Python web framework for building the API
- Python 3.9+: Primary backend language
Audio Analysis:
- librosa: Audio analysis, key detection, and pitch tracking
- numpy: Numerical operations and array processing
- scipy: Advanced signal processing (optional)
Data & Validation:
- pydantic: Data validation and settings management
- httpx or requests: HTTP client for downloading audio files from URLs
Future ML/AI:
- Vector database (e.g., Pinecone, Weaviate, or Qdrant) for recommendation engine
- Potential ML models for advanced vocal analysis (to be determined based on MVP learnings)
Core Framework:
- React: UI framework for the web application
- JavaScript/TypeScript: Frontend language
HTTP & API:
- axios: HTTP client for API requests to FastAPI backend
Visualization:
- recharts or chart.js: Data visualization for pitch comparisons and analysis results
Build Tools:
- Vite or Create React App: Build tooling and development server
- AWS S3: Object storage for audio files and analysis results
- AWS Lambda: Serverless functions for event-driven processing (future)
- Docker: Containerization for deployment
- PostgreSQL or MongoDB: Database for user profiles and song metadata (future)
DuckJams/
├── backend/ # Python FastAPI backend
│ ├── src/
│ │ ├── main.py # FastAPI application
│ │ ├── audio_analyzer.py # Core audio analysis logic
│ │ ├── key_detector.py # Key detection using librosa
│ │ ├── pitch_analyzer.py # Pitch analysis using librosa
│ │ ├── scorer.py # Comparison and scoring logic
│ │ └── models.py # Data models (Pydantic)
│ ├── requirements.txt
│ └── .env.example
├── frontend/ # React frontend
│ ├── src/
│ │ ├── App.jsx
│ │ ├── components/
│ │ │ ├── AudioInput.jsx # URL input form
│ │ │ └── Results.jsx # Analysis results display
│ │ └── services/
│ │ └── apiService.js # API client for FastAPI
│ ├── package.json
│ └── vite.config.js
└── docs/ # Documentation
├── PLAN.md # Original project plan
└── IMPLEMENTATION_PLAN.md # Detailed implementation roadmap
DuckJams is being built with a focus on:
- MVP-First: Start simple, iterate based on real usage
- Explainable Metrics: Prioritize understandable analysis over black-box ML
- Incremental Complexity: Simple scoring models first, advanced ML later
- Fast Feedback Loops: Quick iterations and clear milestones
- Cloud-Agnostic Early: Avoid vendor lock-in initially, AWS-friendly later
- Python 3.9 or higher
- Node.js 16+ and npm/yarn
- FFmpeg (for audio processing with librosa)
cd backend
pip install -r requirements.txtcd frontend
npm installSee docs/IMPLEMENTATION_PLAN.md for detailed implementation steps.
🚧 In Development - Currently in planning and initial setup phase. Starting with Librosa tutorials to build foundational audio analysis capabilities.
See LICENSE file for details.
This project is being developed live on stream. Contributions and feedback are welcome!