|
| 1 | +# VMEvalKit Web Dashboard |
| 2 | + |
| 3 | +A modern web application for visualizing and exploring video generation results from VMEvalKit experiments. |
| 4 | + |
| 5 | +## Quick Start |
| 6 | + |
| 7 | +```bash |
| 8 | +cd web |
| 9 | +./start.sh |
| 10 | +``` |
| 11 | + |
| 12 | +Then open your browser to: **http://localhost:5000** |
| 13 | + |
| 14 | +## Overview |
| 15 | + |
| 16 | +The web dashboard provides an intuitive interface to: |
| 17 | + |
| 18 | +- 📊 View aggregate statistics across all experiments |
| 19 | +- 🤖 Analyze performance by model |
| 20 | +- 🧠 Explore results by reasoning domain |
| 21 | +- 📝 Compare how different models tackle the same task |
| 22 | +- ⚖️ Side-by-side comparison matrix |
| 23 | +- 🎬 Watch generated videos directly in the browser |
| 24 | + |
| 25 | +## Features |
| 26 | + |
| 27 | +### Dashboard Home (`/`) |
| 28 | + |
| 29 | +The main dashboard displays: |
| 30 | +- **Overview Statistics**: Total inferences, models tested, success rates |
| 31 | +- **Model Performance Table**: Success rates, average duration, domains covered |
| 32 | +- **Domain Cards**: Statistics for each reasoning domain (Chess, Maze, Raven, Rotation, Sudoku) |
| 33 | +- **Recent Results**: Grid of latest generated videos |
| 34 | + |
| 35 | +### Model View (`/model/<model_name>`) |
| 36 | + |
| 37 | +Detailed view for a specific model showing: |
| 38 | +- Performance breakdown by domain |
| 39 | +- All generated videos |
| 40 | +- Success/failure statistics |
| 41 | +- Generation duration metrics |
| 42 | + |
| 43 | +### Domain View (`/domain/<domain_name>`) |
| 44 | + |
| 45 | +View all results for a reasoning domain: |
| 46 | +- Performance breakdown by model |
| 47 | +- All task results |
| 48 | +- Domain-specific statistics |
| 49 | + |
| 50 | +### Task View (`/task/<task_id>`) |
| 51 | + |
| 52 | +Compare how different models performed on the same task: |
| 53 | +- Side-by-side video comparison |
| 54 | +- Input image (first frame) |
| 55 | +- Generated video (model output) |
| 56 | +- Target image (final frame) |
| 57 | +- Prompt and metadata |
| 58 | +- Generation time and status |
| 59 | + |
| 60 | +### Comparison Matrix (`/compare`) |
| 61 | + |
| 62 | +Grid view showing all tasks × all models: |
| 63 | +- Interactive video grid |
| 64 | +- Play/pause controls |
| 65 | +- Quick visual comparison |
| 66 | +- Duration overlays |
| 67 | + |
| 68 | +## Architecture |
| 69 | + |
| 70 | +### Backend (Flask) |
| 71 | + |
| 72 | +``` |
| 73 | +web/ |
| 74 | +├── app.py # Main Flask application |
| 75 | +│ # Routes: /, /model, /domain, /task, /compare |
| 76 | +│ # API: /api/results, /api/statistics |
| 77 | +│ # Media: /video, /image |
| 78 | +└── utils/ |
| 79 | + └── data_loader.py # Scans output folders and loads metadata |
| 80 | +``` |
| 81 | + |
| 82 | +### Frontend |
| 83 | + |
| 84 | +``` |
| 85 | +web/ |
| 86 | +├── templates/ # Jinja2 HTML templates |
| 87 | +│ ├── base.html # Base layout with navbar |
| 88 | +│ ├── index.html # Dashboard overview |
| 89 | +│ ├── model.html # Model-specific view |
| 90 | +│ ├── domain.html # Domain-specific view |
| 91 | +│ ├── task.html # Task comparison |
| 92 | +│ ├── compare.html # Comparison matrix |
| 93 | +│ └── error.html # Error page |
| 94 | +└── static/ |
| 95 | + ├── css/ |
| 96 | + │ └── style.css # Modern dark theme with gradients |
| 97 | + └── js/ |
| 98 | + └── main.js # Interactive features |
| 99 | +``` |
| 100 | + |
| 101 | +## Data Source |
| 102 | + |
| 103 | +The dashboard automatically scans the `data/outputs/` directory structure: |
| 104 | + |
| 105 | +``` |
| 106 | +data/outputs/ |
| 107 | +└── {model}/ |
| 108 | + └── {domain}_task/ |
| 109 | + └── {task_id}/ |
| 110 | + └── {run_id}/ |
| 111 | + ├── video/ |
| 112 | + │ └── generated_video.mp4 |
| 113 | + ├── question/ |
| 114 | + │ ├── first_frame.png |
| 115 | + │ ├── final_frame.png |
| 116 | + │ ├── prompt.txt |
| 117 | + │ └── question_metadata.json |
| 118 | + └── metadata.json |
| 119 | +``` |
| 120 | + |
| 121 | +Each inference folder is parsed to extract: |
| 122 | +- Model name and parameters |
| 123 | +- Success/failure status |
| 124 | +- Generation duration |
| 125 | +- Video path |
| 126 | +- Input/output images |
| 127 | +- Prompt text |
| 128 | +- Task metadata |
| 129 | + |
| 130 | +## API Endpoints |
| 131 | + |
| 132 | +### GET `/api/results` |
| 133 | + |
| 134 | +Get all inference results as JSON. |
| 135 | + |
| 136 | +**Query Parameters:** |
| 137 | +- `model` - Filter by model name |
| 138 | +- `domain` - Filter by domain name |
| 139 | +- `task_id` - Filter by task ID |
| 140 | + |
| 141 | +**Example:** |
| 142 | +```bash |
| 143 | +curl http://localhost:5000/api/results?model=luma-ray-2&domain=chess |
| 144 | +``` |
| 145 | + |
| 146 | +**Response:** |
| 147 | +```json |
| 148 | +{ |
| 149 | + "total": 15, |
| 150 | + "results": [ |
| 151 | + { |
| 152 | + "run_id": "luma-ray-2_chess_0001_...", |
| 153 | + "model": "luma-ray-2", |
| 154 | + "domain": "chess", |
| 155 | + "task_id": "chess_0001", |
| 156 | + "success": true, |
| 157 | + "duration_seconds": 42.3, |
| 158 | + "video_path": "...", |
| 159 | + "timestamp": "2025-10-18T..." |
| 160 | + } |
| 161 | + ] |
| 162 | +} |
| 163 | +``` |
| 164 | + |
| 165 | +### GET `/api/statistics` |
| 166 | + |
| 167 | +Get aggregate statistics. |
| 168 | + |
| 169 | +**Response:** |
| 170 | +```json |
| 171 | +{ |
| 172 | + "models": { |
| 173 | + "luma-ray-2": { |
| 174 | + "total": 75, |
| 175 | + "success": 68, |
| 176 | + "failed": 7, |
| 177 | + "success_rate": 90.7, |
| 178 | + "avg_duration": 38.5, |
| 179 | + "domains": ["chess", "maze", "raven", "rotation", "sudoku"] |
| 180 | + } |
| 181 | + }, |
| 182 | + "domains": { |
| 183 | + "chess": { |
| 184 | + "total": 90, |
| 185 | + "success": 82, |
| 186 | + "failed": 8, |
| 187 | + "success_rate": 91.1, |
| 188 | + "models": ["luma-ray-2", "veo-3.0-generate", ...] |
| 189 | + } |
| 190 | + }, |
| 191 | + "total_inferences": 450 |
| 192 | +} |
| 193 | +``` |
| 194 | + |
| 195 | +## Installation |
| 196 | + |
| 197 | +### Option 1: Use Main venv (Recommended) |
| 198 | + |
| 199 | +```bash |
| 200 | +cd web |
| 201 | +source ../venv/bin/activate |
| 202 | +pip install -r requirements.txt |
| 203 | +python app.py |
| 204 | +``` |
| 205 | + |
| 206 | +### Option 2: Separate venv |
| 207 | + |
| 208 | +```bash |
| 209 | +cd web |
| 210 | +python -m venv venv |
| 211 | +source venv/bin/activate |
| 212 | +pip install -r requirements.txt |
| 213 | +python app.py |
| 214 | +``` |
| 215 | + |
| 216 | +## Deployment |
| 217 | + |
| 218 | +### Development |
| 219 | + |
| 220 | +```bash |
| 221 | +python app.py |
| 222 | +# Runs on http://localhost:5000 |
| 223 | +``` |
| 224 | + |
| 225 | +### Production (Gunicorn) |
| 226 | + |
| 227 | +```bash |
| 228 | +gunicorn --bind 0.0.0.0:5000 --workers 4 app:app |
| 229 | +``` |
| 230 | + |
| 231 | +### Environment Variables |
| 232 | + |
| 233 | +No environment variables required - the dashboard uses relative paths to find the output directory. |
| 234 | + |
| 235 | +## Design |
| 236 | + |
| 237 | +### Modern Dark Theme |
| 238 | + |
| 239 | +- **Color Palette**: |
| 240 | + - Primary: Blue (#2563eb) |
| 241 | + - Secondary: Purple (#7c3aed) |
| 242 | + - Success: Green (#10b981) |
| 243 | + - Warning: Orange (#f59e0b) |
| 244 | + - Danger: Red (#ef4444) |
| 245 | + |
| 246 | +- **Layout**: Responsive grid system |
| 247 | +- **Typography**: System fonts for fast loading |
| 248 | +- **Icons**: Emoji for universal support |
| 249 | +- **Animations**: Smooth transitions and hover effects |
| 250 | + |
| 251 | +### Responsive Design |
| 252 | + |
| 253 | +- Desktop: Multi-column grids |
| 254 | +- Tablet: Adaptive layouts |
| 255 | +- Mobile: Single-column stacks |
| 256 | + |
| 257 | +## Browser Support |
| 258 | + |
| 259 | +- ✅ Chrome/Edge (full support) |
| 260 | +- ✅ Firefox (full support) |
| 261 | +- ✅ Safari (full support) |
| 262 | +- ✅ Mobile browsers (responsive) |
| 263 | + |
| 264 | +## Performance |
| 265 | + |
| 266 | +- Lazy loading for videos |
| 267 | +- Metadata caching |
| 268 | +- Efficient directory scanning |
| 269 | +- Progressive loading |
| 270 | + |
| 271 | +## Troubleshooting |
| 272 | + |
| 273 | +### Videos Not Loading |
| 274 | + |
| 275 | +1. Check output directory path |
| 276 | +2. Verify video files exist |
| 277 | +3. Check file permissions |
| 278 | +4. Ensure MP4 format |
| 279 | + |
| 280 | +### Port Already in Use |
| 281 | + |
| 282 | +Change port in `app.py`: |
| 283 | +```python |
| 284 | +app.run(debug=True, host='0.0.0.0', port=5001) |
| 285 | +``` |
| 286 | + |
| 287 | +### No Data Displayed |
| 288 | + |
| 289 | +1. Run some inferences first: |
| 290 | + ```bash |
| 291 | + python examples/experiment_2025-10-14.py |
| 292 | + ``` |
| 293 | +2. Verify outputs exist in `data/outputs/` |
| 294 | + |
| 295 | +## Development |
| 296 | + |
| 297 | +### Adding New Views |
| 298 | + |
| 299 | +1. Create route in `app.py`: |
| 300 | + ```python |
| 301 | + @app.route('/myview') |
| 302 | + def my_view(): |
| 303 | + return render_template('myview.html') |
| 304 | + ``` |
| 305 | + |
| 306 | +2. Create template in `templates/myview.html` |
| 307 | +3. Add navigation link in `templates/base.html` |
| 308 | + |
| 309 | +### Styling |
| 310 | + |
| 311 | +All styles are in `static/css/style.css` using CSS variables for easy theming. |
| 312 | + |
| 313 | +### JavaScript |
| 314 | + |
| 315 | +Interactive features in `static/js/main.js`: |
| 316 | +- Video player enhancements |
| 317 | +- Lazy loading |
| 318 | +- Search/filter |
| 319 | +- Keyboard shortcuts |
| 320 | + |
| 321 | +## Future Enhancements |
| 322 | + |
| 323 | +- [ ] Real-time updates via WebSocket |
| 324 | +- [ ] Advanced filtering and search |
| 325 | +- [ ] Export to CSV/JSON |
| 326 | +- [ ] Video quality metrics |
| 327 | +- [ ] User authentication |
| 328 | +- [ ] Docker containerization |
| 329 | +- [ ] Caching layer for performance |
| 330 | + |
| 331 | +## Integration with VMEvalKit |
| 332 | + |
| 333 | +The dashboard is a standalone app but tightly integrated: |
| 334 | + |
| 335 | +1. **Data Flow**: Reads from VMEvalKit's structured output folders |
| 336 | +2. **No Modification**: Doesn't modify any experiment data |
| 337 | +3. **Real-time**: Reflects latest experiments automatically |
| 338 | +4. **Metadata**: Uses VMEvalKit's metadata format |
| 339 | + |
| 340 | +## Contributing |
| 341 | + |
| 342 | +To contribute to the web dashboard: |
| 343 | + |
| 344 | +1. Follow VMEvalKit's contribution guidelines |
| 345 | +2. Test on multiple browsers |
| 346 | +3. Ensure responsive design |
| 347 | +4. Update documentation |
| 348 | + |
| 349 | +## License |
| 350 | + |
| 351 | +Same as VMEvalKit main project (Apache 2.0). |
| 352 | + |
| 353 | +--- |
| 354 | + |
| 355 | +For more information, see the [main README](../README.md) or `web/README.md`. |
| 356 | + |
0 commit comments