Skip to content

Commit e6d79e1

Browse files
authored
Merge pull request #12 from hokindeng/dev
Dev
2 parents 5c3d243 + e609260 commit e6d79e1

File tree

18 files changed

+2504
-0
lines changed

18 files changed

+2504
-0
lines changed

README.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,24 @@ output/<model>_<question_id>_<timestamp>/
141141

142142
This structure ensures reproducibility and makes batch analysis easy.
143143

144+
## Web Dashboard 🎨
145+
146+
Visualize your results with the built-in web dashboard:
147+
148+
```bash
149+
cd web
150+
./start.sh
151+
# Open http://localhost:5000
152+
```
153+
154+
Features:
155+
- 📊 Overview statistics and model performance
156+
- 🎬 Video playback and comparison
157+
- 🧠 Domain and task analysis
158+
- ⚖️ Side-by-side model comparison
159+
160+
See [docs/WEB_DASHBOARD.md](docs/WEB_DASHBOARD.md) for details.
161+
144162
## Examples
145163

146164
See `examples/experiment_2025-10-14.py` for sequential inference across multiple models.

docs/WEB_DASHBOARD.md

Lines changed: 356 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,356 @@
1+
# VMEvalKit Web Dashboard
2+
3+
A modern web application for visualizing and exploring video generation results from VMEvalKit experiments.
4+
5+
## Quick Start
6+
7+
```bash
8+
cd web
9+
./start.sh
10+
```
11+
12+
Then open your browser to: **http://localhost:5000**
13+
14+
## Overview
15+
16+
The web dashboard provides an intuitive interface to:
17+
18+
- 📊 View aggregate statistics across all experiments
19+
- 🤖 Analyze performance by model
20+
- 🧠 Explore results by reasoning domain
21+
- 📝 Compare how different models tackle the same task
22+
- ⚖️ Side-by-side comparison matrix
23+
- 🎬 Watch generated videos directly in the browser
24+
25+
## Features
26+
27+
### Dashboard Home (`/`)
28+
29+
The main dashboard displays:
30+
- **Overview Statistics**: Total inferences, models tested, success rates
31+
- **Model Performance Table**: Success rates, average duration, domains covered
32+
- **Domain Cards**: Statistics for each reasoning domain (Chess, Maze, Raven, Rotation, Sudoku)
33+
- **Recent Results**: Grid of latest generated videos
34+
35+
### Model View (`/model/<model_name>`)
36+
37+
Detailed view for a specific model showing:
38+
- Performance breakdown by domain
39+
- All generated videos
40+
- Success/failure statistics
41+
- Generation duration metrics
42+
43+
### Domain View (`/domain/<domain_name>`)
44+
45+
View all results for a reasoning domain:
46+
- Performance breakdown by model
47+
- All task results
48+
- Domain-specific statistics
49+
50+
### Task View (`/task/<task_id>`)
51+
52+
Compare how different models performed on the same task:
53+
- Side-by-side video comparison
54+
- Input image (first frame)
55+
- Generated video (model output)
56+
- Target image (final frame)
57+
- Prompt and metadata
58+
- Generation time and status
59+
60+
### Comparison Matrix (`/compare`)
61+
62+
Grid view showing all tasks × all models:
63+
- Interactive video grid
64+
- Play/pause controls
65+
- Quick visual comparison
66+
- Duration overlays
67+
68+
## Architecture
69+
70+
### Backend (Flask)
71+
72+
```
73+
web/
74+
├── app.py # Main Flask application
75+
│ # Routes: /, /model, /domain, /task, /compare
76+
│ # API: /api/results, /api/statistics
77+
│ # Media: /video, /image
78+
└── utils/
79+
└── data_loader.py # Scans output folders and loads metadata
80+
```
81+
82+
### Frontend
83+
84+
```
85+
web/
86+
├── templates/ # Jinja2 HTML templates
87+
│ ├── base.html # Base layout with navbar
88+
│ ├── index.html # Dashboard overview
89+
│ ├── model.html # Model-specific view
90+
│ ├── domain.html # Domain-specific view
91+
│ ├── task.html # Task comparison
92+
│ ├── compare.html # Comparison matrix
93+
│ └── error.html # Error page
94+
└── static/
95+
├── css/
96+
│ └── style.css # Modern dark theme with gradients
97+
└── js/
98+
└── main.js # Interactive features
99+
```
100+
101+
## Data Source
102+
103+
The dashboard automatically scans the `data/outputs/` directory structure:
104+
105+
```
106+
data/outputs/
107+
└── {model}/
108+
└── {domain}_task/
109+
└── {task_id}/
110+
└── {run_id}/
111+
├── video/
112+
│ └── generated_video.mp4
113+
├── question/
114+
│ ├── first_frame.png
115+
│ ├── final_frame.png
116+
│ ├── prompt.txt
117+
│ └── question_metadata.json
118+
└── metadata.json
119+
```
120+
121+
Each inference folder is parsed to extract:
122+
- Model name and parameters
123+
- Success/failure status
124+
- Generation duration
125+
- Video path
126+
- Input/output images
127+
- Prompt text
128+
- Task metadata
129+
130+
## API Endpoints
131+
132+
### GET `/api/results`
133+
134+
Get all inference results as JSON.
135+
136+
**Query Parameters:**
137+
- `model` - Filter by model name
138+
- `domain` - Filter by domain name
139+
- `task_id` - Filter by task ID
140+
141+
**Example:**
142+
```bash
143+
curl http://localhost:5000/api/results?model=luma-ray-2&domain=chess
144+
```
145+
146+
**Response:**
147+
```json
148+
{
149+
"total": 15,
150+
"results": [
151+
{
152+
"run_id": "luma-ray-2_chess_0001_...",
153+
"model": "luma-ray-2",
154+
"domain": "chess",
155+
"task_id": "chess_0001",
156+
"success": true,
157+
"duration_seconds": 42.3,
158+
"video_path": "...",
159+
"timestamp": "2025-10-18T..."
160+
}
161+
]
162+
}
163+
```
164+
165+
### GET `/api/statistics`
166+
167+
Get aggregate statistics.
168+
169+
**Response:**
170+
```json
171+
{
172+
"models": {
173+
"luma-ray-2": {
174+
"total": 75,
175+
"success": 68,
176+
"failed": 7,
177+
"success_rate": 90.7,
178+
"avg_duration": 38.5,
179+
"domains": ["chess", "maze", "raven", "rotation", "sudoku"]
180+
}
181+
},
182+
"domains": {
183+
"chess": {
184+
"total": 90,
185+
"success": 82,
186+
"failed": 8,
187+
"success_rate": 91.1,
188+
"models": ["luma-ray-2", "veo-3.0-generate", ...]
189+
}
190+
},
191+
"total_inferences": 450
192+
}
193+
```
194+
195+
## Installation
196+
197+
### Option 1: Use Main venv (Recommended)
198+
199+
```bash
200+
cd web
201+
source ../venv/bin/activate
202+
pip install -r requirements.txt
203+
python app.py
204+
```
205+
206+
### Option 2: Separate venv
207+
208+
```bash
209+
cd web
210+
python -m venv venv
211+
source venv/bin/activate
212+
pip install -r requirements.txt
213+
python app.py
214+
```
215+
216+
## Deployment
217+
218+
### Development
219+
220+
```bash
221+
python app.py
222+
# Runs on http://localhost:5000
223+
```
224+
225+
### Production (Gunicorn)
226+
227+
```bash
228+
gunicorn --bind 0.0.0.0:5000 --workers 4 app:app
229+
```
230+
231+
### Environment Variables
232+
233+
No environment variables required - the dashboard uses relative paths to find the output directory.
234+
235+
## Design
236+
237+
### Modern Dark Theme
238+
239+
- **Color Palette**:
240+
- Primary: Blue (#2563eb)
241+
- Secondary: Purple (#7c3aed)
242+
- Success: Green (#10b981)
243+
- Warning: Orange (#f59e0b)
244+
- Danger: Red (#ef4444)
245+
246+
- **Layout**: Responsive grid system
247+
- **Typography**: System fonts for fast loading
248+
- **Icons**: Emoji for universal support
249+
- **Animations**: Smooth transitions and hover effects
250+
251+
### Responsive Design
252+
253+
- Desktop: Multi-column grids
254+
- Tablet: Adaptive layouts
255+
- Mobile: Single-column stacks
256+
257+
## Browser Support
258+
259+
- ✅ Chrome/Edge (full support)
260+
- ✅ Firefox (full support)
261+
- ✅ Safari (full support)
262+
- ✅ Mobile browsers (responsive)
263+
264+
## Performance
265+
266+
- Lazy loading for videos
267+
- Metadata caching
268+
- Efficient directory scanning
269+
- Progressive loading
270+
271+
## Troubleshooting
272+
273+
### Videos Not Loading
274+
275+
1. Check output directory path
276+
2. Verify video files exist
277+
3. Check file permissions
278+
4. Ensure MP4 format
279+
280+
### Port Already in Use
281+
282+
Change port in `app.py`:
283+
```python
284+
app.run(debug=True, host='0.0.0.0', port=5001)
285+
```
286+
287+
### No Data Displayed
288+
289+
1. Run some inferences first:
290+
```bash
291+
python examples/experiment_2025-10-14.py
292+
```
293+
2. Verify outputs exist in `data/outputs/`
294+
295+
## Development
296+
297+
### Adding New Views
298+
299+
1. Create route in `app.py`:
300+
```python
301+
@app.route('/myview')
302+
def my_view():
303+
return render_template('myview.html')
304+
```
305+
306+
2. Create template in `templates/myview.html`
307+
3. Add navigation link in `templates/base.html`
308+
309+
### Styling
310+
311+
All styles are in `static/css/style.css` using CSS variables for easy theming.
312+
313+
### JavaScript
314+
315+
Interactive features in `static/js/main.js`:
316+
- Video player enhancements
317+
- Lazy loading
318+
- Search/filter
319+
- Keyboard shortcuts
320+
321+
## Future Enhancements
322+
323+
- [ ] Real-time updates via WebSocket
324+
- [ ] Advanced filtering and search
325+
- [ ] Export to CSV/JSON
326+
- [ ] Video quality metrics
327+
- [ ] User authentication
328+
- [ ] Docker containerization
329+
- [ ] Caching layer for performance
330+
331+
## Integration with VMEvalKit
332+
333+
The dashboard is a standalone app but tightly integrated:
334+
335+
1. **Data Flow**: Reads from VMEvalKit's structured output folders
336+
2. **No Modification**: Doesn't modify any experiment data
337+
3. **Real-time**: Reflects latest experiments automatically
338+
4. **Metadata**: Uses VMEvalKit's metadata format
339+
340+
## Contributing
341+
342+
To contribute to the web dashboard:
343+
344+
1. Follow VMEvalKit's contribution guidelines
345+
2. Test on multiple browsers
346+
3. Ensure responsive design
347+
4. Update documentation
348+
349+
## License
350+
351+
Same as VMEvalKit main project (Apache 2.0).
352+
353+
---
354+
355+
For more information, see the [main README](../README.md) or `web/README.md`.
356+

0 commit comments

Comments
 (0)