This gradio demo is the simplest starting point for you play with our project.
You can either visit it at our huggingface space here or run it locally yourself by
python demo_gr.py
We provide two ways to use our demo:
Basic
mode, where user can upload a single image, and set a target camera trajectory from our preset options. This is the most straightforward way to use our model, and is suitable for most users.Advanced
mode, where user can upload one or multiple images, and set a target camera trajectory by interacting with a 3D viewport (powered by viser). This is suitable for power users and academic researchers.
This is the default mode when entering our demo (given its simplicity).
User can upload a single image, and set a target camera trajectory from our preset options. This is the most straightforward way to use our model, and is suitable for most users.
Here is a video walkthrough:
basic-walkthrough.mp4
You can choose from 13 preset trajectories that are common for NVS (move-forward/backward
are omitted for visualization purpose):
basic-llff-room-collage.mp4
More formally:
orbit/spiral/lemniscate
are good for showing the "3D-ness" of the scene.zoom-in/out
keep the camera position the same while increasing/decreasing the focal length.dolly zoom-in/out
move camera position backward/forward while increasing/decreasing the focal length.move-forward/backward/up/down/left/right
move camera position in different directions.
Notes:
- For a 80 frame video at
786x576
resolution, it takes around 20 seconds for the first pass generation, and around 2 minutes for the second pass generation, tested with a single H100 GPU. - Please expect around ~2-3x more times on HF space.
This is the power mode where you can have very fine-grained control over camera trajectories.
User can upload one or multiple images, and set a target camera trajectory by interacting with a 3D viewport. This is suitable for power users and academic researchers.
Here is a video walkthrough
advanced-walkthru_small.mp4
Notes:
- For a 134 frame video at
576x576
resolution, it takes around 16 seconds for the first pass generation, and around 4 minutes for the second pass generation, tested with a single H100 GPU. - Please expect around ~2-3x more times on HF space.
- If the first pass sampling result is bad, click "Abort rendering" button in GUI to avoid stucking at second pass sampling such that you can try something else.
We have tested our gradio demo in both a local environment and the HF space environment, across different modes and compilation settings. Here are our results:
Total time (s) | Basic first pass |
Basic second pass |
Advanced first pass |
Advanced second pass |
---|---|---|---|---|
HF (L40S, w/o comp.) | 68 | 484 | 48 | 780 |
HF (L40S, w/ comp.) | 51 | 362 | 36 | 587 |
Local (H100, w/o comp.) | 35 | 204 | 20 | 313 |
Local (H100, w/ comp.) | 21 | 144 | 16 | 234 |
Notes:
- HF space uses L40S GPU, and our local environment uses H100 GPU.
- We opt-in compilation by
torch.compile
. Basic
mode is tested by generating 80 frames at768x576
resolution.Advanced
mode is tested by generating 134 frames at576x576
resolution.