🚀 Gradio Demo

This gradio demo is the simplest starting point for you play with our project.

You can either visit it at our huggingface space here or run it locally yourself by

python demo_gr.py

We provide two ways to use our demo:

Basic mode, where user can upload a single image, and set a target camera trajectory from our preset options. This is the most straightforward way to use our model, and is suitable for most users.
Advanced mode, where user can upload one or multiple images, and set a target camera trajectory by interacting with a 3D viewport (powered by viser). This is suitable for power users and academic researchers.

`Basic`

This is the default mode when entering our demo (given its simplicity).

User can upload a single image, and set a target camera trajectory from our preset options. This is the most straightforward way to use our model, and is suitable for most users.

Here is a video walkthrough:

basic-walkthrough.mp4

You can choose from 13 preset trajectories that are common for NVS (move-forward/backward are omitted for visualization purpose):

basic-llff-room-collage.mp4

More formally:

orbit/spiral/lemniscate are good for showing the "3D-ness" of the scene.
zoom-in/out keep the camera position the same while increasing/decreasing the focal length.
dolly zoom-in/out move camera position backward/forward while increasing/decreasing the focal length.
move-forward/backward/up/down/left/right move camera position in different directions.

Notes:

For a 80 frame video at 786x576 resolution, it takes around 20 seconds for the first pass generation, and around 2 minutes for the second pass generation, tested with a single H100 GPU.
Please expect around ~2-3x more times on HF space.

`Advanced`

This is the power mode where you can have very fine-grained control over camera trajectories.

User can upload one or multiple images, and set a target camera trajectory by interacting with a 3D viewport. This is suitable for power users and academic researchers.

Here is a video walkthrough

advanced-walkthru_small.mp4

Notes:

For a 134 frame video at 576x576 resolution, it takes around 16 seconds for the first pass generation, and around 4 minutes for the second pass generation, tested with a single H100 GPU.
Please expect around ~2-3x more times on HF space.

Pro tips

If the first pass sampling result is bad, click "Abort rendering" button in GUI to avoid stucking at second pass sampling such that you can try something else.

Performance benchmark

We have tested our gradio demo in both a local environment and the HF space environment, across different modes and compilation settings. Here are our results:

Total time (s)	`Basic` first pass	`Basic` second pass	`Advanced` first pass	`Advanced` second pass
HF (L40S, w/o comp.)	68	484	48	780
HF (L40S, w/ comp.)	51	362	36	587
Local (H100, w/o comp.)	35	204	20	313
Local (H100, w/ comp.)	21	144	16	234

Notes:

HF space uses L40S GPU, and our local environment uses H100 GPU.
We opt-in compilation by torch.compile.
Basic mode is tested by generating 80 frames at 768x576 resolution.
Advanced mode is tested by generating 134 frames at 576x576 resolution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!