Skip to content

Files

Latest commit

hangg-saijensenz-saijammm
Mar 18, 2025
8a640b7 · Mar 18, 2025

History

History
76 lines (47 loc) · 3.62 KB

GR_USAGE.md

File metadata and controls

76 lines (47 loc) · 3.62 KB

🚀 Gradio Demo

This gradio demo is the simplest starting point for you play with our project.

You can either visit it at our huggingface space here or run it locally yourself by

python demo_gr.py

We provide two ways to use our demo:

  1. Basic mode, where user can upload a single image, and set a target camera trajectory from our preset options. This is the most straightforward way to use our model, and is suitable for most users.
  2. Advanced mode, where user can upload one or multiple images, and set a target camera trajectory by interacting with a 3D viewport (powered by viser). This is suitable for power users and academic researchers.

Basic

This is the default mode when entering our demo (given its simplicity).

User can upload a single image, and set a target camera trajectory from our preset options. This is the most straightforward way to use our model, and is suitable for most users.

Here is a video walkthrough:

basic-walkthrough.mp4

You can choose from 13 preset trajectories that are common for NVS (move-forward/backward are omitted for visualization purpose):

basic-llff-room-collage.mp4

More formally:

  • orbit/spiral/lemniscate are good for showing the "3D-ness" of the scene.
  • zoom-in/out keep the camera position the same while increasing/decreasing the focal length.
  • dolly zoom-in/out move camera position backward/forward while increasing/decreasing the focal length.
  • move-forward/backward/up/down/left/right move camera position in different directions.

Notes:

  • For a 80 frame video at 786x576 resolution, it takes around 20 seconds for the first pass generation, and around 2 minutes for the second pass generation, tested with a single H100 GPU.
  • Please expect around ~2-3x more times on HF space.

Advanced

This is the power mode where you can have very fine-grained control over camera trajectories.

User can upload one or multiple images, and set a target camera trajectory by interacting with a 3D viewport. This is suitable for power users and academic researchers.

Here is a video walkthrough

advanced-walkthru_small.mp4

Notes:

  • For a 134 frame video at 576x576 resolution, it takes around 16 seconds for the first pass generation, and around 4 minutes for the second pass generation, tested with a single H100 GPU.
  • Please expect around ~2-3x more times on HF space.

Pro tips

  • If the first pass sampling result is bad, click "Abort rendering" button in GUI to avoid stucking at second pass sampling such that you can try something else.

Performance benchmark

We have tested our gradio demo in both a local environment and the HF space environment, across different modes and compilation settings. Here are our results:

Total time (s) Basic first pass Basic second pass Advanced first pass Advanced second pass
HF (L40S, w/o comp.) 68 484 48 780
HF (L40S, w/ comp.) 51 362 36 587
Local (H100, w/o comp.) 35 204 20 313
Local (H100, w/ comp.) 21 144 16 234

Notes:

  • HF space uses L40S GPU, and our local environment uses H100 GPU.
  • We opt-in compilation by torch.compile.
  • Basic mode is tested by generating 80 frames at 768x576 resolution.
  • Advanced mode is tested by generating 134 frames at 576x576 resolution.