Skip to content

nv-tlabs/Cosmos-Drive-Dreams

Repository files navigation

Cosmos-Drive-Dreams

External Links: Paper | Arxiv Paper | Paper Website

On This Page: Models | Dataset | Toolkits | SDG Pipeline

This is the official code repository of Cosmos-Drive-Dreams - a Synthetic Data Generation (SDG) pipeline built on Cosmos World Foundation Models for generating diverse and challenging scenarios for Autonomous Vehicle use-cases.

We open-source our model weights, pipeline toolkits, and a dataset (including cosmos-generated videos, paired HDMap and LiDAR), which consists of 81,802 clips.

Cosmos-Drive-Dream Teaser

demo_medium.webm

News

  • 2025-06-10: Model, Toolkits, and Dataset (including cosmos-generated video, HDMap, and LiDAR) are released! Stay tuned for the paired GT RGB videos.

Cosmos-Drive Open-source Summary

Name Type Link
Cosmos-7B-AV-Sample (Paper Sec. [2.1]) model base_model.pt
Cosmos-7B-Multiview-AV-Sample (Paper Sec. [2.1]) model Huggingface Link
Cosmos-Transfer1-7B-Sample-AV (Paper Sec. [2.2]) model Huggingface Link
Cosmos-7B-Single2Multiview-Sample-AV (Paper Sec. [2.3]) model Huggingface Link
Cosmos-7B-Annotate-Sample-AV (Paper Sec. [2.4]) model To be released soon
Cosmos-7B-LiDAR-GEN-Sample-AV (Paper Sec. [3]) model To be released soon

Cosmos-Drive-Dreams Dataset

Cosmos-Drive-Dreams Dataset contains labels (HDMap, BBox, and LiDAR) for 5,843 10-second clips collected by NVIDIA, along with 81,802 synthetic video samples generated by Cosmos-Drive-Dreams from these labels. The synthetically generated video is 121 frames long, capturing a wide variety of challenging scenarios, such as rainy, snowy, foggy, etc, that might not be as easily available in real-world driving datasets. This dataset is ready for commercial/non-commercial use.

Detailed information can be found on the Huggingface page.

Download

usage: scripts/download.py [-h] --odir ODIR
                           [--file_types {hdmap,lidar,synthetic}[,…]]
                           [--workers N] [--clean_cache]

required arguments:
  --odir ODIR            Output directory where files are stored.

optional arguments:
  -h, --help             Show this help message and exit.
  --file_types {hdmap,lidar,synthetic}[,…]
                  Comma-separated list of data groups to fetch.
                  • hdmap     → common folders + 3d_* HD-map layers  
                  • lidar     → common folders + lidar_raw  
                  • synthetic → common folders + cosmos_synthetic  
                  Default: hdmap,lidar,synthetic (all groups).
  --workers N            Parallel download threads (default: 1).
                         Increase on fast networks; reduce if you hit
                         rate limits or disk bottlenecks.
  --clean_cache          Delete the temporary HuggingFace cache after
                         each run to reclaim disk space.

common folders (always downloaded, regardless of --file_types):
  all_object_info, captions, car_mask_coarse, ftheta_intrinsic,
  pinhole_intrinsic, pose, vehicle_pose

Here are some examples:

# download all (about 3TB)
python scripts/download.py --odir YOUR_DATASET_PATH --workers YOUR_WORKER_NUMBER

# download hdmap only
python scripts/download.py --odir YOUR_DATASET_PATH --file_types hdmap --workers YOUR_WORKER_NUMBER

# download lidar only
python scripts/download.py --odir YOUR_DATASET_PATH --file_types lidar --workers YOUR_WORKER_NUMBER

# download synthetic video only (about 700GB)
python scripts/download.py --odir YOUR_DATASET_PATH --file_types synthetic --workers YOUR_WORKER_NUMBER

Tutorial

  • Visualizing the structured labels here

Cosmos-Drive-Dreams Toolkits

  • Visualizing the structured labels here

  • Editing ego trajectory interactively to produce novel scenarios here

  • Converting Waymo Open Dataset to our format here

  • Rectify f-theta camera images to more common pinhole camera images here

toolkit_demo_small.webm

Cosmos-Drive-Dreams SDG Pipeline

We provide a simple walkthrough including all stages of our SDG pipeline through example data available in the assets folder; no additional data download is necessary. For large-scale sampling, please download the above Cosmos-Drive-Dreams Dataset.

0. Installation and Model Downloading

We recommend using conda for managing your environment. Detailed instructions for setting up Cosmos-Drive-Dreams can be found in INSTALL.md.

1. Preprocessing Condition Videos

cosmos-drive-dreams-toolkits/render_from_rds_hq.py is used to render the HD map + bounding box / LiDAR condition videos from RDS-HQ dataset. In this example, we will only be rendering the HD map + bounding box condition videos. Note that GPU is required for rendering LiDAR.

cd cosmos-drive-dreams-toolkits

# generate multi-view condition videos.
# If you just want to generate front-view videos, replace `-d rds_hq_mv` with `-d rds_hq`
python render_from_rds_hq.py -i ../assets/example -o ../outputs -d rds_hq_mv --skip lidar
cd ..

This will automatically launch multiple jobs based on Ray for data parallelization, but since we are only processing 1 clip here, it will only use 1 worker. The script should return in under a minute and produce a new directory at outputs/hdmap:

outputs/
└── hdmap/
    ├── ftheta_camera_cross_left_120fov
    │   └── 2d23a1f4-c269-46aa-8e7d-1bb595d1e421_2445376400000_2445396400000_0.mp4
    ├── ftheta_camera_cross_right_120fov
    │   └── 2d23a1f4-c269-46aa-8e7d-1bb595d1e421_2445376400000_2445396400000_0.mp4
    ├── ftheta_camera_front_wide_120fov
    │   └── 2d23a1f4-c269-46aa-8e7d-1bb595d1e421_2445376400000_2445396400000_0.mp4
    ├── ftheta_camera_rear_left_120fov
        └── 2d23a1f4-c269-46aa-8e7d-1bb595d1e421_2445376400000_2445396400000_0.mp4
    ...

The suffix _0 means it is the first chunk of the video, which will be 121 frames long.

2. Prompt Rewriting

A prompt describing a possible manifestation for the example can be found in assets/example/captions/2d23*.txt. We can use a VLM (Qwen3 to be exact) to augment this single prompt into many variations as follows:

python scripts/rewrite_caption.py -i assets/example/captions -o outputs/captions

The output will be saved at outputs/captions/2d23*json.

3. Front-view Video Generation

Next, we use Cosmos-Transfer1-7b-Sample-AV to generate a 121-frame RGB video from the HD Map condition video and text prompt.

PYTHONPATH="cosmos-transfer1" python scripts/generate_video_single_view.py --caption_path outputs/captions --input_path outputs --video_save_folder outputs/single_view --checkpoint_dir checkpoints/ --is_av_sample --controlnet_specs assets/sample_av_hdmap_spec.json

For detailed description on how to run this model and how to adjust inference parameters, see this readme.

4. Multiview Video Generation

After single view videos have been generated, we use Cosmos-Transfer1-7b-Sample-AV-Single2MultiView to extend them into multi-view videos.

CUDA_HOME=$CONDA_PREFIX PYTHONPATH="cosmos-transfer1" python scripts/generate_video_multi_view.py --caption_path outputs/captions --input_path outputs --input_view_path outputs/single_view --video_save_folder outputs/multi_view --checkpoint_dir checkpoints --is_av_sample --controlnet_specs assets/sample_av_hdmap_multiview_spec.json

For detailed description on how to run this model and how to adjust inference parameters, see this readme.

5. Filtering via VLM

Coming soon

Citation

@misc{nvidia2025cosmosdrivedreams,
  title  = {Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models},
  author = {Ren, Xuanchi and Lu, Yifan and Cao, Tianshi and Gao, Ruiyuan and
            Huang, Shengyu and Sabour, Amirmojtaba and Shen, Tianchang and
            Pfaff, Tobias and Wu, Jay Zhangjie and Chen, Runjian and
            Kim, Seung Wook and Gao, Jun and Leal-Taixe, Laura and
            Chen, Mike and Fidler, Sanja and Ling, Huan},
  year   = {2025},
  url    = {https://arxiv.org/abs/2506.09042}
}
@misc{nvidia2025cosmostransfer1,
  title     = {Cosmos Transfer1: World Generation with Adaptive Multimodal Control},
  author    = {NVIDIA}, 
  year      = {2025},
  url       = {https://arxiv.org/abs/2503.14492}
}

About

Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published