External Links: Paper | Arxiv Paper | Paper Website
On This Page: Models | Dataset | Toolkits | SDG Pipeline
This is the official code repository of Cosmos-Drive-Dreams - a Synthetic Data Generation (SDG) pipeline built on Cosmos World Foundation Models for generating diverse and challenging scenarios for Autonomous Vehicle use-cases.
We open-source our model weights, pipeline toolkits, and a dataset (including cosmos-generated videos, paired HDMap and LiDAR), which consists of 81,802 clips.
demo_medium.webm
- 2025-06-10: Model, Toolkits, and Dataset (including cosmos-generated video, HDMap, and LiDAR) are released! Stay tuned for the paired GT RGB videos.
Name | Type | Link |
---|---|---|
Cosmos-7B-AV-Sample (Paper Sec. [2.1]) | model | base_model.pt |
Cosmos-7B-Multiview-AV-Sample (Paper Sec. [2.1]) | model | Huggingface Link |
Cosmos-Transfer1-7B-Sample-AV (Paper Sec. [2.2]) | model | Huggingface Link |
Cosmos-7B-Single2Multiview-Sample-AV (Paper Sec. [2.3]) | model | Huggingface Link |
Cosmos-7B-Annotate-Sample-AV (Paper Sec. [2.4]) | model | To be released soon |
Cosmos-7B-LiDAR-GEN-Sample-AV (Paper Sec. [3]) | model | To be released soon |
Cosmos-Drive-Dreams Dataset contains labels (HDMap, BBox, and LiDAR) for 5,843 10-second clips collected by NVIDIA, along with 81,802 synthetic video samples generated by Cosmos-Drive-Dreams from these labels. The synthetically generated video is 121 frames long, capturing a wide variety of challenging scenarios, such as rainy, snowy, foggy, etc, that might not be as easily available in real-world driving datasets. This dataset is ready for commercial/non-commercial use.
Detailed information can be found on the Huggingface page.
usage: scripts/download.py [-h] --odir ODIR
[--file_types {hdmap,lidar,synthetic}[,…]]
[--workers N] [--clean_cache]
required arguments:
--odir ODIR Output directory where files are stored.
optional arguments:
-h, --help Show this help message and exit.
--file_types {hdmap,lidar,synthetic}[,…]
Comma-separated list of data groups to fetch.
• hdmap → common folders + 3d_* HD-map layers
• lidar → common folders + lidar_raw
• synthetic → common folders + cosmos_synthetic
Default: hdmap,lidar,synthetic (all groups).
--workers N Parallel download threads (default: 1).
Increase on fast networks; reduce if you hit
rate limits or disk bottlenecks.
--clean_cache Delete the temporary HuggingFace cache after
each run to reclaim disk space.
common folders (always downloaded, regardless of --file_types):
all_object_info, captions, car_mask_coarse, ftheta_intrinsic,
pinhole_intrinsic, pose, vehicle_pose
Here are some examples:
# download all (about 3TB)
python scripts/download.py --odir YOUR_DATASET_PATH --workers YOUR_WORKER_NUMBER
# download hdmap only
python scripts/download.py --odir YOUR_DATASET_PATH --file_types hdmap --workers YOUR_WORKER_NUMBER
# download lidar only
python scripts/download.py --odir YOUR_DATASET_PATH --file_types lidar --workers YOUR_WORKER_NUMBER
# download synthetic video only (about 700GB)
python scripts/download.py --odir YOUR_DATASET_PATH --file_types synthetic --workers YOUR_WORKER_NUMBER
- Visualizing the structured labels here
-
Visualizing the structured labels here
-
Editing ego trajectory interactively to produce novel scenarios here
-
Converting Waymo Open Dataset to our format here
-
Rectify f-theta camera images to more common pinhole camera images here
toolkit_demo_small.webm
We provide a simple walkthrough including all stages of our SDG pipeline through example data available in the assets folder; no additional data download is necessary. For large-scale sampling, please download the above Cosmos-Drive-Dreams Dataset.
We recommend using conda for managing your environment. Detailed instructions for setting up Cosmos-Drive-Dreams can be found in INSTALL.md.
cosmos-drive-dreams-toolkits/render_from_rds_hq.py
is used to render the HD map + bounding box / LiDAR condition videos from RDS-HQ dataset.
In this example, we will only be rendering the HD map + bounding box condition videos.
Note that GPU is required for rendering LiDAR.
cd cosmos-drive-dreams-toolkits
# generate multi-view condition videos.
# If you just want to generate front-view videos, replace `-d rds_hq_mv` with `-d rds_hq`
python render_from_rds_hq.py -i ../assets/example -o ../outputs -d rds_hq_mv --skip lidar
cd ..
This will automatically launch multiple jobs based on Ray for data parallelization, but since we are only processing 1 clip here, it will only use 1 worker. The script should return in under a minute and produce a new directory at outputs/hdmap
:
outputs/
└── hdmap/
├── ftheta_camera_cross_left_120fov
│ └── 2d23a1f4-c269-46aa-8e7d-1bb595d1e421_2445376400000_2445396400000_0.mp4
├── ftheta_camera_cross_right_120fov
│ └── 2d23a1f4-c269-46aa-8e7d-1bb595d1e421_2445376400000_2445396400000_0.mp4
├── ftheta_camera_front_wide_120fov
│ └── 2d23a1f4-c269-46aa-8e7d-1bb595d1e421_2445376400000_2445396400000_0.mp4
├── ftheta_camera_rear_left_120fov
└── 2d23a1f4-c269-46aa-8e7d-1bb595d1e421_2445376400000_2445396400000_0.mp4
...
The suffix _0
means it is the first chunk of the video, which will be 121 frames long.
A prompt describing a possible manifestation for the example can be found in assets/example/captions/2d23*.txt
. We can use a VLM (Qwen3 to be exact) to augment this single prompt into many variations as follows:
python scripts/rewrite_caption.py -i assets/example/captions -o outputs/captions
The output will be saved at outputs/captions/2d23*json
.
Next, we use Cosmos-Transfer1-7b-Sample-AV to generate a 121-frame RGB video from the HD Map condition video and text prompt.
PYTHONPATH="cosmos-transfer1" python scripts/generate_video_single_view.py --caption_path outputs/captions --input_path outputs --video_save_folder outputs/single_view --checkpoint_dir checkpoints/ --is_av_sample --controlnet_specs assets/sample_av_hdmap_spec.json
For detailed description on how to run this model and how to adjust inference parameters, see this readme.
After single view videos have been generated, we use Cosmos-Transfer1-7b-Sample-AV-Single2MultiView to extend them into multi-view videos.
CUDA_HOME=$CONDA_PREFIX PYTHONPATH="cosmos-transfer1" python scripts/generate_video_multi_view.py --caption_path outputs/captions --input_path outputs --input_view_path outputs/single_view --video_save_folder outputs/multi_view --checkpoint_dir checkpoints --is_av_sample --controlnet_specs assets/sample_av_hdmap_multiview_spec.json
For detailed description on how to run this model and how to adjust inference parameters, see this readme.
Coming soon
@misc{nvidia2025cosmosdrivedreams,
title = {Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models},
author = {Ren, Xuanchi and Lu, Yifan and Cao, Tianshi and Gao, Ruiyuan and
Huang, Shengyu and Sabour, Amirmojtaba and Shen, Tianchang and
Pfaff, Tobias and Wu, Jay Zhangjie and Chen, Runjian and
Kim, Seung Wook and Gao, Jun and Leal-Taixe, Laura and
Chen, Mike and Fidler, Sanja and Ling, Huan},
year = {2025},
url = {https://arxiv.org/abs/2506.09042}
}
@misc{nvidia2025cosmostransfer1,
title = {Cosmos Transfer1: World Generation with Adaptive Multimodal Control},
author = {NVIDIA},
year = {2025},
url = {https://arxiv.org/abs/2503.14492}
}