This repository will collect research, implementations, and resources related to Foundation Models for Scenario Generation and Analysis in autonomous driving. The repository will be maintained by TUM-AVS (Professorship of Autonomous Vehicle Systems at Technical University of Munich) and will be continuously updated to track the latest work in the community.
- Nov 2025 – Added 2 new papers on scenario analysis. Added new section: Useful Resources and Links.
- Uploaded new version to arXiv. Repository now categorizes 348 papers:
- 93 on scenario generation
- 56 on scenario analysis
- 58 on datasets
- 21 on simulators
- 25 on benchmark challenges
- 95 on other related topics (e.g., FMs' implementation)
- Oct 2025 – Added 17 new papers on scenario generation and 2 on scenario analysis.
- Sep 2025 – Added 3 new papers on scenario generation and 14 on scenario analysis.
- Aug 2025 – Added 4 new papers on scenario generation and 4 on scenario analysis.
- Jul 2025 – Added 9 new papers on scenario generation and 8 on scenario analysis.
- Jun 2025 – Released our paper on arXiv. Repository now categorizes 342 papers:
- 93 on scenario generation
- 54 on scenario analysis
- 55 on datasets
- 21 on simulators
- 25 on benchmark challenges
- 94 on other related topics
- May 2025 – Repository initialized.
Please visit Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis for more details and comprehensive information. If you find our paper and repo helpful, please consider citing it as follows:
@misc{gao2025foundation,
title={Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis},
author={Yuan Gao, Mattia Piccinini, Yuchen Zhang, Dingrui Wang, Korbinian Moller, Roberto Brusnicki, Baha Zarrouki, Alessio Gambi, Jan Frederik Totz, Kai Storms, Steven Peters, Andrea Stocco, Bassam Alrifaee, Marco Pavone and Johannes Betz,
journal={TBD},
year={2025},
eprint={2506.11526},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2506.11526},
}Foundation models are large-scale, pre-trained models that can be adapted to a wide range of downstream tasks. In the context of autonomous driving, foundation models offer a powerful approach to scenario generation and analysis, enabling more comprehensive and realistic testing, validation, and verification of autonomous driving systems. This repository aims to collect and organize research, tools, and resources in this important field.
The following figure shows the evolution of foundation model research in autonomous driving scenario generation and analysis over time:
The following list of keywords was used to search this survey's papers in the Google Scholar database. The keywords were entered either individually or in combination with other keywords in the list. The search was conducted until May 2025.
Keywords:
- Foundation Model Types: Foundation Models, Large Language Models (LLMs), Vision-Language Models (VLMs), Multimodal Large Language Models (MLLMs), Diffusion Models (DMs), World Models (WMs), Generative Models (GMs)
- Scenario Generation & Analysis: Scenario Generation, Scenario Simulation, Traffic Simulation, Scenario Testing, Scenario Understanding, Driving Scene Generation, Scene Reasoning, Risk Assessment, Safety-Critical Scenarios, Accident Prediction
- Application Context: Autonomous Driving, Self-Driving Vehicles, AV Simulation, Driving Video Generation, Traffic Datasets, Closed-Loop Simulation, Safety Assurance
Scenario Generation (LLM)
Scenario Analysis (LLM)
Scenario Generation (VLM)
Scenario Analysis (VLM)
Scenario Generation (MLLM)
| Paper | Date | Venue | Code | Citation |
|---|---|---|---|---|
| Realistic Corner Case Generation for Autonomous Vehicles with Multimodal Large Language Model | 2024-11 | arXiv | - | 8 |
| LMM-enhanced Safety-Critical Scenario Generation for Autonomous Driving System Testing From Non-Accident Traffic Videos | 2025-01 | arXiv | GitHub | 6 |
| Multi-modal Traffic Scenario Generation for Autonomous Driving System Testing | 2025-06 | FSE 2025 | - | 0 |
| Talk2Traffic: Interactive and Editable Traffic Scenario Generation for Autonomous Driving with Multimodal Large Language Model | 2025-06 | CVPR 2025 WDFM-AD | GitHub | 3 |
Scenario Analysis (MLLM)
Scenario Generation (Diffusion Models)
Scenario Analysis (Diffusion Models)
| Paper | Date | Venue | Code | Citation |
|---|---|---|---|---|
| AVD2: Accident Video Diffusion for Accident Video Description | 2025-03 | ICRA 2025 | GitHub | 16 |
World Models for Autonomous Driving
The following figure shows the usage distribution of different foundation model types across autonomous driving datasets:
Datasets Comparison
| Dataset | Year | Img | View | Real | Lidar | Radar | Traj | 3D | 2D | Lane | Weather | Time | Region | Company |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CamVid | 2009 | RGB | FPV | ✔ | ✖️ | ✖️ | ✖️ | ✖️ | ✔ | ✔ | ✔ | D | U | - |
| KITTI | 2013 | RGB/S | FPV | ✔ | ✔ | ✖️ | ✔ | ✔ | ✔ | ✔ | ✔ | D | U/R/H | - |
| Cyclists | 2016 | RGB | FPV | ✔ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | D | U | - |
| Cityscapes | 2016 | RGB/S | FPV | ✔ | ✖️ | ✖️ | ✖️ | ✔ | ✔ | ✔ | ✖️ | D | U | - |
| SYNTHIA | 2016 | RGB | FPV | ✖️ | ✖️ | ✖️ | ✖️ | ✔ | ✔ | ✔ | ✔ | D/N | U | - |
| Campus | 2016 | RGB | BEV | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | D | C | - |
| RobotCar | 2016 | RGB | FPV | ✔ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | D/N | U | - |
| Mapillary | 2017 | RGB | FPV | ✔ | ✖️ | ✖️ | ✖️ | ✖️ | ✔ | ✔ | ✔ | D/N | U | - |
| P.F.B. | 2017 | RGB | FPV | ✔ | ✖️ | ✖️ | ✖️ | ✖️ | ✔ | ✔ | ✔ | D/N | U | - |
| BDD100K | 2018 | RGB | FPV | ✔ | ✖️ | ✖️ | ✖️ | ✔ | ✔ | ✔ | ✔ | D | U/H | - |
| HighD | 2018 | RGB | BEV | ✔ | ✖️ | ✖️ | ✖️ | ✖️ | ✔ | ✔ | ✖️ | D | H | - |
| Udacity | 2018 | RGB | FPV | ✔ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | D | U | - |
| KAIST | 2018 | RGB/S | FPV | ✔ | ✔ | ✖️ | ✖️ | ✖️ | ✔ | ✔ | ✔ | D/N | U | - |
| Argoverse | 2019 | RGB/S | FPV | ✔ | ✔ | ✖️ | ✖️ | ✖️ | ✔ | ✔ | ✔ | D/N | U | - |
| TRAF | 2019 | RGB | FPV | ✔ | ✖️ | ✖️ | ✖️ | ✖️ | ✔ | ✔ | ✔ | D | U | - |
| ApolloScape | 2019 | RGB/S | FPV | ✔ | ✖️ | ✖️ | ✖️ | ✔ | ✔ | ✔ | ✔ | D | U | - |
| ACFR | 2019 | RGB | BEV | ✔ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | D | RA | - |
| H3D | 2019 | RGB | FPV | ✔ | ✖️ | ✖️ | ✖️ | ✖️ | ✔ | ✔ | ✔ | D | U | - |
| INTERACTION | 2019 | RGB | BEV | ✔ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | D | I/RA | - |
| Comma2k19 | 2019 | RGB | FPV | ✔ | ✖️ | ✖️ | ✔ | ✔ | ✖️ | ✖️ | ✖️ | D/N | U/S/R/H | - |
| InD | 2020 | RGB | BEV | ✔ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | D | I | - |
| RounD | 2020 | RGB | BEV | ✔ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | D | RA | - |
| nuScenes | 2020 | RGB | FPV | ✔ | ✔ | ✔ | ✖️ | ✔ | ✔ | ✔ | ✔ | D/N | U | - |
| Lyft Level 5 | 2020 | RGB | FPV | ✔ | ✔ | ✔ | ✖️ | ✔ | ✔ | ✔ | ✔ | D/N | U/S | - |
| Waymo Open | 2020 | RGB | FPV | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | D/N | U | - |
| A*3D | 2020 | RGB | FPV | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | D/N | U | - |
| RobotCar Radar | 2020 | RGB | FPV | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | D/N | U | - |
| Toronto3D | 2020 | RGB | BEV | ✔ | ✔ | ✖️ | ✔ | ✔ | ✖️ | ✔ | ✖️ | D/N | U | University of Waterloo |
| A2D2 | 2020 | RGB | FPV | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✖️ | ✔ | ✔ | D | U/H/S/R |
| WADS | 2020 | RGB | FPV | ✔ | ✔ | ✔ | ✔ | ✔ | ✖️ | ✖️ | ✔ | D/N | U/S/R | Michigan Technological University |
| Argoverse 2 | 2021 | RGB/S | FPV | ✔ | ✔ | ✖️ | ✖️ | ✔ | ✔ | ✔ | ✔ | D/N | U | - |
| PandaSet | 2021 | RGB | FPV | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | D/N | U | - |
| ONCE | 2021 | RGB | FPV | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | D/N | U | - |
| Leddar PixSet | 2021 | RGB | FPV | ✔ | ✔ | ✖️ | ✔ | ✔ | ✔ | ✖️ | ✔ | D/N | U/S/R | Leddar |
| ZOD | 2022 | RGB | FPV | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | D/N | U/R/S/H | Zenseact |
| IDD-3D | 2022 | RGB | FPV | ✔ | ✔ | ✖️ | ✖️ | ✔ | ✔ | ✖️ | ✖️ | - | R | INAI |
| CODA | 2022 | RGB | FPV | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | D/N | U/S/R | Huawei |
| SHIFT | 2022 | RGB | FPV | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | D/N | U/S/R/H | ETH Zürich |
| DeepAccident | 2023 | RGB/S | FPV/BEV | ✖️ | ✔ | ✖️ | ✖️ | ✔ | ✔ | ✔ | ✔ | D/N | U/S/R/H | HKU, Huawei, CARLA |
| Dual_Radar | 2023 | RGB | FPV | ✔ | ✔ | ✔ | ✔ | ✔ | ✖️ | ✔ | ✔ | D/N | U | Tsinghua University |
| V2V4Real | 2023 | RGB | FPV | ✔ | ✔ | ✖️ | ✔ | ✔ | ✖️ | ✔ | ✖️ | - | U/H/S | UCLA Mobility Lab |
| SCaRL | 2024 | RGB/S | FPV/BEV | ✖️ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | D/N | U/S/R/H | Fraunhofer CARLA |
| MARS | 2024 | RGB | FPV | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | D/N | U/S/H | NYU, MAY Mobility |
| Scenes101 | 2024 | RGB | FPV | ✔ | ✖️ | ✖️ | ✔ | ✖️ | ✖️ | ✔ | ✔ | D/N | U/S/R/H | Wayve |
| TruckScenes | 2025 | RGB | FPV | ✔ | ✔ | ✔ | ✔ | ✔ | ✖️ | ✔ | ✔ | D/N | H/U | MAN |
Notes: View: FPV=First-Person, BEV=Bird's-Eye; Time: D=Day, N=Night; Region: U=Urban, R=Rural, H=Highway, S=Suburban, C=Campus, I=Intersection, RA=Road Area; Img: RGB/S=RGB+Stereo
The following figure shows the usage distribution of different foundation model types across autonomous driving simulators:
Simulators
| Simulator | Year | Back-end | Open Source | Realistic Perception | Custom Scenario | Real World Map | Human Design Map | Python API | C++ API | ROS API | Company |
|---|---|---|---|---|---|---|---|---|---|---|---|
| TORCS | 2000 | None | ✔ | ✔ | ✔ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | - |
| Webots | 2004 | ODE | ✔ | ✔ | ✔ | ✔ | ✖️ | ✔ | ✔ | ✖️ | - |
| CarRacing | 2017 | None | ✔ | ✖️ | ✖️ | ✖️ | ✔ | ✔ | ✖️ | ✖️ | - |
| CARLA | 2017 | UE4 | ✔ | ✔ | ✔ | ✖️ | ✔ | ✔ | ✔ | ✔ | - |
| SimMobilityST | 2017 | None | ✔ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | - |
| GTA-V | 2017 | RAGE | ✖️ | ✔ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ | - |
| highway-env | 2018 | None | ✔ | ✖️ | ✔ | ✖️ | ✔ | ✔ | ✖️ | ✖️ | - |
| Deepdrive | 2018 | UE4 | ✔ | ✔ | ✔ | ✖️ | ✔ | ✔ | ✔ | ✖️ | - |
| esmini | 2018 | Unity | ✔ | ✖️ | ✖️ | ✖️ | ✖️ | ✔ | ✖️ | ✖️ | - |
| AutonoViSim | 2018 | PhysX | ✖️ | ✔ | ✔ | ✖️ | ✖️ | ✔ | ✖️ | ✖️ | - |
| AirSim | 2018 | UE4 | ✔ | ✔ | ✔ | ✖️ | ✔ | ✔ | ✔ | ✖️ | - |
| SUMO | 2018 | None | ✔ | ✖️ | ✔ | ✔ | ✔ | ✖️ | ✔ | ✖️ | - |
| Apollo | 2018 | Unity | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✖️ | - |
| Sim4CV | 2018 | UE4 | ✔ | ✔ | ✔ | ✖️ | ✔ | ✔ | ✖️ | ✖️ | - |
| MATLAB | 2018 | MATLAB | ✖️ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | Mathworks |
| Scenic | 2019 | None | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✖️ | ✖️ | Toyota Research Institute, UC Berkeley |
| SUMMIT | 2020 | UE4 | ✔ | ✔ | ✔ | ✖️ | ✔ | ✔ | ✔ | ✖️ | - |
| MultiCarRacing | 2020 | None | ✔ | ✖️ | ✔ | ✖️ | ✔ | ✔ | ✖️ | ✖️ | - |
| SMARTS | 2020 | None | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✖️ | ✖️ | - |
| LGSVL | 2020 | Unity | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | - |
| CausalCity | 2020 | UE4 | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✖️ | ✖️ | - |
| Vista | 2020 | None | ✔ | ✔ | ✔ | ✔ | ✖️ | ✔ | ✖️ | ✖️ | MIT |
| MetaDrive | 2021 | Panda3D | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✖️ | - |
| L2R | 2021 | UE4 | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✖️ | - |
| AutoDRIVE | 2021 | Unity | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | - |
| Nuplan | 2021 | None | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✖️ | ✖️ | Motional |
| AWSIM | 2021 | Unity | ✔ | ✔ | ✔ | ✔ | ✔ | ✖️ | ✖️ | ✔ | Autoware |
| InterSim | 2022 | None | ✔ | ✔ | ✔ | ✔ | ✖️ | ✔ | ✖️ | ✖️ | Tsinghua |
| Nocturne | 2022 | None | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✖️ | |
| BeamNG.tech | 2022 | Soft-body physics | ✖️ | ✔ | ✔ | ✖️ | ✔ | ✔ | ✖️ | ✔ | BeamNG GmbH |
| Waymax | 2023 | JAX | ✔ | ✔ | ✔ | ✖️ | ✔ | ✔ | ✖️ | ✖️ | Waymo |
| UNISim | 2023 | None | ✖️ | ✔ | ✔ | ✔ | ✖️ | ✖️ | ✔ | ✖️ | Waabi |
| TBSim | 2023 | None | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✖️ | ✖️ | NVIDIA |
| Nvidia DriveWorks | 2024 | Nvidia GPU | ✖️ | ✔ | ✔ | ✔ | ✖️ | ✔ | ✔ | ✖️ | NVIDIA |
Benchmark Challenges
| Name | Host |
|---|---|
| CARLA AD Challenge | CARLA |
| DRL4Real | ICCV |
| Waymo Open Dataset Challenge | Waymo / CVPR WAD |
| Argoverse 2: Scenario Mining | ArgoAI |
| Roboflow-20VL | Roboflow-VL / CVPR |
| AVA Challenge | AVA Challenge Team |
| Name | Host |
|---|---|
| IGLU Challenge | NeurIPS / IGLU Team |
| LLM Efficiency Challenge | NeurIPS |
| Trojan Detection | NeurIPS / CAIS |
| SMART-101 | CVPR |
| NICE Challenge | CVPR / LG Research |
| SyntaGen | CVPR |
| Habitat Challenge | CVPR / FAIR |
| BIG-bench | Google Research |
| BIG-bench Hard (BBH) | Google Research |
| HELM | Stanford CRFM |
| MMBench | OpenCompass |
| MMMU | CVPR / U-Waterloo / OSU |
| Open LLM Leaderboard | VILA-Lab |
| Text-to-Image Leaderboard | Artificial Analysis |
| Ego4D | FAIR |
| VizWiz Grand Challenge | CVPR VizWiz Workshop |
| MedFM | NeurIPS / Shanghai AI Laboratory |
| 3D Scene Understanding | CVPR |
Common Tools and Frameworks
This section provides links to commonly used tools, frameworks, and resources for working with foundation models in autonomous driving.
- Hugging Face Models - Large collection of pre-trained foundation models
- Hugging Face Vision-Language Models Leaderboard - Benchmark and leaderboard for VLMs
- Open LLM Leaderboard - Comprehensive LLM benchmarking
- Awesome LLM - Curated list of Large Language Models resources
- vLLM - High-throughput and memory-efficient inference engine for LLMs
- LMDeploy - Toolkit for compressing, deploying, and serving LLMs
- Ollama - Run large language models locally
- Text Generation Inference - Production-ready inference container by Hugging Face
- TensorRT-LLM - High-performance inference library by NVIDIA
- Hugging Face Transformers - State-of-the-art ML for PyTorch, TensorFlow, and JAX
- LLaMA Factory - Unified framework for fine-tuning LLMs
- Axolotl - Tool for fine-tuning large language models
- PEFT - Parameter-Efficient Fine-Tuning methods
We welcome contributions from the community! If you have research papers, tools, or resources to add, please create a pull request or open an issue.
This repository is released under the Apache 2.0 license.




