Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis 🚗

This repository will collect research, implementations, and resources related to Foundation Models for Scenario Generation and Analysis in autonomous driving. The repository will be maintained by TUM-AVS (Professorship of Autonomous Vehicle Systems at Technical University of Munich) and will be continuously updated to track the latest work in the community.

🔥 Updates

Nov 2025 – Added 2 new papers on scenario analysis. Added new section: Useful Resources and Links.
Uploaded new version to arXiv. Repository now categorizes 348 papers:
- 93 on scenario generation
- 56 on scenario analysis
- 58 on datasets
- 21 on simulators
- 25 on benchmark challenges
- 95 on other related topics (e.g., FMs' implementation)
Oct 2025 – Added 17 new papers on scenario generation and 2 on scenario analysis.
Sep 2025 – Added 3 new papers on scenario generation and 14 on scenario analysis.
Aug 2025 – Added 4 new papers on scenario generation and 4 on scenario analysis.
Jul 2025 – Added 9 new papers on scenario generation and 8 on scenario analysis.
Jun 2025 – Released our paper on arXiv. Repository now categorizes 342 papers:
- 93 on scenario generation
- 54 on scenario analysis
- 55 on datasets
- 21 on simulators
- 25 on benchmark challenges
- 94 on other related topics
May 2025 – Repository initialized.

🤝 Citation

Please visit Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis for more details and comprehensive information. If you find our paper and repo helpful, please consider citing it as follows:

@misc{gao2025foundation,
  title={Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis},
  author={Yuan Gao, Mattia Piccinini, Yuchen Zhang, Dingrui Wang, Korbinian Moller, Roberto Brusnicki, Baha Zarrouki, Alessio Gambi, Jan Frederik Totz, Kai Storms, Steven Peters, Andrea Stocco, Bassam Alrifaee, Marco Pavone and Johannes Betz,
  journal={TBD},
  year={2025},
  eprint={2506.11526},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2506.11526}, 
}

📃 Introduction

Foundation models are large-scale, pre-trained models that can be adapted to a wide range of downstream tasks. In the context of autonomous driving, foundation models offer a powerful approach to scenario generation and analysis, enabling more comprehensive and realistic testing, validation, and verification of autonomous driving systems. This repository aims to collect and organize research, tools, and resources in this important field.

📈 Publication Timeline

The following figure shows the evolution of foundation model research in autonomous driving scenario generation and analysis over time:

🔍 Search Methodology

The following list of keywords was used to search this survey's papers in the Google Scholar database. The keywords were entered either individually or in combination with other keywords in the list. The search was conducted until May 2025.

Keywords:

Foundation Model Types: Foundation Models, Large Language Models (LLMs), Vision-Language Models (VLMs), Multimodal Large Language Models (MLLMs), Diffusion Models (DMs), World Models (WMs), Generative Models (GMs)
Scenario Generation & Analysis: Scenario Generation, Scenario Simulation, Traffic Simulation, Scenario Testing, Scenario Understanding, Driving Scene Generation, Scene Reasoning, Risk Assessment, Safety-Critical Scenarios, Accident Prediction
Application Context: Autonomous Driving, Self-Driving Vehicles, AV Simulation, Driving Video Generation, Traffic Datasets, Closed-Loop Simulation, Safety Assurance

🌟 Large Language Models for Autonomous Driving

Scenario Generation (LLM)

Paper	Date	Venue	Code	Citation
TARGET: Automated Scenario Generation from Traffic Rules for Testing Autonomous Vehicles	2023-05	arXiv	-	9
Language Conditioned Traffic Generation	2023-07	CoRL 2023	GitHub	79
A Generative AI-driven Application: Use of Large Language Models for Traffic Scenario Generation	2023-11	ELECO 2023	-	6
ChatGPT-Based Scenario Engineer: A New Framework on Scenario Generation for Trajectory Prediction	2024-02	IEEE Transactions on Intelligent Vehicles	-	25
Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation	2024-04	arXiv	GitHub	22
LLMScenario: Large Language Model Driven Scenario Generation	2024-05	IEEE Transactions on Systems, Man, and Cybernetics: Systems	-	37
Automatic Generation Method for Autonomous Driving Simulation Scenarios Based on Large Language Model	2024-05	AIAT 2024	-	2
ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for Autonomous Vehicles	2024-05	CVPR 2024	GitHub	99
Editable scene simulation for autonomous driving via collaborative llm-agents	2024-06	CVPR 2024	GitHub	123
Chat2Scenario: Scenario Extraction From Dataset Through Utilization of Large Language Model	2024-06	IV 2024	GitHub	12
SoVAR: Building Generalizable Scenarios from Accident Reports for Autonomous Driving Testing	2024-09	ASE 2024	-	15
LeGEND: A Top-Down Approach to Scenario Generation of Autonomous Driving Systems Assisted by Large Language Models	2024-09	ASE 2024	GitHub	14
Traffic Scene Generation from Natural Language Description for Autonomous Vehicles with Large Language Model	2024-09	arXiv	GitHub	14
Promptable Closed-loop Traffic Simulation	2024-09	CoRL 2024	GitHub	15
Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles	2024-09	arXiv	-	20
LLM-Driven Testing for Autonomous Driving Scenarios	2024-11	FLLM 2024	-	9
ChatSUMO: Large Language Model for Automating Traffic Scenario Generation in Simulation of Urban MObility	2024-11	IEEE Transactions on Intelligent Vehicles	-	29
Generating Out-Of-Distribution Scenarios Using Language Models	2024-11	arXiv	-	8
Generating Traffic Scenarios via In-Context Learning to Learn Better Motion Planner	2024-12	AAAI 2025 Oral	GitHub	3
LLM-attacker: Enhancing Closed-loop Adversarial Scenario Generation for Autonomous Driving with Large Language Models	2025-01	TITS 2025	-	11
ML-SceGen: A Multi-level Scenario Generation Framework	2025-01	arXiv	-	0
Risk-Aware Driving Scenario Analysis with Large Language Models	2025-02	ITSC 2025	GitHub	1
CurricuVLM: Towards Safe Autonomous Driving via Personalized Safety-Critical Curriculum Learning with Vision-Language Models	2025-02	arXiv	GitHub	8
Text2Scenario: Text-Driven Scenario Generation for Autonomous Driving Test	2025-03	arXiv	GitHub	8
Enhancing Autonomous Driving Safety with Collision Scenario Integration	2025-03	arXiv	-	6
Seeking to Collide: Online Safety-Critical Scenario Generation for Autonomous Driving with Retrieval Augmented Large Language Models	2025-05	arXiv	-	3
From Failures to Fixes: LLM-Driven Scenario Repair for Self-Evolving Autonomous Driving	2025-05	arXiv	-	0
AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework	2025-07	arXiv	-	0
LLM-based Realistic Safety-Critical Driving Video Generation	2025-07	arXiv	-	2
Adversarial Generation and Collaborative Evolution of Safety-Critical Scenarios for Autonomous Vehicles	2025-08	arXiv	GitHub	0
LLM-based Human-like Traffic Simulation for Self-driving Tests	2025-08	arXiv	-	0
Conversational Code Generation: a Case Study of Designing a Dialogue System for Generating Driving Scenarios for Testing Autonomous Vehicles	2025-09	GeCoIn 2025	-	3
Txt2Sce: Scenario Generation for Autonomous Driving System Testing Based on Textual Reports	2025-09	arXiv	-	0
LLM‑Based Semantic Modeling & Cooperative Evolutionary Fuzzing	2025-09	APSEC 2025	-	-
LinguaSim: Interactive Multi-Vehicle Testing Scenario Generation via Natural Language Instruction Based on Large Language Models	2025-10	arXiv	-	0

Scenario Analysis (LLM)

Paper	Date	Venue	Code	Citation
Semantic Anomaly Detection with Large Language Models	2023-09	Autonomous Robots	-	95
LLM Multimodal Traffic Accident Forecasting	2023-11	Sensors 2023 MDPI	-	59
Reality Bites: Assessing the Realism of Driving Scenarios with Large Language Models	2024-03	IEEE/ACM First International Conference on AI Foundation Models and Software Engineering (Forge)	GitHub	20
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving	2024-05	ICRA 2024	GitHub	275
Generating Out-Of-Distribution Scenarios Using Language Models	2024-11	arXiv	-	8
SenseRAG: Constructing Environmental Knowledge Bases with Proactive Querying for LLM-Based Autonomous Driving	2025-01	arXiv	-	9
From Words to Collisions: LLM-Guided Evaluation and Adversarial Generation of Safety-Critical Driving Scenarios	2025-02	ITSC 2025	GitHub	1
CurricuVLM: Towards Safe Autonomous Driving via Personalized Safety-Critical Curriculum Learning with Vision-Language Models	2025-02	arXiv	GitHub	8
A Comprehensive LLM-powered Framework for Driving Intelligence Evaluation	2025-03	arXiv	-	7
Collision risk prediction and takeover requirements assessment based on radar-video integrated sensors data: A system framework based on LLM	2025-08	arXiv	-	3

🌟 Vision-Language Models for Autonomous Driving

Scenario Generation (VLM)

Paper	Date	Venue	Code	Citation
WEDGE: A multi-weather autonomous driving dataset built from generative vision-language models	2023-05	CVPR workshop 2023	-	40
DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving	2024-08	IAVVC 2024	-	9
Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles	2024-09	arXiv	-	20
From Dashcam Videos to Driving Simulations: Stress Testing Automated Vehicles against Rare Events	2024-11	arXiv	-	7
Generating Out-Of-Distribution Scenarios Using Language Models	2024-11	arXiv	-	8
From Accidents to Insights: Leveraging Multimodal Data for Scenario-Driven ADS Testing	2025-02	arXiv	-	1
CurricuVLM: Towards Safe Autonomous Driving via Personalized Safety-Critical Curriculum Learning with Vision-Language Models	2025-02	arXiv	GitHub	8
CrashAgent: Crash Scenario Generation via Multi-modal Reasoning	2025-05	arXiv	-	2
BENCH2ADVLM: A Closed-Loop Benchmark for Vision-language Models in Autonomous Driving	2025-08	arXiv	-	1
Vision Language Model-based Testing of Industrial Autonomous Mobile Robots	2025-08	arXiv	-	3

Scenario Analysis (VLM)

Paper	Date	Venue	Code	Citation
Unsupervised 3D Perception with 2D Vision-Language Distillation for Autonomous Driving	2023-09	ICCV 2023	-	42
OpenAnnotate3D: Open-Vocabulary Auto-Labeling System for Multi-modal 3D Data	2023-10	ICRA 2024	-	22
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving	2023-11	ICIL 2024 Workshop on Large Language Models for Agents	GitHub	97
Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous Driving	2023-11	ICRA 2024	GitHub	96
LLM Multimodal Traffic Accident Forecasting	2023-11	Sensors 2023 MDPI	-	59
NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous Driving Datasets using Markup Annotations	2024-01	WACVW LLVM-AD 2024	GitHub	32
Is it safe to cross? Interpretable Risk Assessment with GPT-4V for Safety-Aware Street Crossing	2024-02	UR 2024	-	16
Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving	2024-03	VLADR 2024	GitHub	39
LATTE: A Real-time Lightweight Attention-based Traffic Accident Anticipation Engine	2024-04	arXiv	-	4
OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning	2024-05	CVPR 2025	GitHub	62
Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving	2024-06	ECCV 2024	GitHub	96
SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving	2024-07	arXiv	-	12
Large Language Models Powered Context-aware Motion Prediction in Autonomous Driving	2024-07	IROS 2024	GitHub	15
DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving	2024-08	IAVVC 2024	-	9
V2X-VLM: End-to-End V2X Cooperative Autonomous Driving Through Large Vision-Language Models	2024-08	arXiv	-	44
Think-Driver: From Driving-Scene Understanding to Decision-Making with Vision Language Models	2024-09	ECCV 2024 Workshop	-	3
VLM-Auto: VLM-based Autonomous Driving Assistant with Human-like Behavior and Understanding for Complex Road Scenes	2024-10	FLLM 2024	GitHub	14
Generating Out-Of-Distribution Scenarios Using Language Models	2024-11	arXiv	-	8
Visual Adversarial Attack on Vision-Language Models for Autonomous Driving	2024-11	arXiv	-	17
Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases	2024-12	WACV 2025	GitHub	53
SFF Rendering-Based Uncertainty Prediction using VisionLLM	2024-12	AAAI 2025 Workshop LM4Plan	-	8
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives	2025-01	arXiv	GitHub	53
Enhancing Large Vision Model in Street Scene Semantic Understanding through Leveraging Posterior Optimization Trajectory	2025-01	arXiv	-	6
Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding	2025-01	IAVVC 2024	-	9
DriveLM: Driving with Graph Visual Question Answering	2025-01	ECCV 2024	GitHub	310
Scenario Understanding of Traffic Scenes Through Large Visual Language Models	2025-01	WACV 2025	-	6
INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Hazard Detection and Edge Case Evaluation	2025-02	arXiv	-	8
CurricuVLM: Towards Safe Autonomous Driving via Personalized Safety-Critical Curriculum Learning with Vision-Language Models	2025-02	arXiv	GitHub	8
Evaluating Multimodal Vision-Language Model Prompting Strategies for Visual Question Answering in Road Scene Understanding	2025-02	WACV workshop 2025	-	11
NuGrounding: A Multi-View 3D Visual Grounding Framework in Autonomous Driving	2025-03	arXiv	-	6
AutoDrive-QA- Automated Generation of Multiple-Choice Questions for Autonomous Driving Datasets Using Large Vision-Language Models	2025-03	arXiv	GitHub	2
DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding	2025-03	arXiv	GitHub	17
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation	2025-03	arXiv	GitHub	34
ChatBEV: A Visual Language Model that Understands BEV Maps	2025-03	arXiv	-	2
Retrieval-Based Interleaved Visual Chain-of-Thought in Real-World Driving Scenarios	2025-04	arXiv	GitHub	6
Vision Foundation Model Embedding-Based Semantic Anomaly Detection	2025-05	ICRA 2025 Workshop	-	3
OpenLKA: An Open Dataset of Lane Keeping Assist from Recent Car Models under Real-world Driving Conditions	2025-05	arXiv	GitHub	2
SURDS: Benchmarking Spatial Understanding and Reasoning in Driving Scenarios with Vision Language Models	2025-05	arXiv	GitHub	10
Bridging Human Oversight and Black-box Driver Assistance: Vision-Language Models for Predictive Alerting in Lane Keeping Assist systems	2025-05	arXiv	-	2
FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving	2025-06	arXiv	-	46
Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios	2025-06	arXiv	-	1
Structured Labeling Enables Faster Vision-Language Models for End-to-End Autonomous Driving	2025-06	arXiv	-	0
DriveMRP: Enhancing Vision-Language Models with Synthetic Motion Data for Motion Risk Prediction	2025-07	arXiv	GitHub	0
SafeDriveRAG: Towards Safe Autonomous Driving with Knowledge Graph-based Retrieval-Augmented Generation	2025-07	ACMMM 2025	GitHub	0
DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For Driving	2025-08	arXiv	GitHub	2
NuRisk: A Visual Question Answering Dataset for Agent-Level Risk Assessment in Autonomous Driving	2025-09	arXiv	-	1
DriveAgent-R1: Advancing VLM-based Autonomous Driving with Active Perception and Hybrid Thinking	2025-09	arXiv	-	2
Enhancing Vision-Language Models for Autonomous Driving through Task-Specific Prompting and Spatial Reasoning	2025-09	IROS 2025	GitHub	0
More Than Meets the Eye? Uncovering the Reasoning-Planning Disconnect in Training Vision-Language Driving Models	2025-10	arXiv	-	0

🌟 Multimodal Large Language Models for Autonomous Driving

Scenario Generation (MLLM)

Paper	Date	Venue	Code	Citation
Realistic Corner Case Generation for Autonomous Vehicles with Multimodal Large Language Model	2024-11	arXiv	-	8
LMM-enhanced Safety-Critical Scenario Generation for Autonomous Driving System Testing From Non-Accident Traffic Videos	2025-01	arXiv	GitHub	6
Multi-modal Traffic Scenario Generation for Autonomous Driving System Testing	2025-06	FSE 2025	-	0
Talk2Traffic: Interactive and Editable Traffic Scenario Generation for Autonomous Driving with Multimodal Large Language Model	2025-06	CVPR 2025 WDFM-AD	GitHub	3

Scenario Analysis (MLLM)

Paper	Date	Venue	Code	Citation
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model	2023-10	IEEE Robotics and Automation Letters 2024	GitHub	439
Dolphins: Multimodal Language Model for Driving	2023-12	ECCV 2024	GitHub	109
AccidentGPT: Accident analysis and prevention from V2X Environmental Perception with Multi-modal Large Model	2023-12	IV 2024	GitHub	31
Lidar-llm: Exploring the potential of large language models for 3d lidar understanding	2023-12	AAAI 2025	GitHub	98
LingoQA: Visual Question Answering for Autonomous Driving	2023-12	ECCV 2024	GitHub	82
Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models	2024-01	CVPR 2024	GitHub	73
MAPLM: A Real-World Large-Scale Vision-Language Benchmark for Map and Traffic Scene Understanding	2024-01	CVPR 2024	GitHub	51
WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-Grained Spatial-Temporal Understanding	2024-06	ECCV 2024	GitHub	18
Semantic Understanding of Traffic Scenes with Large Vision Language Models	2024-06	IV 2024	GitHub	27
VLAAD: Vision and Language Assistant for Autonomous Driving	2024-06	WACVW 2024	GitHub	49
InternDrive: A Multimodal Large Language Model for Autonomous Driving Scenario Understanding	2024-07	AIAHPC 2024	-	5
LingoQA: Visual Question Answering for Autonomous Driving	2024-09	ECCV 2024	GitHub	82
Using Multimodal Large Language Models for Automated Detection of Traffic Safety Critical Events	2024-09	Vehicles 2024 MDPI	-	6
MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios	2024-12	arXiv	GitHub	7
Distilling Multi-modal Large Language Models for Autonomous Driving	2025-01	CVPR 2025	-	19
TUMTraffic-VideoQA: A Benchmark for Unified Spatio-Temporal Video Understanding in Traffic Scenes	2025-02	ICML 2025	GitHub	10
ScVLM: Enhancing Vision-Language Model for Safety-Critical Event Understanding	2025-02	WACV Workshop 2025	GitHub	9
Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning	2025-02	arXiv	-	15
A Framework for a Capability-driven Evaluation of Scenario Understanding for Multimodal Large Language Models in Autonomous Driving	2025-03	arXiv	-	1
HiLM-D: Enhancing MLLMs with Multi-Scale High-Resolution Details for Autonomous Driving	2025-03	International Journal of Computer Vision	-	53
NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models	2025-03	arXiv	-	10
A Framework for a Capability-driven Evaluation of Scenario Understanding for Multimodal Large Language Models in Autonomous Driving	2025-03	arXiv	-	1
Tracking Meets Large Multimodal Models for Driving Scenario Understanding	2025-03	arXiv	GitHub	2
SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment	2025-03	CVPR 2025	GitHub	25
V3LMA: Visual 3D-enhanced Language Model for Autonomous Driving	2025-04	arXiv	-	2
Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding	2025-04	arXiv	GitHub	4
ALN-P3: Unified Language Alignment for Perception, Prediction, and Planning in Autonomous Driving	2025-05	arXiv	-	0
X-Driver: Explainable Autonomous Driving with Vision-Language Models	2025-06	arXiv	-	2
EMMA: End-to-End Multimodal Model for Autonomous Driving	2025-07	arXiv	-	96
SafePLUG: Empowering Multimodal LLMs with Pixel-Level Insight and Temporal Grounding for Traffic Accident Understanding	2025-08	arXiv	-	1
RoboTron-Drive: All-in-One Large Multimodal Model for Autonomous Driving	2025-08	arXiv	-	9
Investigating Traffic Accident Detection Using Multimodal Large Language Models	2025-09	IAVVC 2025	-	1

🌟 Diffusion Models for Autonomous Driving

Scenario Generation (Diffusion Models)

Paper	Date	Venue	Code	Citation
Guided Conditional Diffusion for Controllable Traffic Simulation	2022-10	ICRA 2023	GitHub	218
Generating Driving Scenes with Diffusion	2023-05	arXiv	-	21
DiffScene: Guided Diffusion Models for Safety-Critical Scenario Generation	2023-06	AdvML-Frontiers 2023	-	68
BEVControl: Accurately Controlling Street-view Elements with Multi-perspective Consistency via BEV Sketch Layout	2023-09	arXiv	-	81
DriveSceneGen: Generating Diverse and Realistic Driving Scenarios From Scratch	2023-09	IEEE Robotics and Automation Letters 2024	-	22
MagicDrive: Street View Generation with Diverse 3D Geometry Control	2023-10	ICLR 2024	GitHub	185
DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model	2023-10	ECCV 2024	-	82
Language-guided traffic simulation via scene-level diffusion	2023-11	CoRL 2023	-	121
Scenario Diffusion: Controllable Driving Scenario Generation With Diffusion	2023-11	NeurIPS 2023	-	62
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving	2023-11	CVPR 2024	GitHub	105
SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries	2023-12	ECCV 2024	GitHub	24
Text2Street: Controllable Text-to-image Generation for Street Views	2024-02	ICPR 2024	-	12
ChatTraffic: Text-to-Traffic Generation via Diffusion Model	2024-02	arXiv	-	11
GEODIFFUSION: Text-Prompted Geometric Control for Object Detection Data Generation	2024-02	LCLR 2024	GitHub	46
GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model	2024-04	ITSC 2024	-	5
Versatile Behavior Diffusion for Generalized Traffic Agent Simulation	2024-04	RSS 2024	GitHub	15
SceneControl: Diffusion for Controllable Traffic Scene Generation	2024-05	ICRA 2024	-	26
SLEDGE: Synthesizing Driving Environments with Generative Models and Rule-Based Traffic	2024-07	ECCV 2024	GitHub	16
DrivingGen: Efficient Safety-Critical Driving Video Generation with Latent Diffusion Models	2024-07	ICME 2024	-	9
Controllable Traffic Simulation through LLM-Guided Hierarchical Chain-of-Thought Reasoning	2024-09	arXiv	-	0
AdvDiffuser: Generating Adversarial Safety-Critical Driving Scenarios via Guided Diffusion	2024-10	IROS 2023	-	20
Data-driven Diffusion Models for Enhancing Safety in Autonomous Vehicle Traffic Simulations	2024-10	arXiv	-	3
DiffRoad: Realistic and Diverse Road Scenario Generation for Autonomous Vehicle Testing	2024-11	arXiv	-	3
SceneDiffuser: Efficient and Controllable Driving Simulation Initialization and Rollout	2024-12	NeurIPS 2024	-	33
Direct Preference Optimization-Enhanced Multi-Guided Diffusion Model for Traffic Scenario Generation	2025-02	arXiv	-	1
Causal Composition Diffusion Model for Closed-loop Traffic Generation	2025-02	arXiv	-	7
Rolling Ahead Diffusion for Traffic Scene Simulation	2025-02	AAAI 2025 Workshop	-	1
AVD2: Accident Video Diffusion for Accident Video Description	2025-03	ICRA 2025	GitHub	16
DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance	2025-03	arXiv	-	4
Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments	2025-03	arXiv	-	7
DriveGen: Towards Infinite Diverse Traffic Scenarios with Large Models	2025-03	arXiv	-	4
DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer	2025-04	arXiv	-	4
Decoupled Diffusion Sparks Adaptive Scene Generation	2025-04	ICCV 2025	GitHub	5
DualDiff: Dual-branch Diffusion Model for Autonomous Driving with Semantic Fusion	2025-05	arXiv	-	3
LD-Scene: LLM-Guided Diffusion for Controllable Generation of Adversarial Safety-Critical Driving Scenarios	2025-05	arXiv	-	5
Dual-Conditioned Temporal Diffusion Modeling for Driving Scene Generation	2025-05	ICAR 2025	GitHub	1
Diffusion Models for Safety Validation of Autonomous Driving Systems	2025-06	arXiv	-	1
Diffusion-Based Generation and Imputation of Driving Scenarios from Limited Vehicle CAN Data	2025-09	arXiv	-	0
DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion	2025-10	NeurIPS 2025 Workshop	-	0

Scenario Analysis (Diffusion Models)

Paper	Date	Venue	Code	Citation
AVD2: Accident Video Diffusion for Accident Video Description	2025-03	ICRA 2025	GitHub	16

🌟 World Models for Autonomous Driving

World Models for Autonomous Driving

Paper	Date	Venue	Code	Application	Citation
DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving	2023-09	ECCV 2024	GitHub	Scenario Generation	219
GAIA-1: A Generative World Model for Autonomous Driving	2023-09	arXiv Wayve	-	Scenario Generation	369
TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction	2023-09	ICRA 2023	GitHub	Scenario Generation	55
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion	2023-11	ICLR 2024	-	Scenario Generation	81
MUVO: A Multimodal Generative World Model for Autonomous Driving with Geometric Representations	2023-11	IV 2025	-	Scenario Generation	28
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving	2023-11	CVPR 2024	GitHub	Scenario Generation	221
Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability	2024-03	NeurIPS 2024	-	Scenario Generation	181
MagicDrive: Street View Generation with Diverse 3D Geometry Control	2024-05	arXiv	-	Scenario Generation	185
DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation	2024-05	AAAI 2025	GitHub	Scenario Generation	118
UniScene: Multi-Camera Unified Pre-training via 3D Scene Reconstruction for Autonomous Driving	2024-08	RAL 2024	-	Scenario Generation	24
WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation	2024-08	ECCV 2024	-	Scenario Generation	64
Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving	2024-08	CVPR 2024	GitHub	Scenario Generation	105
DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving	2024-08	arXiv	GitHub	Scenario Generation	45
DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment	2024-08	arXiv	-	Scenario Generation	3
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation	2024-11	CVPR 2025	GitHub	Scenario Generation	64
ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration	2024-11	arXiv	-	Scenario Generation	39
MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes	2024-11	arXiv	GitHub	Scenario Generation	52
MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control	2024-11	arXiv	GitHub	Scenario Generation	35
ACT-Bench: Towards Action Controllable World Models for Autonomous Driving	2024-12	arXiv	-	Scenario Generation	5
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control	2024-12	CVPR 2025	GitHub	Scenario Generation	27
SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model	2024-12	NeurIPS 2024	-	Scenario Generation	53
DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT	2024-12	arXiv	GitHub	Scenario Generation	38
Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving	2025-01	AAAI 2025	GitHub	Scenario Generation	18
DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance	2025-03	ICRA 2025	GitHub	Scenario Generation	3
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning	2025-03	arXiv	GitHub	Scenario Generation	12
GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving	2025-03	arXiv	-	Scenario Generation	31
Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latent Space	2025-03	arXiv	-	Scenario Generation	5
Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception	2025-03	arXiv	GitHub	Scenario Generation	7
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control	2025-04	arXiv	GitHub	Scenario Generation	3
DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment	2025-04	ACMMM 2025	GitHub	Scenario Generation	2
OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving	2025-05	arXiv	-	Scenario Generation	88
PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth	2025-05	arXiv	-	Scenario Generation	2
ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos	2025-05	arXiv	-	Scenario Generation	15
Epona: Autoregressive Diffusion World Model for Autonomous Driving	2025-06	ICCV 2025	GitHub	Scenario Generation	29
STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation	2025-06	IROS 2025	-	Scenario Generation	3
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction	2025-06	CVPR 2025	GitHub	Scenario Generation	18
DeepVerse: 4D Autoregressive Video Generation as a World Model	2025-06	arXiv	GitHub	Scenario Generation	7
World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving	2025-07	Commun Eng 4	GitHub	Scenario Generation	1
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation	2025-08	ICCV 2025	GitHub	Scenario Generation	17

📊 Datasets Comparison

The following figure shows the usage distribution of different foundation model types across autonomous driving datasets:

Datasets Comparison

Dataset	Year	Img	View	Real	Lidar	Radar	Traj	3D	2D	Lane	Weather	Time	Region	Company
CamVid	2009	RGB	FPV	✔	✖️	✖️	✖️	✖️	✔	✔	✔	D	U	-
KITTI	2013	RGB/S	FPV	✔	✔	✖️	✔	✔	✔	✔	✔	D	U/R/H	-
Cyclists	2016	RGB	FPV	✔	✖️	✖️	✖️	✖️	✖️	✖️	✖️	D	U	-
Cityscapes	2016	RGB/S	FPV	✔	✖️	✖️	✖️	✔	✔	✔	✖️	D	U	-
SYNTHIA	2016	RGB	FPV	✖️	✖️	✖️	✖️	✔	✔	✔	✔	D/N	U	-
Campus	2016	RGB	BEV	✖️	✖️	✖️	✖️	✖️	✖️	✖️	✖️	D	C	-
RobotCar	2016	RGB	FPV	✔	✖️	✖️	✖️	✖️	✖️	✖️	✖️	D/N	U	-
Mapillary	2017	RGB	FPV	✔	✖️	✖️	✖️	✖️	✔	✔	✔	D/N	U	-
P.F.B.	2017	RGB	FPV	✔	✖️	✖️	✖️	✖️	✔	✔	✔	D/N	U	-
BDD100K	2018	RGB	FPV	✔	✖️	✖️	✖️	✔	✔	✔	✔	D	U/H	-
HighD	2018	RGB	BEV	✔	✖️	✖️	✖️	✖️	✔	✔	✖️	D	H	-
Udacity	2018	RGB	FPV	✔	✖️	✖️	✖️	✖️	✖️	✖️	✖️	D	U	-
KAIST	2018	RGB/S	FPV	✔	✔	✖️	✖️	✖️	✔	✔	✔	D/N	U	-
Argoverse	2019	RGB/S	FPV	✔	✔	✖️	✖️	✖️	✔	✔	✔	D/N	U	-
TRAF	2019	RGB	FPV	✔	✖️	✖️	✖️	✖️	✔	✔	✔	D	U	-
ApolloScape	2019	RGB/S	FPV	✔	✖️	✖️	✖️	✔	✔	✔	✔	D	U	-
ACFR	2019	RGB	BEV	✔	✖️	✖️	✖️	✖️	✖️	✖️	✖️	D	RA	-
H3D	2019	RGB	FPV	✔	✖️	✖️	✖️	✖️	✔	✔	✔	D	U	-
INTERACTION	2019	RGB	BEV	✔	✖️	✖️	✖️	✖️	✖️	✖️	✖️	D	I/RA	-
Comma2k19	2019	RGB	FPV	✔	✖️	✖️	✔	✔	✖️	✖️	✖️	D/N	U/S/R/H	-
InD	2020	RGB	BEV	✔	✖️	✖️	✖️	✖️	✖️	✖️	✖️	D	I	-
RounD	2020	RGB	BEV	✔	✖️	✖️	✖️	✖️	✖️	✖️	✖️	D	RA	-
nuScenes	2020	RGB	FPV	✔	✔	✔	✖️	✔	✔	✔	✔	D/N	U	-
Lyft Level 5	2020	RGB	FPV	✔	✔	✔	✖️	✔	✔	✔	✔	D/N	U/S	-
Waymo Open	2020	RGB	FPV	✔	✔	✔	✔	✔	✔	✔	✔	D/N	U	-
A*3D	2020	RGB	FPV	✔	✔	✔	✔	✔	✔	✔	✔	D/N	U	-
RobotCar Radar	2020	RGB	FPV	✔	✔	✔	✔	✔	✔	✔	✔	D/N	U	-
Toronto3D	2020	RGB	BEV	✔	✔	✖️	✔	✔	✖️	✔	✖️	D/N	U	University of Waterloo
A2D2	2020	RGB	FPV	✔	✔	✔	✔	✔	✔	✖️	✔	✔	D	U/H/S/R
WADS	2020	RGB	FPV	✔	✔	✔	✔	✔	✖️	✖️	✔	D/N	U/S/R	Michigan Technological University
Argoverse 2	2021	RGB/S	FPV	✔	✔	✖️	✖️	✔	✔	✔	✔	D/N	U	-
PandaSet	2021	RGB	FPV	✔	✔	✔	✔	✔	✔	✔	✔	D/N	U	-
ONCE	2021	RGB	FPV	✔	✔	✔	✔	✔	✔	✔	✔	D/N	U	-
Leddar PixSet	2021	RGB	FPV	✔	✔	✖️	✔	✔	✔	✖️	✔	D/N	U/S/R	Leddar
ZOD	2022	RGB	FPV	✔	✔	✔	✔	✔	✔	✔	✔	D/N	U/R/S/H	Zenseact
IDD-3D	2022	RGB	FPV	✔	✔	✖️	✖️	✔	✔	✖️	✖️	-	R	INAI
CODA	2022	RGB	FPV	✔	✔	✔	✔	✔	✔	✔	✔	D/N	U/S/R	Huawei
SHIFT	2022	RGB	FPV	✔	✔	✔	✔	✔	✔	✔	✔	D/N	U/S/R/H	ETH Zürich
DeepAccident	2023	RGB/S	FPV/BEV	✖️	✔	✖️	✖️	✔	✔	✔	✔	D/N	U/S/R/H	HKU, Huawei, CARLA
Dual_Radar	2023	RGB	FPV	✔	✔	✔	✔	✔	✖️	✔	✔	D/N	U	Tsinghua University
V2V4Real	2023	RGB	FPV	✔	✔	✖️	✔	✔	✖️	✔	✖️	-	U/H/S	UCLA Mobility Lab
SCaRL	2024	RGB/S	FPV/BEV	✖️	✔	✔	✔	✔	✔	✔	✔	D/N	U/S/R/H	Fraunhofer CARLA
MARS	2024	RGB	FPV	✔	✔	✔	✔	✔	✔	✔	✔	D/N	U/S/H	NYU, MAY Mobility
Scenes101	2024	RGB	FPV	✔	✖️	✖️	✔	✖️	✖️	✔	✔	D/N	U/S/R/H	Wayve
TruckScenes	2025	RGB	FPV	✔	✔	✔	✔	✔	✖️	✔	✔	D/N	H/U	MAN

Notes: View: FPV=First-Person, BEV=Bird's-Eye; Time: D=Day, N=Night; Region: U=Urban, R=Rural, H=Highway, S=Suburban, C=Campus, I=Intersection, RA=Road Area; Img: RGB/S=RGB+Stereo

🎮 Simulators

The following figure shows the usage distribution of different foundation model types across autonomous driving simulators:

Simulators

Simulator	Year	Back-end	Open Source	Realistic Perception	Custom Scenario	Real World Map	Human Design Map	Python API	C++ API	ROS API	Company
TORCS	2000	None	✔	✔	✔	✖️	✖️	✖️	✖️	✖️	-
Webots	2004	ODE	✔	✔	✔	✔	✖️	✔	✔	✖️	-
CarRacing	2017	None	✔	✖️	✖️	✖️	✔	✔	✖️	✖️	-
CARLA	2017	UE4	✔	✔	✔	✖️	✔	✔	✔	✔	-
SimMobilityST	2017	None	✔	✖️	✖️	✖️	✖️	✖️	✖️	✖️	-
GTA-V	2017	RAGE	✖️	✔	✖️	✖️	✖️	✖️	✖️	✖️	-
highway-env	2018	None	✔	✖️	✔	✖️	✔	✔	✖️	✖️	-
Deepdrive	2018	UE4	✔	✔	✔	✖️	✔	✔	✔	✖️	-
esmini	2018	Unity	✔	✖️	✖️	✖️	✖️	✔	✖️	✖️	-
AutonoViSim	2018	PhysX	✖️	✔	✔	✖️	✖️	✔	✖️	✖️	-
AirSim	2018	UE4	✔	✔	✔	✖️	✔	✔	✔	✖️	-
SUMO	2018	None	✔	✖️	✔	✔	✔	✖️	✔	✖️	-
Apollo	2018	Unity	✔	✔	✔	✔	✔	✔	✔	✖️	-
Sim4CV	2018	UE4	✔	✔	✔	✖️	✔	✔	✖️	✖️	-
MATLAB	2018	MATLAB	✖️	✔	✔	✔	✔	✔	✔	✔	Mathworks
Scenic	2019	None	✔	✔	✔	✔	✔	✔	✖️	✖️	Toyota Research Institute, UC Berkeley
SUMMIT	2020	UE4	✔	✔	✔	✖️	✔	✔	✔	✖️	-
MultiCarRacing	2020	None	✔	✖️	✔	✖️	✔	✔	✖️	✖️	-
SMARTS	2020	None	✔	✔	✔	✔	✔	✔	✖️	✖️	-
LGSVL	2020	Unity	✔	✔	✔	✔	✔	✔	✔	✔	-
CausalCity	2020	UE4	✔	✔	✔	✔	✔	✔	✖️	✖️	-
Vista	2020	None	✔	✔	✔	✔	✖️	✔	✖️	✖️	MIT
MetaDrive	2021	Panda3D	✔	✔	✔	✔	✔	✔	✔	✖️	-
L2R	2021	UE4	✔	✔	✔	✔	✔	✔	✔	✖️	-
AutoDRIVE	2021	Unity	✔	✔	✔	✔	✔	✔	✔	✔	-
Nuplan	2021	None	✔	✔	✔	✔	✔	✔	✖️	✖️	Motional
AWSIM	2021	Unity	✔	✔	✔	✔	✔	✖️	✖️	✔	Autoware
InterSim	2022	None	✔	✔	✔	✔	✖️	✔	✖️	✖️	Tsinghua
Nocturne	2022	None	✔	✔	✔	✔	✔	✔	✔	✖️	Facebook
BeamNG.tech	2022	Soft-body physics	✖️	✔	✔	✖️	✔	✔	✖️	✔	BeamNG GmbH
Waymax	2023	JAX	✔	✔	✔	✖️	✔	✔	✖️	✖️	Waymo
UNISim	2023	None	✖️	✔	✔	✔	✖️	✖️	✔	✖️	Waabi
TBSim	2023	None	✔	✔	✔	✔	✔	✔	✖️	✖️	NVIDIA
Nvidia DriveWorks	2024	Nvidia GPU	✖️	✔	✔	✔	✖️	✔	✔	✖️	NVIDIA

🏆 Foundation Model Benchmark Challenges (2022–2025)

Benchmark Challenges

Autonomous Driving

Name	Host
CARLA AD Challenge	CARLA
DRL4Real	ICCV
Waymo Open Dataset Challenge	Waymo / CVPR WAD
Argoverse 2: Scenario Mining	ArgoAI
Roboflow-20VL	Roboflow-VL / CVPR
AVA Challenge	AVA Challenge Team

Other Fields Related to Generation and Analysis

Name	Host
IGLU Challenge	NeurIPS / IGLU Team
LLM Efficiency Challenge	NeurIPS
Trojan Detection	NeurIPS / CAIS
SMART-101	CVPR
NICE Challenge	CVPR / LG Research
SyntaGen	CVPR
Habitat Challenge	CVPR / FAIR
BIG-bench	Google Research
BIG-bench Hard (BBH)	Google Research
HELM	Stanford CRFM
MMBench	OpenCompass
MMMU	CVPR / U-Waterloo / OSU
Open LLM Leaderboard	VILA-Lab
Text-to-Image Leaderboard	Artificial Analysis
Ego4D	FAIR
VizWiz Grand Challenge	CVPR VizWiz Workshop
MedFM	NeurIPS / Shanghai AI Laboratory
3D Scene Understanding	CVPR

🔗 Useful Resources and Links

Common Tools and Frameworks

This section provides links to commonly used tools, frameworks, and resources for working with foundation models in autonomous driving.

Model Repositories and Leaderboards

Hugging Face Models - Large collection of pre-trained foundation models
Hugging Face Vision-Language Models Leaderboard - Benchmark and leaderboard for VLMs
Open LLM Leaderboard - Comprehensive LLM benchmarking
Awesome LLM - Curated list of Large Language Models resources

Model Inference Frameworks

vLLM - High-throughput and memory-efficient inference engine for LLMs
LMDeploy - Toolkit for compressing, deploying, and serving LLMs
Ollama - Run large language models locally
Text Generation Inference - Production-ready inference container by Hugging Face
TensorRT-LLM - High-performance inference library by NVIDIA

Training and Fine-tuning

Hugging Face Transformers - State-of-the-art ML for PyTorch, TensorFlow, and JAX
LLaMA Factory - Unified framework for fine-tuning LLMs
Axolotl - Tool for fine-tuning large language models
PEFT - Parameter-Efficient Fine-Tuning methods

Contributing

We welcome contributions from the community! If you have research papers, tools, or resources to add, please create a pull request or open an issue.

License

This repository is released under the Apache 2.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
Assets		Assets
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis 🚗

🔥 Updates

🤝 Citation

📃 Introduction

📈 Publication Timeline

🔍 Search Methodology

🌟 Large Language Models for Autonomous Driving

🌟 Vision-Language Models for Autonomous Driving

🌟 Multimodal Large Language Models for Autonomous Driving

🌟 Diffusion Models for Autonomous Driving

🌟 World Models for Autonomous Driving

📊 Datasets Comparison

🎮 Simulators

🏆 Foundation Model Benchmark Challenges (2022–2025)

Autonomous Driving

Other Fields Related to Generation and Analysis

🔗 Useful Resources and Links

Model Repositories and Leaderboards

Model Inference Frameworks

Training and Fine-tuning

Contributing

License

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

License

TUM-AVS/FM-AD-Survey

Folders and files

Latest commit

History

Repository files navigation

Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis 🚗

🔥 Updates

🤝 Citation

📃 Introduction

📈 Publication Timeline

🔍 Search Methodology

🌟 Large Language Models for Autonomous Driving

🌟 Vision-Language Models for Autonomous Driving

🌟 Multimodal Large Language Models for Autonomous Driving

🌟 Diffusion Models for Autonomous Driving

🌟 World Models for Autonomous Driving

📊 Datasets Comparison

🎮 Simulators

🏆 Foundation Model Benchmark Challenges (2022–2025)

Autonomous Driving

Other Fields Related to Generation and Analysis

🔗 Useful Resources and Links

Model Repositories and Leaderboards

Model Inference Frameworks

Training and Fine-tuning

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!