Skip to content

This repository collects research papers of large Foundation Models for Scenario Generation and Analysis in Autonomous Driving. The repository will be continuously updated to track the latest update.

License

Notifications You must be signed in to change notification settings

TUM-AVS/FM-AD-Survey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 

Repository files navigation

Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis 🚗

Paper Badge Stars Badge Forks Badge Pull Requests Badge Issues Badge License Badge

This repository will collect research, implementations, and resources related to Foundation Models for Scenario Generation and Analysis in autonomous driving. The repository will be maintained by TUM-AVS (Professorship of Autonomous Vehicle Systems at Technical University of Munich) and will be continuously updated to track the latest work in the community.

🔥 Updates

  • Nov 2025 – Added 2 new papers on scenario analysis. Added new section: Useful Resources and Links.
  • Uploaded new version to arXiv. Repository now categorizes 348 papers:
    • 93 on scenario generation
    • 56 on scenario analysis
    • 58 on datasets
    • 21 on simulators
    • 25 on benchmark challenges
    • 95 on other related topics (e.g., FMs' implementation)
  • Oct 2025 – Added 17 new papers on scenario generation and 2 on scenario analysis.
  • Sep 2025 – Added 3 new papers on scenario generation and 14 on scenario analysis.
  • Aug 2025 – Added 4 new papers on scenario generation and 4 on scenario analysis.
  • Jul 2025 – Added 9 new papers on scenario generation and 8 on scenario analysis.
  • Jun 2025 – Released our paper on arXiv. Repository now categorizes 342 papers:
    • 93 on scenario generation
    • 54 on scenario analysis
    • 55 on datasets
    • 21 on simulators
    • 25 on benchmark challenges
    • 94 on other related topics
  • May 2025 – Repository initialized.

🤝   Citation

Please visit Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis for more details and comprehensive information. If you find our paper and repo helpful, please consider citing it as follows:

@misc{gao2025foundation,
  title={Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis},
  author={Yuan Gao, Mattia Piccinini, Yuchen Zhang, Dingrui Wang, Korbinian Moller, Roberto Brusnicki, Baha Zarrouki, Alessio Gambi, Jan Frederik Totz, Kai Storms, Steven Peters, Andrea Stocco, Bassam Alrifaee, Marco Pavone and Johannes Betz,
  journal={TBD},
  year={2025},
  eprint={2506.11526},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2506.11526}, 
}

📃 Introduction

Foundation models are large-scale, pre-trained models that can be adapted to a wide range of downstream tasks. In the context of autonomous driving, foundation models offer a powerful approach to scenario generation and analysis, enabling more comprehensive and realistic testing, validation, and verification of autonomous driving systems. This repository aims to collect and organize research, tools, and resources in this important field.

📈 Publication Timeline

The following figure shows the evolution of foundation model research in autonomous driving scenario generation and analysis over time:

🔍 Search Methodology

The following list of keywords was used to search this survey's papers in the Google Scholar database. The keywords were entered either individually or in combination with other keywords in the list. The search was conducted until May 2025.

Keywords:

  • Foundation Model Types: Foundation Models, Large Language Models (LLMs), Vision-Language Models (VLMs), Multimodal Large Language Models (MLLMs), Diffusion Models (DMs), World Models (WMs), Generative Models (GMs)
  • Scenario Generation & Analysis: Scenario Generation, Scenario Simulation, Traffic Simulation, Scenario Testing, Scenario Understanding, Driving Scene Generation, Scene Reasoning, Risk Assessment, Safety-Critical Scenarios, Accident Prediction
  • Application Context: Autonomous Driving, Self-Driving Vehicles, AV Simulation, Driving Video Generation, Traffic Datasets, Closed-Loop Simulation, Safety Assurance

🌟 Large Language Models for Autonomous Driving

Scenario Generation (LLM)
Paper Date Venue Code Citation
TARGET: Automated Scenario Generation from Traffic Rules for Testing Autonomous Vehicles 2023-05 arXiv - 9
Language Conditioned Traffic Generation 2023-07 CoRL 2023 GitHub 79
A Generative AI-driven Application: Use of Large Language Models for Traffic Scenario Generation 2023-11 ELECO 2023 - 6
ChatGPT-Based Scenario Engineer: A New Framework on Scenario Generation for Trajectory Prediction 2024-02 IEEE Transactions on Intelligent Vehicles - 25
Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation 2024-04 arXiv GitHub 22
LLMScenario: Large Language Model Driven Scenario Generation 2024-05 IEEE Transactions on Systems, Man, and Cybernetics: Systems - 37
Automatic Generation Method for Autonomous Driving Simulation Scenarios Based on Large Language Model 2024-05 AIAT 2024 - 2
ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for Autonomous Vehicles 2024-05 CVPR 2024 GitHub 99
Editable scene simulation for autonomous driving via collaborative llm-agents 2024-06 CVPR 2024 GitHub 123
Chat2Scenario: Scenario Extraction From Dataset Through Utilization of Large Language Model 2024-06 IV 2024 GitHub 12
SoVAR: Building Generalizable Scenarios from Accident Reports for Autonomous Driving Testing 2024-09 ASE 2024 - 15
LeGEND: A Top-Down Approach to Scenario Generation of Autonomous Driving Systems Assisted by Large Language Models 2024-09 ASE 2024 GitHub 14
Traffic Scene Generation from Natural Language Description for Autonomous Vehicles with Large Language Model 2024-09 arXiv GitHub 14
Promptable Closed-loop Traffic Simulation 2024-09 CoRL 2024 GitHub 15
Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles 2024-09 arXiv - 20
LLM-Driven Testing for Autonomous Driving Scenarios 2024-11 FLLM 2024 - 9
ChatSUMO: Large Language Model for Automating Traffic Scenario Generation in Simulation of Urban MObility 2024-11 IEEE Transactions on Intelligent Vehicles - 29
Generating Out-Of-Distribution Scenarios Using Language Models 2024-11 arXiv - 8
Generating Traffic Scenarios via In-Context Learning to Learn Better Motion Planner 2024-12 AAAI 2025 Oral GitHub 3
LLM-attacker: Enhancing Closed-loop Adversarial Scenario Generation for Autonomous Driving with Large Language Models 2025-01 TITS 2025 - 11
ML-SceGen: A Multi-level Scenario Generation Framework 2025-01 arXiv - 0
Risk-Aware Driving Scenario Analysis with Large Language Models 2025-02 ITSC 2025 GitHub 1
CurricuVLM: Towards Safe Autonomous Driving via Personalized Safety-Critical Curriculum Learning with Vision-Language Models 2025-02 arXiv GitHub 8
Text2Scenario: Text-Driven Scenario Generation for Autonomous Driving Test 2025-03 arXiv GitHub 8
Enhancing Autonomous Driving Safety with Collision Scenario Integration 2025-03 arXiv - 6
Seeking to Collide: Online Safety-Critical Scenario Generation for Autonomous Driving with Retrieval Augmented Large Language Models 2025-05 arXiv - 3
From Failures to Fixes: LLM-Driven Scenario Repair for Self-Evolving Autonomous Driving 2025-05 arXiv - 0
AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework 2025-07 arXiv - 0
LLM-based Realistic Safety-Critical Driving Video Generation 2025-07 arXiv - 2
Adversarial Generation and Collaborative Evolution of Safety-Critical Scenarios for Autonomous Vehicles 2025-08 arXiv GitHub 0
LLM-based Human-like Traffic Simulation for Self-driving Tests 2025-08 arXiv - 0
Conversational Code Generation: a Case Study of Designing a Dialogue System for Generating Driving Scenarios for Testing Autonomous Vehicles 2025-09 GeCoIn 2025 - 3
Txt2Sce: Scenario Generation for Autonomous Driving System Testing Based on Textual Reports 2025-09 arXiv - 0
LLM‑Based Semantic Modeling & Cooperative Evolutionary Fuzzing 2025-09 APSEC 2025 - -
LinguaSim: Interactive Multi-Vehicle Testing Scenario Generation via Natural Language Instruction Based on Large Language Models 2025-10 arXiv - 0
Scenario Analysis (LLM)
Paper Date Venue Code Citation
Semantic Anomaly Detection with Large Language Models 2023-09 Autonomous Robots - 95
LLM Multimodal Traffic Accident Forecasting 2023-11 Sensors 2023 MDPI - 59
Reality Bites: Assessing the Realism of Driving Scenarios with Large Language Models 2024-03 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering (Forge) GitHub 20
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving 2024-05 ICRA 2024 GitHub 275
Generating Out-Of-Distribution Scenarios Using Language Models 2024-11 arXiv - 8
SenseRAG: Constructing Environmental Knowledge Bases with Proactive Querying for LLM-Based Autonomous Driving 2025-01 arXiv - 9
From Words to Collisions: LLM-Guided Evaluation and Adversarial Generation of Safety-Critical Driving Scenarios 2025-02 ITSC 2025 GitHub 1
CurricuVLM: Towards Safe Autonomous Driving via Personalized Safety-Critical Curriculum Learning with Vision-Language Models 2025-02 arXiv GitHub 8
A Comprehensive LLM-powered Framework for Driving Intelligence Evaluation 2025-03 arXiv - 7
Collision risk prediction and takeover requirements assessment based on radar-video integrated sensors data: A system framework based on LLM 2025-08 arXiv - 3

🌟 Vision-Language Models for Autonomous Driving

Scenario Generation (VLM)
Paper Date Venue Code Citation
WEDGE: A multi-weather autonomous driving dataset built from generative vision-language models 2023-05 CVPR workshop 2023 - 40
DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving 2024-08 IAVVC 2024 - 9
Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles 2024-09 arXiv - 20
From Dashcam Videos to Driving Simulations: Stress Testing Automated Vehicles against Rare Events 2024-11 arXiv - 7
Generating Out-Of-Distribution Scenarios Using Language Models 2024-11 arXiv - 8
From Accidents to Insights: Leveraging Multimodal Data for Scenario-Driven ADS Testing 2025-02 arXiv - 1
CurricuVLM: Towards Safe Autonomous Driving via Personalized Safety-Critical Curriculum Learning with Vision-Language Models 2025-02 arXiv GitHub 8
CrashAgent: Crash Scenario Generation via Multi-modal Reasoning 2025-05 arXiv - 2
BENCH2ADVLM: A Closed-Loop Benchmark for Vision-language Models in Autonomous Driving 2025-08 arXiv - 1
Vision Language Model-based Testing of Industrial Autonomous Mobile Robots 2025-08 arXiv - 3
Scenario Analysis (VLM)
Paper Date Venue Code Citation
Unsupervised 3D Perception with 2D Vision-Language Distillation for Autonomous Driving 2023-09 ICCV 2023 - 42
OpenAnnotate3D: Open-Vocabulary Auto-Labeling System for Multi-modal 3D Data 2023-10 ICRA 2024 - 22
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving 2023-11 ICIL 2024 Workshop on Large Language Models for Agents GitHub 97
Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous Driving 2023-11 ICRA 2024 GitHub 96
LLM Multimodal Traffic Accident Forecasting 2023-11 Sensors 2023 MDPI - 59
NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous Driving Datasets using Markup Annotations 2024-01 WACVW LLVM-AD 2024 GitHub 32
Is it safe to cross? Interpretable Risk Assessment with GPT-4V for Safety-Aware Street Crossing 2024-02 UR 2024 - 16
Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving 2024-03 VLADR 2024 GitHub 39
LATTE: A Real-time Lightweight Attention-based Traffic Accident Anticipation Engine 2024-04 arXiv - 4
OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning 2024-05 CVPR 2025 GitHub 62
Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving 2024-06 ECCV 2024 GitHub 96
SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving 2024-07 arXiv - 12
Large Language Models Powered Context-aware Motion Prediction in Autonomous Driving 2024-07 IROS 2024 GitHub 15
DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving 2024-08 IAVVC 2024 - 9
V2X-VLM: End-to-End V2X Cooperative Autonomous Driving Through Large Vision-Language Models 2024-08 arXiv - 44
Think-Driver: From Driving-Scene Understanding to Decision-Making with Vision Language Models 2024-09 ECCV 2024 Workshop - 3
VLM-Auto: VLM-based Autonomous Driving Assistant with Human-like Behavior and Understanding for Complex Road Scenes 2024-10 FLLM 2024 GitHub 14
Generating Out-Of-Distribution Scenarios Using Language Models 2024-11 arXiv - 8
Visual Adversarial Attack on Vision-Language Models for Autonomous Driving 2024-11 arXiv - 17
Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases 2024-12 WACV 2025 GitHub 53
SFF Rendering-Based Uncertainty Prediction using VisionLLM 2024-12 AAAI 2025 Workshop LM4Plan - 8
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives 2025-01 arXiv GitHub 53
Enhancing Large Vision Model in Street Scene Semantic Understanding through Leveraging Posterior Optimization Trajectory 2025-01 arXiv - 6
Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding 2025-01 IAVVC 2024 - 9
DriveLM: Driving with Graph Visual Question Answering 2025-01 ECCV 2024 GitHub 310
Scenario Understanding of Traffic Scenes Through Large Visual Language Models 2025-01 WACV 2025 - 6
INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Hazard Detection and Edge Case Evaluation 2025-02 arXiv - 8
CurricuVLM: Towards Safe Autonomous Driving via Personalized Safety-Critical Curriculum Learning with Vision-Language Models 2025-02 arXiv GitHub 8
Evaluating Multimodal Vision-Language Model Prompting Strategies for Visual Question Answering in Road Scene Understanding 2025-02 WACV workshop 2025 - 11
NuGrounding: A Multi-View 3D Visual Grounding Framework in Autonomous Driving 2025-03 arXiv - 6
AutoDrive-QA- Automated Generation of Multiple-Choice Questions for Autonomous Driving Datasets Using Large Vision-Language Models 2025-03 arXiv GitHub 2
DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding 2025-03 arXiv GitHub 17
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation 2025-03 arXiv GitHub 34
ChatBEV: A Visual Language Model that Understands BEV Maps 2025-03 arXiv - 2
Retrieval-Based Interleaved Visual Chain-of-Thought in Real-World Driving Scenarios 2025-04 arXiv GitHub 6
Vision Foundation Model Embedding-Based Semantic Anomaly Detection 2025-05 ICRA 2025 Workshop - 3
OpenLKA: An Open Dataset of Lane Keeping Assist from Recent Car Models under Real-world Driving Conditions 2025-05 arXiv GitHub 2
SURDS: Benchmarking Spatial Understanding and Reasoning in Driving Scenarios with Vision Language Models 2025-05 arXiv GitHub 10
Bridging Human Oversight and Black-box Driver Assistance: Vision-Language Models for Predictive Alerting in Lane Keeping Assist systems 2025-05 arXiv - 2
FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving 2025-06 arXiv - 46
Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios 2025-06 arXiv - 1
Structured Labeling Enables Faster Vision-Language Models for End-to-End Autonomous Driving 2025-06 arXiv - 0
DriveMRP: Enhancing Vision-Language Models with Synthetic Motion Data for Motion Risk Prediction 2025-07 arXiv GitHub 0
SafeDriveRAG: Towards Safe Autonomous Driving with Knowledge Graph-based Retrieval-Augmented Generation 2025-07 ACMMM 2025 GitHub 0
DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For Driving 2025-08 arXiv GitHub 2
NuRisk: A Visual Question Answering Dataset for Agent-Level Risk Assessment in Autonomous Driving 2025-09 arXiv - 1
DriveAgent-R1: Advancing VLM-based Autonomous Driving with Active Perception and Hybrid Thinking 2025-09 arXiv - 2
Enhancing Vision-Language Models for Autonomous Driving through Task-Specific Prompting and Spatial Reasoning 2025-09 IROS 2025 GitHub 0
More Than Meets the Eye? Uncovering the Reasoning-Planning Disconnect in Training Vision-Language Driving Models 2025-10 arXiv - 0

🌟 Multimodal Large Language Models for Autonomous Driving

Scenario Generation (MLLM)
Paper Date Venue Code Citation
Realistic Corner Case Generation for Autonomous Vehicles with Multimodal Large Language Model 2024-11 arXiv - 8
LMM-enhanced Safety-Critical Scenario Generation for Autonomous Driving System Testing From Non-Accident Traffic Videos 2025-01 arXiv GitHub 6
Multi-modal Traffic Scenario Generation for Autonomous Driving System Testing 2025-06 FSE 2025 - 0
Talk2Traffic: Interactive and Editable Traffic Scenario Generation for Autonomous Driving with Multimodal Large Language Model 2025-06 CVPR 2025 WDFM-AD GitHub 3
Scenario Analysis (MLLM)
Paper Date Venue Code Citation
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model 2023-10 IEEE Robotics and Automation Letters 2024 GitHub 439
Dolphins: Multimodal Language Model for Driving 2023-12 ECCV 2024 GitHub 109
AccidentGPT: Accident analysis and prevention from V2X Environmental Perception with Multi-modal Large Model 2023-12 IV 2024 GitHub 31
Lidar-llm: Exploring the potential of large language models for 3d lidar understanding 2023-12 AAAI 2025 GitHub 98
LingoQA: Visual Question Answering for Autonomous Driving 2023-12 ECCV 2024 GitHub 82
Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models 2024-01 CVPR 2024 GitHub 73
MAPLM: A Real-World Large-Scale Vision-Language Benchmark for Map and Traffic Scene Understanding 2024-01 CVPR 2024 GitHub 51
WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-Grained Spatial-Temporal Understanding 2024-06 ECCV 2024 GitHub 18
Semantic Understanding of Traffic Scenes with Large Vision Language Models 2024-06 IV 2024 GitHub 27
VLAAD: Vision and Language Assistant for Autonomous Driving 2024-06 WACVW 2024 GitHub 49
InternDrive: A Multimodal Large Language Model for Autonomous Driving Scenario Understanding 2024-07 AIAHPC 2024 - 5
LingoQA: Visual Question Answering for Autonomous Driving 2024-09 ECCV 2024 GitHub 82
Using Multimodal Large Language Models for Automated Detection of Traffic Safety Critical Events 2024-09 Vehicles 2024 MDPI - 6
MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios 2024-12 arXiv GitHub 7
Distilling Multi-modal Large Language Models for Autonomous Driving 2025-01 CVPR 2025 - 19
TUMTraffic-VideoQA: A Benchmark for Unified Spatio-Temporal Video Understanding in Traffic Scenes 2025-02 ICML 2025 GitHub 10
ScVLM: Enhancing Vision-Language Model for Safety-Critical Event Understanding 2025-02 WACV Workshop 2025 GitHub 9
Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning 2025-02 arXiv - 15
A Framework for a Capability-driven Evaluation of Scenario Understanding for Multimodal Large Language Models in Autonomous Driving 2025-03 arXiv - 1
HiLM-D: Enhancing MLLMs with Multi-Scale High-Resolution Details for Autonomous Driving 2025-03 International Journal of Computer Vision - 53
NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models 2025-03 arXiv - 10
A Framework for a Capability-driven Evaluation of Scenario Understanding for Multimodal Large Language Models in Autonomous Driving 2025-03 arXiv - 1
Tracking Meets Large Multimodal Models for Driving Scenario Understanding 2025-03 arXiv GitHub 2
SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment 2025-03 CVPR 2025 GitHub 25
V3LMA: Visual 3D-enhanced Language Model for Autonomous Driving 2025-04 arXiv - 2
Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding 2025-04 arXiv GitHub 4
ALN-P3: Unified Language Alignment for Perception, Prediction, and Planning in Autonomous Driving 2025-05 arXiv - 0
X-Driver: Explainable Autonomous Driving with Vision-Language Models 2025-06 arXiv - 2
EMMA: End-to-End Multimodal Model for Autonomous Driving 2025-07 arXiv - 96
SafePLUG: Empowering Multimodal LLMs with Pixel-Level Insight and Temporal Grounding for Traffic Accident Understanding 2025-08 arXiv - 1
RoboTron-Drive: All-in-One Large Multimodal Model for Autonomous Driving 2025-08 arXiv - 9
Investigating Traffic Accident Detection Using Multimodal Large Language Models 2025-09 IAVVC 2025 - 1

🌟 Diffusion Models for Autonomous Driving

Scenario Generation (Diffusion Models)
Paper Date Venue Code Citation
Guided Conditional Diffusion for Controllable Traffic Simulation 2022-10 ICRA 2023 GitHub 218
Generating Driving Scenes with Diffusion 2023-05 arXiv - 21
DiffScene: Guided Diffusion Models for Safety-Critical Scenario Generation 2023-06 AdvML-Frontiers 2023 - 68
BEVControl: Accurately Controlling Street-view Elements with Multi-perspective Consistency via BEV Sketch Layout 2023-09 arXiv - 81
DriveSceneGen: Generating Diverse and Realistic Driving Scenarios From Scratch 2023-09 IEEE Robotics and Automation Letters 2024 - 22
MagicDrive: Street View Generation with Diverse 3D Geometry Control 2023-10 ICLR 2024 GitHub 185
DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model 2023-10 ECCV 2024 - 82
Language-guided traffic simulation via scene-level diffusion 2023-11 CoRL 2023 - 121
Scenario Diffusion: Controllable Driving Scenario Generation With Diffusion 2023-11 NeurIPS 2023 - 62
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving 2023-11 CVPR 2024 GitHub 105
SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries 2023-12 ECCV 2024 GitHub 24
Text2Street: Controllable Text-to-image Generation for Street Views 2024-02 ICPR 2024 - 12
ChatTraffic: Text-to-Traffic Generation via Diffusion Model 2024-02 arXiv - 11
GEODIFFUSION: Text-Prompted Geometric Control for Object Detection Data Generation 2024-02 LCLR 2024 GitHub 46
GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model 2024-04 ITSC 2024 - 5
Versatile Behavior Diffusion for Generalized Traffic Agent Simulation 2024-04 RSS 2024 GitHub 15
SceneControl: Diffusion for Controllable Traffic Scene Generation 2024-05 ICRA 2024 - 26
SLEDGE: Synthesizing Driving Environments with Generative Models and Rule-Based Traffic 2024-07 ECCV 2024 GitHub 16
DrivingGen: Efficient Safety-Critical Driving Video Generation with Latent Diffusion Models 2024-07 ICME 2024 - 9
Controllable Traffic Simulation through LLM-Guided Hierarchical Chain-of-Thought Reasoning 2024-09 arXiv - 0
AdvDiffuser: Generating Adversarial Safety-Critical Driving Scenarios via Guided Diffusion 2024-10 IROS 2023 - 20
Data-driven Diffusion Models for Enhancing Safety in Autonomous Vehicle Traffic Simulations 2024-10 arXiv - 3
DiffRoad: Realistic and Diverse Road Scenario Generation for Autonomous Vehicle Testing 2024-11 arXiv - 3
SceneDiffuser: Efficient and Controllable Driving Simulation Initialization and Rollout 2024-12 NeurIPS 2024 - 33
Direct Preference Optimization-Enhanced Multi-Guided Diffusion Model for Traffic Scenario Generation 2025-02 arXiv - 1
Causal Composition Diffusion Model for Closed-loop Traffic Generation 2025-02 arXiv - 7
Rolling Ahead Diffusion for Traffic Scene Simulation 2025-02 AAAI 2025 Workshop - 1
AVD2: Accident Video Diffusion for Accident Video Description 2025-03 ICRA 2025 GitHub 16
DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance 2025-03 arXiv - 4
Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments 2025-03 arXiv - 7
DriveGen: Towards Infinite Diverse Traffic Scenarios with Large Models 2025-03 arXiv - 4
DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer 2025-04 arXiv - 4
Decoupled Diffusion Sparks Adaptive Scene Generation 2025-04 ICCV 2025 GitHub 5
DualDiff: Dual-branch Diffusion Model for Autonomous Driving with Semantic Fusion 2025-05 arXiv - 3
LD-Scene: LLM-Guided Diffusion for Controllable Generation of Adversarial Safety-Critical Driving Scenarios 2025-05 arXiv - 5
Dual-Conditioned Temporal Diffusion Modeling for Driving Scene Generation 2025-05 ICAR 2025 GitHub 1
Diffusion Models for Safety Validation of Autonomous Driving Systems 2025-06 arXiv - 1
Diffusion-Based Generation and Imputation of Driving Scenarios from Limited Vehicle CAN Data 2025-09 arXiv - 0
DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion 2025-10 NeurIPS 2025 Workshop - 0
Scenario Analysis (Diffusion Models)
Paper Date Venue Code Citation
AVD2: Accident Video Diffusion for Accident Video Description 2025-03 ICRA 2025 GitHub 16

🌟 World Models for Autonomous Driving

World Models for Autonomous Driving
Paper Date Venue Code Application Citation
DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving 2023-09 ECCV 2024 GitHub Scenario Generation 219
GAIA-1: A Generative World Model for Autonomous Driving 2023-09 arXiv Wayve - Scenario Generation 369
TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction 2023-09 ICRA 2023 GitHub Scenario Generation 55
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion 2023-11 ICLR 2024 - Scenario Generation 81
MUVO: A Multimodal Generative World Model for Autonomous Driving with Geometric Representations 2023-11 IV 2025 - Scenario Generation 28
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving 2023-11 CVPR 2024 GitHub Scenario Generation 221
Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability 2024-03 NeurIPS 2024 - Scenario Generation 181
MagicDrive: Street View Generation with Diverse 3D Geometry Control 2024-05 arXiv - Scenario Generation 185
DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation 2024-05 AAAI 2025 GitHub Scenario Generation 118
UniScene: Multi-Camera Unified Pre-training via 3D Scene Reconstruction for Autonomous Driving 2024-08 RAL 2024 - Scenario Generation 24
WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation 2024-08 ECCV 2024 - Scenario Generation 64
Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving 2024-08 CVPR 2024 GitHub Scenario Generation 105
DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving 2024-08 arXiv GitHub Scenario Generation 45
DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment 2024-08 arXiv - Scenario Generation 3
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation 2024-11 CVPR 2025 GitHub Scenario Generation 64
ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration 2024-11 arXiv - Scenario Generation 39
MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes 2024-11 arXiv GitHub Scenario Generation 52
MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control 2024-11 arXiv GitHub Scenario Generation 35
ACT-Bench: Towards Action Controllable World Models for Autonomous Driving 2024-12 arXiv - Scenario Generation 5
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control 2024-12 CVPR 2025 GitHub Scenario Generation 27
SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model 2024-12 NeurIPS 2024 - Scenario Generation 53
DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT 2024-12 arXiv GitHub Scenario Generation 38
Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving 2025-01 AAAI 2025 GitHub Scenario Generation 18
DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance 2025-03 ICRA 2025 GitHub Scenario Generation 3
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning 2025-03 arXiv GitHub Scenario Generation 12
GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving 2025-03 arXiv - Scenario Generation 31
Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latent Space 2025-03 arXiv - Scenario Generation 5
Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception 2025-03 arXiv GitHub Scenario Generation 7
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control 2025-04 arXiv GitHub Scenario Generation 3
DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment 2025-04 ACMMM 2025 GitHub Scenario Generation 2
OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving 2025-05 arXiv - Scenario Generation 88
PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth 2025-05 arXiv - Scenario Generation 2
ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos 2025-05 arXiv - Scenario Generation 15
Epona: Autoregressive Diffusion World Model for Autonomous Driving 2025-06 ICCV 2025 GitHub Scenario Generation 29
STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation 2025-06 IROS 2025 - Scenario Generation 3
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction 2025-06 CVPR 2025 GitHub Scenario Generation 18
DeepVerse: 4D Autoregressive Video Generation as a World Model 2025-06 arXiv GitHub Scenario Generation 7
World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving 2025-07 Commun Eng 4 GitHub Scenario Generation 1
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation 2025-08 ICCV 2025 GitHub Scenario Generation 17

📊 Datasets Comparison

The following figure shows the usage distribution of different foundation model types across autonomous driving datasets:

Datasets Comparison
Dataset Year Img View Real Lidar Radar Traj 3D 2D Lane Weather Time Region Company
CamVid 2009 RGB FPV ✖️ ✖️ ✖️ ✖️ D U -
KITTI 2013 RGB/S FPV ✖️ D U/R/H -
Cyclists 2016 RGB FPV ✖️ ✖️ ✖️ ✖️ ✖️ ✖️ ✖️ D U -
Cityscapes 2016 RGB/S FPV ✖️ ✖️ ✖️ ✖️ D U -
SYNTHIA 2016 RGB FPV ✖️ ✖️ ✖️ ✖️ D/N U -
Campus 2016 RGB BEV ✖️ ✖️ ✖️ ✖️ ✖️ ✖️ ✖️ ✖️ D C -
RobotCar 2016 RGB FPV ✖️ ✖️ ✖️ ✖️ ✖️ ✖️ ✖️ D/N U -
Mapillary 2017 RGB FPV ✖️ ✖️ ✖️ ✖️ D/N U -
P.F.B. 2017 RGB FPV ✖️ ✖️ ✖️ ✖️ D/N U -
BDD100K 2018 RGB FPV ✖️ ✖️ ✖️ D U/H -
HighD 2018 RGB BEV ✖️ ✖️ ✖️ ✖️ ✖️ D H -
Udacity 2018 RGB FPV ✖️ ✖️ ✖️ ✖️ ✖️ ✖️ ✖️ D U -
KAIST 2018 RGB/S FPV ✖️ ✖️ ✖️ D/N U -
Argoverse 2019 RGB/S FPV ✖️ ✖️ ✖️ D/N U -
TRAF 2019 RGB FPV ✖️ ✖️ ✖️ ✖️ D U -
ApolloScape 2019 RGB/S FPV ✖️ ✖️ ✖️ D U -
ACFR 2019 RGB BEV ✖️ ✖️ ✖️ ✖️ ✖️ ✖️ ✖️ D RA -
H3D 2019 RGB FPV ✖️ ✖️ ✖️ ✖️ D U -
INTERACTION 2019 RGB BEV ✖️ ✖️ ✖️ ✖️ ✖️ ✖️ ✖️ D I/RA -
Comma2k19 2019 RGB FPV ✖️ ✖️ ✖️ ✖️ ✖️ D/N U/S/R/H -
InD 2020 RGB BEV ✖️ ✖️ ✖️ ✖️ ✖️ ✖️ ✖️ D I -
RounD 2020 RGB BEV ✖️ ✖️ ✖️ ✖️ ✖️ ✖️ ✖️ D RA -
nuScenes 2020 RGB FPV ✖️ D/N U -
Lyft Level 5 2020 RGB FPV ✖️ D/N U/S -
Waymo Open 2020 RGB FPV D/N U -
A*3D 2020 RGB FPV D/N U -
RobotCar Radar 2020 RGB FPV D/N U -
Toronto3D 2020 RGB BEV ✖️ ✖️ ✖️ D/N U University of Waterloo
A2D2 2020 RGB FPV ✖️ D U/H/S/R
WADS 2020 RGB FPV ✖️ ✖️ D/N U/S/R Michigan Technological University
Argoverse 2 2021 RGB/S FPV ✖️ ✖️ D/N U -
PandaSet 2021 RGB FPV D/N U -
ONCE 2021 RGB FPV D/N U -
Leddar PixSet 2021 RGB FPV ✖️ ✖️ D/N U/S/R Leddar
ZOD 2022 RGB FPV D/N U/R/S/H Zenseact
IDD-3D 2022 RGB FPV ✖️ ✖️ ✖️ ✖️ - R INAI
CODA 2022 RGB FPV D/N U/S/R Huawei
SHIFT 2022 RGB FPV D/N U/S/R/H ETH Zürich
DeepAccident 2023 RGB/S FPV/BEV ✖️ ✖️ ✖️ D/N U/S/R/H HKU, Huawei, CARLA
Dual_Radar 2023 RGB FPV ✖️ D/N U Tsinghua University
V2V4Real 2023 RGB FPV ✖️ ✖️ ✖️ - U/H/S UCLA Mobility Lab
SCaRL 2024 RGB/S FPV/BEV ✖️ D/N U/S/R/H Fraunhofer CARLA
MARS 2024 RGB FPV D/N U/S/H NYU, MAY Mobility
Scenes101 2024 RGB FPV ✖️ ✖️ ✖️ ✖️ D/N U/S/R/H Wayve
TruckScenes 2025 RGB FPV ✖️ D/N H/U MAN

Notes: View: FPV=First-Person, BEV=Bird's-Eye; Time: D=Day, N=Night; Region: U=Urban, R=Rural, H=Highway, S=Suburban, C=Campus, I=Intersection, RA=Road Area; Img: RGB/S=RGB+Stereo

🎮 Simulators

The following figure shows the usage distribution of different foundation model types across autonomous driving simulators:

Simulators
Simulator Year Back-end Open Source Realistic Perception Custom Scenario Real World Map Human Design Map Python API C++ API ROS API Company
TORCS 2000 None ✖️ ✖️ ✖️ ✖️ ✖️ -
Webots 2004 ODE ✖️ ✖️ -
CarRacing 2017 None ✖️ ✖️ ✖️ ✖️ ✖️ -
CARLA 2017 UE4 ✖️ -
SimMobilityST 2017 None ✖️ ✖️ ✖️ ✖️ ✖️ ✖️ ✖️ -
GTA-V 2017 RAGE ✖️ ✖️ ✖️ ✖️ ✖️ ✖️ ✖️ -
highway-env 2018 None ✖️ ✖️ ✖️ ✖️ -
Deepdrive 2018 UE4 ✖️ ✖️ -
esmini 2018 Unity ✖️ ✖️ ✖️ ✖️ ✖️ ✖️ -
AutonoViSim 2018 PhysX ✖️ ✖️ ✖️ ✖️ ✖️ -
AirSim 2018 UE4 ✖️ ✖️ -
SUMO 2018 None ✖️ ✖️ ✖️ -
Apollo 2018 Unity ✖️ -
Sim4CV 2018 UE4 ✖️ ✖️ ✖️ -
MATLAB 2018 MATLAB ✖️ Mathworks
Scenic 2019 None ✖️ ✖️ Toyota Research Institute, UC Berkeley
SUMMIT 2020 UE4 ✖️ ✖️ -
MultiCarRacing 2020 None ✖️ ✖️ ✖️ ✖️ -
SMARTS 2020 None ✖️ ✖️ -
LGSVL 2020 Unity -
CausalCity 2020 UE4 ✖️ ✖️ -
Vista 2020 None ✖️ ✖️ ✖️ MIT
MetaDrive 2021 Panda3D ✖️ -
L2R 2021 UE4 ✖️ -
AutoDRIVE 2021 Unity -
Nuplan 2021 None ✖️ ✖️ Motional
AWSIM 2021 Unity ✖️ ✖️ Autoware
InterSim 2022 None ✖️ ✖️ ✖️ Tsinghua
Nocturne 2022 None ✖️ Facebook
BeamNG.tech 2022 Soft-body physics ✖️ ✖️ ✖️ BeamNG GmbH
Waymax 2023 JAX ✖️ ✖️ ✖️ Waymo
UNISim 2023 None ✖️ ✖️ ✖️ ✖️ Waabi
TBSim 2023 None ✖️ ✖️ NVIDIA
Nvidia DriveWorks 2024 Nvidia GPU ✖️ ✖️ ✖️ NVIDIA

🏆 Foundation Model Benchmark Challenges (2022–2025)

Benchmark Challenges

Autonomous Driving

Name Host
CARLA AD Challenge CARLA
DRL4Real ICCV
Waymo Open Dataset Challenge Waymo / CVPR WAD
Argoverse 2: Scenario Mining ArgoAI
Roboflow-20VL Roboflow-VL / CVPR
AVA Challenge AVA Challenge Team

Other Fields Related to Generation and Analysis

Name Host
IGLU Challenge NeurIPS / IGLU Team
LLM Efficiency Challenge NeurIPS
Trojan Detection NeurIPS / CAIS
SMART-101 CVPR
NICE Challenge CVPR / LG Research
SyntaGen CVPR
Habitat Challenge CVPR / FAIR
BIG-bench Google Research
BIG-bench Hard (BBH) Google Research
HELM Stanford CRFM
MMBench OpenCompass
MMMU CVPR / U-Waterloo / OSU
Open LLM Leaderboard VILA-Lab
Text-to-Image Leaderboard Artificial Analysis
Ego4D FAIR
VizWiz Grand Challenge CVPR VizWiz Workshop
MedFM NeurIPS / Shanghai AI Laboratory
3D Scene Understanding CVPR

🔗 Useful Resources and Links

Common Tools and Frameworks

This section provides links to commonly used tools, frameworks, and resources for working with foundation models in autonomous driving.

Model Repositories and Leaderboards

Model Inference Frameworks

  • vLLM - High-throughput and memory-efficient inference engine for LLMs
  • LMDeploy - Toolkit for compressing, deploying, and serving LLMs
  • Ollama - Run large language models locally
  • Text Generation Inference - Production-ready inference container by Hugging Face
  • TensorRT-LLM - High-performance inference library by NVIDIA

Training and Fine-tuning

Contributing

We welcome contributions from the community! If you have research papers, tools, or resources to add, please create a pull request or open an issue.

License

This repository is released under the Apache 2.0 license.

About

This repository collects research papers of large Foundation Models for Scenario Generation and Analysis in Autonomous Driving. The repository will be continuously updated to track the latest update.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •