Skip to content

A curated list of papers on reinforcement learning for video generation

Notifications You must be signed in to change notification settings

wendell0218/Awesome-RL-for-Video-Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 

Repository files navigation

Awesome-RL-for-Video-Generation

🤖 Introduction

Welcome to the GitHub repository for Awesome-RL-for-Video-Generation! This repository serves as a curated collection of research, resources, and tools related to Reinforcement Learning (RL) for Video Generation. Our goal is to provide an up-to-date and comprehensive overview of RL techniques used in video generation, focusing on the latest advancements. We aim to bridge the gap between RL theory and real-world applications in video generation tasks, offering a solid foundation for future research and development in this field. We hope this repository will serve as a valuable resource for anyone interested in exploring RL applications in video generation!

🔥 News

  • [February 14, 2025] We have developed an agent that automatically collects and analyzes the latest papers in the RL-based Video Generation field. It will update the Related Papers daily at 1:00 AM UTC+8.

🔍 Related Papers

We are committed to offering researchers the latest advancements in the field. By regularly reviewing and evaluating recent research studies, we ensure that the list of papers stays up-to-date.

⚠️ The paper analysis may not be accurate and is for reference only!

Date Paper Contribution Available Link
Feb 2026 Unified Personalized Reward Model for Vision Generation


• Affiliation: Fudan University
• Method Name: UnifiedReward-Flex, Base Model: Wan2.1-T2V-14B, Strategy: GRPO
Feb 2026 FSVideo: Fast Speed Video Diffusion Model in a Highly-Compressed Latent Space


• Affiliation: ByteDance
• Method Name: FSVideo, Base Model: Wan2.1-14B-I2V, Strategy: GRPO
• Method Name: FSVideo, Base Model: Wan2.1-14B-I2V, Strategy: ReFL
Feb 2026 PISCES: Annotation-free Text-to-Video Post-Training via Optimal Transport-Aligned Rewards

• Affiliation: Microsoft
• Method Name: PISCES, Base Model: HunyuanVideo, Strategy: GRPO
• Method Name: PISCES, Base Model: VideoCrafter2, Strategy: GRPO
Feb 2026 PISCES: Annotation-free Text-to-Video Post-Training via Optimal Transport-Aligned Rewards

• Affiliation: Microsoft
• Method Name: PISCES, Base Model: HunyuanVideo, Strategy: GRPO
• Method Name: PISCES, Base Model: VideoCrafter2, Strategy: GRPO
Jan 2026 SketchDynamics: Exploring Free-Form Sketches for Dynamic Intent Expression in Animation Generation




• Affiliation: Zhejiang University
• Method Name: RL-Video-Gen, Base Model: Qwen2-VL-7B, Strategy: GRPO
• Benchmark Name: VideoGenBench, Data Number: 5000, Evaluation Metric: FID
Jan 2026 The Script is All You Need: An Agentic Framework for Long-Horizon Dialogue-to-Cinematic Video Generation



• Affiliation: Tencent Hunyuan Multimodal Department
• Method Name: ScripterAgent, Base Model: Qwen-Omni-7B, Strategy: GRPO
• Benchmark Name: ScriptBench, Data Number: 1750, Evaluation Metric: Visual-Script Alignment (VSA)
Jan 2026 SkyReels-V3 Technique Report




• Affiliation: Zhejiang University
• Method Name: RL-Video-Gen, Base Model: Qwen2-VL-7B, Strategy: GRPO
• Benchmark Name: VideoGenBench, Data Number: 5000, Evaluation Metric: FID
Jan 2026 A Mechanistic View on Video Generation as World Models: State and Dynamics

• Affiliation: Hong Kong University of Science and Technology (Guangzhou)
• Paper Number: 188
Jan 2026 From Generative Engines to Actionable Simulators: The Imperative of Physical Grounding in World Models

• Affiliation: University of Oxford
• Paper Number: 49
Jan 2026 MVGD-Net: A Novel Motion-aware Video Glass Surface Detection Network




• Affiliation: Zhejiang University
• Method Name: RL-Video-Gen, Base Model: Qwen2-VL-7B, Strategy: GRPO
• Benchmark Name: VideoGenBench, Data Number: 5000, Evaluation Metric: FID
Jan 2026 CroBIM-V: Memory-Quality Controlled Remote Sensing Referring Video Object Segmentation




• Affiliation: Zhejiang University
• Method Name: RL-Video-Gen, Base Model: Qwen2-VL-7B, Strategy: GRPO
• Benchmark Name: VideoGenBench, Data Number: 5000, Evaluation Metric: FID
Jan 2026 PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models



• Affiliation: Zhejiang University
• Method Name: PhysRVG, Base Model: Wan2.2 5B, Strategy: GRPO
• Benchmark Name: PhysRVGBench, Data Number: 700, Evaluation Metric: Intersection over Union (IoU), Trajectory Offset (TO)
Jan 2026 TAGRPO: Boosting GRPO on Image-to-Video Generation with Direct Trajectory Alignment


• Affiliation: The University of Hong Kong
• Method Name: TAGRPO, Base Model: Wan 2.2, Strategy: GRPO
• Method Name: TAGRPO, Base Model: HunyuanVideo-1.5, Strategy: GRPO
• Benchmark Name: TAGRPO-Bench, Data Number: 200, Evaluation Metric: Q-Save (Visual Quality, Dynamic Quality, Image Alignment), HPSv3
Jan 2026 TAGRPO: Boosting GRPO on Image-to-Video Generation with Direct Trajectory Alignment


• Affiliation: The University of Hong Kong
• Method Name: TAGRPO, Base Model: Wan 2.2, Strategy: GRPO
• Method Name: TAGRPO, Base Model: HunyuanVideo-1.5, Strategy: GRPO
• Benchmark Name: TAGRPO-Bench, Data Number: 200, Evaluation Metric: Q-Save
• Benchmark Name: TAGRPO-Bench, Data Number: 200, Evaluation Metric: HPSv3
Jan 2026 Diffusion-DRF: Differentiable Reward Flow for Video Diffusion Fine-Tuning

• Affiliation: Northeastern University
• Method Name: Diffusion-DRF, Base Model: Wan2.1-1.3B-T2V, Strategy: Differentiable Reward Fine-tuning
Jan 2026 Diffusion-DRF: Differentiable Reward Flow for Video Diffusion Fine-Tuning

• Affiliation: Northeastern University
• Method Name: Diffusion-DRF, Base Model: Wan2.1-1.3B-T2V, Strategy: Differentiable Reward Fine-tuning
Jan 2026 Mind the Generative Details: Direct Localized Detail Preference Optimization for Video Diffusion Models

• Affiliation: Harbin Institute of Technology
• Method Name: LocalDPO, Base Model: Wan2.1-1.3B, Strategy: DPO
• Method Name: LocalDPO, Base Model: CogVideoX-2B, Strategy: DPO
• Method Name: LocalDPO, Base Model: CogVideoX-5B, Strategy: DPO
Jan 2026 Mind the Generative Details: Direct Localized Detail Preference Optimization for Video Diffusion Models

• Affiliation: Harbin Institute of Technology
• Method Name: LocalDPO, Base Model: Wan2.1-1.3B, Strategy: DPO
• Method Name: LocalDPO, Base Model: CogVideoX-2B, Strategy: DPO
• Method Name: LocalDPO, Base Model: CogVideoX-5B, Strategy: DPO
Jan 2026 Thinking with Frames: Generative Video Distortion Evaluation via Frame Reward Model


• Affiliation: University of Science and Technology of China
• Method Name: REACT, Base Model: Qwen2.5-VL-7B, Strategy: GRPO
• Benchmark Name: REACT-Bench, Data Number: 2600, Evaluation Metric: F1-score
Jan 2026 Thinking with Frames: Generative Video Distortion Evaluation via Frame Reward Model


• Affiliation: University of Science and Technology of China
• Method Name: REACT, Base Model: Qwen2.5-VL-7B, Strategy: GRPO
• Benchmark Name: REACT-Bench, Data Number: 2600, Evaluation Metric: Accuracy, F1-score, Precision, Recall
Jan 2026 A Versatile Multimodal Agent for Multimedia Content Generation


• Affiliation: University of Rochester
• Method Name: MultiMedia-Agent, Base Model: MiniCPM-V2, Strategy: DPO
• Benchmark Name: 18 real world task types, Data Number: 1260, Evaluation Metric: Dover Score, Pick Score, Human Alignment, Aesthetic Score, Psychological Appealing, Audio Video Alignment
Jan 2026 A Versatile Multimodal Agent for Multimedia Content Generation

• Affiliation: University of Rochester
• Method Name: MultiMedia-Agent, Base Model: Minicpm-v2, Strategy: DPO
Dec 2025 PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation



• Affiliation: Meta Superintelligence Labs
• Method Name: PhyGDPO, Base Model: Wan2.1-T2V-14B, Strategy: DPO
• Benchmark Name: PhyVidGen-135K, Data Number: 135K, Evaluation Metric:
Dec 2025 SoliReward: Mitigating Susceptibility to Reward Hacking and Annotation Noise in Video Generation Reward Models


• Affiliation: Huazhong University of Science and Technology
• Method Name: SoliReward, Base Model: HunyuanVideo, Strategy: GRPO
• Method Name: SoliReward, Base Model: HunyuanVideo, Strategy: DPO
• Benchmark Name: subject deformity and physical plausibility benchmark, Data Number: 50000, Evaluation Metric: RM Accuracy
Dec 2025 DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation


• Affiliation: ByteDance
• Method Name: DreaMontage, Base Model: Seedance 1.0, Strategy: DPO
Dec 2025 VIVA: VLM-Guided Instruction-Based Video Editing with Reward Optimization


• Affiliation: Brown University
• Method Name: VIVA, Base Model: HunyuanVideo-T2V-13B, Strategy: GRPO
Dec 2025 Kling-Omni Technical Report


• Affiliation: Kuaishou Technology
• Method Name: Kling-Omni, Base Model: , Strategy: DPO
Dec 2025 What Happens Next? Next Scene Prediction with a Unified Video Model



• Affiliation: Pennsylvania State University
• Method Name: unified video model, Base Model: Qwen-VL, LTX, Strategy: GRPO
• Benchmark Name: NSP dataset, Data Number: 0.97M samples for SFT, 8K samples for RL, 1K samples for test, Evaluation Metric: causal consistency
Dec 2025 OmniPerson: Unified Identity-Preserving Pedestrian Generation




• Affiliation: Zhejiang University
• Method Name: RL-Video-Gen, Base Model: Qwen2-VL-7B, Strategy: GRPO
• Benchmark Name: VideoGenBench, Data Number: 5000, Evaluation Metric: FID
Dec 2025 PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models



• Affiliation: Sun Yat-sen University
• Method Name: Physical-Aware DPO, Base Model: WanX2.1 1.3B, Strategy: DPO
• Benchmark Name: PID (Physical Implausibility Detection) dataset, Data Number: 3088, Evaluation Metric: F1 Score
Nov 2025 McSc: Motion-Corrective Preference Alignment for Video Generation with Self-Critic Hierarchical Reasoning


• Affiliation: Tongyi Lab, Alibaba Group
• Method Name: McSc, Base Model: Qwen2-VL-7B-Instruct, Strategy: GRPO
• Method Name: McDPO, Base Model: Wan2.1-T2V-1.3B, Strategy: DPO
Nov 2025 Diverse Video Generation with Determinantal Point Process-Guided Policy Optimization



• Affiliation: Virginia Tech
• Method Name: DPP-GRPO, Base Model: Qwen2-7b-Instruct, Strategy: GRPO
• Benchmark Name: diverse video-prompt dataset, Data Number: 30,000, Evaluation Metric: TIE, TCE, CLIP
Nov 2025 Growing with the Generator: Self-paced GRPO for Video Generation

• Affiliation: University of Science and Technology of China
• Method Name: Self-Paced GRPO, Base Model: Wan2.1-T2V, Strategy: GRPO
• Method Name: Self-Paced GRPO, Base Model: HunyuanVideo, Strategy: GRPO
Nov 2025 PhysCorr: Dual-Reward DPO for Physics-Constrained Text-to-Video Generation with Automated Preference Selection

• Affiliation: Beijing Institute of Technology
• Method Name: PhysCorr, Base Model: , Strategy: DPO
• Method Name: PhyDPO, Base Model: , Strategy: DPO
Nov 2025 PhysCorr: Dual-Reward DPO for Physics-Constrained Text-to-Video Generation with Automated Preference Selection

• Affiliation: Beijing Institute of Technology
• Method Name: PhysCorr, Base Model: , Strategy: DPO
• Method Name: PhysicsRM, Base Model: LLaVA-Video-Qwen2-7B, Strategy: supervised learning with Huber loss
• Method Name: PhyDPO, Base Model: , Strategy: reweighted DPO
Nov 2025 Reg-DPO: SFT-Regularized Direct Preference Optimization with GT-Pair for Improving Video Generation

• Affiliation: ByteDance
• Method Name: Reg-DPO, Base Model: Wan2.1-I2V-14B-720P, Strategy: DPO
Nov 2025 CueBench: Advancing Unified Understanding of Context-Aware Video Anomalies in Real-World



• Affiliation: Northwestern Polytechnical University, Xi’an Shaanxi, 710129, China
• Method Name: CUE-R1, Base Model: Qwen2.5-VL-3B, Strategy: GRPO
• Benchmark Name: CUEBENCH, Data Number: 2950, Evaluation Metric: hierarchy score
Nov 2025 ID-Composer: Multi-Subject Video Synthesis with Hierarchical Identity Preservation


• Affiliation: Peking University
• Method Name: ID-COMPOSER, Base Model: Wan-Video-1.3B, Strategy: Flow-GRPO
• Benchmark Name: OpenS2V-Nexus, Data Number: 218230, Evaluation Metric: NexusScore
Nov 2025 ID-Composer: Multi-Subject Video Synthesis with Hierarchical Identity Preservation


• Affiliation: Peking University
• Method Name: ID-COMPOSER, Base Model: Wan-Video-1.3B, Strategy: Flow-GRPO
• Benchmark Name: OpenS2V-Nexus, Data Number: 218230, Evaluation Metric: NexusScore
Nov 2025 World Simulation with Video Foundation Models for Physical AI


• Affiliation: NVIDIA
• Method Name: Cosmos-Predict2.5, Base Model: Cosmos-Reason1, Strategy: GRPO
Nov 2025 World Simulation with Video Foundation Models for Physical AI


• Affiliation: NVIDIA
• Method Name: Cosmos-Predict2.5, Base Model: Cosmos-Reason1, Strategy: GRPO
Oct 2025 Emu3.5: Native Multimodal Models are World Learners



• Affiliation: BAAI
• Method Name: Discrete Diffusion Adaptation, Base Model: Qwen3, Strategy: GRPO
Oct 2025 Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences



• Affiliation: School of Artificial Intelligence, University of Chinese Academy of Sciences
• Method Name: Omni-RewardModel-BT, Base Model: MiniCPM-o-2.6, Strategy: Bradley-Terry
• Method Name: Omni-RewardModel-R1, Base Model: Qwen2.5-VL-7B-Instruct, Strategy: GRPO-based reinforcement learning
• Benchmark Name: Omni-RewardBench, Data Number: 3725, Evaluation Metric: accuracy
Oct 2025 Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences




• Affiliation: School of Artificial Intelligence, University of Chinese Academy of Sciences
• Method Name: Omni-RewardModel-BT, Base Model: MiniCPM-o-2.6, Strategy: Bradley-Terry
• Method Name: Omni-RewardModel-R1, Base Model: Qwen2.5-VL-7B-Instruct, Strategy: GRPO
• Benchmark Name: Omni-RewardBench, Data Number: 3725, Evaluation Metric: accuracy
Oct 2025 LongCat-Video Technical Report


• Affiliation: Meituan
• Method Name: LongCat-Video, Base Model: , Strategy: GRPO
Oct 2025 LongCat-Video Technical Report


• Affiliation: Meituan
• Method Name: LongCat-Video, Base Model: WAN2.1 VAE, Strategy: GRPO
Oct 2025 Epipolar Geometry Improves Video Generation Models



• Affiliation: University of Oxford
• Method Name: Epipolar-DPO, Base Model: Wan-2.1, Strategy: DPO
• Benchmark Name: large dataset of over 162,000 generated videos annotated with 3D scene consistency metrics, Data Number: 162000, Evaluation Metric: Sampson epipolar error
Oct 2025 RealDPO: Real or Not Real, that is the Preference



• Affiliation: University of Electronic Science and Technology of China
• Method Name: RealDPO, Base Model: CogVideoX-5B, Strategy: DPO
• Benchmark Name: RealAction-5K, Data Number: 5000, Evaluation Metric: Visual Alignment, Text Alignment, Motion Quality, Human Quality
Oct 2025 RealDPO: Real or Not Real, that is the Preference



• Affiliation: University of Electronic Science and Technology of China
• Method Name: RealDPO, Base Model: CogVideoX-5B, Strategy: DPO
• Benchmark Name: RealAction-5K, Data Number: 5000, Evaluation Metric: Visual Alignment, Text Alignment, Motion Quality, Human Quality
Oct 2025 ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints


• Affiliation: UCAS
• Method Name: ImagerySearch, Base Model: Wan2.1, Strategy: adaptive test-time search strategy
• Benchmark Name: LDT-Bench, Data Number: 2839, Evaluation Metric: ImageryQA
Oct 2025 Identity-GRPO: Optimizing Multi-Human Identity-preserving Video Generation via Reinforcement Learning



• Affiliation: Alibaba Group
• Method Name: Identity-GRPO, Base Model: Qwen2.5-VL-3B, Strategy: GRPO
• Benchmark Name: multi-human identity-preserving preference benchmark, Data Number: 500, Evaluation Metric: Accuracy
Oct 2025 Identity-Preserving Image-to-Video Generation via Reward-Guided Optimization


• Affiliation: Taobao & Tmall Group of Alibaba
• Method Name: IPRO, Base Model: Wan 2.2 I2V, Strategy: reward-guided optimization with KL-divergence regularization and facial scoring mechanism
Oct 2025 PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning


• Affiliation: The University of Hong Kong
• Method Name: PhysMaster, Base Model: , Strategy: DPO
Oct 2025 VIST3A: Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator


• Affiliation: ETH Zurich
• Method Name: VIST3A, Base Model: , Strategy: direct reward finetuning
Oct 2025 VIST3A: Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator


• Affiliation: ETH Zurich
• Method Name: VIST3A, Base Model: , Strategy: direct reward finetuning
Oct 2025 Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback


• Affiliation: Guangzhou Quwan Network Technology
• Method Name: Mask-CFG, Base Model: Wan2.1, Strategy: DPO
Oct 2025 VR-Thinker: Boosting Video Reward Models through Thinking-with-Image Reasoning


• Affiliation: CUHK MMLab
• Method Name: VR-Thinker, Base Model: Qwen2.5-VL-7B, Strategy: GRPO
Oct 2025 AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration


• Affiliation: Kling Team, Kuaishou Technology
• Method Name: AVoCaDO GRPO, Base Model: Qwen2.5-Omni-7B, Strategy: GRPO
Oct 2025 iMoWM: Taming Interactive Multi-Modal World Model for Robotic Manipulation


• Affiliation: Nanyang Technological University, Singapore
• Method Name: iMoWM, Base Model: , Strategy: model-based RL with DrQ-v2
Oct 2025 Real-Time Motion-Controllable Autoregressive Video Diffusion


• Affiliation: Nanyang Technological University
• Method Name: AR-Drag, Base Model: Wan2.1-1.3B, Strategy: GRPO
Oct 2025 Real-Time Motion-Controllable Autoregressive Video Diffusion



• Affiliation: Nanyang Technological University
• Method Name: AR-Drag, Base Model: Wan2.1-1.3B, Strategy: GRPO
• Benchmark Name: motion controllability benchmark, Data Number: 206, Evaluation Metric: Motion Consistency
Oct 2025 Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations



• Affiliation: University of California, Santa Barbara
• Method Name: PresAesth, Base Model: Qwen-2.5-VL-7B, Strategy: GRPO
• Benchmark Name: EvoPresent Benchmark, Data Number: 650 papers, 2000 slide pairs, Evaluation Metric: Perplexity, ROUGE-L, Layout Balance, Aesthetic Scores, MAE, F1-score, Accuracy
Oct 2025 OpusAnimation: Code-Based Dynamic Chart Generation


• Affiliation: Opus AI Research, Brown University
• Method Name: Joint-Code-Visual Reward based Group Relative Policy Optimization (JCVR-GRPO), Base Model: Qwen2.5-VL-3B, Strategy: GRPO
• Benchmark Name: DCG-Bench, Data Number: 700, Evaluation Metric: Execution Pass Rate, QA-based Scores
Oct 2025 OpusAnimation: Code-Based Dynamic Chart Generation


• Affiliation: Opus AI Research, Brown University
• Method Name: JCVR-GRPO, Base Model: Qwen2.5-VL-3B, Strategy: GRPO
• Benchmark Name: DCG-Bench, Data Number: 700, Evaluation Metric: Execution Pass Rate, QA-based Scores
Oct 2025 MultiModal Action Conditioned Video Generation


• Affiliation: MIT CSAIL
• Method Name: MultiModal Action Conditioned Video Generation, Base Model: I2VGen, Strategy: Video diffusion model with multimodal action conditioning and feature regularization
Oct 2025 MultiModal Action Conditioned Video Generation


• Affiliation: MIT CSAIL
• Method Name: MultiModal Action Conditioned Video Generation, Base Model: , Strategy: Latent space projection and regularization with diffusion-based video generation
Oct 2025 Self-Forcing++: Towards Minute-Scale High-Quality Video Generation


• Affiliation: UCLA
• Method Name: Self-Forcing++, Base Model: Wan2.1-T2V-1.3B, Strategy: GRPO (Group Relative Policy Optimization)
Oct 2025 VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL



• Affiliation: Department of Computer Science, University of Texas at Austin, Austin, TX, USA
• Method Name: VidGuard-R1, Base Model: Qwen2.5-VL-7B, Strategy: GRPO
• Benchmark Name: VidGuard-R1-CoT-30k, Data Number: 30000, Evaluation Metric: Top-1 accuracy
• Benchmark Name: VidGuard-R1-RL-100k, Data Number: 100000, Evaluation Metric: Top-1 accuracy
Oct 2025 InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents


• Affiliation: Shanghai Jiao Tong University
• Benchmark Name: InfoMosaic-Bench, Data Number: 621, Evaluation Metric: Accuracy, Pass Rate
Oct 2025 Poolformer: Recurrent Networks with Pooling for Long-Sequence Modeling

• Affiliation: nan
• Method Name: Poolformer, Base Model: , Strategy: Recurrent neural networks with pooling operations for long-sequence modeling
Oct 2025 EvoStruggle: A Dataset Capturing the Evolution of Struggle across Activities and Skill Levels


• Affiliation: University of Bristol
• Benchmark Name: EvoStruggle, Data Number: 2793 videos, 5385 annotated temporal struggle segments, Evaluation Metric: mAP at different IoU thresholds (0.3, 0.5, 0.7)
Oct 2025 LVTINO: LAtent Video consisTency INverse sOlver for High Definition Video Restoration

• Affiliation: Laboratoire MAP5, UMR 8145, Université Paris Cité, CNRS
• Method Name: LATINO, Base Model: , Strategy: Bayesian Langevin posterior sampling with Video Consistency Models (VCMs) and Image Consistency Models (ICMs)
Oct 2025 LVTINO: LAtent Video consisTency INverse sOlver for High Definition Video Restoration

• Affiliation: Laboratoire MAP5, UMR 8145, Université Paris Cité, CNRS
• Method Name: LATINO, Base Model: , Strategy: Langevin posterior sampling with stochastic auto-encoder steps
Oct 2025 EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory


• Affiliation: Johns Hopkins University
• Benchmark Name: Spatial360, Data Number: 58000+, Evaluation Metric: FVD, LMSE, LPIPS, PSNR, SSIM, MEt3R, AUC@30
Sep 2025 Stable Cinemetrics : Structured Taxonomy and Evaluation for Professional Video Generation


• Affiliation: Stability AI
• Benchmark Name: StableCinemetrics, Data Number: 20K videos, Evaluation Metric: human evaluation (1-5 scale)
Sep 2025 How Far Do Time Series Foundation Models Paint the Landscape of Real-World Benchmarks ?


• Affiliation: University of Luxembourg
• Benchmark Name: REAL-V-TSFM, Data Number: 6130, Evaluation Metric: MAPE, sMAPE, Agg. Relative WQL, Agg. Relative MASE
Sep 2025 V-HUB: A Visual-Centric Humor Understanding Benchmark for Video LLMs

• Affiliation: Shanghai Jiao Tong University
• Benchmark Name: v-HUB, Data Number: 960, Evaluation Metric: BERTScore, SentBERT, METEOR
Sep 2025 Visual Jigsaw Post-Training Improves MLLMs


• Affiliation: S-Lab, Nanyang Technological University
• Method Name: Visual Jigsaw, Base Model: Qwen2.5-VL-7B-Instruct, Strategy: GRPO
Sep 2025 FlashI2V: Fourier-Guided Latent Shifting Prevents Conditional Image Leakage in Image-to-Video Generation


• Affiliation: Peking University, Shenzhen Graduate School
• Method Name: FlashI2V, Base Model: , Strategy: Flow Matching (FM) with Fourier-Guided Latent Shifting
Sep 2025 World-Env: Leveraging World Model as a Virtual Environment for VLA Post-Training

• Affiliation: School of Computer Science and Engineering, Sun Yat-sen University, China
• Method Name: World-Env, Base Model: OpenVLA-OFT, Strategy: PPO
Sep 2025 Fidelity-Aware Data Composition for Robust Robot Generalization

• Affiliation: UCAS-Terminus AI Lab, University of Chinese Academy of Sciences
• Method Name: Coherent Information Fidelity Tuning (CIFT), Base Model: Cosmos-Predict2-2B-Video2World, Strategy: Feature-Space Signal-to-Noise Ratio optimization for data composition
• Method Name: Multi-View Video Augmentation (MV Aug), Base Model: Cosmos-Predict2-2B-Video2World, Strategy: Latent diffusion transformer with periodic cross-view attention for video-to-video synthesis
Sep 2025 IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?


• Affiliation: Shanghai AI Lab, Zhejiang University
• Benchmark Name: IWR-Bench, Data Number: 113, Evaluation Metric: Interactive Functionality Score (IFS) and Visual Fidelity Score (VFS)
Sep 2025 Can you SPLICE it together? A Human Curated Benchmark for Probing Visual Reasoning in VLMs


• Affiliation: Institute of Cognitive Science, Osnabrück University, Osnabrück, Germany
• Benchmark Name: SPLICE, Data Number: 3381, Evaluation Metric: Binary Accuracy, Hamming Accuracy, Longest Common Subsequence, Edit Distance
Sep 2025 PoseDiff: A Unified Diffusion Model Bridging Robot Pose Estimation and Video-to-Action Control


• Affiliation: The University of Manchester
• Method Name: PoseDiff, Base Model: , Strategy: DDPM (Denoising Diffusion Probabilistic Model)
Sep 2025 NeMo: Needle in a Montage for Video-Language Understanding


• Affiliation: The Chinese University of Hong Kong
• Benchmark Name: NeMoBench, Data Number: 31,378, Evaluation Metric: Recall@1x, tIoU=0.7, Recall@1x, tIoU=0.5, Average mAP
Sep 2025 Training Agents Inside of Scalable World Models


• Affiliation: Google DeepMind
• Method Name: Dreamer 4, Base Model: , Strategy: PMPO (Preference optimization as probabilistic inference) with task-conditioned policy and reward modeling
Sep 2025 Rethinking JEPA: Compute-Efficient Video SSL with Frozen Teachers

• Affiliation: Apple
• Method Name: SALT (Static-teacher Asymmetric Latent Training), Base Model: , Strategy: Two-stage self-supervised learning with frozen teacher for video representation learning
Sep 2025 Reinforcement Learning with Inverse Rewards for World Model Post-training

• Affiliation: Microsoft Research
• Method Name: Reinforcement Learning with Inverse Rewards (RLIR), Base Model: , Strategy: Group Relative Policy Optimization (GRPO)
Sep 2025 AssemblyHands-X: Modeling 3D Hand-Body Coordination for Understanding Bimanual Human Activities

• Affiliation: The University of Tokyo, Tokyo, Japan
• Benchmark Name: AssemblyHands-X, Data Number: , Evaluation Metric:
Sep 2025 ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis

• Affiliation: Alibaba Group
• Method Name: ReWatch-R1, Base Model: Qwen2.5-VL-7B, Strategy: GRPO (Group Relative Policy Optimization)
Sep 2025 WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving


• Affiliation: Nankai University
Sep 2025 VideoScore2: Think before You Score in Generative Video Evaluation



• Affiliation: University of Illinois Urbana-Champaign
• Method Name: VIDEOSCORE2, Base Model: Qwen2.5-VL-7B-Instruct, Strategy: Group Relative Policy Optimization (GRPO)
• Benchmark Name: VIDEOSCORE-BENCH-V2, Data Number: 500, Evaluation Metric: Accuracy, Relaxed Accuracy, PLCC
Sep 2025 Learning Human-Perceived Fakeness in AI-Generated Videos via Multimodal LLMs


• Affiliation: Princeton University
• Benchmark Name: DEEPTRACEREWARD, Data Number: 4334, Evaluation Metric: Accuracy, Explanation score, BBox IoU, BBox Distance, Time Distance
Sep 2025 WoW: Towards a World omniscient World model Through Embodied Interaction



• Affiliation: Beijing Innovation Center of Humanoid Robotics
• Method Name: WoW, Base Model: Cosmos2, Strategy: GRPO
• Method Name: SOPHIA, Base Model: , Strategy: Self-optimizing framework with critic-refiner loop
• Benchmark Name: WoWBench, Data Number: 606, Evaluation Metric: FVD, SSIM, PSNR, DINO, Dreamsim, Mask-guided Regional Consistency, Instruction Understanding, Physical common sense, Planning and Task Decomposition
Sep 2025 Drag4D: Align Your Motion with Text-Driven 3D Scene Generation


• Affiliation: KAIST
• Method Name: Local-Global DragAnything, Base Model: , Strategy: Motion-conditioned video diffusion with part-augmented trajectory guidance
• Benchmark Name: Drag4D-30, Data Number: 30, Evaluation Metric: CLIP-Score, Sharp, Colorful, Quality, PSNR, SSIM
Sep 2025 StableDub: Taming Diffusion Prior for Generalized and Efficient Visual Dubbing


• Affiliation: nan
• Method Name: StableDub, Base Model: , Strategy: Diffusion-based visual dubbing with lip-habit-modulated mechanism and occlusion-aware training strategy
Sep 2025 DiTraj: training-free trajectory control for video diffusion transformer


• Affiliation: Beijing University of Posts and Telecommunications
• Method Name: DiTraj, Base Model: Wan2.1, CogVideoX, Strategy: Foreground-background separation guidance and STD-RoPE position embedding modification
Sep 2025 Can AI Perceive Physical Danger and Intervene?


• Affiliation: Google DeepMind Robotics
• Benchmark Name: ASIMOV-2.0, Data Number: 319, Evaluation Metric: Latent risk accuracy, Latent risk severity accuracy, Action effect accuracy, Activated risk accuracy
• Benchmark Name: ASIMOV-2.0-Video, Data Number: 287, Evaluation Metric: Injury risk accuracy, Latent risk and severity accuracy, Last intervention timestamp MAE, Intervention rate
• Benchmark Name: ASIMOV-2.0-Constraints, Data Number: 164, Evaluation Metric: Constraint violation rate
Sep 2025 VideoJudge: Bootstrapping Enables Scalable Supervision of MLLM-as-a-Judge for Video Understanding


• Affiliation: Carnegie Mellon University
• Method Name: VideoJudge, Base Model: Qwen2.5-VL, Strategy: Generator-evaluator bootstrapping with iterative refinement and feedback
• Benchmark Name: VideoJudgeLLaVA-MetaEval, Data Number: , Evaluation Metric: RMSE, MAE, Spearman, Pearson, ECE, PSup, Delta(C-D)
• Benchmark Name: VideoJudgeVCG-MetaEval, Data Number: , Evaluation Metric: RMSE, MAE, Spearman, Pearson, ECE, PSup, Delta(C-D)
• Benchmark Name: VideoJudge-Pairwise, Data Number: , Evaluation Metric: Accuracy
• Benchmark Name: VideoJudge-Pairwise-H, Data Number: 200, Evaluation Metric: Accuracy
Sep 2025 MOSS-ChatV: Reinforcement Learning with Process Reasoning Reward for Video Temporal Reasoning


• Affiliation: HKUST (GZ)
• Method Name: MOSS-ChatV, Base Model: Qwen2.5-7B, Strategy: GRPO
• Benchmark Name: MOSS-Video, Data Number: 11654, Evaluation Metric: accuracy
Sep 2025 VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception



• Affiliation: Zhejiang University
• Method Name: VTTS, Base Model: Qwen2.5-VL-7B, Strategy: GRPO
• Benchmark Name: VTTS-80K, Data Number: 80000, Evaluation Metric:
Sep 2025 KeyWorld: Key Frame Reasoning Enables Effective and Efficient World Models


• Affiliation: Department of Electronic Engineering, BNRist, Tsinghua University
• Method Name: KeyWorld, Base Model: CogVideoX1.5-5B-I2V, Strategy: Diffusion Transformer fine-tuning with motion-aware key frame generation and interpolation
Sep 2025 LLM Trainer: Automated Robotic Data Generating via Demonstration Augmentation using LLMs


• Affiliation: Carnegie Mellon University
• Method Name: LLM Trainer, Base Model: , Strategy: Thompson Sampling for multi-armed bandit optimization of demonstration annotations
Sep 2025 SynchroRaMa : Lip-Synchronized and Emotion-Aware Talking Face Generation via Multi-Modal Emotion Embedding


• Affiliation: IIT Ropar, India
• Method Name: SynchroRaMa, Base Model: Stable Diffusion 1.5, Strategy: Diffusion-based generation with multi-modal emotion embedding and audio-to-motion alignment
Sep 2025 When Words Can't Capture It All: Towards Video-Based User Complaint Text Generation with Multimodal Video Complaint Dataset


• Affiliation: Indian Institute of Technology Patna
• Benchmark Name: ComVID, Data Number: 1175, Evaluation Metric: CR score, BLEU, ROUGE, BERTScore, MoverScore, METEOR, Perplexity, Flesch Reading Ease, Coleman-Liau Index
Sep 2025 Talking Head Generation via AU-Guided Landmark Prediction

• Affiliation: Stony Brook University
• Method Name: Variational Motion Generator (VMG), Base Model: , Strategy: Conditional Variational Autoencoder with flow-based prior and dilated convolutional architecture
Sep 2025 From Prompt to Progression: Taming Video Diffusion Models for Seamless Attribute Transition



• Affiliation: National Yang Ming Chiao Tung University
• Benchmark Name: Controlled-Attribute-Transition Benchmark (CAT-Bench), Data Number: 120, Evaluation Metric: Wholistic Transition Score, Frame-wise Transition Score
Sep 2025 EgoBridge: Domain Adaptation for Generalizable Imitation from Egocentric Human Data


• Affiliation: Georgia Institute of Technology
• Method Name: EgoBridge, Base Model: , Strategy: Optimal Transport (OT) with Dynamic Time Warping (DTW) cost function for domain adaptation between human and robot data
Sep 2025 VIR-Bench: Evaluating Geospatial and Temporal Understanding of MLLMs via Travel Video Itinerary Reconstruction


• Affiliation: Waseda University
• Benchmark Name: VIR-Bench, Data Number: 200, Evaluation Metric: F1 score
Sep 2025 VLN-Zero: Rapid Exploration and Cache-Enabled Neurosymbolic Vision-Language Planning for Zero-Shot Transfer in Robot Navigation


• Affiliation: University of Texas at Austin
• Method Name: VLN-Zero, Base Model: , Strategy: vision-language model guided exploration with neurosymbolic navigation, hierarchical caching, and constraint-satisfying action generation
Sep 2025 ComposableNav: Instruction-Following Navigation in Dynamic Environments via Composable Diffusion



• Affiliation: Department of Computer Science, The University of Texas at Austin
• Method Name: ComposableNav, Base Model: , Strategy: Denoising Diffusion Policy Optimization (DDPO) and PPO
Sep 2025 $\mathtt{M^3VIR}$: A Large-Scale Multi-Modality Multi-View Synthesized Benchmark Dataset for Image Restoration and Content Creation

• Affiliation: Santa Clara University
• Benchmark Name: M3VIR, Data Number: 43200, Evaluation Metric: PSNR, SSIM, LPIPS, FID, DISTS
Sep 2025 Video-to-BT: Generating Reactive Behavior Trees from Human Demonstration Videos for Robotic Assembly


• Affiliation: Munich Institute of Robotics and Machine Intelligence (MIRMI), Technical University of Munich, Germany
• Method Name: Video-to-BT, Base Model: , Strategy: Behavior Tree-based execution with recovery mechanism
Sep 2025 Captioning for Text-Video Retrieval via Dual-Group Direct Preference Optimization


• Affiliation: Korea University
• Method Name: CaRe-DPO, Base Model: VideoChat-Flash-7B, Strategy: DG-DPO (Dual-Group Direct Preference Optimization)
Sep 2025 RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation

• Affiliation: SKL-IOTSC, Computer and Information Science, University of Macau
• Method Name: RLGF, Base Model: , Strategy: Reinforcement Learning with Geometric Feedback (specifically using LoRA-based optimization with latent-space windowing and hierarchical geometric rewards)
Sep 2025 Beyond Video-to-SFX: Video to Audio Synthesis with Environmentally Aware Speech


• Affiliation: Australian National University
Sep 2025 PhysicalAgent: Towards General Cognitive Robotics with Foundation World Models

• Affiliation: Intelligent Robotics Laboratory, Skolkovo Institute of Science and Technology (Skoltech), Bolshoy Boulevard 30, bld. 1, Moscow 121205, Russia
Sep 2025 RewardDance: Reward Scaling in Visual Generation

• Affiliation: ByteDance Seed
• Method Name: RewardDance, Base Model: InternVL, Strategy: ReFL
Sep 2025 GeneVA: A Dataset of Human Annotations for Generative Text to Video Artifacts

• Affiliation: New York University
• Benchmark Name: GeneVA, Data Number: 16356, Evaluation Metric: Average Precision (AP) scores at various IoU thresholds
Sep 2025 BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models

• Affiliation: Peking University
• Method Name: BranchGRPO, Base Model: Wan2.1-1.3B, Strategy: GRPO
Sep 2025 Coefficients-Preserving Sampling for Reinforcement Learning with Flow Matching

• Affiliation: CreateAI (https://www.iamcreate.ai/)
• Method Name: Coefficients-Preserving Sampling (CPS), Base Model: SD3.5-M, FLUX.1-schnell, FLUX.1-dev, Strategy: GRPO
Sep 2025 ManipDreamer3D : Synthesizing Plausible Robotic Manipulation Video with Occupancy-aware 3D Trajectory

• Affiliation: State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
• Method Name: ManipDreamer3D, Base Model: , Strategy:
Sep 2025 PixFoundation 2.0: Do Video Multi-Modal LLMs Use Motion in Visual Grounding?


• Affiliation: nan
• Benchmark Name: MoCentric-Bench, Data Number: , Evaluation Metric: J (Region similarity), F (Contour accuracy), J&F (Average)
Sep 2025 FantasyHSI: Video-Generation-Centric 4D Human Synthesis In Any Scene through A Graph-based Multi-Agent Framework



• Affiliation: AMAP, Alibaba Group; Tsinghua University
• Method Name: FantasyHSI, Base Model: Wan2.1-I2V-14B, Strategy: DPO
• Benchmark Name: SceneBench, Data Number: 120, Evaluation Metric: Penetration Obstacle Score (POS), Reaction Divergence Score (RDS)
Sep 2025 InterPose: Learning to Generate Human-Object Interactions from Large-Scale Web Videos



• Affiliation: Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI)
• Benchmark Name: InterPose, Data Number: 73,814, Evaluation Metric:
Aug 2025 EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control




• Affiliation: Shanghai AI Laboratory
• Method Name: EO-1, Base Model: Qwen2.5-VL, Strategy: flow matching denoising with auto-regressive decoding
• Benchmark Name: EO-Bench, Data Number: 648, Evaluation Metric: completion score, accuracy
Aug 2025 Dress&Dance: Dress up and Dance as You Like It - Technical Preview



• Affiliation: University of Illinois Urbana-Champaign
• Method Name: Dress&Dance, Base Model: , Strategy: Diffusion-based video generation with CondNet conditioning network, multi-stage progressive training, and curriculum learning
• Benchmark Name: Internet video dataset, Data Number: 80000, Evaluation Metric: PSNR, SSIM, LPIPS VGG, LPIPS AlexNet
• Benchmark Name: Captured video dataset, Data Number: 18300, Evaluation Metric: PSNR, SSIM, LPIPS VGG, LPIPS AlexNet
Aug 2025 Droplet3D: Commonsense Priors from Videos Facilitate 3D Generation



• Affiliation: IEIT System Co., Ltd.
• Method Name: GRPO, Base Model: , Strategy: Group Relative Policy Optimization (GRPO)
• Benchmark Name: Droplet3D-4M, Data Number: 4 million, Evaluation Metric: PSNR, SSIM, LPIPS, MSE, CLIP-S
Aug 2025 InfinityHuman: Towards Long-Term Audio-Driven Human


• Affiliation: ByteDance
• Method Name: InfinityHuman, Base Model: , Strategy: reward feedback learning
Aug 2025 Context-Aware Zero-Shot Anomaly Detection in Surveillance Using Contrastive and Predictive Spatiotemporal Modeling


• Affiliation: Department of Computer Science and Engineering, BRAC University, Dhaka, Bangladesh
• Method Name: Context-Aware Zero-Shot Anomaly Detection, Base Model: , Strategy: Contrastive and Predictive Spatiotemporal Modeling with InfoNCE and CPC losses
Aug 2025 Text-Driven 3D Hand Motion Generation from Sign Language Data



• Affiliation: LIGM, École des Ponts, IP Paris, Univ Gustave Eiffel, CNRS
• Method Name: HandMDM, Base Model: , Strategy: Diffusion models (not RL-based)
• Benchmark Name: BOBSL3DT, Data Number: 1312339, Evaluation Metric: R@1, R@3, FID
Aug 2025 Multi-Object Sketch Animation with Grouping and Motion Trajectory Priors


• Affiliation: Beihang University
• Method Name: GroupSketch, Base Model: , Strategy: Score Distillation Sampling (SDS)
Aug 2025 TPA: Temporal Prompt Alignment for Fetal Congenital Heart Defect Classification


• Affiliation: Department of Machine Learning, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
• Method Name: Temporal Prompt Alignment (TPA), Base Model: , Strategy: Contrastive Learning with Margin-Hinge Loss
• Method Name: Conditional Variational Autoencoder Style Modulation (CVAESM), Base Model: , Strategy: KL Divergence Regularization
Aug 2025 Beyond Simple Edits: Composed Video Retrieval with Dense Modifications


• Affiliation: Mohamed bin Zayed University of AI
• Benchmark Name: Dense-WebVid-CoVR, Data Number: 1.6 million, Evaluation Metric: Recall@K
Aug 2025 PhysGM: Large Physical Gaussian Model for Feed-Forward 4D Synthesis



• Affiliation: Beijing Institute of Technology
• Method Name: PhysGM, Base Model: , Strategy: DPO
• Benchmark Name: PhysAssets Dataset, Data Number: 24000+, Evaluation Metric:
Aug 2025 MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents


• Affiliation: ByteDance
• Benchmark Name: MM-BrowseComp, Data Number: 224, Evaluation Metric: Overall Accuracy (OA), Strict Accuracy (SA), Average Checklist Score (AVG CS)
Aug 2025 Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey


• Affiliation: Harbin Institute of Technology (Shenzhen)
• Paper Number: 244
Aug 2025 Express4D: Expressive, Friendly, and Extensible 4D Facial Motion Generation Benchmark


• Affiliation: Tel Aviv University
• Benchmark Name: Express4D, Data Number: 1205, Evaluation Metric: FID, R-precision, Diversity, Multimodal Distance
Aug 2025 VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models


• Affiliation: Harbin Institute of Technology (Shenzhen)
• Method Name: McDPO, Base Model: Phi3-3.8B, Strategy: DPO
Aug 2025 CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models



• Affiliation: Fudan University
• Method Name: CineTrans, Base Model: , Strategy: Masked Diffusion with Attention Mechanism
• Benchmark Name: Cine250K, Data Number: 250K, Evaluation Metric: Transition Control Score, Inter-shot Consistency, Intra-shot Consistency, Aesthetic Quality, Semantic Consistency
Aug 2025 FantasyTalking2: Timestep-Layer Adaptive Preference Optimization for Audio-Driven Portrait Animation



• Affiliation: AMAP, Alibaba Group
• Method Name: Timestep-Layer adaptive multi-expert Preference Optimization (TLPO), Base Model: Wan2.1, Strategy: DPO
• Benchmark Name: Talking-NSQ, Data Number: 410K, Evaluation Metric: Preference Accuracy
Aug 2025 Hierarchical Fine-grained Preference Optimization for Physically Plausible Video Generation

• Affiliation: The Hong Kong University of Science and Technology
• Method Name: PhysHPO, Base Model: CogVideoX-2B, CogVideoX-5B, HunyuanVideo-540p, Strategy: DPO (Direct Preference Optimization)
Aug 2025 Integrating Reinforcement Learning with Visual Generative Models: Foundations and Advances

• Affiliation: Institute of Artificial Intelligence (TeleAI), China Telecom.
• Paper Number: 164
Aug 2025 ViMoNet: A Multimodal Vision-Language Framework for Human Behavior Understanding from Motion and Video

• Affiliation: Department of Computer Science, AIUB, Dhaka, Bangladesh
• Benchmark Name: ViMoNet-Bench, Data Number: , Evaluation Metric: GPT-3.5-turbo scoring (0-5)
Aug 2025 Animate-X++: Universal Character Image Animation with Dynamic Backgrounds



• Affiliation: School of Computing and Data Science, The University of Hong Kong
• Method Name: Animate-X++, Base Model: WanX2.1, Strategy: Multi-task training with partial parameter training and pose transformation simulation
• Benchmark Name: A2Bench, Data Number: 500, Evaluation Metric: PSNR, SSIM, L1, LPIPS, FID, FID-VID, FVD, CLIP Score, Background Consistency, Motion Smoothness, Aesthetic Quality, Image Quality
Aug 2025 Yan: Foundational Interactive Video Generation

• Affiliation: Tencent
• Method Name: Yan-Sim, Base Model: , Strategy: PPO
Aug 2025 Fine-grained Video Dubbing Duration Alignment with Segment Supervised Preference Optimization


• Affiliation: Alibaba Digital Media and Entertainment Group
• Method Name: SSPO (Segment Supervised Preference Optimization), Base Model: Llama3.1-8B-Chinese-Chat, GLM-4-9B-Chat, Qwen2.5-14B-Instruct, Strategy: DPO (Direct Preference Optimization)
Aug 2025 BigTokDetect: A Clinically-Informed Vision-Language Modeling Framework for Detecting Pro-Bigorexia Videos on TikTok

• Affiliation: USC Information Sciences Institute
• Benchmark Name: BigTok, Data Number: 2210, Evaluation Metric: Accuracy, Precision, Recall, F1-score
Aug 2025 SwiftVideo: A Unified Framework for Few-Step Video Generation through Trajectory-Distribution Alignment

• Affiliation: Fudan University
• Method Name: SwiftVideo, Base Model: Wan2.1-FUN-inp-480p-1.3B, Strategy: DPO
Aug 2025 V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models


• Affiliation: Yonsei University
• Method Name: ReDPO, Base Model: , Strategy: DPO
• Method Name: V.I.P., Base Model: , Strategy: DPO
Aug 2025 V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models


• Affiliation: Yonsei University
• Method Name: ReDPO, Base Model: None, Strategy: DPO
• Method Name: V.I.P., Base Model: None, Strategy: DPO
Jul 2025 Controllable Video Generation: A Survey


• Affiliation: The Hong Kong University of Science and Technology
• Paper Number: 416
Jul 2025 Controllable Video Generation: A Survey


• Affiliation: The Hong Kong University of Science and Technology, Hong Kong SAR
• Paper Number: 416
Jul 2025 Show and Polish: Reference-Guided Identity Preservation in Face Video Restoration



• Affiliation: Zhejiang University
• Method Name: IP-FVR, Base Model: , Strategy: identity-preserving feedback learning
• Benchmark Name: YouRef, Data Number: , Evaluation Metric: PSNR, SSIM, LPIPS, CLIP-IQA, MUSIQ, LIQE, IDS, 𝐸𝑤𝑎𝑟𝑝, 𝜎𝐼𝐷𝑆
Jul 2025 EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation


• Affiliation: Terminal Technology Department, Alipay, Ant Group
• Method Name: EchoMimicV3, Base Model: Wan2.1-FUN-inp-480p-1.3B, Strategy: DPO
Jul 2025 EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation

• Affiliation: Terminal Technology Department, Alipay, Ant Group
• Method Name: EchoMimicV3, Base Model: Wan2.1-FUN-inp-480p-1.3B, Strategy: DPO
Jul 2025 LongAnimation: Long Animation Generation with Dynamic Global-Local Memory

• Affiliation: University of Science and Technology of China
• Method Name: LongAnimation, Base Model: CogVideoX-1.5-5B, Strategy: NGR
Jun 2025 Video Perception Models for 3D Scene Synthesis


• Affiliation: Tsinghua University
Jun 2025 RDPO: Real Data Preference Optimization for Physics Consistency Video Generation

• Affiliation: Fudan University
• Method Name: Real Data Preference Optimization (RDPO), Base Model: LTX-Video-2B, Strategy: DPO
Jun 2025 VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning

• Affiliation: School of Electronic and Computer Engineering, Peking University
• Method Name: VQ-Insight, Base Model: Qwen-2.5-VL-7B-Instruct, Strategy: GRPO
Jun 2025 Toward Rich Video Human-Motion2D Generation



• Affiliation: Tongji University
• Method Name: RVHM2D, Base Model: None, Strategy: Fine-tuning with an FID-based reward
• Benchmark Name: Motion2D-Video-150K, Data Number: 150000, Evaluation Metric: R-Precision, FID, MM Dist, Diversity
Jun 2025 AlignHuman: Improving Motion and Fidelity via Timestep-Segment Preference Optimization for Audio-Driven Human Animation


• Affiliation: ByteDance
• Method Name: AlignHuman, Base Model: , Strategy: Timestep-Segment Preference Optimization (TPO)
Jun 2025 Multimodal Large Language Models: A Survey


• Affiliation: School of Architecture, Technology and Engineering, University of Brighton
• Method Name: Video Diffusion Alignment via Reward Gradients, Base Model: , Strategy: Reward Gradients
• Method Name: Diffusion Model Alignment Using Direct Preference Optimization, Base Model: , Strategy: Direct Preference Optimization
• Method Name: VADER, Base Model: , Strategy: Backpropagating Reward Gradients
• Benchmark Name: MJ-VIDEO, Data Number: , Evaluation Metric: Fine-Grained Benchmarking and Rewarding Video Preferences
• Benchmark Name: VideoScore, Data Number: , Evaluation Metric: Simulating Fine-grained Human Feedback for Video Generation
Jun 2025 Multimodal Large Language Models: A Survey


• Affiliation: School of Architecture, Technology and Engineering, University of Brighton, United Kingdom
• Method Name: Video Diffusion Alignment via Reward Gradients, Base Model: , Strategy: Reward Gradients
• Method Name: Diffusion Model Alignment Using Direct Preference Optimization, Base Model: , Strategy: Direct Preference Optimization (DPO)
• Method Name: VADER, Base Model: , Strategy: Backpropagating Reward Gradients
• Benchmark Name: MJ-VIDEO, Data Number: , Evaluation Metric: Fine-Grained Video Preferences
• Benchmark Name: VideoScore, Data Number: , Evaluation Metric: Simulating Fine-Grained Human Feedback
Jun 2025 Seedance 1.0: Exploring the Boundaries of Video Generation Models


• Affiliation: ByteDance
• Method Name: Human Feedback Alignment (RLHF), Base Model: , Strategy: Reward feedback learning with multiple reward models (Foundational Reward Model, Motion Reward Model, Aesthetic Reward Model)
Jun 2025 Seedance 1.0: Exploring the Boundaries of Video Generation Models


• Affiliation: ByteDance
• Method Name: Human Feedback Alignment (RLHF), Base Model: , Strategy: Reward Maximization with Multi-Dimensional Reward Models
Jun 2025 ContentV: Efficient Training of Video Generation Models with Limited Compute


• Affiliation: ByteDance Douyin Content Group
• Method Name: Reinforcement Learning from Human Feedback (RLHF), Base Model: Stable Diffusion 3.5 Large (SD3.5L), Strategy: RLHF
Jun 2025 ContentV: Efficient Training of Video Generation Models with Limited Compute


• Affiliation: ByteDance Douyin Content Group
• Method Name: Reinforcement Learning from Human Feedback (RLHF), Base Model: Stable Diffusion 3.5 Large (SD3.5L), Strategy: Optimizing conditional distribution pθ(x1|c) with reward model r(c, x1) and KL-divergence regularization
May 2025 Photography Perspective Composition: Towards Aesthetic Perspective Recommendation


• Affiliation: East China University of Science and Technology
• Method Name: Photography Perspective Composition (PPC), Base Model: , Strategy: DPO
May 2025 Scaling Image and Video Generation via Test-Time Evolutionary Search


• Affiliation: Hong Kong University of Science and Technology
• Method Name: EvoSearch, Base Model: , Strategy:
May 2025 InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPO



• Affiliation: MAPLE Lab, Westlake University
• Method Name: InfLVG, Base Model: , Strategy: GRPO
• Benchmark Name: CsVBench, Data Number: 1000, Evaluation Metric: HPSv2, Aesthetic Score, CLIP-Flan, ViCLIP, ArcFace-42M, ArcFace-360K, QWen
May 2025 AvatarShield: Visual Reinforcement Learning for Human-Centric Video Forgery Detection



• Affiliation: School of Electronic and Computer Engineering, Peking University
• Method Name: AvatarShield, Base Model: Qwen2.5-VL-7B, Strategy: GRPO
• Benchmark Name: FakeHumanVid, Data Number: 15000, Evaluation Metric: AUC
May 2025 RLVR-World: Training World Models with Reinforcement Learning


• Affiliation: School of Software, BNRist, Tsinghua University
• Method Name: RLVR-World, Base Model: , Strategy: GRPO
May 2025 Temporally-Grounded Language Generation: A Benchmark for Real-Time Vision-Language Models

• Affiliation: University of Michigan
• Benchmark Name: Temporally-Grounded Language Generation (TGLG), Data Number: 16487, Evaluation Metric: TRACE
May 2025 Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models


• Affiliation: MMLab, CUHK, Hong Kong
• Method Name: Negative Preference Optimization (NPO), Base Model: , Strategy: Diffusion-NPO
May 2025 DanceGRPO: Unleashing GRPO on Visual Generation


• Affiliation: ByteDance Seed
• Method Name: DanceGRPO, Base Model: , Strategy: GRPO
May 2025 VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding

• Affiliation: University of Maryland, College Park
• Benchmark Name: VideoHallu, Data Number: 3000, Evaluation Metric:
Apr 2025 TesserAct: Learning 4D Embodied World Models


• Affiliation: UMass Amherst
• Method Name: TesserAct, Base Model: , Strategy:
Apr 2025 Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning

• Affiliation: Zhejiang University
• Method Name: Phys-AR, Base Model: Llama3.1-8B, Strategy: GRPO
Apr 2025 SkyReels-V2: Infinite-length Film Generative Model


• Affiliation: Skywork AI
• Method Name: SkyReels-V2, Base Model: Qwen2-VL-7B, Strategy: DPO
Apr 2025 FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos


• Affiliation: AMAP, Alibaba Group
• Method Name: FingER, Base Model: Qwen2.5-VL, Strategy: GRPO
• Benchmark Name: FingER-Instruct-60k, Data Number: 60000, Evaluation Metric:
Apr 2025 Aligning Anime Video Generation with Human Feedback


• Affiliation: Fudan University
• Method Name: Gap-Aware Preference Optimization (GAPO), Base Model: , Strategy: Direct Preference Optimization (DPO)
• Benchmark Name: AnimeReward, Data Number: 30000, Evaluation Metric: multi-dimensional reward scores
Apr 2025 Discriminator-Free Direct Preference Optimization for Video Diffusion

• Affiliation: Zhejiang University
• Method Name: Discriminator-Free Video Preference Optimization (DF-VPO), Base Model: , Strategy: DPO
Apr 2025 Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments


• Affiliation: University of Trento, Italy
• Benchmark Name: Morpheus, Data Number: 80, Evaluation Metric: Dynamical Score
Apr 2025 OmniCam: Unified Multimodal Video Generation via Camera Control


• Affiliation: Zhejiang University
• Method Name: OmniCam, Base Model: Llama3.1, Strategy: PPO
• Benchmark Name: OmniTr, Data Number: 1000 trajectories, 10,000 descriptions, 30,000 videos, Evaluation Metric: Mstarttime, Mendtime, Mspeed, Mrotate, Mdirection
Mar 2025 VPO: Aligning Text-to-Video Generation Models with Prompt Optimization


• Affiliation: The Conversational Artificial Intelligence (CoAI) Group, Tsinghua University
• Method Name: VPO, Base Model: LLaMA3-8B-Instruct, Strategy: DPO
Mar 2025 Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors


• Affiliation: The University of Hong Kong
• Method Name: Physics-based HOI Refinement, Base Model: , Strategy: Actor-Critic with Gaussian Policy
Mar 2025 Judge Anything: MLLM as a Judge Across Any Modality


• Affiliation: Huazhong University of Science and Technology
• Benchmark Name: TASKANYTHING, Data Number: 1500, Evaluation Metric:
• Benchmark Name: JUDGE ANYTHING, Data Number: 9000, Evaluation Metric: Agreement, Pearson correlation, Spearman correlation, MAE, Accuracy
Mar 2025 MagicID: Hybrid Preference Optimization for ID-Consistent and Dynamic-Preserved Video Customization


• Affiliation: Zhejiang University
• Method Name: MagicID, Base Model: , Strategy: DPO
Mar 2025 Unified Reward Model for Multimodal Understanding and Generation



• Affiliation: Fudan University
• Method Name: UnifiedReward, Base Model: LLaVA-OneVision-7B, Strategy: DPO
Feb 2025 Pre-Trained Video Generative Models as World Simulators

• Affiliation: Hong Kong University of Science and Technology
• Method Name: Dynamic World Simulation (DWS), Base Model: , Strategy: PPO
Feb 2025 Harness Local Rewards for Global Benefits: Effective Text-to-Video Generation Alignment with Patch-level Reward Models

• Affiliation: Carnegie Mellon University
• Method Name: HALO, Base Model: , Strategy: DPO
Feb 2025 IPO: Iterative Preference Optimization for Text-to-Video Generation

• Affiliation: Shanghai Academy of Artificial Intelligence for Science
• Method Name: Iterative Preference Optimization (IPO), Base Model: , Strategy: DPO
Feb 2025 MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation



• Affiliation: UNC-Chapel Hill
• Benchmark Name: MJ-BENCH-VIDEO, Data Number: 5421, Evaluation Metric:
Feb 2025 HuViDPO:Enhancing Video Generation through Direct Preference Optimization for Human-Centric Alignment


• Affiliation: Zhejiang University
• Method Name: HuViDPO, Base Model: , Strategy: DPO
Feb 2025 Zeroth-order Informed Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer


• Affiliation: Guanghua School of Management, Peking University
• Method Name: Recursive Likelihood Ratio (RLR) optimizer, Base Model: , Strategy:
Jan 2025 Improving Video Generation with Human Feedback



• Affiliation: The Chinese University of Hong Kong
• Method Name: Flow-DPO, Base Model: , Strategy: DPO
• Method Name: Flow-RWR, Base Model: , Strategy: RWR
• Method Name: Flow-NRG, Base Model: , Strategy: Reward Guidance
• Benchmark Name: VideoGen-RewardBench, Data Number: 26500, Evaluation Metric:
Dec 2024 VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation


• Affiliation: Tsinghua University
• Method Name: Multi-Objective Preference Optimization (MPO), Base Model: , Strategy: DPO
Dec 2024 OnlineVPO: Align Video Diffusion Model with Online Video-Centric Preference Optimization


• Affiliation: The University of Hong Kong
• Method Name: OnlineVPO, Base Model: , Strategy: DPO
Dec 2024 VideoDPO: Omni-Preference Alignment for Video Diffusion Generation


• Affiliation: HKUST
• Method Name: VideoDPO, Base Model: , Strategy: DPO
Dec 2024 FLIP: Flow-Centric Generative Planning for General-Purpose Manipulation Tasks


• Affiliation: National University of Singapore
• Method Name: FLIP, Base Model: , Strategy: model-based planning
Dec 2024 The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control


• Affiliation: Tongyi Lab
• Method Name: The Matrix, Base Model: , Strategy: Shift-Window Denoising Process Model (Swin-DPM)
Dec 2024 Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback


• Affiliation: The University of Tokyo
• Method Name: RL-Finetuning for Text-to-Video Models, Base Model: , Strategy: RWR, DPO
Nov 2024 Free$^2$Guide: Gradient-Free Path Integral Control for Enhancing Text-to-Video Generation with Large Vision-Language Models


• Affiliation: Kim Jaechul Graduate School of AI, KAIST
• Method Name: Free2Guide, Base Model: , Strategy: Path Integral Control
Nov 2024 A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model

• Affiliation: SSE, The Chinese University of Hong Kong, Shenzhen
• Method Name: RL-based editing framework, Base Model: , Strategy: actor-critic
Oct 2024 Video to Video Generative Adversarial Network for Few-shot Learning Based on Policy Gradient

• Affiliation: Northwestern University
• Method Name: RL-V2V-GAN, Base Model: , Strategy: Policy Gradient
Oct 2024 WorldSimBench: Towards Video Generation Models as World Simulators


• Affiliation: The Chinese University of Hong Kong, Shenzhen
• Benchmark Name: WorldSimBench, Data Number: 35701, Evaluation Metric: Human Preference Evaluator
Oct 2024 Animating the Past: Reconstruct Trilobite via Video Generation

• Affiliation: AI Lab, Yishi Inc.
• Method Name: Automatic T2V Prompt Learning Method, Base Model: , Strategy: KTO
Oct 2024 VideoAgent: Self-Improving Video Generation


• Affiliation: University of Waterloo
• Method Name: VideoAgent, Base Model: , Strategy: self-improvement through online finetuning
Oct 2024 E-Motion: Future Motion Simulation via Event Sequence Diffusion


• Affiliation: Xidian University
• Method Name: Event-Sequence Diffusion Network, Base Model: , Strategy: PPO
Oct 2024 DART: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control


• Affiliation: ETH Z__rich
• Method Name: DART, Base Model: , Strategy: PPO
Oct 2024 SePPO: Semi-Policy Preference Optimization for Diffusion Alignment


• Affiliation: University of Rochester
• Method Name: SePPO, Base Model: , Strategy: DPO
Jul 2024 Video Diffusion Alignment via Reward Gradients


• Affiliation: Carnegie Mellon University
• Method Name: VADER, Base Model: , Strategy: Reward Gradients
Dec 2023 InstructVideo: Instructing Video Diffusion Models with Human Feedback


• Affiliation: Zhejiang University
• Method Name: InstructVideo, Base Model: , Strategy: reward fine-tuning
Nov 2023 AdaDiff: Adaptive Step Selection for Fast Diffusion Models

• Affiliation: Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University
• Method Name: AdaDiff, Base Model: , Strategy: policy gradient

💪 How to Contribute

If you have a paper or are aware of relevant research that should be incorporated, please contribute via pull requests, issues, email, or other suitable methods.

About

A curated list of papers on reinforcement learning for video generation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published