A curated list of awesome reinforcement learning resources in the context of LLMs and multimodal models. Doesn't cover robotics or other domains that are equally cool.
-
DeepSeek-R1 by The WHALE!
-
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
-
Maybe https://arxiv.org/pdf/2412.09413 (or is this distill)??
-
Teaching Large Language Models to Reason with Reinforcement Learning
-
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
-
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
-
🐱 KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
-
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
-
SimpleRL-reason
-
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
-
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use
- verl
- GRPO from Unsloth
- EasyR1 from hiyouga@ of LLaMA factory
- HuggingFace TRL
- Verifiers (github repo)
- Search-r1
- open-thoughts/OpenThoughts2-1M
- open-thoughts/OpenThoughts-114k
- nvidia/OpenCodeReasoning
- open-r1/OpenR1-Math-220k
- GeneralReasoning/GeneralThought-195K
- PrimeIntellect/SYNTHETIC-1
- facebook/Natural-reasoning
- SynthLabsAI/Big-Math-RL-Verified
- Congliu/Chinese-DeepSeek-R1-Distill-data-110k
- FreedomIntelligence/medical-o1-reasoning-SFT
- FreedomIntelligence/Medical-R1-Distill-Data-Chinese
- reinforcement-learning resources (GitHub repo)
- The classic book on RL: Reinforcement Learning: An Introduction
- Playing Atari with Deep Reinforcement Learning
- RL Course by David Silver