Skip to content

0xPabloxx/llm-rl-papers-and-notes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

LLM强化学习论文与笔记

这是一个精心整理的关于大语言模型(LLM)、强化学习(RL)和智能体系统的论文合集。

📚 目录


基础模型与架构

Transformer架构

  1. Attention Is All You Need (2017)

早期预训练模型

  1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018)

  2. Improving Language Understanding by Generative Pre-Training (GPT-1) (2018)

  3. Language Models are Unsupervised Multitask Learners (GPT-2) (2019)

  4. Language Models are Few-Shot Learners (GPT-3) (2020)

  5. PaLM: Scaling Language Modeling with Pathways (2022)

  6. UL2: Unifying Language Learning Paradigms (2022)

现代大模型

  1. Qwen3 Technical Report (2025)

  2. DeepSeek-V3 Technical Report (2024)

  3. DeepSeek-OCR: Contexts Optical Compression (2025)

位置编码

  1. RoFormer: Enhanced Transformer with Rotary Position Embedding (2021)

  2. Kimi Linear: An Expressive, Efficient Attention Architecture (2025)


强化学习基础

策略优化算法

  1. Proximal Policy Optimization Algorithms (PPO) (2017)

  2. Direct Preference Optimization: Your Language Model is Secretly a Reward Model (DPO) (2023)

  3. Group Sequence Policy Optimization (GSPO) (2025)

  4. Group-in-Group Policy Optimization for LLM Agent Training (GiGPO) (2025)

RLHF相关

  1. Learning to summarize from human feedback (2020)

  2. Training language models to follow instructions with human feedback (InstructGPT) (2022)

  3. HybridFlow: A Flexible and Efficient RLHF Framework (2024)

  4. DAPO: An Open-Source LLM Reinforcement Learning System at Scale (2025)

RL理论与综述

  1. A Survey of Reinforcement Learning for Large Reasoning Models (2025)

  2. The Landscape of Agentic Reinforcement Learning for LLMs: A Survey (2025)


推理增强

Chain-of-Thought

  1. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (2022)

  2. Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning (2023)

  3. Chain of Thought Empowers Transformers to Solve Inherently Serial Problems (2024)

推理模型

  1. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (2025)

  2. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (2024)


智能体系统

智能体综述

  1. A Survey on Large Language Model based Autonomous Agents (2023)

  2. Cognitive Architectures for Language Agents (2023)

  3. A Comprehensive Survey of Self-Evolving AI Agents (2025)

智能体设计与实现

  1. Executable Code Actions Elicit Better LLM Agents (CodeAct) (2024)

  2. MetaAgent: Automatically Constructing Multi-Agent Systems Based on Finite State Machines (2025)

  3. Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation (2025)

  4. Trae Agent: An LLM-based Agent for Software Engineering with Test-time Scaling (2025)

  5. Alita: Generalist Agent Enabling Scalable Agentic Reasoning (2025)

实践指南

  1. Building Effective AI Agents (2024)

  2. How we built our multi-agent research system (OpenAI AgentKit) (2024-2025)


工具使用与代码生成

工具集成RL

  1. ToolRL: Reward is All Tool Learning Needs (2025)

  2. ToRL: Scaling Tool-Integrated RL (2025)

  3. ReTool: Reinforcement Learning for Strategic Tool Use in LLMs (2025)

  4. VeriTool: Towards Holistic Agentic Reinforcement Learning with Tool Use (2025)

数学推理与代码

  1. ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving (2024)

  2. PAL: Program-aided Language Models (2022)

领域特定代码生成

  1. A Survey on LLM-based Code Generation for Low-Resource and Domain-Specific Programming Languages (2024)

  2. A Comparative Study of DSL Code Generation: Fine-Tuning vs. Optimized RAG (2024)

  3. GenCoG: A DSL-Based Approach to Generating Computation Graphs for TVM Testing (2023)

  4. Aligning Requirement for Large Language Model's Code Generation (2025)

  5. Guiding LLM-based Smart Contract Generation with Finite State Machine (2025)


RAG与上下文工程

RAG综述与创新

  1. Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems (2025)

  2. ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization (2025)

上下文工程

  1. A Survey of Context Engineering for Large Language Models (2025)

  2. Context Engineering 2.0: The Context of Context Engineering (2025)


深度研究系统

深度研究综述

  1. Deep Research: A Survey of Autonomous Research Agents (2025)

  2. A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications (2025)

  3. Reinforcement Learning Foundations for Deep Research Systems: A Survey (2025)

搜索与研究智能体

  1. A Comprehensive Survey on Reinforcement Learning-based Agentic Search (2025)

  2. R1-Searcher: Incentivizing the Search Capability in LLMs via RL (2025)

  3. Search-R1: Training LLMs to Reason and Leverage Search Engines with RL (2025)

深度研究系统实现

  1. WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines (2025)

  2. WebSailor-V2: Bridging the Chasm to Proprietary Agents (2025)

  3. Tongyi DeepResearch: A New Era of Open-Source AI Researchers (2025)


领域特定应用

数据库与SQL

  1. A Survey of Text-to-SQL in the Era of LLMs (2024)

  2. Text-to-Pipeline: Bridging Natural Language and Data Preparation Pipelines (2025)

工作流与状态机

  1. StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows (2024)

  2. AutoFlow: Automated Workflow Generation for LLM Agents (2024)

  3. An Agentic Flow for Finite State Machine Extraction using Prompt Chaining (2025)

评测基准

  1. GAIA: A Benchmark for General AI Assistants (2023)

优化与训练技术

高效微调

  1. QLoRA: Efficient Finetuning of Quantized LLMs (2023)

  2. Muon is Scalable for LLM Training (2025)

其他技术

  1. Self-supervised Learning of Point Clouds via Orientation Estimation (2020)

📊 统计信息

  • 总论文数: 70篇
  • 时间跨度: 2017-2025年
  • arXiv论文: 66篇
  • 会议/期刊论文: 4篇

🔍 按年份分布

  • 2025年: 32篇(46%)
  • 2024年: 16篇(23%)
  • 2023年: 9篇(13%)
  • 2022年及以前: 13篇(18%)

📌 重点推荐阅读路径

入门路径

  1. Attention Is All You Need → BERT → GPT系列 → Chain-of-Thought
  2. PPO → RLHF → DPO → DeepSeek-R1

智能体开发路径

  1. ReAct → CodeAct → Building Effective AI Agents
  2. Cognitive Architectures → MetaAgent → Chain-of-Agents

工具使用路径

  1. PAL → ToRA → ToolRL → ToRL → ReTool

深度研究路径

  1. Deep Research Survey → WebWeaver → Tongyi DeepResearch → Search-R1

📝 论文笔记

详细的论文阅读笔记请参见 notes 目录。

🤝 贡献

欢迎提交PR添加新的论文或改进现有笔记!

📄 许可证

MIT License


最后更新: 2025-11-15

About

Papers and reading notes on LLMs and RL.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published