这是一个精心整理的关于大语言模型(LLM)、强化学习(RL)和智能体系统的论文合集。
- Attention Is All You Need (2017)
- 链接: https://arxiv.org/abs/1706.03762
- 描述: Transformer架构的奠基性论文,提出了自注意力机制,彻底改变了NLP领域
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018)
- 链接: https://arxiv.org/abs/1810.04805
- 描述: Google的双向预训练模型,在多项NLP任务上取得突破性成果
-
Improving Language Understanding by Generative Pre-Training (GPT-1) (2018)
- 链接: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
- 描述: OpenAI的GPT系列开山之作,提出生成式预训练范式
-
Language Models are Unsupervised Multitask Learners (GPT-2) (2019)
- 链接: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
- 描述: 1.5B参数模型,展示了大规模语言模型的零样本学习能力
-
Language Models are Few-Shot Learners (GPT-3) (2020)
- 链接: https://arxiv.org/abs/2005.14165
- 描述: 175B参数的里程碑模型,开启了大模型时代
-
PaLM: Scaling Language Modeling with Pathways (2022)
- 链接: https://arxiv.org/abs/2204.02311
- 描述: Google的540B参数模型,在推理任务上取得突破
-
UL2: Unifying Language Learning Paradigms (2022)
- 链接: https://arxiv.org/abs/2205.05131
- 描述: 统一多种预训练范式的混合去噪器方法
-
Qwen3 Technical Report (2025)
- 链接: https://arxiv.org/abs/2505.09388
- 描述: 阿里云最新Qwen3系列,支持119种语言,0.6B-235B参数规模
-
DeepSeek-V3 Technical Report (2024)
- 链接: https://arxiv.org/abs/2412.19437
- 描述: 671B总参数的MoE模型,37B激活参数,性能媲美顶级闭源模型
-
DeepSeek-OCR: Contexts Optical Compression (2025)
- 链接: https://arxiv.org/abs/2510.18234
- 描述: 通过2D光学映射实现10倍文本压缩,OCR精度达97%
-
RoFormer: Enhanced Transformer with Rotary Position Embedding (2021)
- 链接: https://arxiv.org/abs/2104.09864
- 描述: 提出旋转位置编码(RoPE),被广泛应用于现代LLM
-
Kimi Linear: An Expressive, Efficient Attention Architecture (2025)
- 链接: https://arxiv.org/abs/2510.26692
- 描述: 混合线性注意力架构,降低KV缓存并提升解码速度
-
Proximal Policy Optimization Algorithms (PPO) (2017)
- 链接: https://arxiv.org/abs/1707.06347
- 描述: 最成功的深度RL算法之一,在LLM训练中广泛应用
-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model (DPO) (2023)
- 链接: https://arxiv.org/abs/2305.18290
- 描述: 无需RL训练循环的对齐方法,简化RLHF流程
-
Group Sequence Policy Optimization (GSPO) (2025)
- 链接: https://arxiv.org/abs/2507.18071
- 描述: 序列级重要性采样的稳定高效RL算法,用于Qwen3训练
-
Group-in-Group Policy Optimization for LLM Agent Training (GiGPO) (2025)
- 链接: https://arxiv.org/abs/2505.10978
- 描述: 实现细粒度信用分配,在智能体任务上超越GRPO 9-12%
-
Learning to summarize from human feedback (2020)
- 链接: https://arxiv.org/abs/2009.01325
- 描述: RLHF的奠基性工作,通过人类偏好训练奖励模型
-
Training language models to follow instructions with human feedback (InstructGPT) (2022)
- 链接: https://arxiv.org/abs/2203.02155
- 描述: OpenAI的InstructGPT,展示RLHF在指令对齐中的威力
-
HybridFlow: A Flexible and Efficient RLHF Framework (2024)
- 链接: https://arxiv.org/abs/2409.19256
- 描述: 混合RLHF框架,实现1.53×吞吐量提升
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale (2025)
- 链接: https://arxiv.org/abs/2503.14476
- 描述: 完全开源的大规模RL系统,在AIME2024上表现优异
-
A Survey of Reinforcement Learning for Large Reasoning Models (2025)
- 链接: https://arxiv.org/abs/2509.08827
- 描述: 综合调查RL在大型推理模型中的应用
-
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey (2025)
- 链接: https://arxiv.org/abs/2509.02547
- 描述: 综合500+篇论文的智能体RL全景调查
-
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (2022)
- 链接: https://arxiv.org/abs/2201.11903
- 描述: 开创性的CoT论文,通过中间推理步骤显著提升LLM推理能力
-
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning (2023)
- 链接: https://arxiv.org/abs/2305.04091
- 描述: 通过先规划后执行改进零样本CoT推理
-
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems (2024)
- 链接: https://arxiv.org/abs/2402.12875
- 描述: 从理论角度证明CoT赋予Transformer序列化计算能力
-
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (2025)
- 链接: https://arxiv.org/abs/2501.12948
- 描述: 通过RL激发推理能力,性能媲美OpenAI o1
-
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (2024)
- 链接: https://arxiv.org/abs/2402.03300
- 描述: 在MATH基准上达51.7%,接近GPT-4数学推理能力
-
A Survey on Large Language Model based Autonomous Agents (2023)
- 链接: https://arxiv.org/abs/2308.11432
- 描述: LLM自主智能体的全面调查,首个发表在FCS的智能体综述
-
Cognitive Architectures for Language Agents (2023)
- 链接: https://arxiv.org/abs/2309.02427
- 描述: 提出CoALA框架,系统化语言智能体的设计
-
A Comprehensive Survey of Self-Evolving AI Agents (2025)
- 链接: https://arxiv.org/abs/2508.07407
- 描述: 自进化AI智能体的系统阐述,定义终身学习智能体范式
-
Executable Code Actions Elicit Better LLM Agents (CodeAct) (2024)
- 链接: https://arxiv.org/abs/2402.01030
- 描述: 使用Python代码统一动作空间,性能提升20%
-
MetaAgent: Automatically Constructing Multi-Agent Systems Based on Finite State Machines (2025)
- 链接: https://arxiv.org/abs/2507.22606
- 描述: 基于FSM的自动化多智能体系统设计框架
-
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation (2025)
- 链接: https://arxiv.org/abs/2508.13167
- 描述: 将多智能体系统蒸馏为单模型,降低84.6%推理成本
-
Trae Agent: An LLM-based Agent for Software Engineering with Test-time Scaling (2025)
- 链接: https://arxiv.org/abs/2507.23370
- 描述: SWE-bench上达70.6%准确率的软件工程智能体
-
Alita: Generalist Agent Enabling Scalable Agentic Reasoning (2025)
- 链接: https://arxiv.org/abs/2505.20286
- 描述: 通用智能体框架,GAIA基准上达87.27% pass@3
-
Building Effective AI Agents (2024)
- 链接: https://www.anthropic.com/engineering/building-effective-agents
- 描述: Anthropic官方智能体设计指南,强调简洁可组合模式
-
How we built our multi-agent research system (OpenAI AgentKit) (2024-2025)
- 链接: https://openai.com/index/introducing-agentkit/
- 描述: OpenAI的多智能体系统和AgentKit工具介绍
-
ToolRL: Reward is All Tool Learning Needs (2025)
- 链接: https://arxiv.org/abs/2504.13958
- 描述: 工具选择和应用的奖励设计综合研究,相比SFT提升17%
-
ToRL: Scaling Tool-Integrated RL (2025)
- 链接: https://arxiv.org/abs/2503.23383
- 描述: 训练LLM自主使用计算工具,AIME2024上达43.3%
-
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs (2025)
- 链接: https://arxiv.org/abs/2504.11536
- 描述: 动态交织代码执行和推理,AIME2024上达72.5%
-
VeriTool: Towards Holistic Agentic Reinforcement Learning with Tool Use (2025)
- 链接: https://arxiv.org/abs/2509.01055
- 描述: 统一框架支持多模态工具(代码、搜索、SQL、视觉)
-
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving (2024)
- 链接: https://arxiv.org/abs/2309.17452
- 描述: ICLR 2024,在MATH上超越WizardMath-70B 22个百分点
-
PAL: Program-aided Language Models (2022)
- 链接: https://arxiv.org/abs/2211.10435
- 描述: 用Python代码作为推理步骤,GSM-hard上超越CoT 40%
-
A Survey on LLM-based Code Generation for Low-Resource and Domain-Specific Programming Languages (2024)
- 链接: https://arxiv.org/abs/2410.03981
- 描述: 覆盖40+编程语言的系统综述,识别6大改进方法
-
A Comparative Study of DSL Code Generation: Fine-Tuning vs. Optimized RAG (2024)
- 链接: https://arxiv.org/abs/2407.02742
- 描述: 发现优化的RAG可匹配微调效果
-
GenCoG: A DSL-Based Approach to Generating Computation Graphs for TVM Testing (2023)
- 链接: https://dl.acm.org/doi/10.1145/3597926.3598105
- 描述: ISSTA 2023,用于TVM编译器测试,发现16个bug
-
Aligning Requirement for Large Language Model's Code Generation (2025)
- 链接: https://arxiv.org/abs/2509.01313
- 描述: Specine方法在竞争编程上提升29.6% Pass@1
-
Guiding LLM-based Smart Contract Generation with Finite State Machine (2025)
- 链接: https://arxiv.org/abs/2505.08542
- 描述: IJCAI 2025,编译成功率提升48%,安全风险降低68%
-
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems (2025)
- 链接: https://arxiv.org/abs/2507.09477
- 描述: 调查推理增强的RAG和RAG增强的推理的协同作用
-
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization (2025)
- 链接: https://arxiv.org/abs/2509.13313
- 描述: 通过周期性总结克服上下文窗口限制,提升4.5-8.2%
-
A Survey of Context Engineering for Large Language Models (2025)
- 链接: https://arxiv.org/abs/2507.13334
- 描述: 165页的上下文工程全面调查,覆盖检索、处理、管理和应用
-
Context Engineering 2.0: The Context of Context Engineering (2025)
- 链接: https://arxiv.org/abs/2510.26493
- 描述: 将上下文工程框架化为20年学科演进,定义四阶段模型
-
Deep Research: A Survey of Autonomous Research Agents (2025)
- 链接: https://arxiv.org/abs/2508.12752
- 描述: 系统调查深度研究管道的四个核心阶段
-
A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications (2025)
- 链接: https://arxiv.org/abs/2506.12594
- 描述: 检查80+深度研究系统,提出四维度分类法
-
Reinforcement Learning Foundations for Deep Research Systems: A Survey (2025)
- 链接: https://arxiv.org/abs/2509.06733
- 描述: 首个专注于深度研究系统RL基础的调查
-
A Comprehensive Survey on Reinforcement Learning-based Agentic Search (2025)
- 链接: https://arxiv.org/abs/2510.16724
- 描述: 首个RL基础智能体搜索的综合调查
-
R1-Searcher: Incentivizing the Search Capability in LLMs via RL (2025)
- 链接: https://arxiv.org/abs/2503.05592
- 描述: 两阶段RL方法,HotpotQA上提升48.2%
-
Search-R1: Training LLMs to Reason and Leverage Search Engines with RL (2025)
- 链接: https://arxiv.org/abs/2503.09516
- 描述: 扩展DeepSeek-R1框架,相比RAG基线提升41%
-
WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines (2025)
- 链接: https://arxiv.org/abs/2509.13312
- 描述: 双智能体框架,在DeepResearch Bench上达SOTA
-
WebSailor-V2: Bridging the Chasm to Proprietary Agents (2025)
- 链接: https://arxiv.org/abs/2509.13305
- 描述: 开源Web代理完整训练管道,通过DUPO算法对标专有代理
-
Tongyi DeepResearch: A New Era of Open-Source AI Researchers (2025)
- 链接: https://arxiv.org/abs/2510.24701
- 描述: 首个开源Web Agent,性能与OpenAI DeepResearch相当
-
A Survey of Text-to-SQL in the Era of LLMs (2024)
- 链接: https://arxiv.org/abs/2408.05109
- 描述: TKDE'25,覆盖模型、数据、评估和错误分析
-
Text-to-Pipeline: Bridging Natural Language and Data Preparation Pipelines (2025)
- 链接: https://arxiv.org/abs/2505.15874
- 描述: 引入18000条管道的PARROT基准
-
StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows (2024)
- 链接: https://arxiv.org/abs/2403.11322
- 描述: 将任务建模为状态机,成功率提升13-28%,成本降低3-5倍
-
AutoFlow: Automated Workflow Generation for LLM Agents (2024)
- 链接: https://arxiv.org/abs/2407.12821
- 描述: 自动化生成工作流,超越手动设计版本
-
An Agentic Flow for Finite State Machine Extraction using Prompt Chaining (2025)
- 链接: https://arxiv.org/abs/2507.11222
- 描述: FlowFSM框架,从协议文档中提取FSM
- GAIA: A Benchmark for General AI Assistants (2023)
- 链接: https://arxiv.org/abs/2311.12983
- 描述: 466个现实问题,人类92% vs GPT-4插件15%
-
QLoRA: Efficient Finetuning of Quantized LLMs (2023)
- 链接: https://arxiv.org/abs/2305.14314
- 描述: 4-bit量化+LoRA,在单48GB GPU上微调65B模型
-
Muon is Scalable for LLM Training (2025)
- 链接: https://arxiv.org/abs/2502.16982
- 描述: 相比AdamW实现约2倍计算效率提升
- Self-supervised Learning of Point Clouds via Orientation Estimation (2020)
- 链接: https://arxiv.org/abs/2008.00305
- 描述: 利用3D点云旋转不变性进行自监督学习
- 总论文数: 70篇
- 时间跨度: 2017-2025年
- arXiv论文: 66篇
- 会议/期刊论文: 4篇
- 2025年: 32篇(46%)
- 2024年: 16篇(23%)
- 2023年: 9篇(13%)
- 2022年及以前: 13篇(18%)
- Attention Is All You Need → BERT → GPT系列 → Chain-of-Thought
- PPO → RLHF → DPO → DeepSeek-R1
- ReAct → CodeAct → Building Effective AI Agents
- Cognitive Architectures → MetaAgent → Chain-of-Agents
- PAL → ToRA → ToolRL → ToRL → ReTool
- Deep Research Survey → WebWeaver → Tongyi DeepResearch → Search-R1
详细的论文阅读笔记请参见 notes 目录。
欢迎提交PR添加新的论文或改进现有笔记!
MIT License
最后更新: 2025-11-15