This repository contains a regularly updated paper list for Efficient Reasoning.
- Keywords Convention
- Papers
- Survey
- Efficient Training
- Latent Chain-of-Thought
- Long-to-Short Chain-of-Thought
- Balanced Chain-of-Thought
- Adaptive Thinking
- Reasoning Shortcuts
- Reasoning Step Decomposition
- Small Reasoning Models & CoT Distillation
- Small & Large Reasoning Model Collaboration
- Speculative Decoding for CoT Efficiency
- Sparse Attention & KV Cache
- Optimal Test-Time Scaling
- Efficient Sampling
- Efficient Self-Consistency
- Long-Context Reasoning Efficiency
- Multimodal Reasoning Efficiency
- Other Work
- Benchmarks
- Analysis
- Blogs
- Talks
- Resources
- Contribution
- Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Yang Sui, Yu-Neng Chuang, Guanchu Wang, Jiamu Zhang, Tianyi Zhang, Jiayi Yuan, Hongyi Liu, Andrew Wen, Shaochen (Henry) Zhong, Hanjie Chen, Xia Hu. [pdf], [paper list], 2025.03. - A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Xiaoye Qu, Yafu Li, Zhaochen Su, Weigao Sun, Jianhao Yan, Dongrui Liu, Ganqu Cui, Daizong Liu, Shuxian Liang, Junxian He, Peng Li, Wei Wei, Jing Shao, Chaochao Lu, Yue Zhang, Xian-Sheng Hua, Bowen Zhou, Yu Cheng. [pdf], [paper list], 2025.03. - Efficient Inference for Large Reasoning Models: A Survey
Yue Liu, Jiaying Wu, Yufei He, Hongcheng Gao, Hongyu Chen, Baolong Bi, Jiaheng Zhang, Zhiqi Huang, Bryan Hooi. [pdf], [paper list], 2025.03. - Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models
Rui Wang, Hongru Wang, Boyang Xue, Jianhui Pang, Shudong Liu, Yi Chen, Jiahao Qiu, Derek Fai Wong, Heng Ji, Kam-Fai Wong. [pdf], [paper list], 2025.03. - Efficient Reasoning Models: A Survey
Sicheng Feng, Gongfan Fang, Xinyin Ma, Xinchao Wang. [pdf], [paper list], 2025.04. - Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning
Xinghao Chen, Anhao Zhao, Heming Xia, Xuan Lu, Hanlin Wang, Yanjun Chen, Wei Zhang, Jian Wang, Wenjie Li, Xiaoyu Shen. [pdf], [paper list], 2025.05.
- s1: Simple test-time scaling
Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, Tatsunori Hashimoto. [pdf], [code], 2025.01. - LIMO: Less is More for Reasoning
Yixin Ye, Zhen Huang, Yang Xiao, Ethan Chern, Shijie Xia, Pengfei Liu. [pdf], [code], 2025.02. - Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond
Liang Wen, Yunke Cai, Fenrui Xiao, Xin He, Qi An, Zhenyu Duan, Yimin Du, Junchen Liu, Lifu Tang, Xiaowei Lv, Haosheng Zou, Yongchao Deng, Shousheng Jia, Xiangzheng Zhang. [pdf], [code], 2025.03. - DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Tiantian Fan, Gaohong Liu, Lingjun Liu, Xin Liu, Haibin Lin, Zhiqi Lin, Bole Ma, Guangming Sheng, Yuxuan Tong, Chi Zhang, Mofan Zhang, Wang Zhang, Hang Zhu, Jinhua Zhu, Jiaze Chen, Jiangjie Chen, Chengyi Wang, Hongli Yu, Weinan Dai, Yuxuan Song, Xiangpeng Wei, Hao Zhou, Jingjing Liu, Wei-Ying Ma, Ya-Qin Zhang, Lin Yan, Mu Qiao, Yonghui Wu, Mingxuan Wang. [pdf], [code], [homepage], 2025.03. - FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models
Mingyang Song, Mao Zheng, Zheng Li, Wenjie Yang, Xuan Luo, Yue Pan, Feng Zhang. [pdf], [code], 2025.03. - Understanding R1-Zero-Like Training: A Critical Perspective
Zichen Liu, Changyu Chen, Wenjun Li, Penghui Qi, Tianyu Pang, Chao Du, Wee Sun Lee, Min Lin. [pdf], [code], 2025.03. - Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training
Brian R. Bartoldson, Siddarth Venkatraman, James Diffenderfer, Moksh Jain, Tal Ben-Nun, Seanie Lee, Minsu Kim, Johan Obando-Ceron, Yoshua Bengio, Bhavya Kailkhura. [pdf], 2025.03. - CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models
Zhihang Lin, Mingbao Lin, Yuan Xie, Rongrong Ji. [pdf], [code], 2025.03. - Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Taiwei Shi, Yiyang Wu, Linxin Song, Tianyi Zhou, Jieyu Zhao. [pdf], [code], 2025.04. - VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Yu Yue, Yufeng Yuan, Qiying Yu, Xiaochen Zuo, Ruofei Zhu, Wenyuan Xu, Jiaze Chen, Chengyi Wang, TianTian Fan, Zhengyin Du, Xiangpeng Wei, Xiangyu Yu, Gaohong Liu, Juncai Liu, Lingjun Liu, Haibin Lin, Zhiqi Lin, Bole Ma, Chi Zhang, Mofan Zhang, Wang Zhang, Hang Zhu, Ru Zhang, Xin Liu, Mingxuan Wang, Yonghui Wu, Lin Yan. [pdf], 2025.04. - Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Kianté Brantley, Mingyu Chen, Zhaolin Gao, Jason D. Lee, Wen Sun, Wenhao Zhan, Xuezhou Zhang. [pdf], [code], 2025.05. - Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Shenzhi Wang, Le Yu, Chang Gao, Chujie Zheng, Shixuan Liu, Rui Lu, Kai Dang, Xionghui Chen, Jianxin Yang, Zhenru Zhang, Yuqiong Liu, An Yang, Andrew Zhao, Yang Yue, Shiji Song, Bowen Yu, Gao Huang, Junyang Lin. [pdf], [homepage], 2025.06. - EPiC: Towards Lossless Speedup for Reasoning Training through Edge-Preserving CoT Condensation
Jinghan Jia, Hadi Reisizadeh, Chongyu Fan, Nathalie Baracaldo, Mingyi Hong, Sijia Liu. [pdf], [code], 2025.06. - SPEED-RL: Faster Training of Reasoning Models via Online Curriculum Learning
Ruiqi Zhang, Daman Arora, Song Mei, Andrea Zanette. [pdf], 2025.05.
- Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning
Xinghao Chen, Anhao Zhao, Heming Xia, Xuan Lu, Hanlin Wang, Yanjun Chen, Wei Zhang, Jian Wang, Wenjie Li, Xiaoyu Shen. [pdf], [paper list], 2025.05. - Think before you speak: Training Language Models With Pause Tokens
Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan. [pdf], 2023.10. - Guiding Language Model Reasoning with Planning Tokens
Xinyi Wang, Lucas Caccia, Oleksiy Ostapenko, Xingdi Yuan, William Yang Wang, Alessandro Sordoni. [pdf], 2023.10. - Implicit Chain of Thought Reasoning via Knowledge Distillation
Yuntian Deng, Kiran Prasad, Roland Fernandez, Paul Smolensky, Vishrav Chaudhary, Stuart Shieber. [pdf], 2023.11. - Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models
Jiacheng Ye, Shansan Gong, Liheng Chen, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Xin Jiang, Zhenguo Li, Wei Bi, Lingpeng Kong. [pdf], 2024.02. - Let's Think Dot by Dot: Hidden Computation in Transformer Language Models
Jacob Pfau, William Merrill, Samuel R. Bowman. [pdf], 2024.04. - From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step
Yuntian Deng, Yejin Choi, Stuart Shieber. [pdf], 2024.05. - Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding
Tianqiao Liu, Zui Chen, Zitao Liu, Mi Tian, Weiqi Luo. [pdf], 2024.09. - Do LLMs Really Think Step-by-step In Implicit Reasoning?
Yijiong Yu. [pdf], 2024.11. - Disentangling Memory and Reasoning Ability in Large Language Models
Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, Yongfeng Zhang. [pdf], [code], 2024.11. - Training Large Language Models to Reason in a Continuous Latent Space
Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, Yuandong Tian. [pdf], [code], 2024.12. - Compressed Chain of Thought: Efficient Reasoning Through Dense Representations
Jeffrey Cheng, Benjamin Van Durme. [pdf], 2024.12. - Efficient Reasoning with Hidden Thinking
Xuan Shen, Yizhou Wang, Xiangxi Shi, Yanzhi Wang, Pu Zhao, Jiuxiang Gu. [pdf], 2025.01. - Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking
Yilong Chen, Junyuan Shang, Zhenyu Zhang, Yanxi Xie, Jiawei Sheng, Tingwen Liu, Shuohuan Wang, Yu Sun, Hua Wu, Haifeng Wang. [pdf], 2025.02. - LightThinker: Thinking Step-by-Step Compression
Jintian Zhang, Yuqi Zhu, Mengshu Sun, Yujie Luo, Shuofei Qiao, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang. [pdf], [code], 2025.02. - Reasoning with Latent Thoughts: On the Power of Looped Transformers
Nikunj Saunshi, Nishanth Dikkala, Zhiyuan Li, Sanjiv Kumar, Sashank J. Reddi. [pdf], 2025.02. - CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation
Zhenyi Shen, Hanqi Yan, Linhai Zhang, Zhanghao Hu, Yali Du, Yulan He. [pdf], 2025.02. - Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Tom Goldstein. [pdf], [code], 2025.02. - LLM Pretraining with Continuous Concepts
Jihoon Tack, Jack Lanchantin, Jane Yu, Andrew Cohen, Ilia Kulikov, Janice Lan, Shibo Hao, Yuandong Tian, Jason Weston, Xian Li. [pdf], [code], 2025.02. - Scalable Language Models with Posterior Inference of Latent Thought Vectors
Deqian Kong, Minglu Zhao, Dehong Xu, Bo Pang, Shu Wang, Edouardo Honig, Zhangzhang Si, Chuan Li, Jianwen Xie, Sirui Xie, Ying Nian Wu. [pdf], 2025.02. - Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning
Qifan Yu, Zhenyu He, Sijie Li, Xun Zhou, Jun Zhang, Jingjing Xu, Di He. [pdf], [code], 2025.02. - Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
DiJia Su, Hanlin Zhu, Yingchen Xu, Jiantao Jiao, Yuandong Tian, Qinqing Zheng. [pdf], 2025.02. - Implicit Reasoning in Transformers is Reasoning through Shortcuts
Tianhe Lin, Jian Xie, Siyu Yuan, Deqing Yang. [pdf], 2025.03. - Reasoning to Learn from Latent Thoughts
Yangjun Ruan, Neil Band, Chris J. Maddison, Tatsunori Hashimoto. [pdf], 2025.03. - Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation
Jiakai Tang, Sunhao Dai, Teng Shi, Jun Xu, Xu Chen, Wen Chen, Wu Jian, Yuning Jiang. [pdf], 2025.03. - Efficient Pretraining Length Scaling
Bohong Wu, Shen Yan, Sijun Zhang, Jianqiao Lu, Yutao Zeng, Ya Wang, Xun Zhou. [pdf], 2025.04. - Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space
Zhen Zhang, Xuehai He, Weixiang Yan, Ao Shen, Chenyang Zhao, Shuohang Wang, Yelong Shen, Xin Eric Wang. [pdf], [code], 2025.05. - Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
Wenhui Tan, Jiaze Li, Jianzhong Ju, Zhenbo Luo, Jian Luan, Ruihua Song. [pdf], [homepage], 2025.05. - Efficient Post-Training Refinement of Latent Reasoning in Large Language Models
Xinyuan Wang, Dongjie Wang, Wangyang Ying, Haoyue Bai, Nanxu Gong, Sixun Dong, Kunpeng Liu, Yanjie Fu. [pdf], 2025.06.
- Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models
Hanxu Hu, Hongyuan Lu, Huajian Zhang, Yun-Ze Song, Wai Lam, Yue Zhang. [pdf], [code], 2023.05. - The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models
Matthew Renze, Erhan Guven. [pdf], [code], 2024.01. - C3oT: Generating Shorter Chain-of-Thought without Compromising Effectiveness
Yu Kang, Xianghui Sun, Liangyu Chen, Wei Zou. [pdf], 2024.12. - Token-Budget-Aware LLM Reasoning
Tingxu Han, Zhenting Wang, Chunrong Fang, Shiyu Zhao, Shiqing Ma, Zhenyu Chen. [pdf], [code], 2024.12. - CoT-Valve: Length-Compressible Chain-of-Thought Tuning
Xinyin Ma, Guangnian Wan, Runpeng Yu, Gongfan Fang, Xinchao Wang. [pdf], [code], 2025.02. - TokenSkip: Controllable Chain-of-Thought Compression in LLMs
Heming Xia, Yongqi Li, Chak Tou Leong, Wenjie Wang, Wenjie Li. [pdf], [code], 2025.02. - Self-Training Elicits Concise Reasoning in Large Language Models
Tergel Munkhbat, Namgyu Ho, Seo Hyun Kim, Yongjin Yang, Yujin Kim, Se-Young Yun. [pdf], [code], 2025.02. - Chain of Draft: Thinking Faster by Writing Less
Silei Xu, Wenhao Xie, Lingxiao Zhao, Pengcheng He. [pdf], [code], 2025.02. - L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
Pranjal Aggarwal, Sean Welleck. [pdf], [code], [homepage], 2025.03. - How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach
Ayeong Lee, Ethan Che, Tianyi Peng. [pdf], [code], 2025.03. - Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Simon A. Aytes, Jinheon Baek, Sung Ju Hwang. [pdf], [code], 2025.03. - Adaptive Group Policy Optimization: Towards Stable Training and Token-Efficient Reasoning
Chen Li, Nazhou Liu, Kai Yang. [pdf], 2025.03. - Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging
Han Wu, Yuxuan Yao, Shuqi Liu, Zehua Liu, Xiaojin Fu, Xiongwei Han, Xing Li, Hui-Ling Zhen, Tao Zhong, Mingxuan Yuan. [pdf], [code], 2025.03. - Think When You Need: Self-Adaptive Chain-of-Thought Learning
Junjie Yang, Ke Lin, Xing Yu. [pdf], 2025.04. - ThinkPrune: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning
Bairu Hou, Yang Zhang, Jiabao Ji, Yujian Liu, Kaizhi Qian, Jacob Andreas, Shiyu Chang. [pdf], [code], 2025.04. - Reasoning Models Can Be Effective Without Thinking
Wenjie Ma, Jingxuan He, Charlie Snell, Tyler Griggs, Sewon Min, Matei Zaharia. [pdf], 2025.04. - ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning
Jingyang Yi, Jiazheng Wang. [pdf], 2025.04. - Dynamic Early Exit in Reasoning Models
Chenxu Yang, Qingyi Si, Yongjie Duan, Zheliang Zhu, Chenyu Zhu, Zheng Lin, Li Cao, Weiping Wang. [pdf], 2025.04. - AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization
Haotian Luo, Haiying He, Yibo Wang, Jinluan Yang, Rui Liu, Naiqiang Tan, Xiaochun Cao, Dacheng Tao, Li Shen. [pdf], [code], 2025.04. - Concise Reasoning via Reinforcement Learning
Mehdi Fatemi, Banafsheh Rafiee, Mingjie Tang, Kartik Talamadupula. [pdf], 2025.04. - Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models
Bin Yu, Hang Yuan, Yuliang Wei, Bailing Wang, Weizhen Qi, Kai Chen. [pdf], [code], 2025.05. - ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning
Ziqing Qiao, Yongheng Deng, Jiali Zeng, Dong Wang, Lai Wei, Fandong Meng, Jie Zhou, Ju Ren, Yaoxue Zhang. [pdf], 2025.05. - Scalable Chain of Thoughts via Elastic Reasoning
Yuhui Xu, Hanze Dong, Lei Wang, Doyen Sahoo, Junnan Li, Caiming Xiong. [pdf], 2025.05. - S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models
Muzhi Dai, Chenxu Yang, Qingyi Si. [pdf], 2025.05. - Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement
Xuechen Zhang, Zijian Huang, Chenchun Ni, Ziyang Xiong, Jiasi Chen, Samet Oymak. [pdf], 2025.05. - Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping
Ren Zhuang, Ben Wang, Shuifa Sun. [pdf], 2025.05. - SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning
Zheng Li, Qingxiu Dong, Jingyuan Ma, Di Zhang, Zhifang Sui. [pdf], 2025.05. - Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning
Yansong Ning, Wei Li, Jun Fang, Naiqiang Tan, Hao Liu. [pdf], [code], 2025.05. - Fractured Chain-of-Thought Reasoning
Baohao Liao, Hanze Dong, Yuhui Xu, Doyen Sahoo, Christof Monz, Junnan Li, Caiming Xiong. [pdf], 2025.05. - Efficient RL Training for Reasoning Models via Length-Aware Optimization
Danlong Yuan, Tian Xie, Shaohan Huang, Zhuocheng Gong, Huishuai Zhang, Chong Luo, Furu Wei, Dongyan Zhao. [pdf], 2025.05. - Can Pruning Improve Reasoning? Revisiting Long-CoT Compression with Capability in Mind for Better Reasoning
Shangziqi Zhao, Jiahao Yuan, Guisong Yang, Usman Naseem. [pdf], 2025.05. - DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models
Yuxuan Jiang, Dawei Li, Frank Ferraro. [pdf], 2025.05. - FlashThink: An Early Exit Method For Efficient Reasoning
Guochao Jiang, Guofeng Quan, Zepeng Ding, Ziqin Luo, Dixuan Wang, Zheng Hu. [pdf], 2025.05. - Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Penghui Qi, Zichen Liu, Tianyu Pang, Chao Du, Wee Sun Lee, Min Lin. [pdf], [code], 2025.05. - VeriThinker: Learning to Verify Makes Reasoning Model Efficient
Zigeng Chen, Xinyin Ma, Gongfan Fang, Ruonan Yu, Xinchao Wang. [pdf], [code], 2025.05. - Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning
Jiwon Song, Dongwon Jo, Yulhwa Kim, Jae-Joon Kim. [pdf], [code], 2025.05. - ThinkLess: A Training-Free Inference-Efficient Method for Reducing Reasoning Redundancy
Gengyang Li, Yifeng Gao, Yuming Li, Yunfang Wu. [pdf], 2025.05. - Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
Wei Liu, Ruochen Zhou, Yiyun Deng, Yuzhen Huang, Junteng Liu, Yuntian Deng, Yizhe Zhang, Junxian He. [pdf], [code], 2025.05. - R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search
Yibo Wang, Li Shen, Huanjin Yao, Tiansheng Huang, Rui Liu, Naiqiang Tan, Jiaxing Huang, Kai Zhang, Dacheng Tao. [pdf], [code], 2025.05. - Incentivizing Dual Process Thinking for Efficient Large Language Model Reasoning
Xiaoxue Cheng, Junyi Li, Zhenduo Zhang, Xinyu Tang, Wayne Xin Zhao, Xinyu Kong, Zhiqiang Zhang. [pdf], 2025.05. - Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning
Junhong Lin, Xinyue Zeng, Jie Zhu, Song Wang, Julian Shun, Jun Wu, Dawei Zhou. [pdf], 2025.05. - ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning Models
Razvan-Gabriel Dumitru, Darius Peteleaza, Vikas Yadav, Liangming Pan. [pdf], [code], 2025.05. - TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling
Weizhe Lin, Xing Li, Zhiyuan Yang, Xiaojin Fu, Hui-Ling Zhen, Yaoyuan Wang, Xianzhi Yu, Wulong Liu, Xiaosong Li, Mingxuan Yuan. [pdf], 2025.05. - Not All Tokens Are What You Need In Thinking
Hang Yuan, Bin Yu, Haotian Li, Shijun Yang, Christina Dan Wang, Zhou Yu, Xueyin Xu, Weizhen Qi, Kai Chen. [pdf], [code], 2025.05. - LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling
Yang Xiao, Jiashuo Wang, Ruifeng Yuan, Chunpu Xu, Kaishuai Xu, Wenjie Li, Pengfei Liu. [pdf], [code], 2025.05. - Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning
Mingyang Song, Mao Zheng. [pdf], [code], 2025.05. - CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models
Siqi Fan, Peng Han, Shuo Shang, Yequan Wang, Aixin Sun. [pdf], 2025.05. - Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models
Sohyun An, Ruochen Wang, Tianyi Zhou, Cho-Jui Hsieh. [pdf], 2025.05. - A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings
Xiaoang Xu, Shuo Wang, Xu Han, Zhenghao Liu, Huijia Wu, Peipei Li, Zhiyuan Liu, Maosong Sun, Zhaofeng He. [pdf], [code], 2025.05. - TL;DR: Too Long, Do Re-weighting for Efficient LLM Reasoning Compression
Zhong-Zhi Li, Xiao Liang, Zihao Tang, Lei Ji, Peijie Wang, Haotian Xu, Xing W, Haizhen Huang, Weiwei Deng, Ying Nian Wu, Yeyun Gong, Zhijiang Guo, Xiao Liu, Fei Yin, Cheng-Lin Liu. [pdf], [code], 2025.06. - Answer Convergence as a Signal for Early Stopping in Reasoning
Xin Liu, Lu Wang. [pdf], 2025.06. - How Far Are We from Optimal Reasoning Efficiency?
Jiaxuan Gao, Shu Yan, Qixin Tan, Lu Yang, Shusheng Xu, Wei Fu, Zhiyu Mei, Kaifeng Lyu, Yi Wu. [pdf], [code], 2025.06. - Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs
Roy Eisenstadt, Itamar Zimerman, Lior Wolf. [pdf], [homepage], [code], 2025.06. - Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning
Hanbing Liu, Lang Cao, Yuanyi Ren, Mengyu Zhou, Haoyu Dong, Xiaojun Ma, Shi Han, Dongmei Zhang. [pdf], 2025.06. - Brevity is the soul of sustainability: Characterizing LLM response lengths
Soham Poddar, Paramita Koley, Janardan Misra, Sanjay Podder, Navveen Balani, Niloy Ganguly, Saptarshi Ghosh. [pdf], 2025.06. - Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency
Chenlong Wang, Yuanning Feng, Dongping Chen, Zhaoyang Chu, Ranjay Krishna, Tianyi Zhou. [pdf], 2025.06. - Causal Sufficiency and Necessity Improves Chain-of-Thought Reasoning
Xiangning Yu, Zhuohan Wang, Linyi Yang, Haoxuan Li, Anjie Liu, Xiao Xue, Jun Wang, Mengyue Yang. [pdf], 2025.06. - PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models
Ye Yu, Yaoning Yu, Haohan Wang. [pdf], 2025.06. - ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference Optimization
Zhensheng Jin, Xinze Li, Yifan Ji, Chunyi Peng, Zhenghao Liu, Qi Shi, Yukun Yan, Shuo Wang, Furong Peng, Ge Yu. [pdf], [code], 2025.06. - Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty
Zehui Ling, Deshu Chen, Hongwei Zhang, Yifeng Jiao, Xin Guo, Yuan Cheng. [pdf], 2025.06.
Balanced CoT allocates more compute to hard questions, reduces compute for simpler ones.
- Efficiently Serving LLM Reasoning Programs with Certaindex
Yichao Fu, Junda Chen, Siqi Zhu, Zheyu Fu, Zhongdongming Dai, Aurick Qiao, Hao Zhang. [pdf], 2024.12. - O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
Haotian Luo, Li Shen, Haiying He, Yibo Wang, Shiwei Liu, Wei Li, Naiqiang Tan, Xiaochun Cao, Dacheng Tao. [pdf], [code], 2025.01. - Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi Team. [pdf], 2025.01. - Training Language Models to Reason Efficiently
Daman Arora, Andrea Zanette. [pdf], [code], [homepage], 2025.02. - Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models
Yuan Sui, Yufei He, Tri Cao, Simeng Han, Bryan Hooi. [pdf], 2025.02. - DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models
Yi Shen, Jian Zhang, Jieyun Huang, Shuming Shi, Wenjing Zhang, Jiangze Yan, Ning Wang, Kai Wang, Shiguo Lian. [pdf], 2025.03. - Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning
Violet Xiang, Chase Blagden, Rafael Rafailov, Nathan Lile, Sang Truong, Chelsea Finn, Nick Haber. [pdf], 2025.06.
- Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
Songjun Tu, Jiahao Lin, Qichao Zhang, Xiangyu Tian, Linjing Li, Xiangyuan Lan, Dongbin Zhao. [pdf], [code], 2025.05. - AdaptThink: Reasoning Models Can Learn When to Think
Jiajie Zhang, Nianyi Lin, Lei Hou, Ling Feng, Juanzi Li. [pdf], [code], 2025.05. - Thinkless: LLM Learns When to Think
Gongfan Fang, Xinyin Ma, Xinchao Wang. [pdf], [code], 2025.05. - Think Only When You Need with Large Hybrid-Reasoning Models
Lingjie Jiang, Xun Wu, Shaohan Huang, Qingxiu Dong, Zewen Chi, Li Dong, Xingxing Zhang, Tengchao Lv, Lei Cui, Furu Wei. [pdf], 2025.05. - ThinkSwitcher: When to Think Hard, When to Think Fast
Guosheng Liang, Longguang Zhong, Ziyi Yang, Xiaojun Quan. [pdf], 2025.05. - Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning
Jinghui Lu, Haiyang Yu, Siliang Xu, Shiwei Ran, Guozhi Tang, Siqi Wang, Bin Shan, Teng Fu, Hao Feng, Jingqun Tang, Han Wang, Can Huang. [pdf], 2025.05. - When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning
Xiaoyun Zhang, Jingqing Ruan, Xing Ma, Yawen Zhu, Haodong Zhao, Hao Li, Jiansong Chen, Ke Zeng, Xunliang Cai. [pdf], 2025.05. - AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning
Chenwei Lou, Zewei Sun, Xinnian Liang, Meng Qu, Wei Shen, Wenqi Wang, Yuntao Li, Qingping Yang, Shuangzhi Wu. [pdf], 2025.05. - AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models
Feng Luo, Yu-Neng Chuang, Guanchu Wang, Hoang Anh Duy Le, Shaochen Zhong, Hongyi Liu, Jiayi Yuan, Yang Sui, Vladimir Braverman, Vipin Chaudhary, Xia Hu. [pdf], 2025.05. - AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Junyu Zhang, Runpei Dong, Han Wang, Xuying Ning, Haoran Geng, Peihao Li, Xialin He, Yutong Bai, Jitendra Malik, Saurabh Gupta, Huan Zhang. [pdf], [homepage], [code], 2025.05. - OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
Shengjia Zhang, Junjie Wu, Jiawei Chen, Changwang Zhang, Xingyu Lou, Wangchunshu Zhou, Sheng Zhou, Can Wang, Jun Wang. [pdf], [code], 2025.05. - Long or short CoT? Investigating Instance-level Switch of Large Reasoning Models
Ruiqi Zhang, Changyi Xiao, Yixin Cao. [pdf], 2025.06. - Token Signature: Predicting Chain-of-Thought Gains with Token Decoding Feature in Large Language Models
Peijie Liu, Fengli Xu, Yong Li. [pdf], [code], 2025.06.
- Not All Neuro-Symbolic Concepts Are Created Equal: Analysis and Mitigation of Reasoning Shortcuts
Emanuele Marconato, Stefano Teso, Antonio Vergari, Andrea Passerini. [pdf], 2023.05. - Break the Chain: Large Language Models Can be Shortcut Reasoners
Mengru Ding, Hanmeng Liu, Zhizhang Fu, Jian Song, Wenbo Xie, Yue Zhang. [pdf], 2024.06. - Can Language Models Learn to Skip Steps?
Tengxiao Liu, Qipeng Guo, Xiangkun Hu, Cheng Jiayang, Yue Zhang, Xipeng Qiu, Zheng Zhang. [pdf], [code], 2024.11. - TokenSkip: Controllable Chain-of-Thought Compression in LLMs
Heming Xia, Yongqi Li, Chak Tou Leong, Wenjie Wang, Wenjie Li. [pdf], [code], 2025.02. - Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language Models
Yingqian Cui, Pengfei He, Jingying Zeng, Hui Liu, Xianfeng Tang, Zhenwei Dai, Yan Han, Chen Luo, Jing Huang, Zhen Li, Suhang Wang, Yue Xing, Jiliang Tang, Qi He. [pdf], 2025.02. - Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping
Ren Zhuang, Ben Wang, Shuifa Sun. [pdf], 2025.05. - DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models
Yuxuan Jiang, Dawei Li, Frank Ferraro. [pdf], 2025.05. - R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search
Yibo Wang, Li Shen, Huanjin Yao, Tiansheng Huang, Rui Liu, Naiqiang Tan, Jiaxing Huang, Kai Zhang, Dacheng Tao. [pdf], [code], 2025.05. - Not All Tokens Are What You Need In Thinking
Hang Yuan, Bin Yu, Haotian Li, Shijun Yang, Christina Dan Wang, Zhou Yu, Xueyin Xu, Weizhen Qi, Kai Chen. [pdf], [code], 2025.05. - LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling
Yang Xiao, Jiashuo Wang, Ruifeng Yuan, Chunpu Xu, Kaishuai Xu, Wenjie Li, Pengfei Liu. [pdf], [code], 2025.05.
- Markov Chain of Thought for Efficient Mathematical Reasoning
Wen Yang, Minpeng Liao, Kai Fan. [pdf], [code], 2024.10. - Atom of Thoughts for Markov LLM Test-Time Scaling
Fengwei Teng, Zhaoyang Yu, Quan Shi, Jiayi Zhang, Chenglin Wu, Yuyu Luo. [pdf], [code], 2025.02. - DISC: Dynamic Decomposition Improves LLM Inference Scaling
Jonathan Light, Wei Cheng, Wu Yue, Masafumi Oyamada, Mengdi Wang, Santiago Paternain, Haifeng Chen. [pdf], 2025.02. - Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?
Kun Xiang, Zhili Liu, Zihao Jiang, Yunshuang Nie, Kaixin Cai, Yiyang Yin, Runhui Huang, Haoxiang Fan, Hanhui Li, Weiran Huang, Yihan Zeng, Yu-Jie Yuan, Jianhua Han, Lanqing Hong, Hang Xu, Xiaodan Liang. [pdf], [code], 2025.03. - From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models
Jinyi Liu, Yan Zheng, Rong Cheng, Qiyu Wu, Wei Guo, Fei Ni, Hebin Liang, Yifu Yuan, Hangyu Mao, Fuzheng Zhang, Jianye Hao. [pdf], 2025.03.
- Teaching Small Language Models to Reason
Lucie Charlotte Magister, Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn. [pdf], 2022.12. - Mixed Distillation Helps Smaller Language Model Better Reasoning
Chenglin Li, Qianglong Chen, Liangyue Li, Caiyu Wang, Yicheng Li, Zulong Chen, Yin Zhang. [pdf], 2023.12. - Small Language Models Need Strong Verifiers to Self-Correct Reasoning
Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim, Moontae Lee, Honglak Lee, Lu Wang. [pdf], [code], 2024.04. - Distilling Reasoning Ability from Large Language Models with Adaptive Thinking
Xiaoshu Chen, Sihang Zhou, Ke Liang, Xinwang Liu. [pdf], 2024.04. - Teaching Small Language Models Reasoning through Counterfactual Distillation
Tao Feng, Yicheng Li, Li Chenglin, Hao Chen, Fei Yu, Yin Zhang. [pdf], 2024.11. - Improving Mathematical Reasoning Capabilities of Small Language Models via Feedback-Driven Distillation
Xunyu Zhu, Jian Li, Can Ma, Weiping Wang. [pdf], 2024.11. - Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners
Daniele Paliotta, Junxiong Wang, Matteo Pagliardini, Kevin Y. Li, Aviv Bick, J. Zico Kolter, Albert Gu, François Fleuret, Tri Dao. [pdf], 2025.02. - Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning
Xinghao Chen, Zhijing Sun, Wenjin Guo, Miaoran Zhang, Yanjun Chen, Yirong Sun, Hui Su, Yijie Pan, Dietrich Klakow, Wenjie Li, Xiaoyu Shen. [pdf], [code], 2025.02. - Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Runze Liu, Junqi Gao, Jian Zhao, Kaiyan Zhang, Xiu Li, Biqing Qi, Wanli Ouyang, Bowen Zhou. [pdf], [code], [homepage], 2025.02. - Small Models Struggle to Learn from Strong Reasoners
Yuetai Li, Xiang Yue, Zhangchen Xu, Fengqing Jiang, Luyao Niu, Bill Yuchen Lin, Bhaskar Ramasubramanian, Radha Poovendran. [pdf], [code], [homepage], 2025.02. - Towards Reasoning Ability of Small Language Models
Gaurav Srivastava, Shuxiang Cao, Xuan Wang. [pdf], 2025.02. - Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation
Yijia Luo, Yulin Song, Xingyao Zhang, Jiaheng Liu, Weixun Wang, GengRu Chen, Wenbo Su, Bo Zheng. [pdf], [code], 2025.03. - SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Weihao Zeng, Yuzhen Huang, Qian Liu, Wei Liu, Keqing He, Zejun Ma, Junxian He. [pdf], [code], 2025.03. - Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
Quy-Anh Dang, Chris Ngo. [pdf], [code], 2025.03. - TwT: Thinking without Tokens by Habitual Reasoning Distillation with Multi-Teachers' Guidance
Jingxian Xu, Mengyu Zhou, Weichang Liu, Hanbing Liu, Shi Han, Dongmei Zhang. [pdf], 2025.03. - When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks
Nan Zhang, Yusen Zhang, Prasenjit Mitra, Rui Zhang. [pdf], 2025.04. - A Short Survey on Small Reasoning Models: Training, Inference, Applications and Research Directions
Chengyu Wang, Taolin Zhang, Richang Hong, Jun Huang. [pdf], 2025.04. - Tina: Tiny Reasoning Models via LoRA
Shangshang Wang, Julian Asilis, Ă–mer Faruk AkgĂĽl, Enes Burak Bilgin, Ollie Liu, Willie Neiswanger. [pdf], [code], 2025.04.
- Hawkeye: Efficient Reasoning with Model Collaboration
Jianshu She, Zhuohao Li, Zhemin Huang, Qi Li, Peiran Xu, Haonan Li, Qirong Ho. [pdf], 2025.04. - Guiding Reasoning in Small Language Models with LLM Assistance
Yujin Kim, Euiin Yi, Minu Kim, Se-Young Yun, Taehyeon Kim. [pdf], [code], 2025.04. - Thought Manipulation: External Thought Can Be Efficient for Large Reasoning Models
Yule Liu, Jingyi Zheng, Zhen Sun, Zifan Peng, Wenhan Dong, Zeyang Sha, Shiwen Cui, Weiqiang Wang, Xinlei He. [pdf], 2025.04. - SplitReason: Learning To Offload Reasoning
Yash Akhauri, Anthony Fei, Chi-Chih Chang, Ahmed F. AbouElhamayed, Yueying Li, Mohamed S. Abdelfattah. [pdf], 2025.04. - ProxyThinker: Test-Time Guidance through Small Visual Reasoners
Zilin Xiao, Jaywon Koo, Siru Ouyang, Jefferson Hernandez, Yu Meng, Vicente Ordonez. [pdf], [code], 2025.05. - What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding
Ming Li, Zhengyuan Yang, Xiyao Wang, Dianqi Li, Kevin Lin, Tianyi Zhou, Lijuan Wang. [pdf], 2025.06.
- Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Baohao Liao, Yuhui Xu, Hanze Dong, Junnan Li, Christof Monz, Silvio Savarese, Doyen Sahoo, Caiming Xiong. [pdf], [code], 2025.01. - SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning
Rui Pan, Yinwei Dai, Zhihao Zhang, Gabriele Oliaro, Zhihao Jia, Ravi Netravali. [pdf], [code], 2025.04. - Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time
Wang Yang, Xiang Yue, Vipin Chaudhary, Xiaotian Han. [pdf], [code], 2025.04. - Efficient Reasoning for LLMs through Speculative Chain-of-Thought
Jikai Wang, Juntao Li, Lijun Wu, Min Zhang. [pdf], [code], 2025.04. - Accelerating Large Language Model Reasoning via Speculative Search
Zhihai Wang, Jie Wang, Jilai Pan, Xilin Xia, Huiling Zhen, Mingxuan Yuan, Jianye Hao, Feng Wu. [pdf], 2025.05. - Accelerated Test-Time Scaling with Model-Free Speculative Sampling
Woomin Song, Saket Dingliwal, Sai Muralidhar Jayanthi, Bhavana Ganesh, Jinwoo Shin, Aram Galstyan, Sravan Babu Bodapati. [pdf], 2025.06.
- R-KV: Redundancy-aware KV Cache Compression for Training-Free Reasoning Models Acceleration
Zefan Cai, Wen Xiao, Hanshi Sun, Cheng Luo, Yikai Zhang, Ke Wan, Yucheng Li, Yeyang Zhou, Li-Wen Chang, Jiuxiang Gu, Zhen Dong, Anima Anandkumar, Abedelkadir Asi, Junjie Hu. [pdf], 2025.06. - SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
Yizhao Gao, Shuming Guo, Shijie Cao, Yuqing Xia, Yu Cheng, Lei Wang, Lingxiao Ma, Yutao Sun, Tianzhu Ye, Li Dong, Hayden Kwok-Hay So, Yu Hua, Ting Cao, Fan Yang, Mao Yang. [pdf], 2025.06.
- Scaling LLM Test-Time Compute Optimally Can be More Effective than Scaling Parameters for Reasoning
Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar. [pdf], 2024.08. - Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models
Yangzhen Wu, Zhiqing Sun, Shanda Li, Sean Welleck, Yiming Yang. [pdf], [code], [homepage], 2024.08. - Scaling Test-Time Compute Without Verification or RL is Suboptimal
Amrith Setlur, Nived Rajaraman, Sergey Levine, Aviral Kumar. [pdf], 2025.02. - Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?
Zhiyuan Zeng, Qinyuan Cheng, Zhangyue Yin, Yunhua Zhou, Xipeng Qiu. [pdf], [code], 2025.02. - Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
Wenkai Yang, Shuming Ma, Yankai Lin, Furu Wei. [pdf], 2025.02. - Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Yuxiao Qu, Matthew Y. R. Yang, Amrith Setlur, Lewis Tunstall, Edward Emanuel Beeching, Ruslan Salakhutdinov, Aviral Kumar. [pdf], [code], [homepage], 2025.03. - Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment
Audrey Huang, Adam Block, Qinghua Liu, Nan Jiang, Dylan J. Foster, Akshay Krishnamurthy. [pdf], 2025.03. - What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models
Qiyuan Zhang, Fuyuan Lyu, Zexu Sun, Lei Wang, Weixu Zhang, Zhihan Guo, Yufei Wang, Irwin King, Xue Liu, Chen Ma. [pdf], 2025.03. - When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning
Nishad Singhi, Hritik Bansal, Arian Hosseini, Aditya Grover, Kai-Wei Chang, Marcus Rohrbach, Anna Rohrbach. [pdf], [code], 2025.04. - Z1: Efficient Test-time Scaling with Code
Zhaojian Yu, Yinghao Wu, Yilun Zhao, Arman Cohan, Xiao-Ping Zhang. [pdf], [code], 2025.04. - Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods
Junlin Wang, Shang Zhu, Jon Saad-Falcon, Ben Athiwaratkun, Qingyang Wu, Jue Wang, Shuaiwen Leon Song, Ce Zhang, Bhuwan Dhingra, James Zou. [pdf], 2025.04. - Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers
Kusha Sareen, Morgane M Moss, Alessandro Sordoni, Rishabh Agarwal, Arian Hosseini. [pdf], 2025.05. - Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence
Amirhosein Ghasemabadi, Keith G. Mills, Baochun Li, Di Niu. [pdf], [code], 2025.05. - Value-Guided Search for Efficient Chain-of-Thought Reasoning
Kaiwen Wang, Jin Peng Zhou, Jonathan Chang, Zhaolin Gao, Nathan Kallus, Kianté Brantley, Wen Sun. [pdf], [code], 2025.05. - Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning
Michael Hassid, Gabriel Synnaeve, Yossi Adi, Roy Schwartz. [pdf], 2025.05. - First Finish Search: Efficient Test-Time Scaling in Large Language Models
Aradhye Agarwal, Ayan Sengupta, Tanmoy Chakraborty. [pdf], 2025.05. - LLM-First Search: Self-Guided Exploration of the Solution Space
Nathan Herr, Tim Rocktäschel, Roberta Raileanu. [pdf], [code], 2025.06.
- Fast Best-of-N Decoding via Speculative Rejection
Hanshi Sun, Momin Haider, Ruiqi Zhang, Huitao Yang, Jiahao Qiu, Ming Yin, Mengdi Wang, Peter Bartlett, Andrea Zanette. [pdf], [code], 2024.10. - Non-myopic Generation of Language Models for Reasoning and Planning
Chang Ma, Haiteng Zhao, Junlei Zhang, Junxian He, Lingpeng Kong. [pdf], [code], 2024.10. - FastMCTS: A Simple Sampling Strategy for Data Synthesis
Peiji Li, Kai Lv, Yunfan Shao, Yichuan Ma, Linyang Li, Xiaoqing Zheng, Xipeng Qiu, Qipeng Guo. [pdf], 2025.02. - Dynamic Parallel Tree Search for Efficient LLM Reasoning
Yifu Ding, Wentao Jiang, Shunyu Liu, Yongcheng Jing, Jinyang Guo, Yingjie Wang, Jing Zhang, Zengmao Wang, Ziwei Liu, Bo Du, Xianglong Liu, Dacheng Tao. [pdf], 2025.02. - Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls
Ante Wang, Linfeng Song, Ye Tian, Dian Yu, Haitao Mi, Xiangyu Duan, Zhaopeng Tu, Jinsong Su, Dong Yu. [pdf], 2025.02. - Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding
Yiming Wang, Pei Zhang, Siyuan Huang, Baosong Yang, Zhuosheng Zhang, Fei Huang, Rui Wang. [pdf], 2025.03. - Language Models can Self-Improve at State-Value Estimation for Better Search
Ethan Mendes, Alan Ritter. [pdf], 2025.03. - Ď•-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation
Fangzhi Xu, Hang Yan, Chang Ma, Haiteng Zhao, Jun Liu, Qika Lin, Zhiyong Wu. [pdf], [code], 2025.03.
- Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning
Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Xinglin Wang, Bin Sun, Heda Wang, Kan Li. [pdf], [code], 2024.01. - Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning
Xinglin Wang, Shaoxiong Feng, Yiwei Li, Peiwen Yuan, Yueqi Zhang, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li. [pdf], [code], 2024.08. - Path-Consistency: Prefix Enhancement for Efficient Inference in LLM
Jiace Zhu, Yingtao Shen, Jie Zhao, An Zou. [pdf], 2024.08. - Reasoning Aware Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling
Guangya Wan, Yuqi Wu, Jie Chen, Sheng Li. [pdf], [code], 2024.08. - Efficient Test-Time Scaling via Self-Calibration
Chengsong Huang, Langlin Huang, Jixuan Leng, Jiacheng Liu, Jiaxin Huang. [pdf], [code], 2025.02. - Confidence Improves Self-Consistency in LLMs
Amir Taubenfeld, Tom Sheffer, Eran Ofek, Amir Feder, Ariel Goldstein, Zorik Gekhman, Gal Yona. [pdf], 2025.02. - Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning
Zhi Zhou, Tan Yuhao, Zenan Li, Yuan Yao, Lan-Zhe Guo, Xiaoxing Ma, Yu-Feng Li. [pdf], 2025.02.
- OmniKV: Dynamic Context Selection for Efficient Long-Context LLMs
Jitai Hao, Yuke Zhu, Tian Wang, Jun Yu, Xin Xin, Bo Zheng, Zhaochun Ren, Sheng Guo. [pdf], 2024.10. - InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
Yuchen Yan, Yongliang Shen, Yang Liu, Jin Jiang, Mengdi Zhang, Jian Shao, Yueting Zhuang. [pdf], 2025.03.
- PixelThink: Towards Efficient Chain-of-Pixel Reasoning
Song Wang, Gongfan Fang, Lingdong Kong, Xiangtai Li, Jianyun Xu, Sheng Yang, Qiang Li, Jianke Zhu, Xinchao Wang. [pdf], [code], [homepage], 2025.05.
- Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding
Xuefei Ning, Zinan Lin, Zixuan Zhou, Zifu Wang, Huazhong Yang, Yu Wang. [pdf], [code], [homepage], 2023.06. - Adaptive Skeleton Graph Decoding
Shuowei Jin, Yongji Wu, Haizhong Zheng, Qingzhao Zhang, Matthew Lentz, Z. Morley Mao, Atul Prakash, Feng Qian, Danyang Zhuo. [pdf], 2024.02. - PENCIL: Long Thoughts with Short Memory
PENCIL: Long Thoughts with Short Memory. [pdf], 2025.03. - Accelerate Parallelizable Reasoning via Parallel Decoding within One Sequence
Yijiong Yu. [pdf], [code], 2024.03. - Fast-Slow-Thinking: Complex Task Solving with Large Language Models
Yiliu Sun, Yanfang Zhang, Zicheng Zhao, Sheng Wan, Dacheng Tao, Chen Gong. [pdf], 2024.04. - Learning Adaptive Parallel Reasoning with Language Models
Jiayi Pan, Xiuyu Li, Long Lian, Charlie Snell, Yifei Zhou, Adam Yala, Trevor Darrell, Kurt Keutzer, Alane Suhr. [pdf], [code], 2025.04. - Group Think: Multiple Concurrent Reasoning Agents Collaborating at Token Level Granularity
Chan-Jan Hsu, Davide Buffelli, Jamie McGowan, Feng-Ting Liao, Yi-Chang Chen, Sattar Vakili, Da-shan Shiu. [pdf], 2025.05. - Thinker: Learning to Think Fast and Slow
Stephen Chung, Wenyu Du, Jie Fu. [pdf], 2025.05. - SPRINT: Enabling Interleaved Planning and Parallelized Execution in Reasoning Models
Emil Biju, Shayan Talaei, Zhemin Huang, Mohammadreza Pourreza, Azalia Mirhoseini, Amin Saberi. [pdf], 2025.06. - Route-and-Reason: Scaling Large Language Model Reasoning with Reinforced Model Router
Chenyang Shao, Xinyang Liu, Yutang Lin, Fengli Xu, Yong Li. [pdf], 2025.06. - Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
Xinyu Yang, Yuwei An, Hongyi Liu, Tianqi Chen, Beidi Chen. [pdf], [homepage], [code], 2025.05.
- DNA Bench: When Silence is Smarter -- Benchmarking Over-Reasoning in Reasoning LLMs
Masoud Hashemi, Oluwanifemi Bamgbose, Sathwik Tejaswi Madhusudhan, Jishnu Sethumadhavan Nair, Aman Tiwari, Vikas Yadav. [pdf], 2024.12. - S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models
Wenyuan Zhang, Shuaiyi Nie, Xinghua Zhang, Zefeng Zhang, Tingwen Liu. [pdf], [code], 2025.04. - THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models
Xiao Pu, Michael Saxon, Wenyue Hua, William Yang Wang. [pdf], 2025.04. - THINK-Bench: Evaluating Thinking Efficiency and Chain-of-Thought Quality of Large Reasoning Models
Zhiyuan Li, Yi Chang, Yuan Wu. [pdf], [homepage], [code], 2025.04.
- The Impact of Reasoning Step Length on Large Language Models
Mingyu Jin, Qinkai Yu, Dong Shu, Haiyan Zhao, Wenyue Hua, Yanda Meng, Yongfeng Zhang, Mengnan Du. [pdf], [code], 2024.01. - Unlocking the Capabilities of Thought: A Reasoning Boundary Framework to Quantify and Optimize Chain-of-Thought
Qiguang Chen, Libo Qin, Jiaqi Wang, Jinxuan Zhou, Wanxiang Che. [pdf], [code], 2024.10. - Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu. [pdf], 2024.12. - When More is Less: Understanding Chain-of-Thought Length in LLMs
Yuyang Wu, Yifei Wang, Tianqi Du, Stefanie Jegelka, Yisen Wang. [pdf], 2025.02. - The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
Alejandro Cuadron, Dacheng Li, Wenjie Ma, Xingyao Wang, Yichuan Wang, Siyuan Zhuang, Shu Liu, Luis Gaspar Schroeder, Tian Xia, Huanzhi Mao, Nicholas Thumiger, Aditya Desai, Ion Stoica, Ana Klimovic, Graham Neubig, Joseph E. Gonzalez. [pdf], [code], 2025.02. - Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Yue Wang, Qiuzhi Liu, Jiahao Xu, Tian Liang, Xingyu Chen, Zhiwei He, Linfeng Song, Dian Yu, Juntao Li, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu. [pdf], 2025.02. - The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer
Marthe Ballon, Andres Algaba, Vincent Ginis. [pdf], 2025.02. - Long Is More Important Than Difficult for Training Reasoning Models
Si Shen, Fei Huang, Zhixiao Zhao, Chang Liu, Tiansheng Zheng, Danhao Zhu. [pdf], [code], 2025.03. - Innate Reasoning is Not Enough: In-Context Learning Enhances Reasoning Large Language Models with Less Overthinking
Yuyao Ge, Shenghua Liu, Yiwei Wang, Lingrui Mei, Lizhe Chen, Baolong Bi, Xueqi Cheng. [pdf], 2025.03. - Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification
Anqi Zhang, Yulin Chen, Jane Pan, Chen Zhao, Aurojit Panda, Jinyang Li, He He. [pdf], 2025.04. - Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
Chenrui Fan, Ming Li, Lichao Sun, Tianyi Zhou. [pdf], [code], 2025.04. - Time's Up! An Empirical Study of LLM Reasoning Ability Under Output Length Constraint
Yi Sun, Han Wang, Jiaqiang Li, Jiacheng Liu, Xiangyu Li, Hao Wen, Huiwen Zheng, Yan Liang, Yuanchun Li, Yunxin Liu. [pdf], 2025.04. - Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
Jinyan Su, Jennifer Healey, Preslav Nakov, Claire Cardie. [pdf], 2025.04. - When Can Large Reasoning Models Save Thinking? Mechanistic Analysis of Behavioral Divergence in Reasoning
Rongzhi Zhu, Yi Liu, Zequn Sun, Yiwei Wang, Wei Hu. [pdf], 2025.05. - On Reasoning Strength Planning in Large Reasoning Models
Leheng Sheng, An Zhang, Zijian Wu, Weixiang Zhao, Changshuo Shen, Yi Zhang, Xiang Wang, Tat-Seng Chua. [pdf], [code], 2025.06.
Optimizing LLM Test-Time Compute Involves Solving a Meta-RL Problem. CMU, University of Toronto. [blog], 2025.01.
Understanding R1-Zero-Like Training: A Critical Perspective. Sea AI Lab. [paper], [code], 2025.03.
The Key Ingredients for Scaling Test-Time Compute. Aviral Kumar. Carnegie Mellon University. [homepage], [video], 2025.03.
Reading lists related to Efficient Reasoning
- Eclipsess/Awesome-Efficient-Reasoning-LLMs
- XiaoYee/Awesome_Efficient_LRM_Reasoning
- Blueyee/Efficient-CoT-LRMs
- yueliu1999/Awesome-Efficient-Inference-for-LRMs
- DevoAllen/Awesome-Reasoning-Economy-Papers
- Hongcheng-Gao/Awesome-Long2short-on-LRMs
- EIT-NLP/Awesome-Latent-CoT
- yzhangchuck/awesome-llm-reasoning-long2short-papers
- There are cases where we miss important works in this field, please feel free to contribute and promote your awesome work or other related works here! Thanks for the efforts in advance.