Skip to content

bruno686/Awesome-Agent-Training

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 

Repository files navigation

Awesome-Agent-Training

PR Welcome License: MIT Awesome

We are witnessing an exciting era where LLM capabilities have rapidly advanced in just a few years, enabling lower costs and stronger performance for real-world applications.

The next key step is to enhance Language Agents' ability to handle diverse tasks, which is crucial for deployment. We also focus on optimizing their structure and training methods to improve task completion rates.

Training Language Agents is an essential yet still emerging technology. This repository is dedicated to pushing the boundaries and exploring new possibilities in this field.

The Second Half

Papers

1.1 Behavior Cloning (Learning from Good Behavior)

1.2 Behavior Cloning (Learning from Both Good and Bad Behaviors, Utilizing Mistakes Selective/Comparative)

2.1 Alignment with the Real World (Considering Previous Trajectories, Multi-turn)

2.2 Alignment with the Real World (Considering Unexpected Cases)

2.3 Alignment with the Real World (Considering Long-Horizon Task)

2.4 Alignment with the Real World (Considering Multi-Agent Collaborate)

3.1 Agent Trajectories Construction

4 Backtracking

Supplementary Classical Papers

0 Agent Framework Design

封面图

Basic Understanding

Difference between Multi-turn RL with Single-turn RL
Single-turn RL: Like scratching a lottery ticket. See the ticket, scratch! See money. End of round. Corresponding to LLM, user enters prompt, gets answer, quality of answer. End.
Multi-turn RL: It's like playing Super Mario, constantly seeing new screens, performing new actions, and getting rewards (deaths or gold coins). Corresponding to LLM, for each new answer, the input is not only the current prompt, but also the previous answer, the previous prompt, and the previous reward (i.e., the previous trajectories).

To be Classified

  • [2504] Guiding VLM Agents with Process Rewards at Inference Time for GUI Navigation [Behavioral cloning is for LLM to learn the correct trajectory during the training phase. The method is to directly correct the trajectory of LLM during the “test” by training a process-reward model. But the reality is complex and changing, doesn't the reward model still have to keep learning?]

Tool Call

RL + Search

Meta-Thinking

Open-Source Project

  • RAGEN Stars RAGEN (Training agent)
  • RAGEN Stars Search-R1 (Train your LLMs to reason and call a search engine with reinforcement learning)
  • OpenManus-RL Stars OpenManus-RL (A live stream development of RL tunning for LLM agents)
  • MetaSpatial Stars MetaSpatial (Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse)

Contributing

  • Feel free to contribute more papers or other any resources!

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •