Commit fa7f5fa
Support branching rollouts via trajectories; refactor state handling (#549)
* big chungus refactor for branching rollouts + cleaner state handling
* tests passing
* 3.11 fix; ruff
* vllm logprob args
* dict indexing for messages
* remove generateinputs
* optional truncation in trajectorystep for tokens
* small tweaks
* optional decorator rank for sorting order
* minor tweak
* change rank -> priority
* add cleanup to is_completed
* tool_env error handling, sandbox command timeout
* handle updated context length msg
* duplicate is_truncated field
* add model/sampling to state
* client/model/sampling in init_state
* updated config
* add kimi overlong prompt message
* add kimi overlong prompt message
* set_max_seq_len
* Add numpy, sympy, and scipy to PythonEnv
* pin prime-rl to will/trajectories branch
* update prime-rl wiki-search config
* ruff, ty
* fix init_state tests
* version, release notes
* env version bumps
* ty fixes
* opt deps for ty CI
* pin trajectories for configs
* use verifiers commit for configs
* ty for verifiers only
* bump vllm version
* process overlong prompt into trajectory
* skip steps with None tokens
---------
Co-authored-by: Sebastian <[email protected]>1 parent 3f20f0f commit fa7f5fa
File tree
57 files changed
+4365
-3428
lines changed- .github/workflows
- configs
- prime-rl
- vf-rl
- environments
- math_group
- math_python
- sentence_repeater
- wiki_search
- wordle
- outputs/evals/wordle--gpt-4.1-mini/816ec3b0
- notes
- tests
- verifiers
- envs
- rl/trainer
- rubrics
- scripts
- utils
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
57 files changed
+4365
-3428
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
32 | | - | |
| 32 | + | |
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
38 | | - | |
| 38 | + | |
39 | 39 | | |
40 | | - | |
| 40 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
32 | | - | |
| 32 | + | |
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
2 | | - | |
| 1 | + | |
| 2 | + | |
3 | 3 | | |
4 | 4 | | |
| 5 | + | |
5 | 6 | | |
6 | 7 | | |
7 | | - | |
| 8 | + | |
8 | 9 | | |
9 | 10 | | |
10 | | - | |
| 11 | + | |
11 | 12 | | |
12 | 13 | | |
13 | 14 | | |
| |||
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
34 | | - | |
| 35 | + | |
35 | 36 | | |
36 | 37 | | |
| 38 | + | |
| 39 | + | |
37 | 40 | | |
38 | 41 | | |
39 | | - | |
| 42 | + | |
40 | 43 | | |
41 | 44 | | |
42 | | - | |
43 | | - | |
| 45 | + | |
44 | 46 | | |
45 | 47 | | |
46 | 48 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
13 | | - | |
14 | | - | |
15 | | - | |
| 13 | + | |
16 | 14 | | |
17 | 15 | | |
18 | 16 | | |
19 | | - | |
| 17 | + | |
20 | 18 | | |
21 | 19 | | |
22 | 20 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
| 4 | + | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
| 23 | + | |
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
| 4 | + | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
10 | | - | |
11 | | - | |
12 | 9 | | |
13 | 10 | | |
14 | 11 | | |
15 | 12 | | |
16 | 13 | | |
17 | 14 | | |
18 | | - | |
| 15 | + | |
19 | 16 | | |
20 | 17 | | |
21 | 18 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | | - | |
| 5 | + | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
| 5 | + | |
6 | 6 | | |
7 | 7 | | |
8 | | - | |
| 8 | + | |
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
| 4 | + | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| |||
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
78 | | - | |
79 | | - | |
80 | | - | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
81 | 81 | | |
82 | 82 | | |
83 | 83 | | |
84 | | - | |
| 84 | + | |
85 | 85 | | |
86 | 86 | | |
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
90 | | - | |
| 90 | + | |
91 | 91 | | |
92 | 92 | | |
93 | 93 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
| 6 | + | |
7 | 7 | | |
8 | | - | |
| 8 | + | |
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| |||
0 commit comments