add GymEnv #548

hallerite · 2025-11-07T20:45:00Z

Description

Adds GymEnv, an optional Environment subclass that runs classic reset/step simulator loops. This lets you plug Gym-style environments directly into verifiers (including custom simulators defined inside this repo) without first converting them into a static dataset.

GymEnv supports:

Homogeneous mode (env_cls): one env class with optional env_kwargs; dataset rows feed directly into reset(**info).
Heterogeneous mode (env_registry): a registry of env classes; each dataset row specifies info.env_type and optional info.env_kwargs to select/configure the env per rollout.
Custom mode (subclass + _make_env): full control over environment construction.

Additional features:

automatic dataset generation when none is provided (so RL training “just works”),
optional evaluation override via a user-supplied eval_runner.

GymEnv is fully opt-in and does not affect existing environments.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

Dataset & trainer integration
- Environment requires at least one of dataset or eval_dataset.
  GymEnv now handles this automatically:
  - If you provide a dataset or eval dataset explicitly, they are used as-is.
  - If you provide no datasets and keep auto_dummy_eval=True:
    - We auto-build a training dataset of length num_train_episodes via _build_auto_dataset(...).
    - In homogeneous mode (env_cls set):
      - dataset = this auto dataset.
      - eval_dataset = a 1-row dummy eval set, preserving “episodes mode” behavior where num_examples maps cleanly to rollouts_per_example.
    - In heterogeneous mode (env_registry set):
      - We auto-generate rows where each info contains a valid env_type (round-robin over registry keys) and default env_kwargs={}.
      - This auto dataset is used for both dataset and eval_dataset, ensuring all rows map cleanly into actual env instances.

TBD

Demonstrate an end-to-end RL training run using RLTrainer + GymEnv (homogeneous and heterogeneous cases) to validate.

CLAassistant · 2025-11-07T20:45:08Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ hallerite
✅ willccbb
❌ snimu
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

willccbb · 2025-11-09T12:10:39Z

nice! would you wanna do the PR on top of the trajectories branch? #549

there's a bunch of new things which should make native gym-style rollouts much easier, especially for training.

also would be ideal if we still had a notion of a dataset, as this is how trainers typically expect to work with verifiers envs

can be generated at init with a given number of hidden state samples (as in textarena_env)

hallerite · 2025-11-09T14:22:09Z

yes! will do it on top of it. I'll just wait for the other to be finished

* verifiers integration v0.0 * verifiers integration v0.0 * training finishes on reversetext * reverse_text trains 0.13 -> 0.70 * pin git branch in pyproject * removed ac in config * remove DataConfig * removed print debug * configurable masks * rm dataconfig instance * pinned vf commit, removed extra deps * registry cleanup, reworked gsm8k, removed default env * bumped verifiers commit, fixed training divergence vs refactor branch * vf simple-math * simple-math edits * debug * simple_math train matching reference * rename math tasks, port to verifiers * Update configs * Fix imports * Add back envs that were accidentically deleted during rebase * Fix sampling.n missing and unused vars * Remove redundant log * Update README with new task names * Fix hendrycks math config path in README * Update W&B project names * Fix wrong config key * Add missing import * Fix sampling.n not defined * Fix missing config key * Do not tokenize in eval * Fix typos * Add 1B and 7B hendryck's and intellect math * Remove comment * fix tests and configs (PrimeIntellect-ai#548) * add tests orch * fix configs * fix configs * fix tests * fix pydantic ofnig * fix tetstp * Dispatch subconfigs via tmp toml file * Add correct GPU placement for int math run * More consistent var names * Set project and model also in orchestrator * Update readme (PrimeIntellect-ai#550) * fix readme * fix scripts * Parse single-turn prompt and completions tokens/ logprobs from vLLM directly via mock process_env_results function * Update verifiers rev * Fix style * Do not filter if field is missing --------- Co-authored-by: Mika Senghaas <[email protected]> Co-authored-by: samsja <[email protected]>

initial commit

9e5d4fb

willccbb added 3 commits November 9, 2025 10:21

big chungus refactor for branching rollouts + cleaner state handling

2545383

tests passing

c602c16

3.11 fix; ruff

527a1f4

willccbb added 4 commits November 9, 2025 12:17

vllm logprob args

6223463

dict indexing for messages

1fda16f

remove generateinputs

8c587ee

optional truncation in trajectorystep for tokens

c2faf0b

willccbb added 4 commits November 9, 2025 23:54

small tweaks

580d8da

optional decorator rank for sorting order

f7a7394

minor tweak

ccc7a40

change rank -> priority

a91e975

willccbb and others added 14 commits November 14, 2025 17:44

add cleanup to is_completed

f91f9e0

tool_env error handling, sandbox command timeout

32d8f9f

handle updated context length msg

b61334f

duplicate is_truncated field

a0e9319

add model/sampling to state

c1b7e28

client/model/sampling in init_state

e5ea48b

updated config

4e951bb

add kimi overlong prompt message

4aba40a

add kimi overlong prompt message

27015ab

set_max_seq_len

a4ec20e

Add numpy, sympy, and scipy to PythonEnv

8860d81

Merge remote-tracking branch 'upstream/trajectories' into gym_env

2f6e367

make compatible with new API

c4ec38e

make dual use more clear

9f15efc

hallerite added 4 commits November 20, 2025 01:31

Merge remote-tracking branch 'upstream/main' into gym_env

ac0ffa9

adapt envs to trajectories patch

dc8f117

match MultiTurnEnv semantics

b86d3c7

add automatic dataset building

636f320

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add GymEnv #548

add GymEnv #548

Uh oh!

hallerite commented Nov 7, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Nov 7, 2025 •

edited

Loading

Uh oh!

willccbb commented Nov 9, 2025

Uh oh!

hallerite commented Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

add GymEnv #548

Are you sure you want to change the base?

add GymEnv #548

Uh oh!

Conversation

hallerite commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Checklist

Additional Notes

TBD

Uh oh!

CLAassistant commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

willccbb commented Nov 9, 2025

Uh oh!

hallerite commented Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hallerite commented Nov 7, 2025 •

edited

Loading

CLAassistant commented Nov 7, 2025 •

edited

Loading