-
Notifications
You must be signed in to change notification settings - Fork 442
add GymEnv #548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
add GymEnv #548
Conversation
|
|
|
nice! would you wanna do the PR on top of the trajectories branch? #549 there's a bunch of new things which should make native gym-style rollouts much easier, especially for training. also would be ideal if we still had a notion of a dataset, as this is how trainers typically expect to work with verifiers envs can be generated at init with a given number of hidden state samples (as in textarena_env) |
|
yes! will do it on top of it. I'll just wait for the other to be finished |
* verifiers integration v0.0 * verifiers integration v0.0 * training finishes on reversetext * reverse_text trains 0.13 -> 0.70 * pin git branch in pyproject * removed ac in config * remove DataConfig * removed print debug * configurable masks * rm dataconfig instance * pinned vf commit, removed extra deps * registry cleanup, reworked gsm8k, removed default env * bumped verifiers commit, fixed training divergence vs refactor branch * vf simple-math * simple-math edits * debug * simple_math train matching reference * rename math tasks, port to verifiers * Update configs * Fix imports * Add back envs that were accidentically deleted during rebase * Fix sampling.n missing and unused vars * Remove redundant log * Update README with new task names * Fix hendrycks math config path in README * Update W&B project names * Fix wrong config key * Add missing import * Fix sampling.n not defined * Fix missing config key * Do not tokenize in eval * Fix typos * Add 1B and 7B hendryck's and intellect math * Remove comment * fix tests and configs (PrimeIntellect-ai#548) * add tests orch * fix configs * fix configs * fix tests * fix pydantic ofnig * fix tetstp * Dispatch subconfigs via tmp toml file * Add correct GPU placement for int math run * More consistent var names * Set project and model also in orchestrator * Update readme (PrimeIntellect-ai#550) * fix readme * fix scripts * Parse single-turn prompt and completions tokens/ logprobs from vLLM directly via mock process_env_results function * Update verifiers rev * Fix style * Do not filter if field is missing --------- Co-authored-by: Mika Senghaas <[email protected]> Co-authored-by: samsja <[email protected]>
Description
Adds
GymEnv, an optionalEnvironmentsubclass that runs classicreset/stepsimulator loops. This lets you plug Gym-style environments directly intoverifiers(including custom simulators defined inside this repo) without first converting them into a static dataset.GymEnvsupports:env_cls): one env class with optionalenv_kwargs; dataset rows feed directly intoreset(**info).env_registry): a registry of env classes; each dataset row specifiesinfo.env_typeand optionalinfo.env_kwargsto select/configure the env per rollout._make_env): full control over environment construction.Additional features:
eval_runner.GymEnvis fully opt-in and does not affect existing environments.Type of Change
Testing
uv run pytestlocally.Checklist
Additional Notes
Environmentrequires at least one ofdatasetoreval_dataset.GymEnvnow handles this automatically:auto_dummy_eval=True:num_train_episodesvia_build_auto_dataset(...).env_clsset):dataset =this auto dataset.eval_dataset =a 1-row dummy eval set, preserving “episodes mode” behavior wherenum_examplesmaps cleanly torollouts_per_example.env_registryset):infocontains a validenv_type(round-robin over registry keys) and defaultenv_kwargs={}.datasetandeval_dataset, ensuring all rows map cleanly into actual env instances.TBD
RLTrainer+GymEnv(homogeneous and heterogeneous cases) to validate.