Refactor configs #383

mikasenghaas · 2025-06-10T22:36:19Z

This PR is quite a major refactor of the configs. Most importantly it switches from pydantic_config to the Pydantic’s official solution pydantic-settings and renames and structures large parts of the inference configs.

Nice things

Natural nesting and more descriptive names: The configs are renamed and restructured such that they easier to understand and maximally modular, allowing us to pass around only config chunks in functions/ classes for initialization (e.g. PipelineConfig is passed to setup_pipeline)
CLI help message: We can now display a help message via uv run src/zeroband/infer.py -h (or —help) showing arguments’ types, default values and descriptions. This should make it much easier for a) new people to use the project and b) us (who are forgetful) to remember what certain arguments are for we defined months ago
Nested TOML configs: We can now load multiple config files. This is super useful if there are some general config (e.g. @configs/inference/synthetic-2/default.toml) that is shared across multiple specific configs (e.g. here the model-specific config@configs/inference/synthetic-2/qwen3-4b.toml)

uv run src/zeroband/infer.py @configs/inference/synthetic-2/default.toml @configs/inference/synthetic-2/qwen3-4b.toml

Multiple sources: We can load configs from TOML files, CLI arguments and environment variables and can easily define the precendence of sources. This is the hierarchy: 1) CLI arguments, 2) Config values, 3) Environment variables, 4) Defaults

PRIME_MODEL__NAME=Qwen/Qwen3-4B uv run src/zeroband/infer.py @qwen8b.toml @qwen14b.toml --model.name Qwen/Qwen3-32B

In this example, the CLI argument --model.name Qwen/Qwen3-32B will take precendence and the script will use Qwen/Qwen3-32B as the model name. If the CLI argument wasn't set, then the second config file would take precedence and the script would use Qwen/Qwen-14B as the model name. If the second config file wasn't set, then the first config file would take precedence and the script would use Qwen/Qwen3-8B as the model name. Finally, if the first config file wasn't set, then the environment variable would take precedence and the script would use Qwen/Qwen-4B as the model name. If the environment variable wasn't set, then the default value would be used and the script would use Qwen/Qwen3-0.6B as the model name.

Easy logging: It's a log easier to log (nested) configs (both to stdout) and to external sources like W&B. For example only this line

    logger.info(f"Initializing model and tokenizer ({config.model})")

Prints the full model config

06-11 16:55:14 [INFER] [INFO] Initializing model and tokenizer (name='Qwen/Qwen3-4B' dtype='auto' kv_cache_dtype='auto' max_model_len=16384 quantization=None enforce_eager=False device='auto' enable_thinking=True)

We can also easly define which config values should be printed from a nested config using the repr argument of the Field class.

No maintenance & more features: We do not need to maintain anything but get loads of nice (optional) feature for free, like setting configs via JSON string, from a .env file, etc.

“Breaking”/ annoying things

Quite some argument names are changed, so people will get slightly annoyed at me when they e.g. try to type —model-name but now it is —model.name
We define a new schema for setting configs via environment variables. All arguments are prefixed with PRIME_ and use __ to denote nested model. For example, --model.name is nested and the corresponding environment variable would be PRIME_MODEL__NAME. This affects how we set the socket path in production from the protocol worker. The env variable PRIME_SOCKET_PATH does not work anymore, instead we have to use PRIME_MONITOR__SOCKET__PATH, or simply pass via CLI as --monitor.socket.path (preferred)

src/zeroband/inference/config.py

src/zeroband/inference/pipeline.py

mikasenghaas · 2025-06-11T20:33:41Z

CI e2e run works. Also tested that all of the commands in the README for distributed inference are updated and work.

README.md

src/zeroband/training/config.py

src/zeroband/inference/config.py

Co-authored-by: samsja <[email protected]>

mikasenghaas · 2025-06-12T01:37:19Z

Jackmin801 · 2025-06-12T22:28:32Z

samsja reviewed Jun 10, 2025

View reviewed changes

src/zeroband/inference/config.py Outdated Show resolved Hide resolved

samsja reviewed Jun 10, 2025

View reviewed changes

src/zeroband/inference/config.py Outdated Show resolved Hide resolved

samsja reviewed Jun 10, 2025

View reviewed changes

src/zeroband/inference/pipeline.py Outdated Show resolved Hide resolved

mikasenghaas force-pushed the mika/refactor/config branch from 8d3d82d to 331b0f0 Compare June 11, 2025 17:01

mikasenghaas marked this pull request as ready for review June 11, 2025 18:47

mikasenghaas added 25 commits June 11, 2025 19:53

Add docs to sampling config and add missing sampling parameters

25326c3

Extract parallel config

1f08baf

Extract model config

ed469b6

Extract data config

deec789

Correct var naming from batch_size to max_batch_size

3000944

Extract RL config

091d1f5

Delete deprecated configs

0b9189d

Increment data offset by problems per batch (excluding sampling.n)

107f05e

Align ckpt/rollout path definition

abda132

Ignore ckpt/rollout dir

0771f10

Adapt configs

aa0d528

Fix inference integration tests

a265e1b

Switch to using annotated fields for configs

9550931

Improved config logs

4dc727e

Migrate inference configs to pydantic-settings

5d56cbd

Fix broken model validation

56ef37b

Fix integration test

721f05b

Skip ge/le checks on non-numeric type

b9e5221

Update README

62423fc

Move comments into field description

9bf46e3

Use path type

459c91f

Fix tab tab error

c0672f6

Use implicit boolean flags

3346484

Fix funny legacy import error

69c91e6

Fix unit tests

7906083

mikasenghaas added 3 commits June 11, 2025 19:53

Use clean argv fixture in inference config test

9584be8

Explain configs in README

022023a

Fix bug from rebase

715e3df

mikasenghaas force-pushed the mika/refactor/config branch from 84838c6 to 715e3df Compare June 11, 2025 19:54

mikasenghaas changed the title ~~Refactor inference configs~~ [PRI2-591] Refactor inference configs Jun 11, 2025

mikasenghaas changed the title ~~[PRI2-591] Refactor inference configs~~ Refactor inference configs Jun 11, 2025

mikasenghaas added 3 commits June 11, 2025 22:06

Skip doing hex string validation

18ff57a

Optionally parse from CLI and move toml extraction away from module def

1d43e2d

Move training configs to pydantic-settings

1a75ab0

mikasenghaas changed the title ~~Refactor inference configs~~ Refactor configs Jun 12, 2025

mikasenghaas added 2 commits June 12, 2025 01:06

Update README

fc6e3c4

Remove pydanctic_config from deps

07e07b2