[Diloco] Diloco trainer #2920

khatwanimohit · 2026-01-08T17:31:57Z

Description

This PR introduces support for Distributed Low-Communication (DiLoCo) training in MaxText. It implements both standard DiLoCo, enabling efficient model training across disjoint clusters ("islands") by synchronizing gradients infrequently via an outer optimizer.

Key Changes

Core Logic: Added src/MaxText/diloco.py, which implements the DiLoCoTrainState, inner/outer optimization steps,
and communication synchronization using drjax.
Training Loop Integration: Modified src/MaxText/train.py to initialize the DiLoCo state and adapt the training
step when enable_diloco is active. This includes handling data reshaping for multiple replicas.
Sharding & Configuration:
- Updated src/MaxText/sharding.py to support a hierarchical "diloco" sharding axis.
- Added new flags (e.g., enable_diloco, num_diloco_replicas, diloco_outer_optimizer) to base.yml and types.py.
Dependencies: Added drjax to the project requirements.
Testing: Added comprehensive unit tests in tests/diloco_test.py.

Notice 1: Once all tests pass, the "pull ready" label will automatically be assigned.
This label is used for administrative purposes. Please do not add it manually.

Notice 2: For external contributions, our settings currently require an approval from a MaxText maintainer to trigger CI tests.

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-01-08T18:25:46Z

Codecov Report

❌ Patch coverage is 0% with 104 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/MaxText/diloco.py	0.00%	68 Missing ⚠️
src/MaxText/train.py	0.00%	22 Missing ⚠️
src/MaxText/train_utils.py	0.00%	7 Missing ⚠️
src/MaxText/sharding.py	0.00%	4 Missing ⚠️
src/MaxText/maxtext_utils.py	0.00%	3 Missing ⚠️

📢 Thoughts on this report? Let us know!

khatwanimohit · 2026-01-08T19:36:33Z

tests/diloco_test.py

+_BASE_CONFIG_PATH = os.path.join(MAXTEXT_REPO_ROOT, "src", "MaxText", "configs", "base.yml")
+
+
+class SimpleNNXModel(nnx.Module):


use SimpleLayer

khatwanimohit

Add train_compile tests for Diloco

src/MaxText/diloco.py

NuojCheng · 2026-01-08T19:48:22Z

src/MaxText/train.py

+        eval_step,
+        eval_data_iterator,
+        params_shardings,
+    )


maybe move logics out of train.py

NuojCheng · 2026-01-08T19:51:01Z

src/MaxText/train.py

      with jax.profiler.StepTraceAnnotation("train", step_num=step):
        example_batch = data_loader.load_next_batch(rampup_manager=rampup_manager)
+        if config.enable_diloco:
+          example_batch = diloco.reshape_first_axis_with_diloco(config.num_diloco_replicas, example_batch)


actually you could consider move this logic and the config.input_data_sharding_logical_axes change in sharding.py to MaxText.data_loader, e.g.

def load_next_batch(**args): if enable_diloco: .... else: original logics...

in this case, along with the previous suggested change, you don't need to change anything in train.py

I was making this change I realized we are calling sharding.maybe_shard_with_name twice.
first inside in data_loader.load_next_batch and the secondly after data_loader.load_next_batch is called in train.py

@NuojCheng can you double check if this is true and then I can remove one of them along with this

Yes please remove the one in train.py. Thanks!

I have made this change in #2926

tests/diloco_test.py

khatwanimohit changed the title ~~Mohit/diloco trainer~~ [Diloco] Diloco trainer Jan 8, 2026

khatwanimohit added 2 commits January 8, 2026 17:39

diloco utils

61290e0

diloco utils

45c05c6

khatwanimohit force-pushed the mohit/diloco_trainer branch from 246893d to 364cf4e Compare January 8, 2026 17:40

diloco trainer

cdba187

khatwanimohit force-pushed the mohit/diloco_trainer branch from 364cf4e to cdba187 Compare January 8, 2026 18:13

khatwanimohit commented Jan 8, 2026

View reviewed changes