The Ajax Library for Deep Learning

Ajax is a library for deep learning built upon Jax and GSPMD intended to support large-scale training of foundation models.

Principles

Simplicity: explicitness over magic
Flexibility: provide building blocks instead of frameworks

Design Choices

Configuration

(For people familiar with Lingvo's configuration system, Ajax's config library is very similar except for the terminology change of "parameter" to "config".)

GSPMD

We choose to represent all computation via global sharding instead of supporting multiple modes (data parallel vs. global sharding).

Specifically, all tensors and computation expressed in Jax describes the global tensors and computation, which are sharded on a device mesh along the data and model axis. To represent a pure data-parallel computation, one just sets the model dims of the device mesh to 1.

TODO(ruoming): add an example here.

A notable exception is input processing, which is not expressed in Jax and describes not the global computation but what happens on each host.

Invocation Context

One of Jax's appeals is the functional programming paradigm---users are encouraged to write pure functions without side inputs and outputs.

In practice, we found that certain inputs and outputs need to propagated forth and back between any pair of parent and child layers. These include model parameters, whether the computation is in the training or evaluation mode (is_training), and pseudo-random generator key as parts of layer inputs, and summaries and auxiliary parameter updates (e.g., moving averages) as parts of layer outputs. Written explicitly, the code will look like

def forward(self, is_training, prng_key, parameters, x):
    ...
    # For each child...
    prng_key, child_key = jax.random.split(prng_key)
    summaries = {}
    parameter_updates = {}
    x, child_aux_outputs = self.child.forward(
        is_training, child_key, parameters["child"], x)
    summaries["child"] = child_aux_outputs["summaries"]
    parameter_updates["child"] = child_aux_outputs["parameter_updates"]
    ...
    return x, dict(summaries=summaries, parameter_updates=parameter_updates)

This seems unnecessarily verbose. So we took a page from Flax and Lingvo's approach and use a thread-local stack of InvocationContext to represent the implicit inputs and outputs. This allows us to write code in a concise style as in PyTorch/Flax:

def forward(self, x):
    ...
    # Implicit inputs can be accessed via self.{is_training, prng_key, parameters}.
    x = self.child(x)
    # Implicit outputs can be added via self.add_{summary, parameter_update}().
    
    # User can override implicit inputs, e.g., to invoke the teacher module in the evaluation mode.
    with set_current_context(current_context().clone(is_training=False)):
        y = self.teacher(x)
    ...
    return x

One exception is at the root level, where we need pure functions for jit, pjit, and differentiation. We provide a functional() method, which converts any module method invocation into a functional API with explicit inputs and outputs.

Name		Name	Last commit message	Last commit date
Latest commit History 211 Commits
experiments		experiments
README.md		README.md
adapter_flax.py		adapter_flax.py
adapter_flax_test.py		adapter_flax_test.py
attention.py		attention.py
attention_test.py		attention_test.py
c4_gpt2.py		c4_gpt2.py
c4_lm_trainer.py		c4_lm_trainer.py
causal_lm.py		causal_lm.py
causal_lm_test.py		causal_lm_test.py
checkpointer.py		checkpointer.py
checkpointer_test.py		checkpointer_test.py
config.py		config.py
config_test.py		config_test.py
factorized_rms.py		factorized_rms.py
factorized_rms_test.py		factorized_rms_test.py
imagenet.py		imagenet.py
imagenet_test.py		imagenet_test.py
input_benchmark.py		input_benchmark.py
input_image.py		input_image.py
input_image_test.py		input_image_test.py
input_text.py		input_text.py
input_text_test.py		input_text_test.py
input_tfds.py		input_tfds.py
jax_test.py		jax_test.py
launch.py		launch.py
launch_trainer_main.py		launch_trainer_main.py
layers.py		layers.py
layers_test.py		layers_test.py
learner.py		learner.py
learner_test.py		learner_test.py
metrics.py		metrics.py
metrics_test.py		metrics_test.py
module.py		module.py
module_test.py		module_test.py
optimizer_base.py		optimizer_base.py
optimizers.py		optimizers.py
optimizers_test.py		optimizers_test.py
param_init.py		param_init.py
param_init_test.py		param_init_test.py
pipeline.py		pipeline.py
pipeline_test.py		pipeline_test.py
pjit_test.py		pjit_test.py
pjit_trainer_test.py		pjit_trainer_test.py
repeat.py		repeat.py
repeat_test.py		repeat_test.py
requirements-local.txt		requirements-local.txt
requirements-tpu.txt		requirements-tpu.txt
resnet.py		resnet.py
resnet_test.py		resnet_test.py
schedule.py		schedule.py
schedule_test.py		schedule_test.py
setup-tpu.sh		setup-tpu.sh
summary_writer.py		summary_writer.py
summary_writer_test.py		summary_writer_test.py
test_utils.py		test_utils.py
tfds_c4_benchmark.py		tfds_c4_benchmark.py
tfds_imagenet_benchmark.py		tfds_imagenet_benchmark.py
tfds_test.py		tfds_test.py
trainer.py		trainer.py
trainer_test.py		trainer_test.py
utils.py		utils.py
utils_test.py		utils_test.py
vision_transformer.py		vision_transformer.py
vision_transformer_test.py		vision_transformer_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The Ajax Library for Deep Learning

Principles

Design Choices

Configuration

GSPMD

Invocation Context

About

Uh oh!

Releases

Packages

Languages

ruomingp/jax_playground

Folders and files

Latest commit

History

Repository files navigation

The Ajax Library for Deep Learning

Principles

Design Choices

Configuration

GSPMD

Invocation Context

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages