Add meshes and config for TRN2/1 for Fuji models by apoorvtintin · Pull Request #885 · apple/axlearn

apoorvtintin · 2024-12-11T20:41:01Z

This PR adds meshes for TRN2/1 for Fuji models and transformer layer configuration favorable to Neuron.

Neuron supports stacked transformer and GroupedQKVLinear instead of FusedGroupedQKVLinear for Grouped Query Attention (GQA)

ruomingp · 2024-12-14T00:02:04Z

axlearn/experiments/text/gpt/fuji.py

+            mesh_rules=(
+                (
+                    "neuron-(trn2|trn2n).48xlarge-64",
+                    mesh_shape_from_axes(fsdp=-1, model=4),


Comment on why we set model=4 for neuron?

ruomingp · 2024-12-14T00:06:02Z

axlearn/experiments/text/gpt/fuji.py

    if num_kv_heads:
        atten_cfg = GroupedQueryAttention.default_config()
-        atten_input_linear = FusedGroupedQKVLinear.default_config().set(num_kv_heads=num_kv_heads)
+        backend = jax.default_backend()


The fuji config should not depend on jax.default_backend(), otherwise the golden configs will not reflect the actual config being used.

Instead, we can create separate configs for a backend that requires different settings.

+1, please follow this example instead if you really need to overwrite some configs, you can add another custom LayerConfigModifierlike this one: https://github.com/apple/axlearn/blob/main/axlearn/common/trainer_config_modifier.py#L69,

Thanks for the review, will update the PR with a custom LayerConfigModifier.

ruomingp · 2024-12-14T00:06:54Z

axlearn/experiments/text/gpt/fuji.py

        raise NotImplementedError(f"Unknown model size {model_size}.")
    model_kwargs = trainer_kwargs.pop("model_kwargs")
    model_kwargs.setdefault("vocab_size", vocab_size)
+    model_kwargs.setdefault("stack_cfg", None if backend != "neuron" else StackedTransformerLayer.default_config())


Will the use of StackedTransformerLayer (vs. RepeatedTransformerLayer) lead to large XLA programs and long compilation time?

We are in the middle of optimizing RepeatedTransformer to use a new hardware feature in TRN2 to make dynamic memory operations faster. In the meantime, please continue to use StackedTransformer. Neuron compiler has a module to detect repeating blocks, compile once and reuse. So, compile time won't grow with the number of layers.

We are in the middle of optimizing RepeatedTransformer to use a new hardware feature in TRN2 to make dynamic memory operations faster. In the meantime, please continue to use StackedTransformer. Neuron compiler has a module to detect repeating blocks, compile once and reuse. So, compile time won't grow with the number of layers.

Nice! Could you add this as a comment?

apoorvtintin · 2025-01-10T00:50:16Z

Opened a new PR from my fork of Axlearn (#916). All comments in this discussion have been addressed there.

apoorvtintin · 2025-01-10T00:50:49Z

Closing since there is a new version of the PR in (#916)

boilerplate

0e70af3

apoorvtintin requested review from markblee and ruomingp as code owners December 11, 2024 20:41

apoorvtintin mentioned this pull request Dec 11, 2024

[DO-NOT-MERGE] PR encompassing all changes needed to support neuron on Axlearn #886

Closed

ruomingp requested a review from kelvin-zou December 14, 2024 00:07

ruomingp suggested changes Dec 14, 2024

View reviewed changes

apoorvtintin mentioned this pull request Jan 10, 2025

TRN2 Meshes and Configurations #916

Merged

apoorvtintin closed this Jan 10, 2025

apoorvtintin mentioned this pull request Jan 13, 2025

[DO-NOT-MERGE] PR encompassing all changes needed to support neuron on Axlearn #919

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add meshes and config for TRN2/1 for Fuji models#885

Add meshes and config for TRN2/1 for Fuji models#885
apoorvtintin wants to merge 1 commit intoapple:mainfrom
patrick-toulme:mainline_upstream_boilerplate

apoorvtintin commented Dec 11, 2024

Uh oh!

ruomingp Dec 14, 2024

Uh oh!

ruomingp Dec 14, 2024

Uh oh!

kelvin-zou Dec 14, 2024

Uh oh!

apoorvtintin Dec 16, 2024 •

edited

Loading

Uh oh!

ruomingp Dec 14, 2024

Uh oh!

apoorvtintin Dec 16, 2024

Uh oh!

ruomingp Dec 16, 2024

Uh oh!

apoorvtintin commented Jan 10, 2025

Uh oh!

apoorvtintin commented Jan 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

apoorvtintin commented Dec 11, 2024

Uh oh!

ruomingp Dec 14, 2024

Choose a reason for hiding this comment

Uh oh!

ruomingp Dec 14, 2024

Choose a reason for hiding this comment

Uh oh!

kelvin-zou Dec 14, 2024

Choose a reason for hiding this comment

Uh oh!

apoorvtintin Dec 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ruomingp Dec 14, 2024

Choose a reason for hiding this comment

Uh oh!

apoorvtintin Dec 16, 2024

Choose a reason for hiding this comment

Uh oh!

ruomingp Dec 16, 2024

Choose a reason for hiding this comment

Uh oh!

apoorvtintin commented Jan 10, 2025

Uh oh!

apoorvtintin commented Jan 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

apoorvtintin Dec 16, 2024 •

edited

Loading