[PP] microbatch split config #947

H-Huang · 2025-03-07T22:22:27Z

Allow the input batch to be split on the sequence dimension in pipeline parallelism (remove the requirement for batch_size >= num stages)

depends on pytorch/pytorch#148458

The new config to set this is pipeline_parallel_batch_split_dim = 1

tianyu-l · 2025-03-09T00:37:43Z

torchtitan/config_manager.py

                of stages.  Stages per rank are inferred from split points degree, and schedule.""",
        )
+        self.parser.add_argument(
+            "--experimental.pipeline_parallel_batch_split_dim",


It seems PP has been in experimental for a while. Do you think it's time we extract pipeline_parallel into a standalone section and put all configs over there?
It doesn't have to happen in this PR.

tianyu-l · 2025-03-09T00:38:53Z

torchtitan/distributed/pipeline.py

            f"of stages ({num_total_stages}) which may result in a bubble in the pipeline."
        )

-    # validate that the batch size is divisible by the number of microbatches otherwise we'll hang or error during training


I have several questions here:

If pipeline_parallel_batch_split_dim == 0, what would happen if if job_config.training.batch_size % num_total_stages != 0?

If pipeline_parallel_batch_split_dim is on the sequence dim or other dims, don't we need similar checks in the extremal cases e.g. seq_len < num_stages

Btw this divisibility requirement seems not exactly the same as "batch_size >= num stages" you mentioned in the PR summary.

[PP] microbatch split config

a2a0f73

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 7, 2025

tianyu-l reviewed Mar 9, 2025

View reviewed changes

H-Huang marked this pull request as draft March 25, 2025 14:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PP] microbatch split config #947

[PP] microbatch split config #947

Uh oh!

H-Huang commented Mar 7, 2025 •

edited

Loading

Uh oh!

tianyu-l Mar 9, 2025

Uh oh!

tianyu-l Mar 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[PP] microbatch split config #947

Are you sure you want to change the base?

[PP] microbatch split config #947

Uh oh!

Conversation

H-Huang commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tianyu-l Mar 9, 2025

Choose a reason for hiding this comment

Uh oh!

tianyu-l Mar 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

H-Huang commented Mar 7, 2025 •

edited

Loading