Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PP] microbatch split config #947

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

H-Huang
Copy link
Member

@H-Huang H-Huang commented Mar 7, 2025

Allow the input batch to be split on the sequence dimension in pipeline parallelism (remove the requirement for batch_size >= num stages)

depends on pytorch/pytorch#148458

The new config to set this is pipeline_parallel_batch_split_dim = 1

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 7, 2025
@@ -359,6 +359,14 @@ def __init__(self):
If using looped schedules, this still specifies the number of physical ranks, not the number
of stages. Stages per rank are inferred from split points degree, and schedule.""",
)
self.parser.add_argument(
"--experimental.pipeline_parallel_batch_split_dim",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems PP has been in experimental for a while. Do you think it's time we extract pipeline_parallel into a standalone section and put all configs over there?
It doesn't have to happen in this PR.

@@ -119,17 +121,18 @@ def build_pipeline_schedule(
f"of stages ({num_total_stages}) which may result in a bubble in the pipeline."
)

# validate that the batch size is divisible by the number of microbatches otherwise we'll hang or error during training
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have several questions here:

  1. If pipeline_parallel_batch_split_dim == 0, what would happen if if job_config.training.batch_size % num_total_stages != 0?
  2. If pipeline_parallel_batch_split_dim is on the sequence dim or other dims, don't we need similar checks in the extremal cases e.g. seq_len < num_stages
  3. Btw this divisibility requirement seems not exactly the same as "batch_size >= num stages" you mentioned in the PR summary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants