Skip to content

feat: support cpu and xpu devices in llama4 #326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dvrogozh
Copy link
Contributor

@dvrogozh dvrogozh commented Apr 9, 2025

Chang is similar to the one previously done for Llama3.

CPU support tried on Intel Xeon.

XPU support tried on Intel Data Center GPU Max series. Note that a fix in fairscale for facebookresearch/fairscale#1195 is required to make XPU working (CPU not affected). I believe that this fix might also be needed for CUDA as well since issue seems to be device agnostic and I see it with the simplified reproducer on NVidia A10.

Verified on the platforms named above with LLama4 sample completion and chat completion scripts.

Requires: facebookresearch/fairscale#1196
CC: @ashwinb, @raghotham

CPU support tried on Intel Xeon.

XPU support tried on Intel Data Center GPU Max series. Note that a fix
in fairscale for facebookresearch/fairscale#1195
is required to make XPU working (CPU not affected). I believe that this
fix might also be needed for CUDA as well since issue seems to be device
agnostic and I see it with the simplified reproducer on NVidia A10.

Verified on the platforms named above with LLama4 sample completion
and chat completion scripts.

Requires: facebookresearch/fairscale#1196
Signed-off-by: Dmitry Rogozhkin <[email protected]>
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 9, 2025
@dvrogozh
Copy link
Contributor Author

@ashwinb : can you, please, help review this PR and maybe the one to fairscale (facebookresearch/fairscale#1196)?

Carol25170

This comment was marked as abuse.

@ashwinb
Copy link
Contributor

ashwinb commented Apr 28, 2025

Thank you @dvrogozh especially for fixing the fairscale issue. I had observed that myself and it caused me much pain. I am not sure about the release cadence for fairscale packages, so will figure that out. Do we need that fix to land (in pypi) before this PR can work?

@dvrogozh
Copy link
Contributor Author

<...>the fairscale issue. <...> Do we need that fix to land (in pypi) before this PR can work?

API wise - no. That's fairscale internal issue and we did not update fairscale API.

Functionally wise:

  • We don't need the fix for CPU
  • We need the fix for XPU otherwise there will be runtime error (with latest fairscale version v0.4.13 which I've tried)
  • I suppose we need the fix for CUDA, but I can't verify since I don't have a system with multi-CUDA devices and llama4 don't fit in one. CUDA part of the story somewhat leaves me confused - I guess llama4 was verified on CUDA, at the same time I don't see why fariscale issue would not shown up on CUDA. I wonder, was llama4 verified on some internal/specific/patched version of fairscale where the issue did not exist?

@dvrogozh
Copy link
Contributor Author

dvrogozh commented Jun 2, 2025

@ashwinb : do we have a way forward making a release in fairscale and merging in this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants