feat: support cpu and xpu devices in llama4 #326

dvrogozh · 2025-04-09T22:47:07Z

Chang is similar to the one previously done for Llama3.

CPU support tried on Intel Xeon.

XPU support tried on Intel Data Center GPU Max series. Note that a fix in fairscale for facebookresearch/fairscale#1195 is required to make XPU working (CPU not affected). I believe that this fix might also be needed for CUDA as well since issue seems to be device agnostic and I see it with the simplified reproducer on NVidia A10.

Verified on the platforms named above with LLama4 sample completion and chat completion scripts.

Requires: facebookresearch/fairscale#1196
CC: @ashwinb, @raghotham

CPU support tried on Intel Xeon. XPU support tried on Intel Data Center GPU Max series. Note that a fix in fairscale for facebookresearch/fairscale#1195 is required to make XPU working (CPU not affected). I believe that this fix might also be needed for CUDA as well since issue seems to be device agnostic and I see it with the simplified reproducer on NVidia A10. Verified on the platforms named above with LLama4 sample completion and chat completion scripts. Requires: facebookresearch/fairscale#1196 Signed-off-by: Dmitry Rogozhkin <[email protected]>

dvrogozh · 2025-04-17T23:26:06Z

@ashwinb : can you, please, help review this PR and maybe the one to fairscale (facebookresearch/fairscale#1196)?

ashwinb · 2025-04-28T16:50:53Z

Thank you @dvrogozh especially for fixing the fairscale issue. I had observed that myself and it caused me much pain. I am not sure about the release cadence for fairscale packages, so will figure that out. Do we need that fix to land (in pypi) before this PR can work?

dvrogozh · 2025-04-28T17:07:15Z

<...>the fairscale issue. <...> Do we need that fix to land (in pypi) before this PR can work?

API wise - no. That's fairscale internal issue and we did not update fairscale API.

Functionally wise:

We don't need the fix for CPU
We need the fix for XPU otherwise there will be runtime error (with latest fairscale version v0.4.13 which I've tried)
I suppose we need the fix for CUDA, but I can't verify since I don't have a system with multi-CUDA devices and llama4 don't fit in one. CUDA part of the story somewhat leaves me confused - I guess llama4 was verified on CUDA, at the same time I don't see why fariscale issue would not shown up on CUDA. I wonder, was llama4 verified on some internal/specific/patched version of fairscale where the issue did not exist?

dvrogozh · 2025-06-02T20:23:38Z

@ashwinb : do we have a way forward making a release in fairscale and merging in this PR?

dvrogozh requested review from ashwinb, yanxi0830, hardikjshah, dltn, raghotham and ehhuang as code owners April 9, 2025 22:47

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 9, 2025

This comment was marked as abuse.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support cpu and xpu devices in llama4 #326

feat: support cpu and xpu devices in llama4 #326

Uh oh!

dvrogozh commented Apr 9, 2025

Uh oh!

dvrogozh commented Apr 17, 2025

Uh oh!

This comment was marked as abuse.

Uh oh!

ashwinb commented Apr 28, 2025

Uh oh!

dvrogozh commented Apr 28, 2025

Uh oh!

dvrogozh commented Jun 2, 2025

Uh oh!

Uh oh!

feat: support cpu and xpu devices in llama4 #326

Are you sure you want to change the base?

feat: support cpu and xpu devices in llama4 #326

Uh oh!

Conversation

dvrogozh commented Apr 9, 2025

Uh oh!

dvrogozh commented Apr 17, 2025

Uh oh!

This comment was marked as abuse.

Uh oh!

ashwinb commented Apr 28, 2025

Uh oh!

dvrogozh commented Apr 28, 2025

Uh oh!

dvrogozh commented Jun 2, 2025

Uh oh!

Uh oh!