Missing torch-xla-gpu-plugin #8876

tengyifei · 2025-03-24T18:03:36Z

A user reported the following issue:

we have been trying to use torch-xla nightly builds to get around some of the slowness issues seen in torch-xla 2.5. We found torch-xla nightly builds for GPU under gs://pytorch-xla-releases/wheels/cuda/12.6, however these don’t contain torch-xla-gpu-plugin (this was present for older torch-xla versions e.g. gs://pytorch-xla-releases/wheels/cuda/12.1/torch_xla_cuda_plugin-2.6.0-py3-none-any.whl). Is there any location that contains the cuda plugin nightly builds for torch-xla 2.8.0?

The text was updated successfully, but these errors were encountered:

tengyifei · 2025-03-24T18:04:16Z

@ysiraichi wondering do you know the answer to this? Would it be possible to document this somewhere?

ysiraichi · 2025-03-24T18:24:19Z

I don't actually know how the nightly builds are built/stored/released.

tengyifei · 2025-03-24T18:30:44Z

Gotcha. Do you know conceptually what is a "torch-xla-gpu-plugin"? How's that different from the "torch-xla" package built with CUDA support? I'm very unfamiliar with these unfortunately.

Presumably, our CI tests know to find these things since we have GPU tests in the GitHub Action Checks, I think?

ysiraichi · 2025-03-24T19:44:54Z

Do you know conceptually what is a "torch-xla-gpu-plugin"?

As far as I understand, it's a library provided by OpenXLA with the device-specific implementation of PJRT.

How's that different from the "torch-xla" package built with CUDA support?

I don't think there's a difference...

tengyifei · 2025-04-01T20:24:29Z

Okay, I did some more digging

As far as I understand, it's a library provided by OpenXLA with the device-specific implementation of PJRT.

It turns out this is incorrect. In fact we build another wheel! The project folder of the wheel is https://github.com/pytorch/xla/tree/master/plugins/cuda.

IIUC, using PyTorch/XLA on GPU requires two wheels: torch-xla itself, plus torch-xla-gpu-plugin. Just like using PyTorch/XLA on TPU requires torch-xla and libtpu.

I don't think there's a difference...

There are two separate wheels.

Now the issue is that we have somehow stopped uploading newer torch-xla-gpu-plugin wheels. I've confirmed this internally.

amjames · 2025-04-01T20:29:20Z

@ysiraichi recently re-enabled CUDA build jobs in CI. It is possible that the nightly wheel build was missed and simply never turned back on. Is that even a separate job, or do we just have a nightly trigger that adds an upload workflow?

tengyifei · 2025-04-01T20:31:27Z

Those are good questions. I don't know any of them. If Yukio doesn't have internal access, maybe @zpcore could help check the triggers.

zpcore · 2025-04-01T20:35:18Z

The cuda build nightly trigger is there. We should have nightly-3-10-cuda-12-1 and nightly-3-11-cuda-12-1 available. Others are missing due to build failure.

ysiraichi · 2025-04-01T20:38:50Z

It turns out this is incorrect. In fact we build another wheel!

You are right. However, that wheel is only composed of pjrt_c_api_gpu_plugin.so library, which is provided by OpenXLA.

IIUC, using PyTorch/XLA on GPU requires two wheels: torch-xla itself, plus torch-xla-gpu-plugin

I don't think so. I have been using XLA:CUDA without that plugin.

There are two separate wheels.

Yes. If you compile PyTorch/XLA without CUDA support (i.e. XLA_CUDA=0), you do need the separate CUDA plugin wheel. Otherwise, you don't (that's how I've been using it).

tengyifei · 2025-04-01T21:36:18Z

@ysiraichi I see. But if I go to https://github.com/pytorch/xla/blob/master/README.md?plain=1#L94-L107 which is the installation instruction, does that use PyTorch/XLA build with or without CUDA support? I assume the final stable versions of PyTorch/XLA uploaded to PyPI will not be built with CUDA support, so we will need a plugin, IIUC?

Do we publish any wheel built with XLA_CUDA=1?

ysiraichi · 2025-04-02T14:25:21Z

I assume the final stable versions of PyTorch/XLA uploaded to PyPI will not be built with CUDA support, so we will need a plugin, IIUC?

That makes sense to me. But, I don't actually know what gets published in PyPI.

tengyifei assigned ysiraichi Mar 24, 2025

ysiraichi added the xla:gpu label Mar 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing torch-xla-gpu-plugin #8876

Missing torch-xla-gpu-plugin #8876

tengyifei commented Mar 24, 2025

tengyifei commented Mar 24, 2025

ysiraichi commented Mar 24, 2025

tengyifei commented Mar 24, 2025

ysiraichi commented Mar 24, 2025

tengyifei commented Apr 1, 2025

amjames commented Apr 1, 2025

tengyifei commented Apr 1, 2025

zpcore commented Apr 1, 2025

ysiraichi commented Apr 1, 2025

tengyifei commented Apr 1, 2025

ysiraichi commented Apr 2, 2025

Missing torch-xla-gpu-plugin #8876

Missing torch-xla-gpu-plugin #8876

Comments

tengyifei commented Mar 24, 2025

tengyifei commented Mar 24, 2025

ysiraichi commented Mar 24, 2025

tengyifei commented Mar 24, 2025

ysiraichi commented Mar 24, 2025

tengyifei commented Apr 1, 2025

amjames commented Apr 1, 2025

tengyifei commented Apr 1, 2025

zpcore commented Apr 1, 2025

ysiraichi commented Apr 1, 2025

tengyifei commented Apr 1, 2025

ysiraichi commented Apr 2, 2025