Skip to content

Missing torch-xla-gpu-plugin #8876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tengyifei opened this issue Mar 24, 2025 · 11 comments
Open

Missing torch-xla-gpu-plugin #8876

tengyifei opened this issue Mar 24, 2025 · 11 comments
Assignees
Labels

Comments

@tengyifei
Copy link
Collaborator

A user reported the following issue:

we have been trying to use torch-xla nightly builds to get around some of the slowness issues seen in torch-xla 2.5. We found torch-xla nightly builds for GPU under gs://pytorch-xla-releases/wheels/cuda/12.6, however these don’t contain torch-xla-gpu-plugin (this was present for older torch-xla versions e.g. gs://pytorch-xla-releases/wheels/cuda/12.1/torch_xla_cuda_plugin-2.6.0-py3-none-any.whl). Is there any location that contains the cuda plugin nightly builds for torch-xla 2.8.0?

@tengyifei
Copy link
Collaborator Author

@ysiraichi wondering do you know the answer to this? Would it be possible to document this somewhere?

@ysiraichi
Copy link
Collaborator

I don't actually know how the nightly builds are built/stored/released.

@tengyifei
Copy link
Collaborator Author

Gotcha. Do you know conceptually what is a "torch-xla-gpu-plugin"? How's that different from the "torch-xla" package built with CUDA support? I'm very unfamiliar with these unfortunately.

Presumably, our CI tests know to find these things since we have GPU tests in the GitHub Action Checks, I think?

@ysiraichi
Copy link
Collaborator

Do you know conceptually what is a "torch-xla-gpu-plugin"?

As far as I understand, it's a library provided by OpenXLA with the device-specific implementation of PJRT.

How's that different from the "torch-xla" package built with CUDA support?

I don't think there's a difference...

@tengyifei
Copy link
Collaborator Author

Okay, I did some more digging

As far as I understand, it's a library provided by OpenXLA with the device-specific implementation of PJRT.

It turns out this is incorrect. In fact we build another wheel! The project folder of the wheel is https://github.com/pytorch/xla/tree/master/plugins/cuda.

IIUC, using PyTorch/XLA on GPU requires two wheels: torch-xla itself, plus torch-xla-gpu-plugin. Just like using PyTorch/XLA on TPU requires torch-xla and libtpu.

I don't think there's a difference...

There are two separate wheels.

Now the issue is that we have somehow stopped uploading newer torch-xla-gpu-plugin wheels. I've confirmed this internally.

@amjames
Copy link
Collaborator

amjames commented Apr 1, 2025

@ysiraichi recently re-enabled CUDA build jobs in CI. It is possible that the nightly wheel build was missed and simply never turned back on. Is that even a separate job, or do we just have a nightly trigger that adds an upload workflow?

@tengyifei
Copy link
Collaborator Author

Those are good questions. I don't know any of them. If Yukio doesn't have internal access, maybe @zpcore could help check the triggers.

@zpcore
Copy link
Collaborator

zpcore commented Apr 1, 2025

The cuda build nightly trigger is there. We should have nightly-3-10-cuda-12-1 and nightly-3-11-cuda-12-1 available. Others are missing due to build failure.

@ysiraichi
Copy link
Collaborator

It turns out this is incorrect. In fact we build another wheel!

You are right. However, that wheel is only composed of pjrt_c_api_gpu_plugin.so library, which is provided by OpenXLA.

IIUC, using PyTorch/XLA on GPU requires two wheels: torch-xla itself, plus torch-xla-gpu-plugin

I don't think so. I have been using XLA:CUDA without that plugin.

There are two separate wheels.

Yes. If you compile PyTorch/XLA without CUDA support (i.e. XLA_CUDA=0), you do need the separate CUDA plugin wheel. Otherwise, you don't (that's how I've been using it).

@tengyifei
Copy link
Collaborator Author

@ysiraichi I see. But if I go to https://github.com/pytorch/xla/blob/master/README.md?plain=1#L94-L107 which is the installation instruction, does that use PyTorch/XLA build with or without CUDA support? I assume the final stable versions of PyTorch/XLA uploaded to PyPI will not be built with CUDA support, so we will need a plugin, IIUC?

Do we publish any wheel built with XLA_CUDA=1?

@ysiraichi
Copy link
Collaborator

I assume the final stable versions of PyTorch/XLA uploaded to PyPI will not be built with CUDA support, so we will need a plugin, IIUC?

That makes sense to me. But, I don't actually know what gets published in PyPI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants