Skip to content

Extensions to support copy! for CuStridedView (and friends!)#40

Merged
kshyatt merged 11 commits intoQuantumKitHub:mainfrom
kshyatt:ksh/cuda
Feb 26, 2026
Merged

Extensions to support copy! for CuStridedView (and friends!)#40
kshyatt merged 11 commits intoQuantumKitHub:mainfrom
kshyatt:ksh/cuda

Conversation

@kshyatt
Copy link
Member

@kshyatt kshyatt commented Feb 25, 2026

This PR creates some logic to allow StridedViews backed by CuArrays (and, soon, any generic GPU array) to hook into the existing GPUArrays.jl infrastructure for broadcasted copies without having to replicate all the indexing logic or create arrays of indices.

I've created two extensions:

  • one generic GPUArrays.jl one so that we can reuse logic for CUDA and AMD
  • one CUDA.jl specific-one which is needed because the KernelAdaptor type for adapting CuArrays to be usable inside kernels is not a subtype of anything in GPUArrays.jl. A similar extension would be needed for AMDGPU.jl.

For now this only supports copy! but I think other uses are already covered (for CUDA.jl at least) by TensorOperations.jl.

@kshyatt
Copy link
Member Author

kshyatt commented Feb 25, 2026

I'll also add a BuildKite pipeline for this

@kshyatt kshyatt requested review from Jutho and lkdvos and removed request for Jutho February 25, 2026 13:57
@codecov
Copy link

codecov bot commented Feb 25, 2026

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

@github-actions
Copy link

github-actions bot commented Feb 25, 2026

Your PR no longer requires formatting changes. Thank you for your contribution!

@kshyatt
Copy link
Member Author

kshyatt commented Feb 25, 2026 via email

@kshyatt kshyatt changed the title Extensions to support copy! for CuStridedView (and friends!) [WIP] Extensions to support copy! for CuStridedView (and friends!) Feb 26, 2026
@kshyatt kshyatt marked this pull request as draft February 26, 2026 06:27
@kshyatt
Copy link
Member Author

kshyatt commented Feb 26, 2026

I've added more tests including for size-0 arrays and square ones, things still seem to be working! Maybe I've written the test wrong? Otherwise I think the getindex call is applying the conj as appropriate?

@kshyatt
Copy link
Member Author

kshyatt commented Feb 26, 2026

This probably needs QuantumKitHub/StridedViews.jl#26 and a tag first

@Jutho
Copy link
Member

Jutho commented Feb 26, 2026

I was assuming that you would unpack the parent array from the StridedView, but you are just passing the StridedView objects to GPUArrays._copyto!. In that case, if this ends up calling the StridedViews getindex implementation, it will indeed automatically apply the necessary function like conj. But can this just work within a GPU kernel? I don't know anything about this, nor even thow GPUArrays._copyto! works.

@kshyatt
Copy link
Member Author

kshyatt commented Feb 26, 2026

I was assuming that you would unpack the parent array from the StridedView, but you are just passing the StridedView objects to GPUArrays._copyto!. In that case, if this ends up calling the StridedViews getindex implementation, it will indeed automatically apply the necessary function like conj. But can this just work within a GPU kernel? I don't know anything about this, nor even thow GPUArrays._copyto! works.

So what happens behind the scenes is the backing CuArray (or WLOG ROCArray) gets converted into the "kernel compatible" bits type device array -- but very crucially, this new kernel-friendly array has the same pointer and offset. Then all the indexing logic for a StridedView is used in the copying kernel -- all GPUArrays needs is that you have implemented such logic correctly (as it seems you have).

@Jutho
Copy link
Member

Jutho commented Feb 26, 2026

That's great. Looking at the GPUArrays._copyto! implementation, there might actually be quite a few (if not all) of the Strided.jl functionality that could be made working on a GPU, even if it is not the most efficient implementation possible.

@kshyatt
Copy link
Member Author

kshyatt commented Feb 26, 2026

That's great. Looking at the GPUArrays._copyto! implementation, there might actually be quite a few (if not all) of the Strided.jl functionality that could be made working on a GPU, even if it is not the most efficient implementation possible.

Absolutely. Contra to my usual tendency I didn't add this here to keep this PR more or less "skinny" and reviewable, but would be happy to add more things as we need them.

@kshyatt kshyatt requested a review from Jutho February 26, 2026 14:56
@kshyatt kshyatt marked this pull request as ready for review February 26, 2026 14:56
@kshyatt kshyatt changed the title [WIP] Extensions to support copy! for CuStridedView (and friends!) Extensions to support copy! for CuStridedView (and friends!) Feb 26, 2026
@@ -0,0 +1,18 @@
for T in (Float32, Float64, Complex{Float32}, Complex{Float64})
@testset "Copy with ROCStridedView: $T, $f1, $f2" for f2 in (identity, conj, adjoint, transpose), f1 in (identity, conj, transpose, adjoint)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think transpose and adjoint really make sense for scalar element types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah wait, it is applied to the matrix. Never mind, then it does make sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there another element type we want to test with? Maybe a StridedView of a BlockArray or something?

@kshyatt kshyatt merged commit b8af140 into QuantumKitHub:main Feb 26, 2026
10 of 13 checks passed
@kshyatt kshyatt deleted the ksh/cuda branch February 26, 2026 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants