Extensions to support `copy!` for `CuStridedView` (and friends!) by kshyatt · Pull Request #40 · QuantumKitHub/Strided.jl

kshyatt · 2026-02-25T13:10:38Z

This PR creates some logic to allow StridedViews backed by CuArrays (and, soon, any generic GPU array) to hook into the existing GPUArrays.jl infrastructure for broadcasted copies without having to replicate all the indexing logic or create arrays of indices.

I've created two extensions:

one generic GPUArrays.jl one so that we can reuse logic for CUDA and AMD
one CUDA.jl specific-one which is needed because the KernelAdaptor type for adapting CuArrays to be usable inside kernels is not a subtype of anything in GPUArrays.jl. A similar extension would be needed for AMDGPU.jl.

For now this only supports copy! but I think other uses are already covered (for CUDA.jl at least) by TensorOperations.jl.

kshyatt · 2026-02-25T13:11:25Z

I'll also add a BuildKite pipeline for this

codecov · 2026-02-25T14:34:57Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

github-actions · 2026-02-25T15:42:02Z

Your PR no longer requires formatting changes. Thank you for your contribution!

ext/StridedCUDAExt.jl

kshyatt · 2026-02-25T21:24:41Z

I thought so too but then it seems to work for transpose and adjoint (because the axes are then the same), which surprised me. I agree about restricting the element types. I think I need more extensive tests here tbh.

…

On Wed, Feb 25, 2026 at 10:23 PM Jutho ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In ext/StridedCUDAExt.jl <#40 (comment)> : > @@ -0,0 +1,17 @@ +module StridedCUDAExt + +using Strided, CUDA +using Strided: StridedViews +using CUDA: Adapt, KernelAdaptor +using CUDA: GPUArrays + +const ALL_FS = Union{typeof(adjoint), typeof(conj), typeof(identity), typeof(transpose)} + +function Base.copy!(dst::StridedView{TD, ND, TAD, FD}, src::StridedView{TS, NS, TAS, FS}) where {TD, ND, TAD <: CuArray{TD}, FD <: ALL_FS, TS, NS, TAS <: CuArray{TS}, FS <: ALL_FS} + bc_style = Base.Broadcast.BroadcastStyle(TAS) + bc = Base.Broadcast.Broadcasted(bc_style, identity, (src,), axes(dst)) + GPUArrays._copyto!(dst, bc) This is probably not generally correct, for example if FS and FD are different, say one is identity and the other is conj. For GPUArrays, probably we should only have TD and TN being <:Numbers and FS and FD being only typeof(identity) or typeof(conj) ? — Reply to this email directly, view it on GitHub <#40 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGKJY5COO3ZYNB42UHVN734NYHDRAVCNFSM6AAAAACV7IGOX6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTQNJXGA3TGMBZGE> . You are receiving this because you authored the thread.Message ID: ***@***.***>

kshyatt · 2026-02-26T07:08:43Z

I've added more tests including for size-0 arrays and square ones, things still seem to be working! Maybe I've written the test wrong? Otherwise I think the getindex call is applying the conj as appropriate?

kshyatt · 2026-02-26T07:15:07Z

This probably needs QuantumKitHub/StridedViews.jl#26 and a tag first

Jutho · 2026-02-26T10:40:03Z

I was assuming that you would unpack the parent array from the StridedView, but you are just passing the StridedView objects to GPUArrays._copyto!. In that case, if this ends up calling the StridedViews getindex implementation, it will indeed automatically apply the necessary function like conj. But can this just work within a GPU kernel? I don't know anything about this, nor even thow GPUArrays._copyto! works.

kshyatt · 2026-02-26T10:45:17Z

I was assuming that you would unpack the parent array from the StridedView, but you are just passing the StridedView objects to GPUArrays._copyto!. In that case, if this ends up calling the StridedViews getindex implementation, it will indeed automatically apply the necessary function like conj. But can this just work within a GPU kernel? I don't know anything about this, nor even thow GPUArrays._copyto! works.

So what happens behind the scenes is the backing CuArray (or WLOG ROCArray) gets converted into the "kernel compatible" bits type device array -- but very crucially, this new kernel-friendly array has the same pointer and offset. Then all the indexing logic for a StridedView is used in the copying kernel -- all GPUArrays needs is that you have implemented such logic correctly (as it seems you have).

Jutho · 2026-02-26T10:56:08Z

That's great. Looking at the GPUArrays._copyto! implementation, there might actually be quite a few (if not all) of the Strided.jl functionality that could be made working on a GPU, even if it is not the most efficient implementation possible.

kshyatt · 2026-02-26T10:57:19Z

That's great. Looking at the GPUArrays._copyto! implementation, there might actually be quite a few (if not all) of the Strided.jl functionality that could be made working on a GPU, even if it is not the most efficient implementation possible.

Absolutely. Contra to my usual tendency I didn't add this here to keep this PR more or less "skinny" and reviewable, but would be happy to add more things as we need them.

Jutho · 2026-02-26T15:48:56Z

test/amd.jl

@@ -0,0 +1,18 @@
+for T in (Float32, Float64, Complex{Float32}, Complex{Float64})
+    @testset "Copy with ROCStridedView: $T, $f1, $f2" for f2 in (identity, conj, adjoint, transpose), f1 in (identity, conj, transpose, adjoint)


I don't think transpose and adjoint really make sense for scalar element types.

Ah wait, it is applied to the matrix. Never mind, then it does make sense.

Is there another element type we want to test with? Maybe a StridedView of a BlockArray or something?

kshyatt requested review from Jutho and lkdvos and removed request for Jutho February 25, 2026 13:57

kshyatt force-pushed the ksh/cuda branch from cfe0aa3 to d498a40 Compare February 25, 2026 15:36

kshyatt added 4 commits February 25, 2026 15:30

Initial support for copy! + CUDA

ac2bc59

Working extension

91c61b8

Refactor to use GPUArrays internals

4e9edfa

Update CI and add BK

09d5421

kshyatt force-pushed the ksh/cuda branch from 6f2e783 to bcb35e9 Compare February 25, 2026 20:32

Jutho reviewed Feb 25, 2026

View reviewed changes

ext/StridedCUDAExt.jl Show resolved Hide resolved

kshyatt changed the title ~~Extensions to support copy! for CuStridedView (and friends!)~~ [WIP] Extensions to support copy! for CuStridedView (and friends!) Feb 26, 2026

kshyatt marked this pull request as draft February 26, 2026 06:27

Also trial AMD support

9721b1e

kshyatt force-pushed the ksh/cuda branch from f9ce0fe to 9721b1e Compare February 26, 2026 10:12

Workaround for size zero

3326f5a

kshyatt added 4 commits February 26, 2026 06:00

Update StridedViews dep

c63b870

Fix AMD test

6ec9797

Another AMD fix

d3769f6

Bump StridedViews version

59c3e92

kshyatt requested a review from Jutho February 26, 2026 14:56

kshyatt marked this pull request as ready for review February 26, 2026 14:56

kshyatt changed the title ~~[WIP] Extensions to support copy! for CuStridedView (and friends!)~~ Extensions to support copy! for CuStridedView (and friends!) Feb 26, 2026

Add Number restriction

6814a52

Jutho reviewed Feb 26, 2026

View reviewed changes

Jutho approved these changes Feb 26, 2026

View reviewed changes

kshyatt merged commit b8af140 into QuantumKitHub:main Feb 26, 2026
10 of 13 checks passed

kshyatt deleted the ksh/cuda branch February 26, 2026 15:56

		@@ -0,0 +1,18 @@
		for T in (Float32, Float64, Complex{Float32}, Complex{Float64})
		@testset "Copy with ROCStridedView: $T, $f1, $f2" for f2 in (identity, conj, adjoint, transpose), f1 in (identity, conj, transpose, adjoint)

Conversation

kshyatt commented Feb 25, 2026

Uh oh!

kshyatt commented Feb 25, 2026

Uh oh!

codecov bot commented Feb 25, 2026

Welcome to Codecov 🎉

Uh oh!

github-actions bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kshyatt commented Feb 25, 2026 via email

Uh oh!

kshyatt commented Feb 26, 2026

Uh oh!

kshyatt commented Feb 26, 2026

Uh oh!

Jutho commented Feb 26, 2026

Uh oh!

kshyatt commented Feb 26, 2026

Uh oh!

Jutho commented Feb 26, 2026

Uh oh!

kshyatt commented Feb 26, 2026

Uh oh!

Jutho Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Jutho Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

kshyatt Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Feb 25, 2026 •

edited

Loading