Extensions to support copy! for CuStridedView (and friends!)#40
Extensions to support copy! for CuStridedView (and friends!)#40kshyatt merged 11 commits intoQuantumKitHub:mainfrom
copy! for CuStridedView (and friends!)#40Conversation
|
I'll also add a BuildKite pipeline for this |
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment Thanks for integrating Codecov - We've got you covered ☂️ |
|
Your PR no longer requires formatting changes. Thank you for your contribution! |
|
I thought so too but then it seems to work for transpose and adjoint
(because the axes are then the same), which surprised me. I agree about
restricting the element types. I think I need more extensive tests here tbh.
…On Wed, Feb 25, 2026 at 10:23 PM Jutho ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In ext/StridedCUDAExt.jl
<#40 (comment)>
:
> @@ -0,0 +1,17 @@
+module StridedCUDAExt
+
+using Strided, CUDA
+using Strided: StridedViews
+using CUDA: Adapt, KernelAdaptor
+using CUDA: GPUArrays
+
+const ALL_FS = Union{typeof(adjoint), typeof(conj), typeof(identity), typeof(transpose)}
+
+function Base.copy!(dst::StridedView{TD, ND, TAD, FD}, src::StridedView{TS, NS, TAS, FS}) where {TD, ND, TAD <: CuArray{TD}, FD <: ALL_FS, TS, NS, TAS <: CuArray{TS}, FS <: ALL_FS}
+ bc_style = Base.Broadcast.BroadcastStyle(TAS)
+ bc = Base.Broadcast.Broadcasted(bc_style, identity, (src,), axes(dst))
+ GPUArrays._copyto!(dst, bc)
This is probably not generally correct, for example if FS and FD are
different, say one is identity and the other is conj. For GPUArrays,
probably we should only have TD and TN being <:Numbers and FS and FD
being only typeof(identity) or typeof(conj) ?
—
Reply to this email directly, view it on GitHub
<#40 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGKJY5COO3ZYNB42UHVN734NYHDRAVCNFSM6AAAAACV7IGOX6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTQNJXGA3TGMBZGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
copy! for CuStridedView (and friends!)copy! for CuStridedView (and friends!)
|
I've added more tests including for size-0 arrays and square ones, things still seem to be working! Maybe I've written the test wrong? Otherwise I think the |
|
This probably needs QuantumKitHub/StridedViews.jl#26 and a tag first |
|
I was assuming that you would unpack the parent array from the StridedView, but you are just passing the StridedView objects to GPUArrays._copyto!. In that case, if this ends up calling the StridedViews |
So what happens behind the scenes is the backing |
|
That's great. Looking at the |
Absolutely. Contra to my usual tendency I didn't add this here to keep this PR more or less "skinny" and reviewable, but would be happy to add more things as we need them. |
copy! for CuStridedView (and friends!)copy! for CuStridedView (and friends!)
| @@ -0,0 +1,18 @@ | |||
| for T in (Float32, Float64, Complex{Float32}, Complex{Float64}) | |||
| @testset "Copy with ROCStridedView: $T, $f1, $f2" for f2 in (identity, conj, adjoint, transpose), f1 in (identity, conj, transpose, adjoint) | |||
There was a problem hiding this comment.
I don't think transpose and adjoint really make sense for scalar element types.
There was a problem hiding this comment.
Ah wait, it is applied to the matrix. Never mind, then it does make sense.
There was a problem hiding this comment.
Is there another element type we want to test with? Maybe a StridedView of a BlockArray or something?
This PR creates some logic to allow
StridedViews backed byCuArrays (and, soon, any generic GPU array) to hook into the existingGPUArrays.jlinfrastructure for broadcasted copies without having to replicate all the indexing logic or create arrays of indices.I've created two extensions:
GPUArrays.jlone so that we can reuse logic for CUDA and AMDCUDA.jlspecific-one which is needed because theKernelAdaptortype for adaptingCuArrays to be usable inside kernels is not a subtype of anything inGPUArrays.jl. A similar extension would be needed forAMDGPU.jl.For now this only supports
copy!but I think other uses are already covered (forCUDA.jlat least) byTensorOperations.jl.