-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy Arrays with CUDA.jl (feature request) #214
Comments
I don’t have easy access to an Nvidia chip but would love to accept a PR. I don’t quite see what you are proposing for sum. Do you want sum(a .+b ) to lower to sum(a) + sum(b)? That’s pretty easy to add |
Sorry that's a bad example. More generally i'm interested in reductions and accumulation over broadcasted functions i.e. using CUDA
a, b = CUDA.rand(1, 10), CUDA.rand(8, 1);
sum(@~ a .+ b; dims=1) # works
a, b = Array(a), Array(b)
sum(@~ a .+ b; dims=1) # doesn't work and the opposite is true for a, b = CUDA.rand(1, 10), CUDA.rand(8, 1);
sum(LazyArray(@~ a .+ b; dims=1)) # uses scalar indexing
a, b = Array(a), Array(b)
sum(LazyArray(@~ a .+ b; dims=1)) # works With all that said -- I think my specific ask is to get |
I think you meant sum(LazyArray(@~ a .+ b); dims=1) I'm a bit confused what you want to happen. We have reduce(*, a .+ b) == (a[1] + b[1]) * … * (a[end] + b[end]) But what do you want to do when |
The LazyArray defers the evaluation to the point of access in the reduce (e.g. sum) or Warning: Performing scalar indexing on task Task (runnable) @0x000000000a2107d0.
│ Invocation of getindex resulted in scalar indexing of a GPU array.
│ This is typically caused by calling an iterating implementation of a method.
│ Such implementations *do not* execute on the GPU, but very slowly on the CPU,
│ and therefore are only permitted from the REPL for prototyping purposes.
│ If you did intend to index this array, annotate the caller with @allowscalar.
└ @ GPUArraysCore C:\Users\pi96doc\.julia\packages\GPUArraysCore\ZBmfM\src\GPUArraysCore.jl:90 Can the execution of the broadcast somehow be deferred to the standard broadcasting mechanism? If so, I would expect the package to also work with |
... digging into the code, it seems that the issue with the loop is a minor display issue. However, when calling |
This is a feature request to get LazyArrays to work with CuArrays (and potentially other GPUArray implementations)
Some of the features of LazyArrays works with CuArrays out of the box, which is awesome. For example:
Getting 125 μs and 156 μs μs respectively on my GTX 1080 Ti
So using LazyArrays.jl is faster using less memory!
However reductions on
BroadcastArray
(via the LazyArray constructor) reverts to scalar indexing on the GPUERROR: Scalar indexing is disallowed.
Also:
BroadcastArray
,ApplyArray
, andBase.Broadcast.Broadcasted
, but didn't seem to help with the above problems.I think this would be a really powerful feature -- which would make writing fast and efficient non-trivial gpu functions much easier write.
The text was updated successfully, but these errors were encountered: