-
Notifications
You must be signed in to change notification settings - Fork 30
Open
Description
This is a feature request to get LazyArrays to work with CuArrays (and potentially other GPUArray implementations)
Some of the features of LazyArrays works with CuArrays out of the box, which is awesome. For example:
using CUDA
using LazyArrays
using BenchmarkTools
a, b = CUDA.rand(1, 1000), CUDA.rand(1000, 1);
bench1(a, b) = CUDA.@sync sum(@~ a .+ b; dims=1)
@btime bench1(a, b)
bench2(a, b) = CUDA.@sync sum(a .+ b; dims=1)
@btime bench2(a, b)
Getting 125 μs and 156 μs μs respectively on my GTX 1080 Ti
So using LazyArrays.jl is faster using less memory!
However reductions on BroadcastArray
(via the LazyArray constructor) reverts to scalar indexing on the GPU
CUDA.allowscalar(false)
sum(LazyArray(@~ a .+ b); dims=1)
ERROR: Scalar indexing is disallowed.
Also:
- displaying LazyArray of broadcasted CuArrays doesn't work
- Tried using Adapt.jl on
BroadcastArray
,ApplyArray
, andBase.Broadcast.Broadcasted
, but didn't seem to help with the above problems.
I think this would be a really powerful feature -- which would make writing fast and efficient non-trivial gpu functions much easier write.
Metadata
Metadata
Assignees
Labels
No labels