Lazy Arrays with CUDA.jl (feature request)

This is a feature request to get LazyArrays to work with CuArrays (and potentially other GPUArray implementations)

Some of the features of LazyArrays works with CuArrays out of the box, which is awesome. For example:
```julia
using CUDA
using LazyArrays
using BenchmarkTools

a, b = CUDA.rand(1, 1000), CUDA.rand(1000, 1);

bench1(a, b) = CUDA.@sync sum(@~ a .+ b; dims=1)
@btime bench1(a, b)

bench2(a, b) = CUDA.@sync sum(a .+ b; dims=1)
@btime bench2(a, b)
```

Getting 125 μs and 156 μs μs respectively on my GTX 1080 Ti
So using LazyArrays.jl is faster using less memory!

However reductions on `BroadcastArray` (via the LazyArray constructor) reverts to scalar indexing on the GPU
```julia
CUDA.allowscalar(false)
sum(LazyArray(@~ a .+ b); dims=1)
```
`ERROR: Scalar indexing is disallowed.`

Also:
* displaying LazyArray of broadcasted CuArrays doesn't work
* Tried using Adapt.jl on `BroadcastArray`, `ApplyArray`, and `Base.Broadcast.Broadcasted`, but didn't seem to help with the above problems.

I think this would be a really powerful feature -- which would make writing fast and efficient non-trivial gpu functions much easier write.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lazy Arrays with CUDA.jl (feature request) #214

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Lazy Arrays with CUDA.jl (feature request) #214

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions