Skip to content

Lazy Arrays with CUDA.jl (feature request) #214

@jenkspt

Description

@jenkspt

This is a feature request to get LazyArrays to work with CuArrays (and potentially other GPUArray implementations)

Some of the features of LazyArrays works with CuArrays out of the box, which is awesome. For example:

using CUDA
using LazyArrays
using BenchmarkTools

a, b = CUDA.rand(1, 1000), CUDA.rand(1000, 1);

bench1(a, b) = CUDA.@sync sum(@~ a .+ b; dims=1)
@btime bench1(a, b)

bench2(a, b) = CUDA.@sync sum(a .+ b; dims=1)
@btime bench2(a, b)

Getting 125 μs and 156 μs μs respectively on my GTX 1080 Ti
So using LazyArrays.jl is faster using less memory!

However reductions on BroadcastArray (via the LazyArray constructor) reverts to scalar indexing on the GPU

CUDA.allowscalar(false)
sum(LazyArray(@~ a .+ b); dims=1)

ERROR: Scalar indexing is disallowed.

Also:

  • displaying LazyArray of broadcasted CuArrays doesn't work
  • Tried using Adapt.jl on BroadcastArray, ApplyArray, and Base.Broadcast.Broadcasted, but didn't seem to help with the above problems.

I think this would be a really powerful feature -- which would make writing fast and efficient non-trivial gpu functions much easier write.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions