scalar indexing error when using block_gmres on GPU #913

tobbls · 2024-10-26T20:11:27Z

Hi!
I am trying to setup a minimal example using the block_gmres(A, B) solver on a GPU. This should be possible with Julia 1.11 if I understand the documentation correctly. However, I get the error: LoadError: Scalar indexing is disallowed. I am able to run the bilq(A,b) solver without any problems. I am using Julia Version 1.11.1 and v0.9.7 of the Krylov package on a system with multiple NVIDIA GPUs.
This is the code producing the error:

using Krylov, CUDA

#Setup matrices:
n = 100
A = CUDA.rand(Float32, n, n)
B = CUDA.rand(Float32, n, n)

#does NOT work:
X, stats = block_gmres(A, B)

This is the full error message:

ERROR: LoadError: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.

If you want to allow scalar iteration, use `allowscalar` or `@allowscalar`
to enable scalar iteration globally or for the operations in question.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] errorscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:155
  [3] _assertscalar(op::String, behavior::GPUArraysCore.ScalarIndexing)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:128
  [4] assertscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:116
  [5] getindex
    @ ~/.julia/packages/GPUArrays/qt4ax/src/host/indexing.jl:50 [inlined]
  [6] scalar_getindex
    @ ~/.julia/packages/GPUArrays/qt4ax/src/host/indexing.jl:36 [inlined]
  [7] _getindex
    @ ~/.julia/packages/GPUArrays/qt4ax/src/host/indexing.jl:19 [inlined]
  [8] getindex
    @ ~/.julia/packages/GPUArrays/qt4ax/src/host/indexing.jl:17 [inlined]
  [9] getindex
    @ ./subarray.jl:320 [inlined]
 [10] copytrito!(B::CuArray{Float32, 2, CUDA.DeviceMemory}, A::SubArray{Float32, 2, CuArray{Float32, 2, CUDA.DeviceMemory}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, uplo::Char)
    @ LinearAlgebra ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/generic.jl:1945
 [11] copy_triangle
    @ ~/.julia/packages/Krylov/M9Tib/src/block_krylov_utils.jl:172 [inlined]
 [12] householder!(Q::CuArray{Float32, 2, CUDA.DeviceMemory}, R::CuArray{Float32, 2, CUDA.DeviceMemory}, τ::CuArray{Float32, 1, CUDA.DeviceMemory}; compact::Bool)
    @ Krylov ~/.julia/packages/Krylov/M9Tib/src/block_krylov_utils.jl:199
 [13] householder!
    @ ~/.julia/packages/Krylov/M9Tib/src/block_krylov_utils.jl:195 [inlined]
 [14] block_gmres!(solver::BlockGmresSolver{Float32, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 2, CUDA.DeviceMemory}}, A::CuArray{Float32, 2, CUDA.DeviceMemory}, B::CuArray{Float32, 2, CUDA.DeviceMemory}; M::LinearAlgebra.UniformScaling{Bool}, N::LinearAlgebra.UniformScaling{Bool}, ldiv::Bool, restart::Bool, reorthogonalization::Bool, atol::Float32, rtol::Float32, itmax::Int64, timemax::Float64, verbose::Int64, history::Bool, callback::Krylov.var"#1104#1118", iostream::Core.CoreSTDOUT)
    @ Krylov ~/.julia/packages/Krylov/M9Tib/src/block_gmres.jl:254
 [15] block_gmres!
    @ ~/.julia/packages/Krylov/M9Tib/src/block_gmres.jl:94 [inlined]
 [16] #block_gmres#1103
    @ ~/.julia/packages/Krylov/M9Tib/src/krylov_solve.jl:169 [inlined]
 [17] block_gmres(A::CuArray{Float32, 2, CUDA.DeviceMemory}, B::CuArray{Float32, 2, CUDA.DeviceMemory})
    @ Krylov ~/.julia/packages/Krylov/M9Tib/src/krylov_solve.jl:164
 [18] top-level scope
    @ ~/github/clusterexperiments/src/gpu_minimal_test.jl:9

I assume that allowing scalar iteration is not advisable here. Am I missing anything obvious, e.g. is it still necessary to overload the copy_triangle() function like described here: https://jso.dev/Krylov.jl/stable/block_krylov/ or is it a code issue?
Thank you very much already!

The text was updated successfully, but these errors were encountered:

amontoison · 2024-10-26T22:28:02Z

Hi @tobbls! Thanks for reporting the issue.
I fixed the problem in #914 and tagged a new release, 0.9.8.
Please try it and let me know if it works on your end.

The culprit was a view that led to an incorrect dispatch of copytrito!.

tobbls · 2024-10-27T08:34:43Z

Wow that was fast, thank you @amontoison !
That solves the issue and now the code above works on the GPU for me.
Thanks!

tobbls · 2024-10-27T08:37:05Z

I'm a bit new to this.. should I close the issue?

amontoison · 2024-10-27T13:31:43Z

You can leave it open. I need to finalize a PR related to copytrito! in GPUArrays.jl and this issue will help me to not forget:
JuliaGPU/GPUArrays.jl#538

amontoison mentioned this issue Oct 26, 2024

Fix block_gmres on GPUs #914

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scalar indexing error when using block_gmres on GPU #913

scalar indexing error when using block_gmres on GPU #913

tobbls commented Oct 26, 2024

amontoison commented Oct 26, 2024

tobbls commented Oct 27, 2024

tobbls commented Oct 27, 2024

amontoison commented Oct 27, 2024

scalar indexing error when using block_gmres on GPU #913

scalar indexing error when using block_gmres on GPU #913

Comments

tobbls commented Oct 26, 2024

amontoison commented Oct 26, 2024

tobbls commented Oct 27, 2024

tobbls commented Oct 27, 2024

amontoison commented Oct 27, 2024