Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scalar indexing error when using block_gmres on GPU #913

Open
tobbls opened this issue Oct 26, 2024 · 4 comments
Open

scalar indexing error when using block_gmres on GPU #913

tobbls opened this issue Oct 26, 2024 · 4 comments

Comments

@tobbls
Copy link

tobbls commented Oct 26, 2024

Hi!
I am trying to setup a minimal example using the block_gmres(A, B) solver on a GPU. This should be possible with Julia 1.11 if I understand the documentation correctly. However, I get the error: LoadError: Scalar indexing is disallowed. I am able to run the bilq(A,b) solver without any problems. I am using Julia Version 1.11.1 and v0.9.7 of the Krylov package on a system with multiple NVIDIA GPUs.
This is the code producing the error:

using Krylov, CUDA

#Setup matrices:
n = 100
A = CUDA.rand(Float32, n, n)
B = CUDA.rand(Float32, n, n)

#does NOT work:
X, stats = block_gmres(A, B)

This is the full error message:

ERROR: LoadError: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.

If you want to allow scalar iteration, use `allowscalar` or `@allowscalar`
to enable scalar iteration globally or for the operations in question.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] errorscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:155
  [3] _assertscalar(op::String, behavior::GPUArraysCore.ScalarIndexing)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:128
  [4] assertscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:116
  [5] getindex
    @ ~/.julia/packages/GPUArrays/qt4ax/src/host/indexing.jl:50 [inlined]
  [6] scalar_getindex
    @ ~/.julia/packages/GPUArrays/qt4ax/src/host/indexing.jl:36 [inlined]
  [7] _getindex
    @ ~/.julia/packages/GPUArrays/qt4ax/src/host/indexing.jl:19 [inlined]
  [8] getindex
    @ ~/.julia/packages/GPUArrays/qt4ax/src/host/indexing.jl:17 [inlined]
  [9] getindex
    @ ./subarray.jl:320 [inlined]
 [10] copytrito!(B::CuArray{Float32, 2, CUDA.DeviceMemory}, A::SubArray{Float32, 2, CuArray{Float32, 2, CUDA.DeviceMemory}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, uplo::Char)
    @ LinearAlgebra ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/generic.jl:1945
 [11] copy_triangle
    @ ~/.julia/packages/Krylov/M9Tib/src/block_krylov_utils.jl:172 [inlined]
 [12] householder!(Q::CuArray{Float32, 2, CUDA.DeviceMemory}, R::CuArray{Float32, 2, CUDA.DeviceMemory}, τ::CuArray{Float32, 1, CUDA.DeviceMemory}; compact::Bool)
    @ Krylov ~/.julia/packages/Krylov/M9Tib/src/block_krylov_utils.jl:199
 [13] householder!
    @ ~/.julia/packages/Krylov/M9Tib/src/block_krylov_utils.jl:195 [inlined]
 [14] block_gmres!(solver::BlockGmresSolver{Float32, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 2, CUDA.DeviceMemory}}, A::CuArray{Float32, 2, CUDA.DeviceMemory}, B::CuArray{Float32, 2, CUDA.DeviceMemory}; M::LinearAlgebra.UniformScaling{Bool}, N::LinearAlgebra.UniformScaling{Bool}, ldiv::Bool, restart::Bool, reorthogonalization::Bool, atol::Float32, rtol::Float32, itmax::Int64, timemax::Float64, verbose::Int64, history::Bool, callback::Krylov.var"#1104#1118", iostream::Core.CoreSTDOUT)
    @ Krylov ~/.julia/packages/Krylov/M9Tib/src/block_gmres.jl:254
 [15] block_gmres!
    @ ~/.julia/packages/Krylov/M9Tib/src/block_gmres.jl:94 [inlined]
 [16] #block_gmres#1103
    @ ~/.julia/packages/Krylov/M9Tib/src/krylov_solve.jl:169 [inlined]
 [17] block_gmres(A::CuArray{Float32, 2, CUDA.DeviceMemory}, B::CuArray{Float32, 2, CUDA.DeviceMemory})
    @ Krylov ~/.julia/packages/Krylov/M9Tib/src/krylov_solve.jl:164
 [18] top-level scope
    @ ~/github/clusterexperiments/src/gpu_minimal_test.jl:9

I assume that allowing scalar iteration is not advisable here. Am I missing anything obvious, e.g. is it still necessary to overload the copy_triangle() function like described here: https://jso.dev/Krylov.jl/stable/block_krylov/ or is it a code issue?
Thank you very much already!

@amontoison
Copy link
Member

Hi @tobbls! Thanks for reporting the issue.
I fixed the problem in #914 and tagged a new release, 0.9.8.
Please try it and let me know if it works on your end.

The culprit was a view that led to an incorrect dispatch of copytrito!.

@tobbls
Copy link
Author

tobbls commented Oct 27, 2024

Wow that was fast, thank you @amontoison !
That solves the issue and now the code above works on the GPU for me.
Thanks!

@tobbls
Copy link
Author

tobbls commented Oct 27, 2024

I'm a bit new to this.. should I close the issue?

@amontoison
Copy link
Member

You can leave it open. I need to finalize a PR related to copytrito! in GPUArrays.jl and this issue will help me to not forget:
JuliaGPU/GPUArrays.jl#538

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants