Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NDTensorsMetalExt] Fix more scalar indexing issues on Metal #1264

Merged
merged 80 commits into from
Dec 1, 2023
Merged
Show file tree
Hide file tree
Changes from 73 commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
ab39bed
Same issue on Metal and CUDA
kmp5VT Nov 16, 2023
5b63ddd
Replace _gemm! with mul!!
kmp5VT Nov 17, 2023
478131f
Add comment
kmp5VT Nov 17, 2023
1324868
Fix some issues with permutedims in Metal
kmp5VT Nov 17, 2023
526bf4a
Change the default_eltype of NDTensors to Float32 if using Mtl
kmp5VT Nov 17, 2023
4b9be1d
Blocksparse shouldn't use default of Float64. Use NDTensors.default_e…
kmp5VT Nov 17, 2023
5e34f9d
Make sure to revert to original Float64
kmp5VT Nov 17, 2023
3e93f9b
If in Float32 need to adjust atol. Also turn off and on Float32 appro…
kmp5VT Nov 17, 2023
8f9f46c
There is no check when multiplying numbers this might lead to an issu…
kmp5VT Nov 17, 2023
911542d
Format
kmp5VT Nov 17, 2023
fdbdef6
Merge branch 'main' into kmp5/debug/scalar_metal
kmp5VT Nov 17, 2023
cb08934
Merge branch 'main' into kmp5/debug/scalar_metal
kmp5VT Nov 17, 2023
0a2f1ca
Merge branch 'main' into kmp5/debug/scalar_metal
kmp5VT Nov 20, 2023
6780956
Create NDTensorsTestUtils
kmp5VT Nov 20, 2023
1ab3418
Move functions to NDTensorsTestUtils
kmp5VT Nov 20, 2023
2ff61ff
Use NDTensorsTestUtils
kmp5VT Nov 20, 2023
09e95cf
Remove global modification of default_eltype
kmp5VT Nov 20, 2023
cb2bd51
format
kmp5VT Nov 20, 2023
ed10f71
Rename test files test_FILENAME.jl
kmp5VT Nov 20, 2023
c61bc5b
Fix spelling mistake
kmp5VT Nov 21, 2023
6530969
Update test files to create random modules and run in the modules. Al…
kmp5VT Nov 21, 2023
a792341
format
kmp5VT Nov 21, 2023
bb424d7
Create a couple necessary BlockSparseTensor constructors
kmp5VT Nov 21, 2023
b393700
fix dmrg tests
kmp5VT Nov 21, 2023
49b0f7b
Fix metal tests
kmp5VT Nov 21, 2023
c50f431
format
kmp5VT Nov 21, 2023
06a9e47
This is fixed in this pr
kmp5VT Nov 21, 2023
d39f765
fix elt issue in combiner
kmp5VT Nov 21, 2023
cb652b7
format
kmp5VT Nov 21, 2023
c4e7332
use tuple when different types
kmp5VT Nov 21, 2023
dc8b13e
Fix typo
kmp5VT Nov 21, 2023
ef3b4d5
Alphabetize and consistently use LinearAlgebra.
kmp5VT Nov 21, 2023
b22c430
Remove commented code
kmp5VT Nov 21, 2023
e6baedc
Remove comment
kmp5VT Nov 21, 2023
1473da7
remove comma
kmp5VT Nov 21, 2023
40be5aa
Split into two lines
kmp5VT Nov 21, 2023
fc7561f
Need to have module `NDTensors` in path
kmp5VT Nov 22, 2023
25f6d04
Reorder for documentation
kmp5VT Nov 22, 2023
0fbe7b8
Update dense test code
kmp5VT Nov 22, 2023
488aefd
Add better comment
kmp5VT Nov 22, 2023
e7ed450
Fix remaining issues
kmp5VT Nov 22, 2023
87c50e3
format
kmp5VT Nov 22, 2023
4880655
Metal testing now fully functional remove it from extra
kmp5VT Nov 22, 2023
87eca37
Merge remote-tracking branch 'origin/main' into kmp5/debug/scalar_metal
kmp5VT Nov 26, 2023
16b9c74
Move metal back to extras because it cannot be installed properly for…
kmp5VT Nov 26, 2023
7e4839a
use is_supported_eltype function
kmp5VT Nov 26, 2023
2ef4f16
Update NDTensorsCUDAExt example
kmp5VT Nov 27, 2023
e1748be
Remove LinearAlgebra.
kmp5VT Nov 28, 2023
37c0ced
Changes to NDTensorsCUDA example
kmp5VT Nov 28, 2023
663b56e
Remove NDTensors.
kmp5VT Nov 28, 2023
8de246b
Update NDTensors/test/runtests.jl [no-ci]
kmp5VT Nov 28, 2023
e6a68e5
Just bring name into namespace
kmp5VT Nov 28, 2023
edcb6c8
Just use reshape
kmp5VT Nov 28, 2023
246ecd4
Updates to NDTensorsMetal example
kmp5VT Nov 29, 2023
ab237aa
Remove unwrap_type
kmp5VT Nov 29, 2023
0abeb5a
format
kmp5VT Nov 29, 2023
4700b39
calling permutedims(expose(A) ...) can cause a stack overflow.
kmp5VT Nov 29, 2023
ac4f9c1
format
kmp5VT Nov 29, 2023
771403b
Update mul.jl [no-ci]
kmp5VT Nov 29, 2023
c2b39a1
Update mul.jl [no-ci]
kmp5VT Nov 29, 2023
919b041
Update documentation
kmp5VT Nov 29, 2023
d4621f5
Add permutedims for CuArray failing
kmp5VT Nov 29, 2023
98719ed
Merge branch 'kmp5/debug/scalar_metal' of github.com:kmp5VT/ITensors.…
kmp5VT Nov 29, 2023
da62a14
format
kmp5VT Nov 29, 2023
c03accd
Merge branch 'main' into kmp5/debug/scalar_metal
kmp5VT Nov 30, 2023
cda274f
simplifications to NDTensorsCUDA example
kmp5VT Nov 30, 2023
5f29bf4
Allowscalar shouldn't be necessary
kmp5VT Nov 30, 2023
7ce440b
Simplifications to NDTensorsMetal tests
kmp5VT Nov 30, 2023
6dd95d4
spelling
kmp5VT Nov 30, 2023
2498930
formatting
kmp5VT Nov 30, 2023
61562ef
Merge branch 'main' into kmp5/debug/scalar_metal
kmp5VT Nov 30, 2023
8a5a3d7
format
kmp5VT Nov 30, 2023
b56f8a4
grab NDTensors.cpu
kmp5VT Nov 30, 2023
66045ae
Simplify code
kmp5VT Nov 30, 2023
a2d569d
add random module to files its missing
kmp5VT Nov 30, 2023
d85ac43
Move readwrite to `NDTensors/tes/broken`
kmp5VT Dec 1, 2023
ce0caba
Remove @eval module...
kmp5VT Dec 1, 2023
ffbb01c
remove extra end
kmp5VT Dec 1, 2023
bc7c400
Use TupleTools.sort
kmp5VT Dec 1, 2023
3eadb7b
use `TupleTools.sort`
kmp5VT Dec 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions NDTensors/ext/NDTensorsCUDAExt/permutedims.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,16 @@ function Base.permutedims!(
copyto!(expose(parent(Edest)), expose(Aperm))
return unexpose(Edest)
end

## Found an issue in CUDA where if Edest is a reshaped{<:Adjoint}
## .= can fail. So instead force Esrc into the shape of parent(Edest)
function Base.permutedims!(
Edest::Exposed{<:CuArray,<:Base.ReshapedArray{<:Any,<:Any,<:Adjoint}},
Esrc::Exposed{<:CuArray},
perm,
f,
)
Aperm = reshape(permutedims(Esrc, perm), size(parent(Edest)))
parent(Edest) .= f.(parent(Edest), Aperm)
return unexpose(Edest)
end
19 changes: 19 additions & 0 deletions NDTensors/ext/NDTensorsMetalExt/mul.jl
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,22 @@ function LinearAlgebra.mul!(
mul!(CM', BM', AM', α, β)
return unexpose(CM)
end

## Fix issue in Metal.jl where it cannot distinguish Transpose{Reshape{Adjoint{MtlArray}}}
## as a MtlArray and calls generic matmul
function LinearAlgebra.mul!(
CM::Exposed{<:MtlArray},
AM::Exposed{<:MtlArray},
BM::Exposed{
<:MtlArray,
<:LinearAlgebra.Transpose{
<:Any,<:Base.ReshapedArray{<:Any,<:Any,<:LinearAlgebra.Adjoint}
},
},
α,
β,
)
B = copy(expose(parent(BM)))
mul!(CM, AM, expose(transpose(B)), α, β)
return unexpose(CM)
end
30 changes: 30 additions & 0 deletions NDTensors/ext/NDTensorsMetalExt/permutedims.jl
Original file line number Diff line number Diff line change
@@ -1,7 +1,37 @@
## Theres an issue in metal that `ReshapedArray' wrapped arrays cannot be permuted using
## permutedims (failing in that Metal uses scalar indexing)
## These functions are to address the problem in different instances of permutedims
function Base.permutedims(E::Exposed{<:MtlArray,<:Base.ReshapedArray}, perm)
A = copy(E)
return permutedims(A, perm)
end

function Base.permutedims!(
Edest::Exposed{<:MtlArray,<:Base.ReshapedArray}, Esrc::Exposed{<:MtlArray}, perm
)
Aperm = permutedims(Esrc, perm)
copyto!(expose(parent(Edest)), expose(Aperm))
return unexpose(Edest)
end

function Base.permutedims!(
Edest::Exposed{<:MtlArray}, Esrc::Exposed{<:MtlArray,<:Base.ReshapedArray}, perm
)
Aperm = permutedims(Esrc, perm)
copyto!(Edest, expose(Aperm))
return unexpose(Edest)
end

## To get around the Metal issue here we copy and permute Esrc,
## then we reshape Esrc to the size of Edest's parent
## and broadcast into the parent.
function Base.permutedims!(
Edest::Exposed{<:MtlArray,<:Base.ReshapedArray},
Esrc::Exposed{<:MtlArray,<:Base.ReshapedArray},
perm,
f,
)
Aperm = reshape(permutedims(Esrc, perm), size(parent(Edest)))
parent(Edest) .= f.(parent(Edest), Aperm)
return unexpose(Edest)
end
60 changes: 27 additions & 33 deletions NDTensors/ext/examples/NDTensorCUDA.jl
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
using CUDA
using NDTensors

using ITensors
using Test

using Zygote
using CUDA: CUDA, CuVector, cu, reshape
using ITensors:
Index, ITensor, randomMPO, randomMPS, inner, orthogonalize, qr, siteinds, svd
using Test: @test
using Zygote: gradient

function main()
# using ITensorGPU
cpu = NDTensors.cpu
gpu = NDTensors.cu
# Here is an example of how to utilize NDTensors based tensors with CUDA datatypes
i = Index(2)
j = Index(5)
Expand All @@ -18,10 +19,9 @@ function main()
dim2 = (j, k)

# Create 2 ITensors with CUDA backends (These will be made simpiler by randomITensor(CuVector) soon)
A = ITensor(NDTensors.generic_randn(CuVector, dim(dim1)), dim1)
B = ITensor(NDTensors.generic_randn(CuVector, dim(dim2)), dim2)
A = ITensor(randomTensor(CuVector, dim1))
B = ITensor(randomTensor(CuVector, dim2))
# Contract the two tensors
cpu = NDTensors.cpu
C = A * B
A = cpu(A)
B = cpu(B)
Expand All @@ -36,8 +36,8 @@ function main()
fill!(B, randn())

# Convert the ITensors to GPU
cA = NDTensors.cu(A)
cB = NDTensors.cu(B)
cA = gpu(A)
cB = gpu(B)

#Check that backend of contraction is GPU
@test A * A ≈ cpu(cA * cA)
Expand All @@ -47,11 +47,8 @@ function main()

dim3 = (l, k)
dim4 = (i,)
cC = ITensor(
NDTensors.generic_randn(CuVector{Float64,CUDA.Mem.DeviceBuffer}, dim(dim3)), dim3
)
cC = NDTensors.cu(ITensor(NDTensors.generic_randn(Vector{Float64}, dim(dim3)), dim3))
cD = ITensor(Tensor(CuVector, dim4))
cC = ITensor(randomTensor(CuVector{Float64,CUDA.Mem.DeviceBuffer}, dim3))
cD = ITensor(Tensor(CuVector{Float32}, dim4))
fill!(cD, randn())

# Create a function of 4 tensors on GPU
Expand All @@ -61,20 +58,17 @@ function main()
#Currently this code fails with CUDA.allowscalar(false)
# Because of outer calling the _gemm! function which calls a
# generic implementation
@allowscalar grad = gradient(f, cA, cB, cC, cD)
@allowscalar @test NDTensors.cpu(cB * cC * cD) ≈ NDTensors.cpu(grad[1])
@allowscalar @test (cB * cC * cD) ≈ grad[1]
grad = gradient(f, cA, cB, cC, cD)
@test cpu(cB * cC * cD) ≈ cpu(grad[1])
@test (cB * cC * cD) ≈ grad[1]
# Create a tuple of indices
decomp = (
dim(NDTensors.ind(grad[1], 1)),
dim(NDTensors.ind(grad[1], 2)) * dim(NDTensors.ind(grad[1], 3)),
)
decomp = (dim(ind(grad[1], 1)), dim(ind(grad[1], 2)) * dim(ind(grad[1], 3)))
kmp5VT marked this conversation as resolved.
Show resolved Hide resolved
# Reshape the CuVector of data into a matrix
cuTensor_data = CUDA.reshape(NDTensors.data(storage(grad[1])), decomp)
cuTensor_data = reshape(array(grad[1]), decomp)
# Use cuBLAS to compute SVD of data
U, S, V = svd(cuTensor_data)
decomp = (dim(NDTensors.ind(grad[2], 1)), dim(NDTensors.ind(grad[2], 2)))
cuTensor_data = CUDA.reshape(NDTensors.data(storage(grad[2])), decomp)
decomp = size(array(grad[2]))
cuTensor_data = reshape(array(grad[2]), decomp)
U, S, V = svd(cuTensor_data)

# These things can take up lots of memory, look at memory usage here
Expand All @@ -87,33 +81,33 @@ function main()
CUDA.memory_status()

# Its possible to compute QR of GPU tensor
cq = ITensors.qr(cA, (i,), (j, l))
q = ITensors.qr(A, (i,), (j, l))
cq = qr(cA, (i,), (j, l))
A ≈ cpu(cq[1]) * cpu(cq[2])

## SVD does not yet work with CUDA backend, see above on
## Converting ITensors to vectors and calling CUDA svd function
## CuVectors...
#ITensors.svd(A, (i,), (j, l))

s = ITensors.siteinds("S=1/2", 8)
s = siteinds("S=1/2", 8)
m = randomMPS(s; linkdims=4)
cm = NDTensors.cu(m)
cm = gpu(m)

@test inner(cm', cm) ≈ inner(m', m)

H = randomMPO(s)
cH = NDTensors.cu(H)
cH = gpu(H)
@test inner(cm', cH, cm) ≈ inner(m', H, m)

m = orthogonalize(m, 1)
cm = NDTensors.cu(orthogonalize(cm, 1))
cm = gpu(orthogonalize(cm, 1))
@test inner(m', m) ≈ inner(cm', cm)

H = orthogonalize(H, 1)
cH = NDTensors.cu(cH)
cH = gpu(cH)

@test inner(cm', cH, cm) ≈ inner(m', H, m)
end

## running the main function with Float64
main()
25 changes: 13 additions & 12 deletions NDTensors/ext/examples/NDTensorMetal.jl
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
using Metal
using Metal: MtlVector, mtl
using NDTensors

using ITensors
using Test
using Zygote
using ITensors: ITensor, Index, randomITensor
using Test: @test
using Zygote: gradient

function main()
cpu = NDTensors.cpu
gpu = NDTensors.mtl
# Here is an example of how to utilize NDTensors based tensors with CUDA datatypes
i = Index(20)
j = Index(5)
Expand All @@ -15,27 +17,26 @@ function main()
dim1 = (i, j, l)
dim2 = (j, k)

cA = ITensor(NDTensors.generic_randn(MtlVector{Float32}, dim(dim1)), dim1)
cB = ITensor(NDTensors.generic_randn(MtlVector{Float32}, dim(dim2)), dim2)
## MtlArrays only support Float32 arithmatic
cA = ITensor(randomTensor(MtlVector{Float32}, dim1))
cB = ITensor(randomTensor(MtlVector{Float32}, dim2))
cC = cA * cB

cpu = NDTensors.cpu
A = cpu(cA)
B = cpu(cB)

@test A * B ≈ cpu(cC)

#C = A * B

dim3 = (l, k)
dim4 = (i,)

cC = mtl(randomITensor(Float32, dim3))
cD = mtl(randomITensor(Float32, dim4))
cC = gpu(randomITensor(Float32, dim3))
cD = gpu(randomITensor(Float32, dim4))

f(A, B, C, D) = (A * B * C * D)[]

return grad = gradient(f, cA, cB, cC, cD)
grad = gradient(f, cA, cB, cC, cD)
@test grad[2] ≈ cA * cC * cD
end

main()
16 changes: 11 additions & 5 deletions NDTensors/src/blocksparse/blocksparsetensor.jl
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ Construct a block sparse tensor with uninitialized memory
from indices and locations of non-zero blocks.
"""
function BlockSparseTensor(::UndefInitializer, blockoffsets, inds)
return BlockSparseTensor(Float64, undef, blockoffsets, inds)
return BlockSparseTensor(default_eltype(), undef, blockoffsets, inds)
end

function BlockSparseTensor(
Expand All @@ -65,15 +65,15 @@ function BlockSparseTensor(eltype::Type{<:Number}, blockoffsets::BlockOffsets, i
end

function BlockSparseTensor(blockoffsets::BlockOffsets, inds)
return BlockSparseTensor(Float64, blockoffsets, inds)
return BlockSparseTensor(default_eltype(), blockoffsets, inds)
end

"""
BlockSparseTensor(inds)

Construct a block sparse tensor with no blocks.
"""
BlockSparseTensor(inds) = BlockSparseTensor(Float64, inds)
BlockSparseTensor(inds) = BlockSparseTensor(default_eltype(), inds)

function BlockSparseTensor(datatype::Type{<:AbstractArray}, inds)
return BlockSparseTensor(datatype, BlockOffsets{length(inds)}(), inds)
Expand All @@ -99,7 +99,7 @@ Construct a block sparse tensor with the specified blocks.
Defaults to setting structurally non-zero blocks to zero.
"""
function BlockSparseTensor(blocks::Vector{BlockT}, inds) where {BlockT<:Union{Block,NTuple}}
return BlockSparseTensor(Float64, blocks, inds)
return BlockSparseTensor(default_eltype(), blocks, inds)
end

function BlockSparseTensor(
Expand Down Expand Up @@ -160,7 +160,7 @@ function randomBlockSparseTensor(blocks::Vector, inds)
end

function randomBlockSparseTensor(rng::AbstractRNG, blocks::Vector, inds)
return randomBlockSparseTensor(rng, Float64, blocks, inds)
return randomBlockSparseTensor(rng, default_eltype(), blocks, inds)
end

"""
Expand All @@ -176,6 +176,12 @@ function BlockSparseTensor(
return BlockSparseTensor(blocks, inds)
end

function BlockSparseTensor{ElT}(
blocks::Vector{BlockT}, inds::Vararg{BlockDim,N}
) where {ElT<:Number,BlockT<:Union{Block{N},NTuple{N,<:Integer}}} where {N}
return BlockSparseTensor(ElT, blocks, inds)
end

function zeros(
tensor::BlockSparseTensor{ElT,N}, blockoffsets::BlockOffsets{N}, inds
) where {ElT,N}
Expand Down
9 changes: 3 additions & 6 deletions NDTensors/src/dense/tensoralgebra/outer.jl
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,9 @@ function outer!(
v1 = data(T1)
v2 = data(T2)
RM = reshape(R, length(v1), length(v2))
## Potential fix is call reshape on array
#RM = reshape(array(R), length(v1), length(v2))
#RM .= v1 .* transpose(v2)
#mul!(RM, v1, transpose(v2))
_gemm!('N', 'T', one(ElR), v1, v2, zero(ElR), RM)
#mul!!(RM, v1, transpose(v2), one(ElR), zero(ElR))
## There is no _gemm! defined for CUDA or Metal so it calls
## generic matmul. Replace with mul!! to call correct mul! (ger)
mul!!(array(RM), v1, transpose(v2), one(ElR), zero(ElR))
return R
end

Expand Down
Loading