-
Notifications
You must be signed in to change notification settings - Fork 137
Open
Description
Describe the bug
Incorrect temporary memlet sizes within a range in a GPU map.
Thus, the validation fails with "Memlet subset out-of-bounds"/
Message
"dace/dace/sdfg/validation.py", line 824, in validate_state
raise InvalidSDFGEdgeError("Memlet subset out-of-bounds", sdfg, state_id, eid)
dace.sdfg.validation.InvalidSDFGEdgeError: <exception str() failed>
To Reproduce
The following code results in the error:
import dace
import torch
from dace.transformation.interstate import GPUTransformSDFG
def _get_strides_squared(size):
return (1, size), (size, 1), (size, 1)
N = 32
n = 32
a_desc = dace.data.Array(
dace.float32,
[N, N],
storage=dace.StorageType.GPU_Global,
strides=[N, 1],
)
b_desc = dace.data.Array(
dace.float32,
[N, N],
storage=dace.StorageType.GPU_Global,
strides=[N, 1],
)
c_desc = dace.data.Array(
dace.float32,
[N, N],
storage=dace.StorageType.GPU_Global,
strides=[N, 1],
)
op_size = 32
a_storage, b_storage, c_storage = dace.StorageType.GPU_Shared, dace.StorageType.GPU_Shared, dace.StorageType.GPU_Shared
a_strides, b_strides, c_strides = _get_strides_squared(op_size)
@dace.program
def global_matmul(
A: a_desc @ dace.StorageType.GPU_Global,
B: b_desc @ dace.StorageType.GPU_Global,
C: c_desc @ dace.StorageType.GPU_Global,
):
for i, j in dace.map[0:N:op_size,0:N:op_size] @ dace.ScheduleType.GPU_Device:
for l in dace.map[0:64] @ dace.ScheduleType.GPU_ThreadBlock:
c = dace.ndarray(
[op_size, op_size],
dtype=dace.float32,
storage=c_storage,
strides=c_strides,
)
c.fill(0.0)
for k in range(1):
a = dace.ndarray(
[32, 32],
dtype=dace.float32,
storage=a_storage,
strides=a_strides,
)
b = dace.ndarray(
[32, 32],
dtype=dace.float32,
storage=b_storage,
strides=b_strides,
)
a[:, :] = A[i:i+32, k:k+32]
b[:, :] = B[k:k+32, j:j+32]
dace.libraries.blas.gemm(a, b, c, alpha=1.0, beta=1.0)
C[i:i+op_size, j:j+op_size] = c[:, :]
A = torch.rand((n,n), dtype=torch.float32, device="cuda")
B = torch.rand((n,n), dtype=torch.float32, device="cuda")
C = torch.zeros((n,n), dtype=torch.float32, device="cuda")
sdfg = global_matmul.to_sdfg()
sdfg.apply_transformations(
GPUTransformSDFG, options=dict(sequential_innermaps=False, register_trans=False)
)
sdfg(A=A, B=B, C=C)
Expected behavior
The expected behavior is that the temporary arrays __tmp*
all have shape [32,32].
Desktop (please complete the following information):
Metadata
Metadata
Assignees
Labels
No labels