Description
Problem
I have found an issue when using CUDA 11.1, where creating a FFT plan, using it and doing another operation (simple sum reduction), then deleting the plan, re-creating another one and doing this again ends up with a cuFuncSetBlockShape failed: invalid resource handle
The following minimal example can be used to reproduce the issue (needs to be done in a fresh session for reproductibility)
import numpy as np
import pycuda.gpuarray as cua
import pycuda.autoinit
import skcuda.fft as cu_fft
fft_shape = (128, 128)
plan = cu_fft.Plan(fft_shape, np.complex64, np.complex64, batch=1)
a = cua.to_gpu(np.random.uniform(0,1, fft_shape).astype(np.complex64))
cu_fft.fft(a, a, plan)
tmp = cua.sum(a)
del plan
plan = cu_fft.Plan(fft_shape, np.complex64, np.complex64, batch=1)
cu_fft.fft(a, a, plan)
tmp = cua.sum(a)
Using the above code in a fresh python session always ends up with the following error:
---> 17 tmp = cua.sum(a)
~/dev/py38-env/lib/python3.8/site-packages/pycuda/gpuarray.py in sum(a, dtype, stream, allocator)
1639 from pycuda.reduction import get_sum_kernel
1640 krnl = get_sum_kernel(dtype, a.dtype)
-> 1641 return krnl(a, stream=stream, allocator=allocator)
1642
1643
~/dev/py38-env/lib/python3.8/site-packages/pycuda/reduction.py in __call__(self, *args, **kwargs)
283
284 # print block_count, seq_count, self.block_size, sz
--> 285 f((block_count, 1), (self.block_size, 1, 1), stream,
286 *([result.gpudata]+invocation_args+[seq_count, sz]),
287 **kwargs)
~/dev/py38-env/lib/python3.8/site-packages/pycuda/driver.py in function_prepared_async_call(func, grid, block, stream, *arg
s, **kwargs)
547 def function_prepared_async_call(func, grid, block, stream, *args, **kwargs):
548 if isinstance(block, tuple):
--> 549 func._set_block_shape(*block) 550 else:
551 from warnings import warn
LogicError: cuFuncSetBlockShape failed: invalid resource handle
The error occurs during the pycuda sum reduction, but it seems triggered by the deletion of the plan and re-creation of another one, so it may be due to cuFFT.
I noted than in CUDA 11.1 the release notes indicate: "After successfully creating a plan, cuFFT now enforces a lock on the cufftHandle. Subsequent calls to any planning function with the same cufftHandle will fail" but I have no idea if that can be related.
Environment
List the following info:
- OS platform: Linux (tested in power64/debian10, but also fresh X86_64 cloud machines (from vast.ai) based on https://hub.docker.com/r/nvidia/cuda/ , for example version nvidia/cuda:11.1-devel or nvidia/cuda:11.0-devel images)
- Python version: 3.8 (probably not dependent)
- CUDA version: 11.0 (with driver 455.45.01) , 11.1 (with driver 450.80.02, 455.23.05 or 455.38)
- PyCUDA version: pycuda.VERSION = (2020, 1)
- scikit-cuda version: latest git 806ee27 (0.53 pip-installed also has the issue)