Skip to content

【报bug测试3】OSError: (External) CUDNN error(7), CUDNN_STATUS_MAPPING_ERROR. #6

@Ligoml

Description

@Ligoml

bug复现环境(bug reproduction environment)

标题:特定CUDA版本下稳定复现CUDNN error

版本、环境信息:
1)PaddlePaddle版本:2.2.1
2)CPU:---
3)GPU:V100 16G/32G
4)系统环境:ubuntu 16.04,python 3.7

bug复现步骤及最小代码集(Bug reproduction steps and minimal code set)

代码中所有算子都是直接调用Paddle提供的卷积块,主要包含Conv3D,BN3D,Conv3DTranspose等模块。

期望结果(Desired result)

不报错,正常训练

实际结果(actual result)

在CUDA版本为10.1/10.2时,稳定报如下错误。但是当CUDA版本为11.0以上版本时,可以正常训练完成。
File "train_mgpu.py", line 284, in train
loss.backward()
File "", line 2, in backward
File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in impl
return wrapped_func(*args, **kwargs)
File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/framework.py", line 229, in impl
return func(*args, **kwargs)
File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/varbase_patch_methods.py", line 249, in backward
framework._dygraph_tracer())
OSError: (External) CUDNN error(7), CUDNN_STATUS_MAPPING_ERROR.
[Hint: 'CUDNN_STATUS_MAPPING_ERROR'. An access to GPU memory space failed, which is usually caused by a failure to bind a texture. To correct, prior to the function call, unbind any previously bound textures. Otherwise, this may indicate an internal error/bug in the library. ] (at /paddle/paddle/fluid/operators/conv_cudnn_op.cu:758)

其他补充

No response

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions