Description
Describe the bug
When using e.g. warp_perspective or remap with the REPLICATE border mode, certain transformations can result in an illegal cuda memory access.
Steps/Code to reproduce bug
import cupy as cp
import numpy as np
import cvcuda
img = cp.ones((1208, 1928, 3), dtype=cp.uint8)
M = np.array([
[ 8.08776838e-02, 2.36326631e+00, -4.08795000e+02],
[-1.28514739e-02, 2.55201343e-01, -8.45896673e+01],
[-2.68404432e-04, -6.57235630e-04, 1.00000000e+00]
])
cvcuda.warp_perspective(cvcuda.as_tensor(img, "HWC"), M, flags=cvcuda.Interp.LINEAR, border_mode=cvcuda.Border.REPLICATE, border_value=np.array([0]))
I was able to reproduce the illegal memory access on RTX 3090 and RTX 4090 cards, on both Intel and AMD cpus.
When compiling cvcuda with -DCMAKE_CUDA_FLAGS_RELEASE=-O0
, instead of an illegal memory access, the following assert triggers instead:
Expected behavior
It would be great if these transformations didn't trigger an illegal memory access.
Environment overview
- Environment location: Bare-metal
- Method of cuDF install: pip, from source
Environment details
Click here to see environment details
**git*** commit 56a4d2a9285d650934a72bf327d901491127042e (HEAD -> main, origin/main, origin/HEAD) Author: dlesage-nvidia <[email protected]> Date: Mon Mar 3 12:48:00 2025 -0800 docs: update doc to highlight Pypi wheels (#235) **git submodules*** ca4d00ad3e2e0f410eeab3264d21b8a39397f362 3rdparty/dlpack (v0.8-1-gca4d00a) 5ab508a01f9eb089207ee87fd547d290da39d015 3rdparty/googletest (release-1.8.0-3127-g5ab508a0) 75212298727e8f6e1df9215f2fcb47c8c721ffc9 3rdparty/nvbench (old-cmake-164-g7521229) 941f45bcb51457884fa1afd6e24a67377d70f75c 3rdparty/pybind11 (v2.11.0-134-g941f45bc) ***OS Information*** DISTRIB_ID=Ubuntu DISTRIB_RELEASE=24.04 DISTRIB_CODENAME=noble DISTRIB_DESCRIPTION="Ubuntu 24.04.2 LTS" PRETTY_NAME="Ubuntu 24.04.2 LTS" NAME="Ubuntu" VERSION_ID="24.04" VERSION="24.04.2 LTS (Noble Numbat)" VERSION_CODENAME=noble ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=noble LOGO=ubuntu-logo Linux workstation-mitchell 6.8.0-55-generic #57-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb 12 23:42:21 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux ***GPU Information*** Tue May 6 16:01:25 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3090 On | 00000000:61:00.0 On | N/A | | 30% 30C P8 37W / 350W | 3248MiB / 24576MiB | 18% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 5069 G /usr/lib/xorg/Xorg 338MiB | | 0 N/A N/A 9467 G /usr/bin/gnome-shell 64MiB | | 0 N/A N/A 11159 G /usr/libexec/xdg-desktop-portal-gnome 47MiB | | 0 N/A N/A 186573 G ...onEnabled --variations-seed-version 72MiB | | 0 N/A N/A 929421 C ...s/python/triton_python_backend_stub 668MiB | | 0 N/A N/A 938266 G ...erProcess --variations-seed-version 57MiB | | 0 N/A N/A 4020085 C tritonserver 1888MiB | +-----------------------------------------------------------------------------------------+ ***CPU*** Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 24 On-line CPU(s) list: 0-23 Vendor ID: AuthenticAMD Model name: AMD Ryzen Threadripper PRO 5945WX 12-Cores CPU family: 25 Model: 8 Thread(s) per core: 2 Core(s) per socket: 12 Socket(s): 1 Stepping: 2 Frequency boost: enabled CPU(s) scaling MHz: 42% CPU max MHz: 7014.8428 CPU min MHz: 1800.0000 BogoMIPS: 8184.86 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin brs arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap Virtualization: AMD-V L1d cache: 384 KiB (12 instances) L1i cache: 384 KiB (12 instances) L2 cache: 6 MiB (12 instances) L3 cache: 64 MiB (2 instances) NUMA node(s): 1 NUMA node0 CPU(s): 0-23 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Vulnerable: Safe RET, no microcode Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected ***CMake*** /usr/bin/cmake cmake version 3.28.3 CMake suite maintained and supported by Kitware (kitware.com/cmake). ***g++*** /usr/bin/g++ g++ (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 Copyright (C) 2023 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ***nvcc*** /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Tue_Oct_29_23:50:19_PDT_2024 Cuda compilation tools, release 12.6, V12.6.85 Build cuda_12.6.r12.6/compiler.35059454_0 ***Python*** /home/batman/.pyenv/shims/python Python 3.11.4 ***Environment Variables*** PATH : /home/batman/.pyenv/shims:/home/batman/.pyenv/bin:/home/batman/.local/bin:/home/batman/.nvm/versions/node/v22.14.0/bin:/home/batman/.pyenv/bin:/home/batman/.local/bin:/home/batman/.cargo/bin:/home/batman/.pyenv/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin:/usr/local/go/bin:/home/batman/.go/bin/:/usr/local/go/bin:/home/batman/.go/bin/ LD_LIBRARY_PATH : /usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/home/batman/xx/external/lib: NUMBAPRO_NVVM : NUMBAPRO_LIBDEVICE : CONDA_PREFIX : PYTHON_PATH : conda not found ***pip packages*** /home/batman/.pyenv/shims/pip Package Version Editable project location ------------------------- ----------------- ------------------------- aenum 3.1.15 aiohappyeyeballs 2.4.0 aiohttp 3.10.5 aioice 0.9.0 aiortc 1.9.0 aiosignal 1.3.1 albumentations 1.3.1 ale-py 0.8.1 annotated-types 0.7.0 antlr4-python3-runtime 4.9.3 anyio 4.4.0 argon2-cffi 23.1.0 argon2-cffi-bindings 21.2.0 arrow 1.3.0 ask 0.1.0 /home/batman/ask asttokens 2.4.1 async-lru 2.0.4 atomicwrites 1.4.1 attrs 24.2.0 auditwheel 6.3.0 av 12.3.0 azure-core 1.31.0 azure-identity 1.17.1 azure-storage-blob 12.23.0 babel 2.16.0 beautifulsoup4 4.12.3 bidict 0.23.1 bleach 6.1.0 blinker 1.8.2 blosc 1.11.2 bodyjim 1.0.3 box2d-py 2.3.5 Brotli 1.1.0 casadi 3.6.6 certifi 2024.8.30 cffi 1.17.1 cfgv 3.4.0 charset-normalizer 3.3.2 click 8.1.7 click-plugins 1.1.1 cligj 0.7.2 cloudpickle 3.0.0 codespell 2.3.0 coloredlogs 15.0.1 comm 0.2.2 ConfigArgParse 1.7 contourpy 1.3.0 coverage 7.6.1 crcmod 1.7 cryptography 43.0.1 cupy-cuda12x 13.3.0 cvcuda-cu12 0.14.0 cycler 0.12.1 Cython 3.0.11 datadog 0.50.0 dbus-next 0.2.3 debugpy 1.8.5 decorator 5.1.1 defusedxml 0.7.1 Deprecated 1.2.14 dictdiffer 0.9.0 diffusers 0.30.3 distlib 0.3.8 dm-tree 0.1.8 dnspython 2.6.1 dotmap 1.3.30 einops 0.8.0 elastic-transport 8.15.0 elasticsearch 8.15.1 EWMHlib 0.2 execnet 2.1.1 executing 2.1.0 Farama-Notifications 0.0.4 fastjsonschema 2.20.0 fastrlock 0.8.2 filelock 3.16.1 fiona 1.10.1 Flask 3.0.3 Flask-Cors 5.0.0 Flask-SocketIO 5.3.7 flatbuffers 24.3.25 fonttools 4.53.1 fqdn 1.5.1 frozenlist 1.4.1 fsspec 2024.9.0 future 1.0.0 future-fstrings 1.2.0 GeoAlchemy2 0.15.2 geopandas 0.14.4 gevent 24.2.1 geventhttpclient 2.0.2 ghp-import 2.1.0 google-crc32c 1.6.0 greenlet 3.1.0 gymnasium 0.29.1 h11 0.14.0 hatanaka 2.8.1 httpcore 1.0.5 httpx 0.27.2 huggingface-hub 0.25.0 humanfriendly 10.0 hypothesis 6.47.5 identify 2.6.1 idna 3.10 ifaddr 0.2.0 imageio 2.35.1 importlib_metadata 8.5.0 importlib_resources 6.4.5 influxdb-client 1.46.0 iniconfig 2.0.0 inputs 0.5 ipykernel 6.29.5 ipython 8.27.0 ipywidgets 8.1.5 isodate 0.6.1 isoduration 20.11.0 itsdangerous 2.2.0 jedi 0.19.1 Jinja2 3.1.4 joblib 1.4.2 json-logging-py 0.2 json-rpc 1.15.0 json5 0.9.25 jsonpointer 3.0.0 jsonschema 4.23.0 jsonschema-specifications 2023.12.1 jupyter 1.1.1 jupyter_client 8.6.3 jupyter-console 6.6.3 jupyter_core 5.7.2 jupyter-events 0.10.0 jupyter-lsp 2.2.5 jupyter_server 2.14.2 jupyter_server_terminals 0.5.3 jupyterlab 4.2.5 jupyterlab_pygments 0.3.0 jupyterlab_server 2.27.3 jupyterlab-vim 4.1.4 jupyterlab_widgets 3.0.13 kaleido 0.2.1 kiwisolver 1.4.7 laika 0.0.1 /home/batman/xx/laika lazy_loader 0.4 libusb1 3.1.0 llvmlite 0.44.0 lru-dict 1.3.0 lxml 5.3.0 Mako 1.3.5 Markdown 3.7 MarkupSafe 2.1.5 matplotlib 3.9.2 matplotlib-inline 0.1.7 mergedeep 1.3.4 metadrive-simulator 0.4.2.3 mistune 3.0.2 mkdocs 1.6.1 mkdocs-get-deps 0.2.0 MouseInfo 0.1.3 mpmath 1.3.0 msal 1.31.0 msal-extensions 1.2.0 msgpack 1.1.0 msgpack-numpy 0.4.8 msgpack-python 0.5.6 multidict 6.1.0 mypy 1.11.2 mypy-extensions 1.0.0 natsort 8.4.0 nbclient 0.10.0 nbconvert 7.16.4 nbformat 5.10.4 ncompress 1.0.2 nest-asyncio 1.6.0 netron 7.9.9 networkx 2.8.8 nodeenv 1.9.1 notebook 7.2.2 notebook_shim 0.2.4 numpy 2.1.3 nvidia-cublas-cu12 12.6.4.1 nvidia-cuda-cupti-cu12 12.6.80 nvidia-cuda-nvrtc-cu12 12.6.77 nvidia-cuda-runtime-cu12 12.6.77 nvidia-cudnn-cu12 9.5.1.17 nvidia-cufft-cu12 11.3.0.4 nvidia-cufile-cu12 1.11.1.6 nvidia-curand-cu12 10.3.7.77 nvidia-cusolver-cu12 11.7.1.2 nvidia-cusparse-cu12 12.5.4.2 nvidia-cusparselt-cu12 0.6.3 nvidia-ml-py 12.560.30 nvidia-nccl-cu12 2.26.2 nvidia-nvjitlink-cu12 12.6.85 nvidia-nvtx-cu12 12.6.77 omegaconf 2.3.0 onnx 1.16.2 onnxoptimizer 0.3.13 onnxruntime-gpu 1.19.2 opencv-python-headless 4.10.0.84 openpilot 0.1.0 /home/batman/xx/openpilot osmnx 1.2.2 overrides 7.7.0 packaging 24.0 Panda3D 1.10.13 panda3d-gltf 0.13 panda3d-simplepbr 0.12.0 pandas 2.2.2 pandas-stubs 2.0.3.230814 pandocfilters 1.5.1 parameterized 0.8.1 parso 0.8.4 pathspec 0.12.1 pexpect 4.9.0 pillow 10.4.0 pillow-avif-plugin 1.4.6 pip 25.1.1 pipenv 2024.0.2 platformdirs 4.3.6 plotly 5.24.1 pluggy 1.5.0 polyline 2.0.2 portalocker 2.10.1 pre-commit 3.8.0 pre-commit-hooks 4.6.0 progressbar 2.5 prometheus_client 0.20.0 prompt_toolkit 3.0.47 protobuf 5.27.0 psutil 6.0.0 ptyprocess 0.7.0 pure_eval 0.2.3 PyAudio 0.2.14 PyAutoGUI 0.9.54 pycapnp 2.0.0 pycparser 2.22 pycryptodome 3.20.0 pycuda 2025.1 pycurl 7.45.3 pydantic 2.9.2 pydantic_core 2.23.4 pyee 12.0.0 pyelftools 0.32 pygame 2.6.0 PyGetWindow 0.0.9 PyGithub 2.4.0 Pygments 2.18.0 PyJWT 2.9.0 pylibsrtp 0.10.0 PyMonCtl 0.92 PyMsgBox 1.0.9 PyMySQL 1.1.1 PyNaCl 1.5.0 PyNvCodec 2.0 PyNvVideoCodec 1.0.2 pyopencl 2024.2.7 pyOpenSSL 24.2.1 pyparsing 3.1.4 pyperclip 1.9.0 pyprof2calltree 1.4.5 pyproj 3.6.1 PyRect 0.2.0 PyScreeze 1.0.1 pyserial 3.5 pytest 8.3.3 pytest-asyncio 0.24.0 pytest-cov 5.0.0 pytest-cpp 2.6.0 pytest-mock 3.14.0 pytest-randomly 3.15.0 pytest-repeat 0.9.3 pytest-subtests 0.13.1 pytest-timeout 2.3.1 pytest-xdist 3.6.1 python-dateutil 2.9.0.post0 python-engineio 4.9.1 python-json-logger 2.0.7 python-logstash 0.4.8 python-rapidjson 1.20 python-socketio 5.11.4 python-xlib 0.33 python3-xlib 0.15 pytools 2024.1.10 pytweening 1.2.0 pytz 2024.2 PyWinBox 0.7 PyWinCtl 0.4 PyYAML 6.0.2 pyyaml_env_tag 0.1 pyzmq 26.2.0 qudida 0.0.4 raylib 5.5.0.2 reactivex 4.0.4 redis 5.0.8 referencing 0.35.1 regex 2024.9.11 requests 2.32.3 reverse_geocoder 1.5.1 rfc3339-validator 0.1.4 rfc3986-validator 0.1.1 rpds-py 0.20.0 Rtree 1.3.0 ruamel.yaml 0.18.6 ruamel.yaml.clib 0.2.8 ruff 0.6.5 s2sphere 0.2.5 safetensors 0.4.5 scikit-image 0.24.0 scikit-learn 1.5.2 scipy 1.14.1 SCons 4.8.1 seaborn 0.13.2 Send2Trash 1.8.3 sentry-sdk 2.14.0 setproctitle 1.3.3 setuptools 75.1.0 Shapely 1.8.5.post1 Shimmy 0.2.1 simple-websocket 1.0.0 simplejson 3.19.3 siphash24 1.6 six 1.16.0 smbus2 0.4.3 sniffio 1.3.1 sortedcontainers 2.4.0 sounddevice 0.5.0 soupsieve 2.6 spidev 3.6 SQLAlchemy 2.0.35 sqlalchemy-stubs 0.4 stack-data 0.6.3 statsd 4.0.1 swig 4.2.1 sympy 1.13.3 tabulate 0.9.0 teleoprtc 1.0.3 tenacity 9.0.0 terminado 0.18.1 threadpoolctl 3.5.0 tifffile 2024.8.30 timm 1.0.9 tinycss2 1.3.0 tomli 2.0.1 torch 2.7.0+cu126 torchvision 0.22.0+cu126 tornado 6.4.1 tqdm 4.66.5 traitlets 5.14.3 triton 3.3.0 tritonclient 2.33.0 types-aiofiles 24.1.0.20240626 types-beautifulsoup4 4.12.0.20250204 types-cffi 1.16.0.20240331 types-docutils 0.21.0.20240907 types-html5lib 1.1.11.20241018 types-influxdb-client 1.45.0.20240915 types-Markdown 3.7.0.20240822 types-Pillow 10.2.0.20240822 types-psutil 6.0.0.20240901 types-Pygments 2.18.0.20240506 types-pyOpenSSL 24.1.0.20240722 types-python-dateutil 2.9.0.20240906 types-pytz 2024.2.0.20240913 types-PyYAML 6.0.12.20240917 types-redis 4.6.0.20240903 types-requests 2.32.0.20240914 types-setuptools 75.1.0.20240917 types-simplejson 3.19.0.20240801 types-tabulate 0.9.0.20240106 types-tqdm 4.66.0.20240417 typing_extensions 4.12.2 tzdata 2024.1 uri-template 1.3.0 urllib3 2.2.3 virtualenv 20.26.5 watchdog 5.0.2 wcwidth 0.2.13 webcolors 24.8.0 webencodings 0.5.1 websocket-client 1.8.0 Werkzeug 3.0.4 wheel 0.45.1 widgetsnbextension 4.0.13 wrapt 1.16.0 wsproto 1.2.0 xx 0.1.0 /home/batman/xx yapf 0.40.2 yarl 1.11.1 zerorpc 0.6.3 zipp 3.20.2 zope.event 5.0 zope.interface 7.0.3 zstandard 0.23.0
Additional context
After spending a bit of time debugging, it appears to me that the crash is caused by an integer overflow in InterpolationWrap.hpp. When the given transformation is quite extreme, as in the repro above, some of the computed source coordinates may be outside the range of a signed 32-bit int. When those coordinates are cast to an int32_t, it seems to get clamped to INT32_MIN/INT32_MAX, e.g. at
Calling InterpolationWrap::operator[]
with INT32_MAX can result in an integer overflow when it attempts to compute the coordinates of the neighboring pixels, e.g. at
I believe the reason this triggers an illegal memory access when using the REPLICATE border mode is that the c < 0
check in GetIndexWithBorder is skipped when called with x1 + 1
, since the function is inlined and has already been called with x1
and therefore the compiler treats the check as unreachable code for x1 + 1
: