Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conda package build returning internal cuFile error. #508

Open
fstrug opened this issue Oct 23, 2024 · 7 comments
Open

Conda package build returning internal cuFile error. #508

fstrug opened this issue Oct 23, 2024 · 7 comments

Comments

@fstrug
Copy link

fstrug commented Oct 23, 2024

Hi, I think I might be running into a similar related error. I am installing kvikio with Conda and I can see the libcufile is installed in my environment. When I try to run the benchmark:

KVIKIO_COMPAT_MODE=off python python/kvikio/kvikio/benchmarks/single_node_io.py 

Roundtrip benchmark
----------------------------------
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   WARNING - cuFile compat mode   
         GDS not enabled          
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
GPU               | Unknown (install nvidia-ml-py)
GPU Memory Total  | Unknown (install nvidia-ml-py)
BAR1 Memory Total | Unknown (install nvidia-ml-py)
GDS driver        | N/A (Compatibility Mode)
GDS config.json   | /etc/cufile.json
----------------------------------
nbytes            | 10485760 bytes (10.00 MiB)
4K aligned        | True
pre-reg-buf       | False
directory         | /tmp/tmp4__41g_o
nthreads          | 1
nruns             | 1
==================================
Traceback (most recent call last):
  File "/home/fstrug/kvikio/python/kvikio/kvikio/benchmarks/single_node_io.py", line 375, in <module>
    main(args)
  File "/home/fstrug/kvikio/python/kvikio/kvikio/benchmarks/single_node_io.py", line 283, in main
    read, write = API[api](args)
                  ^^^^^^^^^^^^^^
  File "/home/fstrug/kvikio/python/kvikio/kvikio/benchmarks/single_node_io.py", line 45, in run_cufile
    f = kvikio.CuFile(file_path, flags="w")
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fstrug/.conda/envs/img_cuda12.2-kvikio/lib/python3.12/site-packages/kvikio/cufile.py", line 88, in __init__
    self._handle = file_handle.CuFile(file, flags)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "file_handle.pyx", line 97, in kvikio._lib.file_handle.CuFile.__init__
RuntimeError: cuFile error at: /home/fstrug/.conda/envs/img_cuda12.2-kvikio/include/kvikio/file_handle.hpp:196: internal error

Originally posted by @fstrug in #378

@fstrug
Copy link
Author

fstrug commented Oct 23, 2024

Output from conda list and conda info.

conda info

     active environment : img_cuda12.2-kvikio
    active env location : /home/fstrug/.conda/envs/img_cuda12.2-kvikio
            shell level : 2
       user config file : /home/fstrug/.condarc
 populated config files : /opt/conda/.condarc
          conda version : 24.7.1
    conda-build version : not installed
         python version : 3.10.15.final.0
                 solver : libmamba (default)
       virtual packages : __archspec=1=zen3
                          __conda=24.7.1=0
                          __cuda=12.2=0
                          __glibc=2.34=0
                          __linux=6.3.12=0
                          __unix=0=0
       base environment : /opt/conda  (read only)
      conda av data dir : /opt/conda/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /opt/conda/pkgs
                          /home/fstrug/.conda/pkgs
       envs directories : /home/fstrug/.conda/envs
                          /opt/conda/envs
               platform : linux-64
             user-agent : conda/24.7.1 requests/2.32.3 CPython/3.10.15 Linux/6.3.12-200.fc38.x86_64 almalinux/9.4 glibc/2.34 solver/libmamba conda-libmamba-solver/24.7.0 libmambapy/1.5.9
                UID:GID : 57561:5063
             netrc file : None
           offline mode : False

conda list

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
annotated-types           0.7.0              pyhd8ed1ab_0    conda-forge
asciitree                 0.3.3                      py_2    conda-forge
asttokens                 2.4.1              pyhd8ed1ab_0    conda-forge
attr                      2.5.1                h166bdaf_1    conda-forge
awkward                   2.6.6                    pypi_0    pypi
awkward-cpp               35                       pypi_0    pypi
aws-c-auth                0.7.22               h96bc93b_2    conda-forge
aws-c-cal                 0.6.14               h88a6e22_1    conda-forge
aws-c-common              0.9.19               h4ab18f5_0    conda-forge
aws-c-compression         0.2.18               h83b837d_6    conda-forge
aws-c-event-stream        0.4.2               ha47c788_12    conda-forge
aws-c-http                0.8.1               h29d6fba_17    conda-forge
aws-c-io                  0.14.8               h21d4f22_5    conda-forge
aws-c-mqtt                0.10.4               h759edc4_4    conda-forge
aws-c-s3                  0.5.9                h594631b_3    conda-forge
aws-c-sdkutils            0.1.16               h83b837d_2    conda-forge
aws-checksums             0.1.18               h83b837d_6    conda-forge
aws-crt-cpp               0.26.9               he3a8b3b_0    conda-forge
aws-sdk-cpp               1.11.329             hba8bd5f_3    conda-forge
binutils_impl_linux-64    2.40                 ha1999f0_2    conda-forge
bokeh                     3.4.1              pyhd8ed1ab_0    conda-forge
boost-histogram           1.4.1           py311h9547e67_0    conda-forge
brotli                    1.1.0                hd590300_1    conda-forge
brotli-bin                1.1.0                hd590300_1    conda-forge
brotli-python             1.1.0           py311hb755f60_1    conda-forge
bzip2                     1.0.8                hd590300_5    conda-forge
c-ares                    1.28.1               hd590300_0    conda-forge
ca-certificates           2024.8.30            hbcca054_0    conda-forge
cachetools                5.3.3              pyhd8ed1ab_0    conda-forge
certifi                   2024.8.30          pyhd8ed1ab_0    conda-forge
cffi                      1.16.0          py311hb3a22ac_0    conda-forge
click                     8.1.7           unix_pyh707e725_0    conda-forge
click-default-group       1.2.4              pyhd8ed1ab_0    conda-forge
cloudpickle               3.0.0              pyhd8ed1ab_0    conda-forge
coffea                    2024.3.0           pyhd8ed1ab_0    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
comm                      0.2.2              pyhd8ed1ab_0    conda-forge
contourpy                 1.2.1           py311h9547e67_0    conda-forge
correctionlib             2.5.0           py311h9e0f504_1    conda-forge
cramjam                   2.8.3           py311h46250e7_0    conda-forge
cuda-cccl_linux-64        12.2.140             ha770c72_0    conda-forge
cuda-crt-dev_linux-64     12.2.140             ha770c72_1    conda-forge
cuda-crt-tools            12.2.140             ha770c72_1    conda-forge
cuda-cudart               12.2.140             hd3aeb46_0    conda-forge
cuda-cudart-dev           12.2.140             hd3aeb46_0    conda-forge
cuda-cudart-dev_linux-64  12.2.140             h59595ed_0    conda-forge
cuda-cudart-static        12.2.140             hd3aeb46_0    conda-forge
cuda-cudart-static_linux-64 12.2.140             h59595ed_0    conda-forge
cuda-cudart_linux-64      12.2.140             h59595ed_0    conda-forge
cuda-libraries            12.5.0                        0    nvidia
cuda-nsight-compute       12.2.2                        0    nvidia/label/cuda-12.2.2
cuda-nvcc                 12.4.131                      0    nvidia
cuda-nvcc-dev_linux-64    12.2.140             ha770c72_1    conda-forge
cuda-nvcc-impl            12.2.140             hd3aeb46_1    conda-forge
cuda-nvcc-tools           12.2.140             hd3aeb46_1    conda-forge
cuda-nvprof               12.4.127                      0    nvidia
cuda-nvrtc                12.2.140             hd3aeb46_0    conda-forge
cuda-nvvm-dev_linux-64    12.2.140             ha770c72_1    conda-forge
cuda-nvvm-impl            12.2.140             h59595ed_1    conda-forge
cuda-nvvm-tools           12.2.140             h59595ed_1    conda-forge
cuda-opencl               12.4.127                      0    nvidia
cuda-python               12.5.0          py311h817de4b_0    conda-forge
cuda-version              12.2                 he2b69de_3    conda-forge
cudf                      24.06.00        cuda12_py311_240605_g7c706cc400_0    rapidsai
cupy                      13.1.0          py311hf829483_4    conda-forge
cupy-core                 13.1.0          py311he1e6e68_4    conda-forge
cycler                    0.12.1             pyhd8ed1ab_0    conda-forge
cytoolz                   0.12.3          py311h459d7ec_0    conda-forge
dask                      2024.5.2           pyhd8ed1ab_0    conda-forge
dask-awkward              2024.3.0           pyhd8ed1ab_0    conda-forge
dask-core                 2024.5.2           pyhd8ed1ab_0    conda-forge
dask-expr                 1.1.2              pyhd8ed1ab_0    conda-forge
dask-histogram            2024.3.0           pyhd8ed1ab_0    conda-forge
debugpy                   1.8.1           py311hb755f60_0    conda-forge
decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
distributed               2024.5.2           pyhd8ed1ab_0    conda-forge
dlpack                    0.8                  h59595ed_3    conda-forge
entrypoints               0.4                pyhd8ed1ab_0    conda-forge
exceptiongroup            1.2.0              pyhd8ed1ab_2    conda-forge
executing                 2.0.1              pyhd8ed1ab_0    conda-forge
fasteners                 0.17.3             pyhd8ed1ab_0    conda-forge
fastparquet               2024.5.0        py311h18e1886_0    conda-forge
fastrlock                 0.8.2           py311hb755f60_2    conda-forge
fmt                       10.2.1               h00ab1b0_0    conda-forge
fonttools                 4.53.0          py311h331c9d8_0    conda-forge
freetype                  2.12.1               h267a509_2    conda-forge
fsspec                    2024.6.0           pyhff2d567_0    conda-forge
gcc                       12.4.0               h236703b_1    conda-forge
gcc_impl_linux-64         12.4.0               hb2e57f8_1    conda-forge
gettext                   0.22.5               h59595ed_2    conda-forge
gettext-tools             0.22.5               h59595ed_2    conda-forge
gflags                    2.2.2             he1b5a44_1004    conda-forge
glog                      0.7.0                hed5481d_0    conda-forge
hepconvert                1.3.4              pyhd8ed1ab_0    conda-forge
hist                      2.7.3                ha770c72_0    conda-forge
hist-base                 2.7.3              pyhd8ed1ab_0    conda-forge
histoprint                2.4.0              pyhd8ed1ab_0    conda-forge
iminuit                   2.25.2          py311hb755f60_0    conda-forge
importlib-metadata        7.1.0              pyha770c72_0    conda-forge
importlib_metadata        7.1.0                hd8ed1ab_0    conda-forge
ipykernel                 6.29.3             pyhd33586a_0    conda-forge
ipython                   8.25.0             pyh707e725_0    conda-forge
jedi                      0.19.1             pyhd8ed1ab_0    conda-forge
jinja2                    3.1.4              pyhd8ed1ab_0    conda-forge
jit                       0.2.7                    pypi_0    pypi
jupyter_client            8.6.2              pyhd8ed1ab_0    conda-forge
jupyter_core              5.7.2           py311h38be061_0    conda-forge
kernel-headers_linux-64   3.10.0              he073ed8_17    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.5           py311h9547e67_1    conda-forge
krb5                      1.21.2               h659d440_0    conda-forge
kvikio                    24.06.00        cuda12_py311_240605_gd3f15ec_0    rapidsai
lcms2                     2.16                 hb7c19ff_0    conda-forge
ld_impl_linux-64          2.40                 hf3520f5_2    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libabseil                 20240116.2      cxx17_h59595ed_0    conda-forge
libarrow                  16.1.0           hcb6531f_6_cpu    conda-forge
libarrow-acero            16.1.0           hac33072_6_cpu    conda-forge
libarrow-dataset          16.1.0           hac33072_6_cpu    conda-forge
libarrow-substrait        16.1.0           h7e0c224_6_cpu    conda-forge
libasprintf               0.22.5               h661eb56_2    conda-forge
libasprintf-devel         0.22.5               h661eb56_2    conda-forge
libblas                   3.9.0           22_linux64_openblas    conda-forge
libbrotlicommon           1.1.0                hd590300_1    conda-forge
libbrotlidec              1.1.0                hd590300_1    conda-forge
libbrotlienc              1.1.0                hd590300_1    conda-forge
libcap                    2.69                 h0f662aa_0    conda-forge
libcblas                  3.9.0           22_linux64_openblas    conda-forge
libcrc32c                 1.1.2                h9c3ff4c_0    conda-forge
libcublas                 12.2.5.6             hd3aeb46_0    conda-forge
libcudf                   24.06.00        cuda12_240605_g7c706cc400_0    rapidsai
libcufft                  11.0.8.103           hd3aeb46_0    conda-forge
libcufile                 1.7.2.10             hd3aeb46_0    conda-forge
libcufile-dev             1.7.2.10             hd3aeb46_0    conda-forge
libcurand                 10.3.3.141           hd3aeb46_0    conda-forge
libcurl                   8.8.0                hca28451_0    conda-forge
libcusolver               11.5.2.141           hd3aeb46_0    conda-forge
libcusparse               12.1.2.141           hd3aeb46_0    conda-forge
libdeflate                1.20                 hd590300_0    conda-forge
libdrm                    2.4.120              hd590300_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 hd590300_2    conda-forge
libevent                  2.1.12               hf998b51_1    conda-forge
libexpat                  2.6.2                h59595ed_0    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc                    14.1.0               h77fa898_1    conda-forge
libgcc-devel_linux-64     12.4.0             ha4f9413_101    conda-forge
libgcc-ng                 14.1.0               h69a702a_1    conda-forge
libgcrypt                 1.10.3               hd590300_0    conda-forge
libgettextpo              0.22.5               h59595ed_2    conda-forge
libgettextpo-devel        0.22.5               h59595ed_2    conda-forge
libgfortran-ng            13.2.0               h69a702a_7    conda-forge
libgfortran5              13.2.0               hca663fb_7    conda-forge
libgomp                   14.1.0               h77fa898_1    conda-forge
libgoogle-cloud           2.24.0               h2736e30_0    conda-forge
libgoogle-cloud-storage   2.24.0               h3d9a0c8_0    conda-forge
libgpg-error              1.49                 h4f305b6_0    conda-forge
libgrpc                   1.62.2               h15f2491_0    conda-forge
libjpeg-turbo             3.0.0                hd590300_1    conda-forge
libkvikio                 24.06.00        cuda12_240605_gd3f15ec_0    rapidsai
liblapack                 3.9.0           22_linux64_openblas    conda-forge
libllvm14                 14.0.6               hcd5def8_4    conda-forge
libnghttp2                1.58.0               h47da74e_1    conda-forge
libnpp                    12.2.5.30                     0    nvidia
libnsl                    2.0.1                hd590300_0    conda-forge
libnvfatbin               12.4.127                      0    nvidia
libnvjitlink              12.2.140             hd3aeb46_0    conda-forge
libnvjpeg                 12.3.1.117                    0    nvidia
libopenblas               0.3.27          pthreads_h413a1c8_0    conda-forge
libparquet                16.1.0           h6a7eafb_6_cpu    conda-forge
libpciaccess              0.18                 hd590300_0    conda-forge
libpng                    1.6.43               h2797004_0    conda-forge
libprotobuf               4.25.3               h08a7969_0    conda-forge
libre2-11                 2023.09.01           h5a48ba9_2    conda-forge
librmm                    24.06.00        cuda12_240605_gd889275f_0    rapidsai
libsanitizer              12.4.0               h46f95d5_1    conda-forge
libsodium                 1.0.18               h36c2ea0_1    conda-forge
libsqlite                 3.45.3               h2797004_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx                 14.1.0               hc0a3c3a_1    conda-forge
libstdcxx-ng              13.2.0               hc0a3c3a_7    conda-forge
libsystemd0               255                  h3516f8a_1    conda-forge
libthrift                 0.19.0               hb90f79a_1    conda-forge
libtiff                   4.6.0                h1dd3fc0_3    conda-forge
libunwind                 1.6.2                h9c3ff4c_0    conda-forge
libutf8proc               2.8.0                h166bdaf_0    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libwebp-base              1.4.0                hd590300_0    conda-forge
libxcb                    1.15                 h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libzlib                   1.3.1                h4ab18f5_1    conda-forge
llvmlite                  0.43.0          py311hbde99c3_0    conda-forge
locket                    1.0.0              pyhd8ed1ab_0    conda-forge
lz4                       4.3.3           py311h38e4bf4_0    conda-forge
lz4-c                     1.9.4                hcb278e6_0    conda-forge
markdown-it-py            3.0.0              pyhd8ed1ab_0    conda-forge
markupsafe                2.1.5           py311h459d7ec_0    conda-forge
matplotlib-base           3.8.4           py311ha4ca890_2    conda-forge
matplotlib-inline         0.1.7              pyhd8ed1ab_0    conda-forge
mdurl                     0.1.2              pyhd8ed1ab_0    conda-forge
mplhep                    0.3.48             pyhd8ed1ab_0    conda-forge
mplhep_data               0.0.3              pyhd8ed1ab_0    conda-forge
msgpack-python            1.0.8           py311h52f7536_0    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
ncurses                   6.5                  h59595ed_0    conda-forge
nest-asyncio              1.6.0              pyhd8ed1ab_0    conda-forge
nsight-compute            2023.2.2.3                    0    nvidia/label/cuda-12.2.2
numba                     0.60.0          py311h4bc866e_0    conda-forge
numcodecs                 0.11.0          py311hcafe171_1    conda-forge
numpy                     1.26.4          py311h64a7726_0    conda-forge
nvcomp                    3.0.6                h10b603f_0    conda-forge
nvtop                     3.1.0                hefaacde_0    conda-forge
nvtx                      0.2.10          py311h459d7ec_0    conda-forge
openjpeg                  2.5.2                h488ebb8_0    conda-forge
openssl                   3.3.2                hb9d3cd8_0    conda-forge
orc                       2.0.1                h17fec99_1    conda-forge
packaging                 24.0               pyhd8ed1ab_0    conda-forge
pandas                    2.2.2           py311h14de704_1    conda-forge
parso                     0.8.4              pyhd8ed1ab_0    conda-forge
partd                     1.4.2              pyhd8ed1ab_0    conda-forge
pexpect                   4.9.0              pyhd8ed1ab_0    conda-forge
pickleshare               0.7.5                   py_1003    conda-forge
pillow                    10.3.0          py311h18e6fac_0    conda-forge
pip                       24.0               pyhd8ed1ab_0    conda-forge
platformdirs              4.2.2              pyhd8ed1ab_0    conda-forge
prompt-toolkit            3.0.46             pyha770c72_0    conda-forge
psutil                    5.9.8           py311h459d7ec_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
py-spy                    0.3.14               h87a5ac0_0    conda-forge
pyarrow                   16.1.0          py311h781c19f_1    conda-forge
pyarrow-core              16.1.0          py311h8e2c35d_1_cpu    conda-forge
pyarrow-hotfix            0.6                pyhd8ed1ab_0    conda-forge
pycparser                 2.22               pyhd8ed1ab_0    conda-forge
pydantic                  2.7.3              pyhd8ed1ab_0    conda-forge
pydantic-core             2.18.4          py311h5ecf98a_0    conda-forge
pygments                  2.18.0             pyhd8ed1ab_0    conda-forge
pynvjitlink               0.2.3           py311hdaa3023_0    rapidsai
pyparsing                 3.1.2              pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.11.9          hb806964_0_cpython    conda-forge
python-dateutil           2.9.0              pyhd8ed1ab_0    conda-forge
python-tzdata             2024.1             pyhd8ed1ab_0    conda-forge
python-xxhash             3.4.1           py311h459d7ec_0    conda-forge
python_abi                3.11                    4_cp311    conda-forge
pytz                      2024.1             pyhd8ed1ab_0    conda-forge
pyyaml                    6.0.1           py311h459d7ec_1    conda-forge
pyzmq                     26.0.3          py311h08a0b41_0    conda-forge
re2                       2023.09.01           h7f4b329_2    conda-forge
readline                  8.2                  h8228510_1    conda-forge
rich                      13.7.1             pyhd8ed1ab_0    conda-forge
rmm                       24.06.00        cuda12_py311_240605_gd889275f_0    rapidsai
s2n                       1.4.15               he19d79f_0    conda-forge
scipy                     1.13.1          py311h517d4fd_0    conda-forge
setuptools                70.0.0             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.2.0                hdb0a2a9_1    conda-forge
sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
spdlog                    1.12.0               hd2e6256_2    conda-forge
stack_data                0.6.2              pyhd8ed1ab_0    conda-forge
sysroot_linux-64          2.17                h4a8ded7_17    conda-forge
tblib                     3.0.0              pyhd8ed1ab_0    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
toml                      0.10.2             pyhd8ed1ab_0    conda-forge
toolz                     0.12.1             pyhd8ed1ab_0    conda-forge
tornado                   6.4             py311h459d7ec_0    conda-forge
tqdm                      4.66.4             pyhd8ed1ab_0    conda-forge
traitlets                 5.14.3             pyhd8ed1ab_0    conda-forge
typing-extensions         4.12.1               hd8ed1ab_0    conda-forge
typing_extensions         4.12.1             pyha770c72_0    conda-forge
tzdata                    2024a                h0c530f3_0    conda-forge
uhi                       0.4.0              pyhd8ed1ab_0    conda-forge
uproot                    5.3.7                ha770c72_0    conda-forge
uproot-base               5.3.7              pyhd8ed1ab_0    conda-forge
urllib3                   2.2.1              pyhd8ed1ab_0    conda-forge
wcwidth                   0.2.13             pyhd8ed1ab_0    conda-forge
wheel                     0.43.0             pyhd8ed1ab_1    conda-forge
xorg-libxau               1.0.11               hd590300_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xxhash                    0.8.2                hd590300_0    conda-forge
xyzservices               2024.4.0           pyhd8ed1ab_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
zarr                      2.18.2             pyhd8ed1ab_0    conda-forge
zeromq                    4.3.5                h75354e8_4    conda-forge
zict                      3.0.0              pyhd8ed1ab_0    conda-forge
zipp                      3.17.0             pyhd8ed1ab_0    conda-forge
zstandard                 0.22.0          py311hb6f056b_1    conda-forge
zstd                      1.5.6                ha6fb4c9_0    conda-forge

@fstrug
Copy link
Author

fstrug commented Oct 23, 2024

This is on a DGX-1, @fstrug what system are you running on?

Originally posted by @madsbk in #378

Some system information:

OS: AlmaLinux 9 version=6.3.12-200.fc38.x86_64

CPU: AMD EPYC 7543 32-Core Processor
Architecture - x86_64

GPU: NVIDIA A100 80GB PCIe 
Driver Version - 535.129.03 
Cuda Version -12.2

Kvikio environment created with
mamba create -n img_cuda12.2-kvikio -c rapidsai -c conda-forge python=3.12 cuda-version=12.2 kvikio

@madsbk
Copy link
Member

madsbk commented Oct 24, 2024

Hmm, running on a A100 also works for me:

mamba create -n img_cuda12.2-kvikio -c rapidsai -c conda-forge python=3.12 cuda-version=12.2 kvikio
conda activate img_cuda12.2-kvikio
pip install distributed nvidia-ml-py

(img_cuda12.2-kvikio) mkristensen@dgx-02:/lustre/mkristensen/repos/kvikio$ KVIKIO_COMPAT_MODE=OFF python python/kvikio/kvikio/benchmarks/single_node_io.py --dir /lustre/mkristensen/tmp/
Roundtrip benchmark
----------------------------------
GPU               | NVIDIA A100-SXM4-80GB (dev #0)
GPU Memory Total  | 80.00 GiB
BAR1 Memory Total | 128.00 GiB
GDS driver        | v2.23
GDS config.json   | /etc/cufile.json
----------------------------------
nbytes            | 10485760 bytes (10.00 MiB)
4K aligned        | True
pre-reg-buf       | False
directory         | /lustre/mkristensen/tmp
nthreads          | 1
nruns             | 1
==================================
cufile read       | 901.62 MiB/s
cufile write      | 634.03 MiB/s
posix read        | 799.14 MiB/s
posix write       | 852.70 MiB/s

Maybe it is something in /etc/cufile.json? This is my config:

{
    // NOTE : Application can override custom configuration via export CUFILE_ENV_PATH_JSON=<filepath>
    // e.g : export CUFILE_ENV_PATH_JSON="/home/<xxx>/cufile.json"

            "logging": {
                            // log directory, if not enabled will create log file under current working directory
                            //"dir": "/home/<xxxx>",

                            // NOTICE|ERROR|WARN|INFO|DEBUG|TRACE (in decreasing order of severity)
                            "level": "ERROR"
            },

            "profile": {
                            // nvtx profiling on/off
                            "nvtx": false,
                            // cufile stats level(0-3)
                            "cufile_stats": 0
            },

            "execution" : {
                    // max number of workitems in the queue;
                    "max_io_queue_depth": 128,
                    // max number of host threads per gpu to spawn for parallel IO
                    "max_io_threads" : 4,
                    // enable support for parallel IO
                    "parallel_io" : true,
                    // minimum IO threshold before splitting the IO
                    "min_io_threshold_size_kb" : 8192,
		    // maximum parallelism for a single request
		    "max_request_parallelism" : 4
            },

            "properties": {
                            // max IO chunk size (parameter should be multiples of 64K) used by cuFileRead/Write internally per IO request
                            "max_direct_io_size_kb" : 16384,
                            // device memory size (parameter should be 4K aligned) for reserving bounce buffers for the entire GPU
                            "max_device_cache_size_kb" : 131072,
                            // limit on maximum device memory size (parameter should be 4K aligned) that can be pinned for a given process
                            "max_device_pinned_mem_size_kb" : 33554432,
                            // true or false (true will enable asynchronous io submission to nvidia-fs driver)
                            // Note : currently the overall IO will still be synchronous
                            "use_poll_mode" : false,
                            // maximum IO request size (parameter should be 4K aligned) within or equal to which library will use polling for IO completion
                            "poll_mode_max_size_kb": 4,
                            // allow p2pdma, this will enable use of cuFile without nvme patches 
                            "use_pci_p2pdma": false,
                            // allow compat mode, this will enable use of cuFile posix read/writes
                            "allow_compat_mode": true,
                            // enable GDS write support for RDMA based storage
                            "gds_rdma_write_support": true,
                            // GDS batch size
                            "io_batchsize": 128,
                            // enable io priority w.r.t compute streams
                            // valid options are "default", "low", "med", "high"
                            "io_priority": "default",
                            // client-side rdma addr list for user-space file-systems(e.g ["10.0.1.0", "10.0.2.0"])
                            "rdma_dev_addr_list": [ ],
                            // load balancing policy for RDMA memory registration(MR), (RoundRobin, RoundRobinMaxMin)
                            // In RoundRobin, MRs will be distributed uniformly across NICS closest to a GPU
                            // In RoundRobinMaxMin, MRs will be distributed across NICS closest to a GPU
                            // with minimal sharing of NICS acros GPUS
                            "rdma_load_balancing_policy": "RoundRobin",
			    //32-bit dc key value in hex
			    //"rdma_dc_key": "0xffeeddcc", 
			    //To enable/disable different rdma OPs use the below bit map
			    //Bit 0 - If set enables Local RDMA WRITE
			    //Bit 1 - If set enables Remote RDMA WRITE
			    //Bit 2 - If set enables Remote RDMA READ
			    //Bit 3 - If set enables REMOTE RDMA Atomics
			    //Bit 4 - If set enables Relaxed ordering.
			    //"rdma_access_mask": "0x1f",
                            
                            // In platforms where IO transfer to a GPU will cause cross RootPort PCie transfers, enabling this feature
                            // might help improve overall BW provided there exists a GPU(s) with Root Port common to that of the storage NIC(s).
                            // If this feature is enabled, please provide the ip addresses used by the mount either in file-system specific
                            // section for mount_table or in the rdma_dev_addr_list property in properties section
                            "rdma_dynamic_routing": false,
                            // The order describes the sequence in which a policy is selected for dynamic routing for cross Root Port transfers
                            // If the first policy is not applicable, it will fallback to the next and so on.
                            // policy GPU_MEM_NVLINKS: use GPU memory with NVLink to transfer data between GPUs
                            // policy GPU_MEM: use GPU memory with PCIe to transfer data between GPUs
                            // policy SYS_MEM: use system memory with PCIe to transfer data to GPU
                            // policy P2P: use P2P PCIe to transfer across between NIC and GPU
                            "rdma_dynamic_routing_order": [ "GPU_MEM_NVLINKS", "GPU_MEM", "SYS_MEM", "P2P" ]
            },

            "fs": {
                    "generic": {

                            // for unaligned writes, setting it to true will, cuFileWrite use posix write internally instead of regular GDS write
                            "posix_unaligned_writes" : false
                    },

		    "beegfs" : {
                            // IO threshold for read/write (param should be 4K aligned)) equal to or below which cuFile will use posix read/write
                            "posix_gds_min_kb" : 0

                            // To restrict the IO to selected IP list, when dynamic routing is enabled
                            // if using a single BeeGFS mount, provide the ip addresses here
                            //"rdma_dev_addr_list" : []

                            // if using multiple lustre mounts, provide ip addresses used by respective mount here 
                            //"mount_table" : {
                            //                    "/beegfs/client1" : {
                            //                                    "rdma_dev_addr_list" : ["172.172.1.40", "172.172.1.42"]
                            //                    },

                            //                    "/beegfs/client2" : {
                            //                                    "rdma_dev_addr_list" : ["172.172.2.40", "172.172.2.42"]
                            //                    }
                            //}

		    },
                    "lustre": {

                            // IO threshold for read/write (param should be 4K aligned)) equal to or below which cuFile will use posix read/write
                            "posix_gds_min_kb" : 16

                            // To restrict the IO to selected IP list, when dynamic routing is enabled
                            // if using a single lustre mount, provide the ip addresses here (use : sudo lnetctl net show)
                            //"rdma_dev_addr_list" : []

                            // if using multiple lustre mounts, provide ip addresses used by respective mount here 
                            //"mount_table" : {
                            //                    "/lustre/ai200_01/client" : {
                            //                                    "rdma_dev_addr_list" : ["172.172.1.40", "172.172.1.42"]
                            //                    },

                            //                    "/lustre/ai200_02/client" : {
                            //                                    "rdma_dev_addr_list" : ["172.172.2.40", "172.172.2.42"]
                            //                    }
                            //}
                    },

                    "nfs": {

                           // To restrict the IO to selected IP list, when dynamic routing is enabled
                           //"rdma_dev_addr_list" : []

                           //"mount_table" : {
                           //                     "/mnt/nfsrdma_01/" : {
                           //                                     "rdma_dev_addr_list" : []
                           //                     },

                           //                     "/mnt/nfsrdma_02/" : {
                           //                                     "rdma_dev_addr_list" : []
                           //                     }
                           //}
                    },
                    
		    "gpfs": {
                           //allow GDS writes with GPFS
                           "gds_write_support": false,

                           //allow Async support
                           "gds_async_support": true

                           //"rdma_dev_addr_list" : []

                           //"mount_table" : {
                           //                     "/mnt/gpfs_01" : {
                           //                                     "rdma_dev_addr_list" : []
                           //                     },

                           //                     "/mnt/gpfs_02/" : {
                           //                                     "rdma_dev_addr_list" : []
                           //                     }
                           //}
                    },

                    "weka": {

                            // enable/disable RDMA write
                            "rdma_write_support" : false
                    }
            },

            "denylist": {
                            // specify list of vendor driver modules to deny for nvidia-fs (e.g. ["nvme" , "nvme_rdma"])
                            "drivers":  [ ],

                            // specify list of block devices to prevent IO using cuFile (e.g. [ "/dev/nvme0n1" ])
                            "devices": [ ],

                            // specify list of mount points to prevent IO using cuFile (e.g. ["/mnt/test"])
                            "mounts": [ ],

                            // specify list of file-systems to prevent IO using cuFile (e.g ["lustre", "wekafs"])
                            "filesystems": [ ]
            },
        
            "miscellaneous": {
                            // enable only for enforcing strict checks at API level for debugging
                            "api_check_aggressive": false
	    }
}

@fstrug
Copy link
Author

fstrug commented Oct 24, 2024

This file does not exist on our system /etc/cufile.json. If it's safe to copy and paste your config, I'll see about getting this configured for our systems or maybe there is a way to point to a different path?

@madsbk
Copy link
Member

madsbk commented Oct 25, 2024

Yes, it should be safe to use my config. You can use CUFILE_ENV_PATH_JSON to specify a custom config path.

@fstrug
Copy link
Author

fstrug commented Oct 28, 2024

Using the cufile.json above didn't resolve the issue. I should also share the cufile.log from this set-up using your cufile.json.

 28-10-2024 17:38:08:134 [pid=8196 tid=8196] ERROR  0:140 unable to load,  liburcu-bp.so.6 

 28-10-2024 17:38:08:134 [pid=8196 tid=8196] ERROR  0:140 unable to load,  liburcu-bp.so.1 

 28-10-2024 17:38:08:134 [pid=8196 tid=8196] WARN   0:168 failed to open /proc/driver/nvidia-fs/devcount  error: No such file or directory
 28-10-2024 17:38:08:134 [pid=8196 tid=8196] NOTICE  cufio-drv:720 running in compatible mode
 28-10-2024 17:38:08:423 [pid=8196 tid=8196] ERROR  0:140 unable to load,  libnuma.so.1.0.0 

 28-10-2024 17:38:08:423 [pid=8196 tid=8196] ERROR  0:91 dlopen error libnuma.so.1.0.0: cannot open shared object file: No such file or directory
 28-10-2024 17:38:08:427 [pid=8196 tid=8196] ERROR  cufio-fs:322 error creating udev_device for block device dev_no: 0:526
 28-10-2024 17:38:08:427 [pid=8196 tid=8196] ERROR  cufio-fs:742 error getting volume attributes error for device: dev_no: 0:526
 28-10-2024 17:38:08:427 [pid=8196 tid=8196] ERROR  cufio-obj:215 unable to get volume attributes for fd 36
 28-10-2024 17:38:08:427 [pid=8196 tid=8196] ERROR  cufio:310 cuFileHandleRegister error, failed to allocate file object
 28-10-2024 17:38:08:427 [pid=8196 tid=8196] ERROR  cufio:338 cuFileHandleRegister error: internal error

@KiranModukuri
Copy link

28-10-2024 17:38:08:134 [pid=8196 tid=8196] WARN 0:168 failed to open /proc/driver/nvidia-fs/devcount error: No such file or directory
Looks like the nvidia-fs driver is not loaded.

$ sudo lsmod | grep nvidia_fs

28-10-2024 17:38:08:427 [pid=8196 tid=8196] ERROR cufio-fs:322 error creating udev_device for block device dev_no: 0:526

28-10-2024 17:38:08:427 [pid=8196 tid=8196] ERROR cufio-obj:215 unable to get volume attributes for fd 36

This error indicates that the library is not understand the block device type for filesystem and get the volume attributes.

Where is the dataset located ?
Can you share details about the block device and mount information for the root_folder of your dataset.

$ lsblk
$ mount

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants