-
Notifications
You must be signed in to change notification settings - Fork 123
Guard calls to nvtx directly #168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
... rather than trying to macroize names popsift does not own. This avoids a compile error [1] due to the macro breaking the declaration *inside NVTX* : ```c++ NVTX_DECLSPEC int NVTX_API nvtxRangePop(void); ``` note that the `(void)` means it tries to pass an argument to the **macro** nvtxRangePop because popsift is `#define`ing that macro prior to `#include`ing other CUDA headers. Related: alicevision#162 is probably a better fix than this but as it outright removes a feature we are hesitant to add that as a patch in vcpkg. If alicevision#162 is accepted, this PR should be closed as moot. [1]: ```console [1/31] cd /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift && /vcpkg/downloads/tools/cmake-3.30.1-linux/cmake-3.30.1-linux-x86_64/bin/cmake -E make_directory /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/. && /vcpkg/downloads/tools/cmake-3.30.1-linux/cmake-3.30.1-linux-x86_64/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=Debug -D generated_file:STRING=/vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/./popsift_generated_s_filtergrid.cu.o -D generated_cubin_file:STRING=/vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/./popsift_generated_s_filtergrid.cu.o.cubin.txt -P /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/popsift_generated_s_filtergrid.cu.o.Debug.cmake FAILED: src/CMakeFiles/popsift.dir/popsift/popsift_generated_s_filtergrid.cu.o /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/popsift_generated_s_filtergrid.cu.o cd /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift && /vcpkg/downloads/tools/cmake-3.30.1-linux/cmake-3.30.1-linux-x86_64/bin/cmake -E make_directory /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/. && /vcpkg/downloads/tools/cmake-3.30.1-linux/cmake-3.30.1-linux-x86_64/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=Debug -D generated_file:STRING=/vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/./popsift_generated_s_filtergrid.cu.o -D generated_cubin_file:STRING=/vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/./popsift_generated_s_filtergrid.cu.o.cubin.txt -P /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/popsift_generated_s_filtergrid.cu.o.Debug.cmake nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). In file included from /usr/local/cuda-12.9/include/nvtx3/nvtx3.hpp:657, from /usr/local/cuda-12.9/include/cub/detail/nvtx.cuh:56, from /usr/local/cuda-12.9/include/cub/device/device_for.cuh:40, from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/parallel_for.h:43, from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/swap_ranges.h:43, from /usr/local/cuda-12.9/include/thrust/system/detail/adl/swap_ranges.h:50, from /usr/local/cuda-12.9/include/thrust/detail/swap_ranges.inl:31, from /usr/local/cuda-12.9/include/thrust/swap.h:147, from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/iter_swap.h:34, from /usr/local/cuda-12.9/include/thrust/system/detail/adl/iter_swap.h:50, from /usr/local/cuda-12.9/include/thrust/detail/reference.h:37, from /usr/local/cuda-12.9/include/thrust/memory.h:35, from /usr/local/cuda-12.9/include/thrust/detail/allocator/temporary_allocator.h:31, from /usr/local/cuda-12.9/include/thrust/detail/temporary_array.h:46, from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/internal/copy_cross_system.h:47, from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/copy.h:79, from /usr/local/cuda-12.9/include/thrust/system/detail/adl/copy.h:50, from /usr/local/cuda-12.9/include/thrust/detail/copy.inl:29, from /usr/local/cuda-12.9/include/thrust/detail/copy.h:72, from /usr/local/cuda-12.9/include/thrust/copy.h:505, from /vcpkg/buildtrees/popsift/src/v0.9-ba1d7e1254.clean/src/popsift/s_filtergrid.cu:21: /usr/local/cuda-12.9/include/nvtx3/nvToolsExt.h:1063:45: error: macro "nvtxRangePop" passed 1 arguments, but takes just 0 1063 | NVTX_DECLSPEC int NVTX_API nvtxRangePop(void); | ^ /vcpkg/buildtrees/popsift/src/v0.9-ba1d7e1254.clean/src/popsift/s_filtergrid.cu:16: note: macro "nvtxRangePop" defined here 16 | #define nvtxRangePop() | In file included from /usr/local/cuda-12.9/include/nvtx3/nvtxDetail/nvtxImpl.h:367, from /usr/local/cuda-12.9/include/nvtx3/nvToolsExt.h:1619, from /usr/local/cuda-12.9/include/nvtx3/nvtx3.hpp:657, from /usr/local/cuda-12.9/include/cub/detail/nvtx.cuh:56, from /usr/local/cuda-12.9/include/cub/device/device_for.cuh:40, from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/parallel_for.h:43, from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/swap_ranges.h:43, from /usr/local/cuda-12.9/include/thrust/system/detail/adl/swap_ranges.h:50, from /usr/local/cuda-12.9/include/thrust/detail/swap_ranges.inl:31, from /usr/local/cuda-12.9/include/thrust/swap.h:147, from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/iter_swap.h:34, from /usr/local/cuda-12.9/include/thrust/system/detail/adl/iter_swap.h:50, from /usr/local/cuda-12.9/include/thrust/detail/reference.h:37, from /usr/local/cuda-12.9/include/thrust/memory.h:35, from /usr/local/cuda-12.9/include/thrust/detail/allocator/temporary_allocator.h:31, from /usr/local/cuda-12.9/include/thrust/detail/temporary_array.h:46, from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/internal/copy_cross_system.h:47, from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/copy.h:79, from /usr/local/cuda-12.9/include/thrust/system/detail/adl/copy.h:50, from /usr/local/cuda-12.9/include/thrust/detail/copy.inl:29, from /usr/local/cuda-12.9/include/thrust/detail/copy.h:72, from /usr/local/cuda-12.9/include/thrust/copy.h:505, from /vcpkg/buildtrees/popsift/src/v0.9-ba1d7e1254.clean/src/popsift/s_filtergrid.cu:21: /usr/local/cuda-12.9/include/nvtx3/nvtxDetail/nvtxImplCore.h:133:45: error: macro "nvtxRangePop" passed 1 arguments, but takes just 0 133 | NVTX_DECLSPEC int NVTX_API nvtxRangePop(void) | ^ /vcpkg/buildtrees/popsift/src/v0.9-ba1d7e1254.clean/src/popsift/s_filtergrid.cu:16: note: macro "nvtxRangePop" defined here 16 | #define nvtxRangePop() | CMake Error at popsift_generated_s_filtergrid.cu.o.Debug.cmake:220 (message): Error generating /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/./popsift_generated_s_filtergrid.cu.o ```
Hi @BillyONeal , thanks for the PR. |
The reason for PR#162 was the continuous conflict between NVTX versions across several versions of the CUDA SDK, along with difference between Windows/Intel, Linux/Intel and Linux/ARM. One had only NVTX3, one only NVTX2, and one had both. We do attempt to support several of the recent CUDA versions; for example, the SDK for Tegra tends to lag behind the SDKs for Intel. This PR is very welcome by adding NVTX3 only and drop NVTX2 support completely. Still, since NVidia has now changed the NVTX API and implementation completely 3 times, for readability, I would prefer a reintroduced NVTX that is supported through PopSift-specific macros that use only a single line in the code for every NVTX call made (instead of #ifdef / #endif in many places). Those macros could be in a new header file in popsift/common, and it would be nice to have an option in CMake to (a) choose expliclity to disable it, and (b) provide a location where it can be easily disabled by CMake checks for unexpected platform/SDK combinations. |
#162 has been merged so this PR is now moot. Thanks for your consideration and happy that we'll have a path forward on modern CUDA :) |
... rather than trying to macroize names popsift does not own.
This avoids a compile error [1] due to the macro breaking the declaration inside NVTX :
note that the
(void)
means it tries to pass an argument to the macro nvtxRangePop because popsift is#define
ing that macro prior to#include
ing other CUDA headers.Related: #162 is probably a better fix than this but as it outright removes a feature we are hesitant to add that as a patch in vcpkg. If #162 is accepted, this PR should be closed as moot.
[1]: