Skip to content

Conversation

BillyONeal
Copy link
Contributor

@BillyONeal BillyONeal commented Aug 15, 2025

... rather than trying to macroize names popsift does not own.

This avoids a compile error [1] due to the macro breaking the declaration inside NVTX :

NVTX_DECLSPEC int NVTX_API nvtxRangePop(void);

note that the (void) means it tries to pass an argument to the macro nvtxRangePop because popsift is #defineing that macro prior to #includeing other CUDA headers.

Related: #162 is probably a better fix than this but as it outright removes a feature we are hesitant to add that as a patch in vcpkg. If #162 is accepted, this PR should be closed as moot.

[1]:

[1/31] cd /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift && /vcpkg/downloads/tools/cmake-3.30.1-linux/cmake-3.30.1-linux-x86_64/bin/cmake -E make_directory /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/. && /vcpkg/downloads/tools/cmake-3.30.1-linux/cmake-3.30.1-linux-x86_64/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=Debug -D generated_file:STRING=/vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/./popsift_generated_s_filtergrid.cu.o -D generated_cubin_file:STRING=/vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/./popsift_generated_s_filtergrid.cu.o.cubin.txt -P /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/popsift_generated_s_filtergrid.cu.o.Debug.cmake
FAILED: src/CMakeFiles/popsift.dir/popsift/popsift_generated_s_filtergrid.cu.o /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/popsift_generated_s_filtergrid.cu.o
cd /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift && /vcpkg/downloads/tools/cmake-3.30.1-linux/cmake-3.30.1-linux-x86_64/bin/cmake -E make_directory /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/. && /vcpkg/downloads/tools/cmake-3.30.1-linux/cmake-3.30.1-linux-x86_64/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=Debug -D generated_file:STRING=/vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/./popsift_generated_s_filtergrid.cu.o -D generated_cubin_file:STRING=/vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/./popsift_generated_s_filtergrid.cu.o.cubin.txt -P /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/popsift_generated_s_filtergrid.cu.o.Debug.cmake
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
In file included from /usr/local/cuda-12.9/include/nvtx3/nvtx3.hpp:657,
                 from /usr/local/cuda-12.9/include/cub/detail/nvtx.cuh:56,
                 from /usr/local/cuda-12.9/include/cub/device/device_for.cuh:40,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/parallel_for.h:43,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/swap_ranges.h:43,
                 from /usr/local/cuda-12.9/include/thrust/system/detail/adl/swap_ranges.h:50,
                 from /usr/local/cuda-12.9/include/thrust/detail/swap_ranges.inl:31,
                 from /usr/local/cuda-12.9/include/thrust/swap.h:147,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/iter_swap.h:34,
                 from /usr/local/cuda-12.9/include/thrust/system/detail/adl/iter_swap.h:50,
                 from /usr/local/cuda-12.9/include/thrust/detail/reference.h:37,
                 from /usr/local/cuda-12.9/include/thrust/memory.h:35,
                 from /usr/local/cuda-12.9/include/thrust/detail/allocator/temporary_allocator.h:31,
                 from /usr/local/cuda-12.9/include/thrust/detail/temporary_array.h:46,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/internal/copy_cross_system.h:47,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/copy.h:79,
                 from /usr/local/cuda-12.9/include/thrust/system/detail/adl/copy.h:50,
                 from /usr/local/cuda-12.9/include/thrust/detail/copy.inl:29,
                 from /usr/local/cuda-12.9/include/thrust/detail/copy.h:72,
                 from /usr/local/cuda-12.9/include/thrust/copy.h:505,
                 from /vcpkg/buildtrees/popsift/src/v0.9-ba1d7e1254.clean/src/popsift/s_filtergrid.cu:21:
/usr/local/cuda-12.9/include/nvtx3/nvToolsExt.h:1063:45: error: macro "nvtxRangePop" passed 1 arguments, but takes just 0
 1063 | NVTX_DECLSPEC int NVTX_API nvtxRangePop(void);
      |                                             ^
/vcpkg/buildtrees/popsift/src/v0.9-ba1d7e1254.clean/src/popsift/s_filtergrid.cu:16: note: macro "nvtxRangePop" defined here
   16 | #define nvtxRangePop()
      |
In file included from /usr/local/cuda-12.9/include/nvtx3/nvtxDetail/nvtxImpl.h:367,
                 from /usr/local/cuda-12.9/include/nvtx3/nvToolsExt.h:1619,
                 from /usr/local/cuda-12.9/include/nvtx3/nvtx3.hpp:657,
                 from /usr/local/cuda-12.9/include/cub/detail/nvtx.cuh:56,
                 from /usr/local/cuda-12.9/include/cub/device/device_for.cuh:40,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/parallel_for.h:43,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/swap_ranges.h:43,
                 from /usr/local/cuda-12.9/include/thrust/system/detail/adl/swap_ranges.h:50,
                 from /usr/local/cuda-12.9/include/thrust/detail/swap_ranges.inl:31,
                 from /usr/local/cuda-12.9/include/thrust/swap.h:147,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/iter_swap.h:34,
                 from /usr/local/cuda-12.9/include/thrust/system/detail/adl/iter_swap.h:50,
                 from /usr/local/cuda-12.9/include/thrust/detail/reference.h:37,
                 from /usr/local/cuda-12.9/include/thrust/memory.h:35,
                 from /usr/local/cuda-12.9/include/thrust/detail/allocator/temporary_allocator.h:31,
                 from /usr/local/cuda-12.9/include/thrust/detail/temporary_array.h:46,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/internal/copy_cross_system.h:47,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/copy.h:79,
                 from /usr/local/cuda-12.9/include/thrust/system/detail/adl/copy.h:50,
                 from /usr/local/cuda-12.9/include/thrust/detail/copy.inl:29,
                 from /usr/local/cuda-12.9/include/thrust/detail/copy.h:72,
                 from /usr/local/cuda-12.9/include/thrust/copy.h:505,
                 from /vcpkg/buildtrees/popsift/src/v0.9-ba1d7e1254.clean/src/popsift/s_filtergrid.cu:21:
/usr/local/cuda-12.9/include/nvtx3/nvtxDetail/nvtxImplCore.h:133:45: error: macro "nvtxRangePop" passed 1 arguments, but takes just 0
  133 | NVTX_DECLSPEC int NVTX_API nvtxRangePop(void)
      |                                             ^
/vcpkg/buildtrees/popsift/src/v0.9-ba1d7e1254.clean/src/popsift/s_filtergrid.cu:16: note: macro "nvtxRangePop" defined here
   16 | #define nvtxRangePop()
      |
CMake Error at popsift_generated_s_filtergrid.cu.o.Debug.cmake:220 (message):
  Error generating
  /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/./popsift_generated_s_filtergrid.cu.o

... rather than trying to macroize names popsift does not own.

This avoids a compile error [1] due to the macro breaking the declaration *inside NVTX* :

```c++
NVTX_DECLSPEC int NVTX_API nvtxRangePop(void);
```

note that the `(void)` means it tries to pass an argument to the **macro** nvtxRangePop because popsift is `#define`ing that macro prior to `#include`ing other CUDA headers.

Related: alicevision#162 is probably a better fix than this but as it outright removes a feature we are hesitant to add that as a patch in vcpkg. If alicevision#162 is accepted, this PR should be closed as moot.

[1]:
```console
[1/31] cd /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift && /vcpkg/downloads/tools/cmake-3.30.1-linux/cmake-3.30.1-linux-x86_64/bin/cmake -E make_directory /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/. && /vcpkg/downloads/tools/cmake-3.30.1-linux/cmake-3.30.1-linux-x86_64/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=Debug -D generated_file:STRING=/vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/./popsift_generated_s_filtergrid.cu.o -D generated_cubin_file:STRING=/vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/./popsift_generated_s_filtergrid.cu.o.cubin.txt -P /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/popsift_generated_s_filtergrid.cu.o.Debug.cmake
FAILED: src/CMakeFiles/popsift.dir/popsift/popsift_generated_s_filtergrid.cu.o /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/popsift_generated_s_filtergrid.cu.o
cd /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift && /vcpkg/downloads/tools/cmake-3.30.1-linux/cmake-3.30.1-linux-x86_64/bin/cmake -E make_directory /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/. && /vcpkg/downloads/tools/cmake-3.30.1-linux/cmake-3.30.1-linux-x86_64/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=Debug -D generated_file:STRING=/vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/./popsift_generated_s_filtergrid.cu.o -D generated_cubin_file:STRING=/vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/./popsift_generated_s_filtergrid.cu.o.cubin.txt -P /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/popsift_generated_s_filtergrid.cu.o.Debug.cmake
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
In file included from /usr/local/cuda-12.9/include/nvtx3/nvtx3.hpp:657,
                 from /usr/local/cuda-12.9/include/cub/detail/nvtx.cuh:56,
                 from /usr/local/cuda-12.9/include/cub/device/device_for.cuh:40,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/parallel_for.h:43,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/swap_ranges.h:43,
                 from /usr/local/cuda-12.9/include/thrust/system/detail/adl/swap_ranges.h:50,
                 from /usr/local/cuda-12.9/include/thrust/detail/swap_ranges.inl:31,
                 from /usr/local/cuda-12.9/include/thrust/swap.h:147,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/iter_swap.h:34,
                 from /usr/local/cuda-12.9/include/thrust/system/detail/adl/iter_swap.h:50,
                 from /usr/local/cuda-12.9/include/thrust/detail/reference.h:37,
                 from /usr/local/cuda-12.9/include/thrust/memory.h:35,
                 from /usr/local/cuda-12.9/include/thrust/detail/allocator/temporary_allocator.h:31,
                 from /usr/local/cuda-12.9/include/thrust/detail/temporary_array.h:46,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/internal/copy_cross_system.h:47,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/copy.h:79,
                 from /usr/local/cuda-12.9/include/thrust/system/detail/adl/copy.h:50,
                 from /usr/local/cuda-12.9/include/thrust/detail/copy.inl:29,
                 from /usr/local/cuda-12.9/include/thrust/detail/copy.h:72,
                 from /usr/local/cuda-12.9/include/thrust/copy.h:505,
                 from /vcpkg/buildtrees/popsift/src/v0.9-ba1d7e1254.clean/src/popsift/s_filtergrid.cu:21:
/usr/local/cuda-12.9/include/nvtx3/nvToolsExt.h:1063:45: error: macro "nvtxRangePop" passed 1 arguments, but takes just 0
 1063 | NVTX_DECLSPEC int NVTX_API nvtxRangePop(void);
      |                                             ^
/vcpkg/buildtrees/popsift/src/v0.9-ba1d7e1254.clean/src/popsift/s_filtergrid.cu:16: note: macro "nvtxRangePop" defined here
   16 | #define nvtxRangePop()
      |
In file included from /usr/local/cuda-12.9/include/nvtx3/nvtxDetail/nvtxImpl.h:367,
                 from /usr/local/cuda-12.9/include/nvtx3/nvToolsExt.h:1619,
                 from /usr/local/cuda-12.9/include/nvtx3/nvtx3.hpp:657,
                 from /usr/local/cuda-12.9/include/cub/detail/nvtx.cuh:56,
                 from /usr/local/cuda-12.9/include/cub/device/device_for.cuh:40,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/parallel_for.h:43,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/swap_ranges.h:43,
                 from /usr/local/cuda-12.9/include/thrust/system/detail/adl/swap_ranges.h:50,
                 from /usr/local/cuda-12.9/include/thrust/detail/swap_ranges.inl:31,
                 from /usr/local/cuda-12.9/include/thrust/swap.h:147,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/iter_swap.h:34,
                 from /usr/local/cuda-12.9/include/thrust/system/detail/adl/iter_swap.h:50,
                 from /usr/local/cuda-12.9/include/thrust/detail/reference.h:37,
                 from /usr/local/cuda-12.9/include/thrust/memory.h:35,
                 from /usr/local/cuda-12.9/include/thrust/detail/allocator/temporary_allocator.h:31,
                 from /usr/local/cuda-12.9/include/thrust/detail/temporary_array.h:46,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/internal/copy_cross_system.h:47,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/copy.h:79,
                 from /usr/local/cuda-12.9/include/thrust/system/detail/adl/copy.h:50,
                 from /usr/local/cuda-12.9/include/thrust/detail/copy.inl:29,
                 from /usr/local/cuda-12.9/include/thrust/detail/copy.h:72,
                 from /usr/local/cuda-12.9/include/thrust/copy.h:505,
                 from /vcpkg/buildtrees/popsift/src/v0.9-ba1d7e1254.clean/src/popsift/s_filtergrid.cu:21:
/usr/local/cuda-12.9/include/nvtx3/nvtxDetail/nvtxImplCore.h:133:45: error: macro "nvtxRangePop" passed 1 arguments, but takes just 0
  133 | NVTX_DECLSPEC int NVTX_API nvtxRangePop(void)
      |                                             ^
/vcpkg/buildtrees/popsift/src/v0.9-ba1d7e1254.clean/src/popsift/s_filtergrid.cu:16: note: macro "nvtxRangePop" defined here
   16 | #define nvtxRangePop()
      |
CMake Error at popsift_generated_s_filtergrid.cu.o.Debug.cmake:220 (message):
  Error generating
  /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/./popsift_generated_s_filtergrid.cu.o
```
@simogasp simogasp added this to the v0.10 milestone Aug 16, 2025
@simogasp simogasp added the cuda issues related to cuda versions label Aug 16, 2025
@NathanMOlson NathanMOlson mentioned this pull request Aug 24, 2025
3 tasks
@simogasp simogasp mentioned this pull request Aug 25, 2025
@simogasp
Copy link
Member

Hi @BillyONeal , thanks for the PR.
We'll try to merge the original PR that removes nvtx altogether and make a new minor release so that the port on vcpkg can also be updated.

@griwodz
Copy link
Member

griwodz commented Aug 26, 2025

The reason for PR#162 was the continuous conflict between NVTX versions across several versions of the CUDA SDK, along with difference between Windows/Intel, Linux/Intel and Linux/ARM. One had only NVTX3, one only NVTX2, and one had both. We do attempt to support several of the recent CUDA versions; for example, the SDK for Tegra tends to lag behind the SDKs for Intel.

This PR is very welcome by adding NVTX3 only and drop NVTX2 support completely.
And I agree that redefining the original definition of the CUDA functions/macros was a bad idea.
I agree with @simogasp that this PR should be applied instead of (or after) #162.

Still, since NVidia has now changed the NVTX API and implementation completely 3 times, for readability, I would prefer a reintroduced NVTX that is supported through PopSift-specific macros that use only a single line in the code for every NVTX call made (instead of #ifdef / #endif in many places).

Those macros could be in a new header file in popsift/common, and it would be nice to have an option in CMake to (a) choose expliclity to disable it, and (b) provide a location where it can be easily disabled by CMake checks for unexpected platform/SDK combinations.

@BillyONeal
Copy link
Contributor Author

#162 has been merged so this PR is now moot. Thanks for your consideration and happy that we'll have a path forward on modern CUDA :)

@BillyONeal BillyONeal closed this Aug 26, 2025
@BillyONeal BillyONeal deleted the avoid-macroizing-nvtx branch August 26, 2025 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda issues related to cuda versions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants