Skip to content

Conversation

griwodz
Copy link
Member

@griwodz griwodz commented Aug 12, 2024

Description

Remove use of NVTX from the develop branch. This is a profiling feature that is not needed for everybody out there. People who know about it can add the relevant lines themselves.

It is currently very troublesome because the transition from the NVTX library to header-only NVTX3 is complete on Windows, where NVTX has been removed, while NVTX3 does not even exist on Tegra, Jetson and maybe other Arm-based platforms. The removal is OK because NVTX is only relevant for very fine-grained debugging of the interplay between CPU and GPU. Its timing features can also be achieved by cudaEvents.

@griwodz griwodz added type:bug in progress cuda issues related to cuda versions bugfix labels Aug 12, 2024
@griwodz griwodz self-assigned this Aug 12, 2024
@griwodz griwodz changed the title First-order CUDA follow-up fixes First-order CUDA follow-up fix: troublesome transition to nvtx3 Aug 12, 2024
@griwodz griwodz force-pushed the dev/cmake-lang-cuda branch 3 times, most recently from ac217ae to 5c37b81 Compare August 12, 2024 10:08
@griwodz griwodz added ready and removed in progress labels Aug 12, 2024
@griwodz
Copy link
Member Author

griwodz commented Aug 12, 2024

Seems to be confirmed in #161 that the PR fixes the problem. #161 is also proposing a few updates in the vcpkg portfile.

@griwodz griwodz changed the title First-order CUDA follow-up fix: troublesome transition to nvtx3 First-order CUDA follow-up fix: do not use NVXT Aug 14, 2024
@griwodz griwodz changed the title First-order CUDA follow-up fix: do not use NVXT First-order CUDA follow-up fix: do not use NVTX Aug 14, 2024
This make trouble for continuous integration and is apparently not supported on all platforms.
Since it is a debug function, it's just as well to remove it from the mainstream tree.
@griwodz griwodz force-pushed the dev/cmake-lang-cuda branch from 7435c0f to abef1d4 Compare August 15, 2024 06:30
@griwodz
Copy link
Member Author

griwodz commented Aug 15, 2024

CUDA is always able to provide the timing of CUDA kernels running on the GPU to tools that observe this, either debuggers or performance analyzers. It is usually not able to register the timing and occupancy of threads on the CPU. NVTX is a mechanism that allows CPU code to add timestamps of its running threads into the same system.

That makes it possible to discover whether the CPU is working on something while the GPU is working on something else. It is not foolproof because NVTX knows nothing about the CPU's "occupancy" (whether it actually does something or is sleeping). But it can help to understand the communication between CPU and GPU better.

NVidia promises that it costs "nearly nothing", and I don't know how it actually compares to something like prof. It does create output for very nice performance analysers.

But it is most certainly not necessary to have it in the develop branch for everybody.

So while the transition from NVTX (version 2) to NVTX3 is ongoing and leads to cross-platform problems, I'd prefer to remove it entirely from the develop branch. There's probably nobody else than I who uses it anyway.

@griwodz
Copy link
Member Author

griwodz commented Jan 13, 2025

Without this PR, PopSift does not build on Linux with CUDA 12.6 any more.

BillyONeal added a commit to BillyONeal/popsift that referenced this pull request Aug 15, 2025
... rather than trying to macroize names popsift does not own.

This avoids a compile error [1] due to the macro breaking the declaration *inside NVTX* :

```c++
NVTX_DECLSPEC int NVTX_API nvtxRangePop(void);
```

note that the `(void)` means it tries to pass an argument to the **macro** nvtxRangePop because popsift is `#define`ing that macro prior to `#include`ing other CUDA headers.

Related: alicevision#162 is probably a better fix than this but as it outright removes a feature we are hesitant to add that as a patch in vcpkg. If alicevision#162 is accepted, this PR should be closed as moot.

[1]:
```console
[1/31] cd /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift && /vcpkg/downloads/tools/cmake-3.30.1-linux/cmake-3.30.1-linux-x86_64/bin/cmake -E make_directory /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/. && /vcpkg/downloads/tools/cmake-3.30.1-linux/cmake-3.30.1-linux-x86_64/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=Debug -D generated_file:STRING=/vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/./popsift_generated_s_filtergrid.cu.o -D generated_cubin_file:STRING=/vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/./popsift_generated_s_filtergrid.cu.o.cubin.txt -P /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/popsift_generated_s_filtergrid.cu.o.Debug.cmake
FAILED: src/CMakeFiles/popsift.dir/popsift/popsift_generated_s_filtergrid.cu.o /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/popsift_generated_s_filtergrid.cu.o
cd /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift && /vcpkg/downloads/tools/cmake-3.30.1-linux/cmake-3.30.1-linux-x86_64/bin/cmake -E make_directory /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/. && /vcpkg/downloads/tools/cmake-3.30.1-linux/cmake-3.30.1-linux-x86_64/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=Debug -D generated_file:STRING=/vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/./popsift_generated_s_filtergrid.cu.o -D generated_cubin_file:STRING=/vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/./popsift_generated_s_filtergrid.cu.o.cubin.txt -P /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/popsift_generated_s_filtergrid.cu.o.Debug.cmake
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
In file included from /usr/local/cuda-12.9/include/nvtx3/nvtx3.hpp:657,
                 from /usr/local/cuda-12.9/include/cub/detail/nvtx.cuh:56,
                 from /usr/local/cuda-12.9/include/cub/device/device_for.cuh:40,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/parallel_for.h:43,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/swap_ranges.h:43,
                 from /usr/local/cuda-12.9/include/thrust/system/detail/adl/swap_ranges.h:50,
                 from /usr/local/cuda-12.9/include/thrust/detail/swap_ranges.inl:31,
                 from /usr/local/cuda-12.9/include/thrust/swap.h:147,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/iter_swap.h:34,
                 from /usr/local/cuda-12.9/include/thrust/system/detail/adl/iter_swap.h:50,
                 from /usr/local/cuda-12.9/include/thrust/detail/reference.h:37,
                 from /usr/local/cuda-12.9/include/thrust/memory.h:35,
                 from /usr/local/cuda-12.9/include/thrust/detail/allocator/temporary_allocator.h:31,
                 from /usr/local/cuda-12.9/include/thrust/detail/temporary_array.h:46,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/internal/copy_cross_system.h:47,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/copy.h:79,
                 from /usr/local/cuda-12.9/include/thrust/system/detail/adl/copy.h:50,
                 from /usr/local/cuda-12.9/include/thrust/detail/copy.inl:29,
                 from /usr/local/cuda-12.9/include/thrust/detail/copy.h:72,
                 from /usr/local/cuda-12.9/include/thrust/copy.h:505,
                 from /vcpkg/buildtrees/popsift/src/v0.9-ba1d7e1254.clean/src/popsift/s_filtergrid.cu:21:
/usr/local/cuda-12.9/include/nvtx3/nvToolsExt.h:1063:45: error: macro "nvtxRangePop" passed 1 arguments, but takes just 0
 1063 | NVTX_DECLSPEC int NVTX_API nvtxRangePop(void);
      |                                             ^
/vcpkg/buildtrees/popsift/src/v0.9-ba1d7e1254.clean/src/popsift/s_filtergrid.cu:16: note: macro "nvtxRangePop" defined here
   16 | #define nvtxRangePop()
      |
In file included from /usr/local/cuda-12.9/include/nvtx3/nvtxDetail/nvtxImpl.h:367,
                 from /usr/local/cuda-12.9/include/nvtx3/nvToolsExt.h:1619,
                 from /usr/local/cuda-12.9/include/nvtx3/nvtx3.hpp:657,
                 from /usr/local/cuda-12.9/include/cub/detail/nvtx.cuh:56,
                 from /usr/local/cuda-12.9/include/cub/device/device_for.cuh:40,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/parallel_for.h:43,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/swap_ranges.h:43,
                 from /usr/local/cuda-12.9/include/thrust/system/detail/adl/swap_ranges.h:50,
                 from /usr/local/cuda-12.9/include/thrust/detail/swap_ranges.inl:31,
                 from /usr/local/cuda-12.9/include/thrust/swap.h:147,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/iter_swap.h:34,
                 from /usr/local/cuda-12.9/include/thrust/system/detail/adl/iter_swap.h:50,
                 from /usr/local/cuda-12.9/include/thrust/detail/reference.h:37,
                 from /usr/local/cuda-12.9/include/thrust/memory.h:35,
                 from /usr/local/cuda-12.9/include/thrust/detail/allocator/temporary_allocator.h:31,
                 from /usr/local/cuda-12.9/include/thrust/detail/temporary_array.h:46,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/internal/copy_cross_system.h:47,
                 from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/copy.h:79,
                 from /usr/local/cuda-12.9/include/thrust/system/detail/adl/copy.h:50,
                 from /usr/local/cuda-12.9/include/thrust/detail/copy.inl:29,
                 from /usr/local/cuda-12.9/include/thrust/detail/copy.h:72,
                 from /usr/local/cuda-12.9/include/thrust/copy.h:505,
                 from /vcpkg/buildtrees/popsift/src/v0.9-ba1d7e1254.clean/src/popsift/s_filtergrid.cu:21:
/usr/local/cuda-12.9/include/nvtx3/nvtxDetail/nvtxImplCore.h:133:45: error: macro "nvtxRangePop" passed 1 arguments, but takes just 0
  133 | NVTX_DECLSPEC int NVTX_API nvtxRangePop(void)
      |                                             ^
/vcpkg/buildtrees/popsift/src/v0.9-ba1d7e1254.clean/src/popsift/s_filtergrid.cu:16: note: macro "nvtxRangePop" defined here
   16 | #define nvtxRangePop()
      |
CMake Error at popsift_generated_s_filtergrid.cu.o.Debug.cmake:220 (message):
  Error generating
  /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/./popsift_generated_s_filtergrid.cu.o
```
@simogasp simogasp added this to the v0.10 milestone Aug 16, 2025
@NathanMOlson NathanMOlson mentioned this pull request Aug 24, 2025
3 tasks
@simogasp simogasp mentioned this pull request Aug 25, 2025
@simogasp simogasp requested a review from Copilot August 26, 2025 08:33
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Remove NVTX profiling support from the PopSift codebase to resolve cross-platform compatibility issues. The NVTX library has transitioned to header-only NVTX3 on Windows while being unavailable on ARM-based platforms like Tegra and Jetson.

  • Removed all NVTX-related preprocessor directives and function calls across CUDA source files
  • Cleaned up build configuration by removing NVTX-related CMake options and dependencies
  • Updated CI configuration to remove NVTX package downloads and build flags

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/popsift/sift_pyramid.cu Removed NVTX header includes and profiling calls around descriptor operations
src/popsift/sift_desc.cu Removed NVTX header includes and profiling calls around orientation count reading
src/popsift/s_orientation.cu Removed NVTX header includes and macro definitions
src/popsift/s_image.cu Removed NVTX profiling calls around memory allocation operations
src/popsift/s_filtergrid.cu Removed NVTX profiling calls around counter write operations
src/popsift/popsift.h Removed NVTX header includes and range ID member variable
src/popsift/popsift.cu Removed NVTX profiling calls around image processing operations
src/application/CMakeLists.txt Added CMake policy setting for Boost compatibility
src/CMakeLists.txt Removed NVTX library linking configuration
cudaInstallAppveyor.cmd Removed NVTX package download and installation steps
cmake/sift_config.h.in Removed NVTX configuration macro definition
appveyor.yml Removed NVTX profiling build flag from cmake command
CMakeLists.txt Removed all NVTX-related build options and configuration

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@simogasp
Copy link
Member

@griwodz Do you think we can merge it?

@griwodz
Copy link
Member Author

griwodz commented Aug 26, 2025 via email

NathanMOlson added a commit to NathanMOlson/popsift that referenced this pull request Aug 26, 2025
@simogasp simogasp merged commit 7d59a27 into develop Aug 26, 2025
6 checks passed
@simogasp simogasp deleted the dev/cmake-lang-cuda branch August 26, 2025 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] Cannot built in vcpkg - MSVS2022 + Cuda 12.6

2 participants