-
Notifications
You must be signed in to change notification settings - Fork 123
First-order CUDA follow-up fix: do not use NVTX #162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ac217ae
to
5c37b81
Compare
This make trouble for continuous integration and is apparently not supported on all platforms. Since it is a debug function, it's just as well to remove it from the mainstream tree.
7435c0f
to
abef1d4
Compare
CUDA is always able to provide the timing of CUDA kernels running on the GPU to tools that observe this, either debuggers or performance analyzers. It is usually not able to register the timing and occupancy of threads on the CPU. NVTX is a mechanism that allows CPU code to add timestamps of its running threads into the same system. That makes it possible to discover whether the CPU is working on something while the GPU is working on something else. It is not foolproof because NVTX knows nothing about the CPU's "occupancy" (whether it actually does something or is sleeping). But it can help to understand the communication between CPU and GPU better. NVidia promises that it costs "nearly nothing", and I don't know how it actually compares to something like prof. It does create output for very nice performance analysers. But it is most certainly not necessary to have it in the develop branch for everybody. So while the transition from NVTX (version 2) to NVTX3 is ongoing and leads to cross-platform problems, I'd prefer to remove it entirely from the develop branch. There's probably nobody else than I who uses it anyway. |
Without this PR, PopSift does not build on Linux with CUDA 12.6 any more. |
... rather than trying to macroize names popsift does not own. This avoids a compile error [1] due to the macro breaking the declaration *inside NVTX* : ```c++ NVTX_DECLSPEC int NVTX_API nvtxRangePop(void); ``` note that the `(void)` means it tries to pass an argument to the **macro** nvtxRangePop because popsift is `#define`ing that macro prior to `#include`ing other CUDA headers. Related: alicevision#162 is probably a better fix than this but as it outright removes a feature we are hesitant to add that as a patch in vcpkg. If alicevision#162 is accepted, this PR should be closed as moot. [1]: ```console [1/31] cd /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift && /vcpkg/downloads/tools/cmake-3.30.1-linux/cmake-3.30.1-linux-x86_64/bin/cmake -E make_directory /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/. && /vcpkg/downloads/tools/cmake-3.30.1-linux/cmake-3.30.1-linux-x86_64/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=Debug -D generated_file:STRING=/vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/./popsift_generated_s_filtergrid.cu.o -D generated_cubin_file:STRING=/vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/./popsift_generated_s_filtergrid.cu.o.cubin.txt -P /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/popsift_generated_s_filtergrid.cu.o.Debug.cmake FAILED: src/CMakeFiles/popsift.dir/popsift/popsift_generated_s_filtergrid.cu.o /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/popsift_generated_s_filtergrid.cu.o cd /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift && /vcpkg/downloads/tools/cmake-3.30.1-linux/cmake-3.30.1-linux-x86_64/bin/cmake -E make_directory /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/. && /vcpkg/downloads/tools/cmake-3.30.1-linux/cmake-3.30.1-linux-x86_64/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=Debug -D generated_file:STRING=/vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/./popsift_generated_s_filtergrid.cu.o -D generated_cubin_file:STRING=/vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/./popsift_generated_s_filtergrid.cu.o.cubin.txt -P /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/popsift_generated_s_filtergrid.cu.o.Debug.cmake nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). In file included from /usr/local/cuda-12.9/include/nvtx3/nvtx3.hpp:657, from /usr/local/cuda-12.9/include/cub/detail/nvtx.cuh:56, from /usr/local/cuda-12.9/include/cub/device/device_for.cuh:40, from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/parallel_for.h:43, from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/swap_ranges.h:43, from /usr/local/cuda-12.9/include/thrust/system/detail/adl/swap_ranges.h:50, from /usr/local/cuda-12.9/include/thrust/detail/swap_ranges.inl:31, from /usr/local/cuda-12.9/include/thrust/swap.h:147, from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/iter_swap.h:34, from /usr/local/cuda-12.9/include/thrust/system/detail/adl/iter_swap.h:50, from /usr/local/cuda-12.9/include/thrust/detail/reference.h:37, from /usr/local/cuda-12.9/include/thrust/memory.h:35, from /usr/local/cuda-12.9/include/thrust/detail/allocator/temporary_allocator.h:31, from /usr/local/cuda-12.9/include/thrust/detail/temporary_array.h:46, from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/internal/copy_cross_system.h:47, from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/copy.h:79, from /usr/local/cuda-12.9/include/thrust/system/detail/adl/copy.h:50, from /usr/local/cuda-12.9/include/thrust/detail/copy.inl:29, from /usr/local/cuda-12.9/include/thrust/detail/copy.h:72, from /usr/local/cuda-12.9/include/thrust/copy.h:505, from /vcpkg/buildtrees/popsift/src/v0.9-ba1d7e1254.clean/src/popsift/s_filtergrid.cu:21: /usr/local/cuda-12.9/include/nvtx3/nvToolsExt.h:1063:45: error: macro "nvtxRangePop" passed 1 arguments, but takes just 0 1063 | NVTX_DECLSPEC int NVTX_API nvtxRangePop(void); | ^ /vcpkg/buildtrees/popsift/src/v0.9-ba1d7e1254.clean/src/popsift/s_filtergrid.cu:16: note: macro "nvtxRangePop" defined here 16 | #define nvtxRangePop() | In file included from /usr/local/cuda-12.9/include/nvtx3/nvtxDetail/nvtxImpl.h:367, from /usr/local/cuda-12.9/include/nvtx3/nvToolsExt.h:1619, from /usr/local/cuda-12.9/include/nvtx3/nvtx3.hpp:657, from /usr/local/cuda-12.9/include/cub/detail/nvtx.cuh:56, from /usr/local/cuda-12.9/include/cub/device/device_for.cuh:40, from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/parallel_for.h:43, from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/swap_ranges.h:43, from /usr/local/cuda-12.9/include/thrust/system/detail/adl/swap_ranges.h:50, from /usr/local/cuda-12.9/include/thrust/detail/swap_ranges.inl:31, from /usr/local/cuda-12.9/include/thrust/swap.h:147, from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/iter_swap.h:34, from /usr/local/cuda-12.9/include/thrust/system/detail/adl/iter_swap.h:50, from /usr/local/cuda-12.9/include/thrust/detail/reference.h:37, from /usr/local/cuda-12.9/include/thrust/memory.h:35, from /usr/local/cuda-12.9/include/thrust/detail/allocator/temporary_allocator.h:31, from /usr/local/cuda-12.9/include/thrust/detail/temporary_array.h:46, from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/internal/copy_cross_system.h:47, from /usr/local/cuda-12.9/include/thrust/system/cuda/detail/copy.h:79, from /usr/local/cuda-12.9/include/thrust/system/detail/adl/copy.h:50, from /usr/local/cuda-12.9/include/thrust/detail/copy.inl:29, from /usr/local/cuda-12.9/include/thrust/detail/copy.h:72, from /usr/local/cuda-12.9/include/thrust/copy.h:505, from /vcpkg/buildtrees/popsift/src/v0.9-ba1d7e1254.clean/src/popsift/s_filtergrid.cu:21: /usr/local/cuda-12.9/include/nvtx3/nvtxDetail/nvtxImplCore.h:133:45: error: macro "nvtxRangePop" passed 1 arguments, but takes just 0 133 | NVTX_DECLSPEC int NVTX_API nvtxRangePop(void) | ^ /vcpkg/buildtrees/popsift/src/v0.9-ba1d7e1254.clean/src/popsift/s_filtergrid.cu:16: note: macro "nvtxRangePop" defined here 16 | #define nvtxRangePop() | CMake Error at popsift_generated_s_filtergrid.cu.o.Debug.cmake:220 (message): Error generating /vcpkg/buildtrees/popsift/x64-linux-dbg/src/CMakeFiles/popsift.dir/popsift/./popsift_generated_s_filtergrid.cu.o ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Remove NVTX profiling support from the PopSift codebase to resolve cross-platform compatibility issues. The NVTX library has transitioned to header-only NVTX3 on Windows while being unavailable on ARM-based platforms like Tegra and Jetson.
- Removed all NVTX-related preprocessor directives and function calls across CUDA source files
- Cleaned up build configuration by removing NVTX-related CMake options and dependencies
- Updated CI configuration to remove NVTX package downloads and build flags
Reviewed Changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
src/popsift/sift_pyramid.cu | Removed NVTX header includes and profiling calls around descriptor operations |
src/popsift/sift_desc.cu | Removed NVTX header includes and profiling calls around orientation count reading |
src/popsift/s_orientation.cu | Removed NVTX header includes and macro definitions |
src/popsift/s_image.cu | Removed NVTX profiling calls around memory allocation operations |
src/popsift/s_filtergrid.cu | Removed NVTX profiling calls around counter write operations |
src/popsift/popsift.h | Removed NVTX header includes and range ID member variable |
src/popsift/popsift.cu | Removed NVTX profiling calls around image processing operations |
src/application/CMakeLists.txt | Added CMake policy setting for Boost compatibility |
src/CMakeLists.txt | Removed NVTX library linking configuration |
cudaInstallAppveyor.cmd | Removed NVTX package download and installation steps |
cmake/sift_config.h.in | Removed NVTX configuration macro definition |
appveyor.yml | Removed NVTX profiling build flag from cmake command |
CMakeLists.txt | Removed all NVTX-related build options and configuration |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Co-authored-by: Copilot <[email protected]>
@griwodz Do you think we can merge it? |
Yes, I believe that this one is safe to merge.
On 26 Aug 2025, at 11:42, Simone Gasparini ***@***.***> wrote:
[https://avatars.githubusercontent.com/u/1331744?s=20&v=4]simogasp left a comment (alicevision/popsift#162)<#162 (comment)>
@griwodz<https://github.com/griwodz> Do you think we can merge it?
—
Reply to this email directly, view it on GitHub<#162 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAQCZAUBY2VHGVUKGIO23KD3PQTYRAVCNFSM6AAAAACE2HC6RCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEMRTGQZTEMJUG4>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Description
Remove use of NVTX from the develop branch. This is a profiling feature that is not needed for everybody out there. People who know about it can add the relevant lines themselves.
It is currently very troublesome because the transition from the NVTX library to header-only NVTX3 is complete on Windows, where NVTX has been removed, while NVTX3 does not even exist on Tegra, Jetson and maybe other Arm-based platforms. The removal is OK because NVTX is only relevant for very fine-grained debugging of the interplay between CPU and GPU. Its timing features can also be achieved by cudaEvents.