Add Nsight profiling support (nsys/ncu) #244
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introduced
host_to_comm_path()
inCommunicationLayer
and implemented Docker-specific logic inDockerCommLayer
to resolve host/container path mapping (e.g., for file outputs inside mounted volumes).Added new example:
campaign_nsys_ncu.py
showcasing how to run GPU benchmarks withnsys
andncu
wrappers, including post-run hooks for metrics extraction.Created
NsysWrap
andNcuWrap
.NsysWrap
runsnsys profile
, then extracts memory usage stats fromreport_cuda_gpu_mem_size_sum.csv
.NcuWrap
runsncu
in CSV mode and parses per-kernel metrics from the log.Refactored
AddVecBench
into its own reusable file underexamples/gpus/kit/addvec.py
.Updated
gpus.py
to install Nsight Systems and libsmctrl by default in the Docker image.Added a realistic CUDA benchmark
simplesleep.cu
to simulate kernel workloads with artificial delay and multiple phases.These changes improve support for automated GPU profiling and set the foundation for deeper performance analysis using NVIDIA's Nsight tooling inside Docker-based benchkit platforms.