Skip to content

Add Nsight profiling support (nsys/ncu) #244

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Add Nsight profiling support (nsys/ncu) #244

wants to merge 3 commits into from

Conversation

apaolillo
Copy link
Collaborator

  • Introduced host_to_comm_path() in CommunicationLayer and implemented Docker-specific logic in DockerCommLayer to resolve host/container path mapping (e.g., for file outputs inside mounted volumes).

  • Added new example: campaign_nsys_ncu.py showcasing how to run GPU benchmarks with nsys and ncu wrappers, including post-run hooks for metrics extraction.

  • Created NsysWrap and NcuWrap. NsysWrap runs nsys profile, then extracts memory usage stats from report_cuda_gpu_mem_size_sum.csv. NcuWrap runs ncu in CSV mode and parses per-kernel metrics from the log.

  • Refactored AddVecBench into its own reusable file under examples/gpus/kit/addvec.py.

  • Updated gpus.py to install Nsight Systems and libsmctrl by default in the Docker image.

  • Added a realistic CUDA benchmark simplesleep.cu to simulate kernel workloads with artificial delay and multiple phases.

These changes improve support for automated GPU profiling and set the foundation for deeper performance analysis using NVIDIA's Nsight tooling inside Docker-based benchkit platforms.

- Introduced `host_to_comm_path()` in `CommunicationLayer` and
  implemented Docker-specific logic in `DockerCommLayer` to resolve
  host/container path mapping (e.g., for file outputs inside mounted
  volumes).

- Added new example: `campaign_nsys_ncu.py` showcasing how to run GPU
  benchmarks with `nsys` and `ncu` wrappers, including post-run hooks
  for metrics extraction.

- Created `NsysWrap` and `NcuWrap`. `NsysWrap` runs `nsys profile`, then
  extracts memory usage stats from `report_cuda_gpu_mem_size_sum.csv`.
  `NcuWrap` runs `ncu` in CSV mode and parses per-kernel metrics from
  the log.

- Refactored `AddVecBench` into its own reusable file under
  `examples/gpus/kit/addvec.py`.

- Updated `gpus.py` to install Nsight Systems and libsmctrl by default
  in the Docker image.

- Added a realistic CUDA benchmark `simplesleep.cu` to simulate kernel
  workloads with artificial delay and multiple phases.

These changes improve support for automated GPU profiling and set the
foundation for deeper performance analysis using NVIDIA's Nsight tooling
inside Docker-based benchkit platforms.

Signed-off-by: Antonio Paolillo <[email protected]>
@apaolillo apaolillo requested a review from aaronbog June 8, 2025 17:43
@apaolillo apaolillo self-assigned this Jun 8, 2025
apaolillo added 2 commits June 8, 2025 19:50
Signed-off-by: Antonio Paolillo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant