Bugfix - nvbandwidth benchmark need to handle N/A value #675

polarG · 2024-12-02T06:21:03Z

Description

Fixed the bug that nvbandwidth benchmark need to handle 'N/A' values in nvbandwidth cmd output.
Replaced the input format of test cases with a list.
Add nvbandwidth configuration example in default config files.

This reverts commit 3459eac.

…width output.

codecov · 2024-12-02T06:32:00Z

Codecov Report

Attention: Patch coverage is 70.00000% with 18 lines in your changes missing coverage. Please review.

Project coverage is 85.45%. Comparing base (249e21c) to head (bd6aab2).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
...erbench/benchmarks/micro_benchmarks/nvbandwidth.py	70.00%	18 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #675      +/-   ##
==========================================
- Coverage   85.61%   85.45%   -0.16%     
==========================================
  Files          99       99              
  Lines        7165     7210      +45     
==========================================
+ Hits         6134     6161      +27     
- Misses       1031     1049      +18

Flag	Coverage Δ
cpu-python3.10-unit-test	`71.14% <70.00%> (-0.70%)`	⬇️
cpu-python3.7-unit-test	`71.11% <69.49%> (-0.70%)`	⬇️
cpu-python3.8-unit-test	`71.15% <70.00%> (-0.68%)`	⬇️
cuda-unit-test	`83.27% <70.00%> (-0.11%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

superbench/benchmarks/micro_benchmarks/nvbandwidth.py

superbench/config/default.yaml

dpower4 · 2024-12-05T01:39:39Z

@polarG , do we also handle when we run few tests that are not valid for the underlying system. Such tests results in output as "waived".
for eg: running thedevice_to_device_memcpy_read_sm on a single gpu machine results

`nvidia@localhost:/home/nvidia/nvbandwidth$ ./nvbandwidth -t 18
nvbandwidth Version: v0.5
Built from Git version: 

NOTE: This tool reports current measured bandwidth on your system.
Additional system-specific tuning may be required to achieve maximal peak bandwidth.

CUDA Runtime Version: 12040
CUDA Driver Version: 12040
Driver Version: 550.54.15

Device 0: NVIDIA GH200 480GB

Waived:

abuccts

pls add test cases for the "N/A" and "Waived" cases

superbench/benchmarks/micro_benchmarks/nvbandwidth.py

tests/benchmarks/micro_benchmarks/test_nvbandwidth.py

polarG · 2024-12-05T20:10:05Z

@polarG , do we also handle when we run few tests that are not valid for the underlying system. Such tests results in output as "waived". for eg: running thedevice_to_device_memcpy_read_sm on a single gpu machine results
`nvidia@localhost:/home/nvidia/nvbandwidth$ ./nvbandwidth -t 18
nvbandwidth Version: v0.5
Built from Git version: 

NOTE: This tool reports current measured bandwidth on your system.
Additional system-specific tuning may be required to achieve maximal peak bandwidth.

CUDA Runtime Version: 12040
CUDA Driver Version: 12040
Driver Version: 550.54.15

Device 0: NVIDIA GH200 480GB

Waived: 

Good point! I will try to catch this in the code.
For the waived test cases, shall we show a negative value in the report? or just add a log contain the name/index? @abuccts @dpower4

dpower4 · 2024-12-06T17:08:00Z

@polarG , do we also handle when we run few tests that are not valid for the underlying system. Such tests results in output as "waived". for eg: running thedevice_to_device_memcpy_read_sm on a single gpu machine results
`nvidia@localhost:/home/nvidia/nvbandwidth$ ./nvbandwidth -t 18
nvbandwidth Version: v0.5
Built from Git version: 

NOTE: This tool reports current measured bandwidth on your system.
Additional system-specific tuning may be required to achieve maximal peak bandwidth.

CUDA Runtime Version: 12040
CUDA Driver Version: 12040
Driver Version: 550.54.15

Device 0: NVIDIA GH200 480GB

Waived: 
Good point! I will try to catch this in the code. For the waived test cases, shall we show a negative value in the report? or just add a log contain the name/index? @abuccts @dpower4

Its better to show the waived test in the report in line with how other failed benchmarks are treated.

guoshzhao · 2025-01-02T21:15:06Z

Looks good to me. One more question for docs: Any documents need to be updated to align this change?

hongtaozhang and others added 9 commits October 30, 2024 11:40

Init cpu copy.

3459eac

Revert "Init cpu copy."

4c9546c

This reverts commit 3459eac.

Merge branch 'microsoft:main' into main

2a115dd

Merge branch 'microsoft:main' into main

2778e37

Merge branch 'microsoft:main' into main

63ffaba

Merge branch 'microsoft:main' into main

0079f63

Merge branch 'microsoft:main' into main

9796474

Fix bug: nabandwidth benchmark need to handle 'N/A' valules in nvband…

fea87c9

…width output.

Merge branch 'microsoft:main' into bugfix/nvbandwidth-handle-na-value

92b75e6

polarG added bug Something isn't working configuration Benchmark configurations labels Dec 2, 2024

polarG requested a review from a team as a code owner December 2, 2024 06:21

guoshzhao reviewed Dec 2, 2024

View reviewed changes

superbench/benchmarks/micro_benchmarks/nvbandwidth.py Outdated Show resolved Hide resolved

guoshzhao reviewed Dec 2, 2024

View reviewed changes

superbench/benchmarks/micro_benchmarks/nvbandwidth.py Outdated Show resolved Hide resolved

guoshzhao reviewed Dec 2, 2024

View reviewed changes

superbench/benchmarks/micro_benchmarks/nvbandwidth.py Outdated Show resolved Hide resolved

guoshzhao reviewed Dec 2, 2024

View reviewed changes

superbench/config/default.yaml Show resolved Hide resolved

Fix comments.

d444c63

abuccts reviewed Dec 5, 2024

View reviewed changes

Fix comments.

c7d7eff

Add func to handle waied and unsupported test cases.

bd6aab2

guoshzhao approved these changes Jan 2, 2025

View reviewed changes

hongtaozhang added 2 commits January 8, 2025 14:16

Merge branch 'main' into bugfix/nvbandwidth-handle-na-value

5cf1853

Add more unittest cases.

6e9dbfb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix - nvbandwidth benchmark need to handle N/A value #675

Bugfix - nvbandwidth benchmark need to handle N/A value #675

polarG commented Dec 2, 2024

codecov bot commented Dec 2, 2024 •

edited

Loading

dpower4 commented Dec 5, 2024 •

edited

Loading

abuccts left a comment

polarG commented Dec 5, 2024

dpower4 commented Dec 6, 2024

guoshzhao commented Jan 2, 2025

Bugfix - nvbandwidth benchmark need to handle N/A value #675

Are you sure you want to change the base?

Bugfix - nvbandwidth benchmark need to handle N/A value #675

Conversation

polarG commented Dec 2, 2024

codecov bot commented Dec 2, 2024 • edited Loading

Codecov Report

dpower4 commented Dec 5, 2024 • edited Loading

abuccts left a comment

Choose a reason for hiding this comment

polarG commented Dec 5, 2024

dpower4 commented Dec 6, 2024

guoshzhao commented Jan 2, 2025

codecov bot commented Dec 2, 2024 •

edited

Loading

dpower4 commented Dec 5, 2024 •

edited

Loading