-
Notifications
You must be signed in to change notification settings - Fork 31
RE-Implemented NVIDIA Energy capture via C #1167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 file(s) reviewed, 3 comment(s)
Edit PR Review Bot Settings | Greptile
{"help", no_argument, NULL, 'h'}, | ||
{"interval", no_argument, NULL, 'i'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: interval option is marked as no_argument but requires an argument in getopt_long
{"help", no_argument, NULL, 'h'}, | |
{"interval", no_argument, NULL, 'i'}, | |
{"help", no_argument, NULL, 'h'}, | |
{"interval", required_argument, NULL, 'i'}, |
@ribalba Can you please check what the correct install command is for the libraries under Fedora. ChatGPT suggested: sudo dnf install cuda-nvml-dev It also mentioned it is not in the distributions repos and you need to add: sudo dnf config-manager --add-repo=https://developer.download.nvidia.com/compute/cuda/repos/fedora$(rpm -E %fedora)/x86_64/cuda-fedora.repo |
Eco CI Output:
🌳 CO2 Data: Total cost of whole PR so far: |
We were experiencing some sampling-rate issues with the
nvidia-smi
implementation where the sampling jitter was too high.This is a re-implementation in C which is still minimally slower than our other providers, but quite performant so we can achieve sampling < 100 ms

Greptile Summary
Re-implemented NVIDIA GPU power monitoring from Bash to C using NVML library, significantly improving sampling rate consistency and enabling sub-100ms measurement intervals.
source.c
with direct NVML library integration for precise GPU power measurementsmetric-provider-nvidia-smi-wrapper.sh
shell script to eliminate sampling jitter issuesMakefile
with -O3 optimization and NVML library linkage for performanceprovider.py
to support card model identification and handle the new C-based metrics formatsource.c
to prevent thread state persistence