-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GetComputeRunningProcesses does not work for multiple processes #75
Comments
Hi, I also could observe this issue on my V100 and Cuda 12.2 meanwhile A100 and Cuda 12.0 work well. I tried to "see" what I get from the func deviceGetComputeRunningProcesses_v3(Device Device) ([]ProcessInfo, Return) {
var InfoCount uint32 = 1 // Will be reduced upon returning
for {
Infos := make([]ProcessInfo, InfoCount)
ret := nvmlDeviceGetComputeRunningProcesses_v3(Device, &InfoCount, &Infos[0])
if ret == SUCCESS {
fmt.Printf("### Start: deviceGetComputeRunningProcesses_v3: Start length %d\n", InfoCount)
for i := 0; i < int(InfoCount); i++ {
ptr := unsafe.Pointer(&Infos[i])
vLen := int(unsafe.Sizeof(Infos[i]))
v := (unsafe.Slice((*byte)(ptr), vLen))
fmt.Printf("Info[%d], ptr %x, bytes_ptr %x, bytes_len %d\n", i, ptr, v, vLen)
fmt.Printf("Info[%d]: %+v\n", i, v[:vLen])
}
ptr := unsafe.Pointer(&Infos[0])
vLen := (int)(unsafe.Sizeof(Infos[0]))*int(InfoCount) + 16 // Run for A100 was with +8 but the A100 works anyway
v := unsafe.Slice((*byte)(ptr), vLen)
fmt.Printf("All bytes, as is:")
for i := 0; i < vLen; i++ {
fmt.Printf("%#x, ", v[i])
}
fmt.Println("")
fmt.Println("### End: deviceGetComputeRunningProcesses_v3\n")
return Infos[:InfoCount], ret
}
if ret != ERROR_INSUFFICIENT_SIZE {
return nil, ret
}
InfoCount *= 2
}
} It seems that there is an extra 8 bytes between Layout (as it is in nvml.h): typedef struct nvmlProcessInfo_st
{
unsigned int pid; //!< Process ID Offset: 0 bytes, Size: 4 bytes (+4 bytes padding)
unsigned long long usedGpuMemory; //!< Amount of used GPU memory in bytes. Offset: 8 bytes Size: 8 bytes
//! Under WDDM, \ref NVML_VALUE_NOT_AVAILABLE is always reported
//! because Windows KMD manages all the memory and not the NVIDIA driver
unsigned int gpuInstanceId; //!< If MIG is enabled, stores a valid GPU instance ID. gpuInstanceId is set to
// 0xFFFFFFFF otherwise.
// Offset: 16 bytes Size: 4 bytes
unsigned int computeInstanceId; //!< If MIG is enabled, stores a valid compute instance ID. computeInstanceId is set to
// 0xFFFFFFFF otherwise.
// Offset: 20 bytes Size: 4 bytes
} nvmlProcessInfo_t; So the total size of the structure is supposed to be 24 bytes. Also, if on my x86_64 PC I create a simple file, include nvml.h and type: nvmlProcessInfo_t data[2];
int x = sizeof(data); then I can see in clangd language server (did not try to compile though) that x is 48 bytes. A100 Cuda 12.0 (working well):
Output of the
Here is a split of the All bytes with my comments: First process info: 24 bytes: The values of the second process: Here is what I get from V100 and Cuda 12.2: V100 Cuda 12.2:
All bytes (split to chunks): If we take into account the "undocumented 8 bytes padding" (skip it), then: I don't see any obvious solution for the issue for now. |
An interesting observation: When I commented out
and the processes are detected as expected:
So it looks like there is an issue with |
We upgraded NVIDIA drivers on our servers with V100 and it seems that it fixed the issue. The upgrade was between versions: 535.54.03 -> 535.154.05, CUDA 12.2. |
Hey NVIDIA team,
When multiple processes are on one GPU, the output of
device.GetComputeRunningProcesses()
is wrong. This is on CUDA 12.2, and this bug seems very similar to the closed bug here which occurred on an earlier CUDA version.The PID of the process [1] is shown to be 0, and the actual PID is available under the gpu memory usage.
I managed to also test this on an A100, and the bug does not happen on that card on CUDA 12.2.
The text was updated successfully, but these errors were encountered: