DivideByZero error causing plots not to update

On systems with shared memory the total device memory returned by `pynvml` is `0`. This results in some `DivideByZero` errors in places where this value is used.

```pytb
[E 2025-05-07 14:43:37.537 ServerApp] Exception in callback <bound method GPUResourceWebSocketHandler.send_data of <jupyterlab_nvdashboard.apps.gpu.GPUResourceWebSocketHandler object at 0xf96f511b0170>>
    Traceback (most recent call last):
      File "/home/jtomlinson/miniforge3/envs/rapids-25.04/lib/python3.12/site-packages/tornado/ioloop.py", line 937, in _run
        val = self.callback()
              ^^^^^^^^^^^^^^^
      File "/home/jtomlinson/miniforge3/envs/rapids-25.04/lib/python3.12/site-packages/jupyterlab_nvdashboard/apps/gpu.py", line 113, in send_data
        (stats["gpu_memory_total"] / gpu_mem_sum) * 100, 2
         ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~
    ZeroDivisionError: float division by zero
```

https://github.com/rapidsai/jupyterlab-nvdashboard/blob/a71f5f9c49b56f52a3027cb3b2f3e169187c3012/jupyterlab_nvdashboard/apps/gpu.py#L113

It would be good to handle this more gracefully. Some graphs just fail to update, while others like the memory usage graph shows 18EB of memory.

<img width="481" alt="Image" src="https://github.com/user-attachments/assets/0ff88622-19b6-43e9-87d0-3160fd2db96c" />

In this case we probably need to query the host memory via `psutil` and display that data instead.

Unfortunately this is only reproducible on machines where GPU memory is being reported by NVML as `0`. But if you have such a system you can run the following script.

```python
# memory_mre.py
# /// script
# requires-python = ">=3.12"
# dependencies = [
#     "nvidia-ml-py",
# ]
# ///

import pynvml
pynvml.nvmlInit()


print("Detecting GPU memory")
for gpu_idx in range(pynvml.nvmlDeviceGetCount()):
    handle = pynvml.nvmlDeviceGetHandleByIndex(gpu_idx)
    print(f"GPU {gpu_idx}: {pynvml.nvmlDeviceGetMemoryInfo(handle).total} bytes")
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DivideByZero error causing plots not to update #234

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DivideByZero error causing plots not to update #234

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions