The NVIDIA® Network Operator is an application for Kubernetes designed for managing and optimizing software components for networking between NVIDIA GPUs in the cloud. The operator automates many tasks related to network setup, including the configuration of high-performance networking features like RDMA (Remote Direct Memory Access) and GPUDirect, which are crucial for applications requiring low latency and high throughput. This tool is particularly beneficial for environments where NVIDIA GPUs are deployed for compute-intensive tasks, as it ensures that the network can support the high data transfer demands of such applications.
Your cluster must have a node group attached to a Compute Cloud GPU cluster.
Optimize GPU networking in Kubernetes with NVIDIA® Network Operator on Nebius AI.
Before installing this product:
- Create a GPU cluster in Compute Cloud.
- Create a Kubernetes cluster and a node group in it. When creating the group, select the created GPU cluster for it.
To install the product:
- Click Install.
- Wait for the application to change its status to
Deployed
.
To check that the NVIDIA Network Operator is working:
-
Install kubectl and configure it to work with the created cluster.
-
Check that NVIDIA Network Operator pods are running:
kubectl get pods -n <namespace>
- Automating management of software components for GPU networking in Kubernetes clusters.
- Building fast infrastructures for high-performance computing (HPC) and AI workloads.
- NVIDIA Network Operator documentation
- NVIDIA Network Operator in NVIDIA NGC Catalog
- NVIDIA Network Operator on GitHub
By using the application, you agree to their terms and conditions: the helm-chart and NVIDIA® Network Operator.