Skip to content

Files

Latest commit

 

History

History

nvidia-network-operator

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Nebius package for NVIDIA® Network Operator

Description

The NVIDIA® Network Operator is an application for Kubernetes designed for managing and optimizing software components for networking between NVIDIA GPUs in the cloud. The operator automates many tasks related to network setup, including the configuration of high-performance networking features like RDMA (Remote Direct Memory Access) and GPUDirect, which are crucial for applications requiring low latency and high throughput. This tool is particularly beneficial for environments where NVIDIA GPUs are deployed for compute-intensive tasks, as it ensures that the network can support the high data transfer demands of such applications.

Your cluster must have a node group attached to a Compute Cloud GPU cluster.

Short description

Optimize GPU networking in Kubernetes with NVIDIA® Network Operator on Nebius AI.

Tutorial

Before installing this product:

  1. Create a GPU cluster in Compute Cloud.
  2. Create a Kubernetes cluster and a node group in it. When creating the group, select the created GPU cluster for it.

To install the product:

  1. Click Install.
  2. Wait for the application to change its status to Deployed.

Usage

To check that the NVIDIA Network Operator is working:

  1. Install kubectl and configure it to work with the created cluster.

  2. Check that NVIDIA Network Operator pods are running:

    kubectl get pods -n <namespace>

Use cases

  • Automating management of software components for GPU networking in Kubernetes clusters.
  • Building fast infrastructures for high-performance computing (HPC) and AI workloads.

Links

Term of service

Legal

By using the application, you agree to their terms and conditions: the helm-chart and NVIDIA® Network Operator.