SNCCL

Optimized primitives for inter-GPU communication.

This project is based on NCCL (NVIDIA Collective Communications Library) but extends its capabilities to support cross-data-center collective communication. It utilizes Mongoose (https://github.com/cesanta/mongoose/tree/master) as the TCP connection solution for inter-data-center communication.

During deployment:

Users only need to modify the SERVER_ADDR parameter in server.h to establish connections. The current implementation does not include code for SERVER_ADDR broadcasting in multi-machine scenarios. Once SERVER_ADDR is properly configured, GPUs across data centers will automatically transmit data through pre-established TCP connections between servers.

Introduction

NCCL (pronounced "Nickel") is a stand-alone library of standard communication routines for GPUs, implementing all-reduce, all-gather, reduce, broadcast, reduce-scatter, as well as any send/receive based communication pattern. It has been optimized to achieve high bandwidth on platforms using PCIe, NVLink, NVswitch, as well as networking using InfiniBand Verbs or TCP/IP sockets. NCCL supports an arbitrary number of GPUs installed in a single node or across multiple nodes, and can be used in either single- or multi-process (e.g., MPI) applications.

For more information on NCCL usage, please refer to the NCCL documentation.

Build

Note: the official and tested builds of NCCL can be downloaded from: https://developer.nvidia.com/nccl. You can skip the following build steps if you choose to use the official builds.

To build the library :

$ cd nccl
$ make -j src.build

If CUDA is not installed in the default /usr/local/cuda path, you can define the CUDA path with :

$ make src.build CUDA_HOME=<path to cuda install>

NCCL will be compiled and installed in build/ unless BUILDDIR is set.

By default, NCCL is compiled for all supported architectures. To accelerate the compilation and reduce the binary size, consider redefining NVCC_GENCODE (defined in makefiles/common.mk) to only include the architecture of the target platform :

$ make -j src.build NVCC_GENCODE="-gencode=arch=compute_70,code=sm_70"

Install

To install NCCL on the system, create a package then install it as root.

Debian/Ubuntu :

$ # Install tools to create debian packages
$ sudo apt install build-essential devscripts debhelper fakeroot
$ # Build NCCL deb package
$ make pkg.debian.build
$ ls build/pkg/deb/

RedHat/CentOS :

$ # Install tools to create rpm packages
$ sudo yum install rpm-build rpmdevtools
$ # Build NCCL rpm package
$ make pkg.redhat.build
$ ls build/pkg/rpm/

OS-agnostic tarball :

$ make pkg.txz.build
$ ls build/pkg/txz/

Tests

Tests for NCCL are maintained separately at https://github.com/nvidia/nccl-tests.

$ git clone https://github.com/NVIDIA/nccl-tests.git
$ cd nccl-tests
$ make
$ ./build/all_reduce_perf -b 8 -e 256M -f 2 -g <ngpus>

Name		Name	Last commit message	Last commit date
Latest commit History 408 Commits
.vscode		.vscode
ext-net		ext-net
ext-profiler		ext-profiler
ext-tuner/example		ext-tuner/example
makefiles		makefiles
nccl-tests		nccl-tests
pkg		pkg
src		src
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SNCCL

Introduction

Build

Install

Tests

Copyright

About

Releases

Packages

Languages

License

Relaxed-System-Lab/snccl

Folders and files

Latest commit

History

Repository files navigation

SNCCL

Introduction

Build

Install

Tests

Copyright

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages