- Clone the repository
git clone [email protected]:hipersys-team/checkmate.git
cd checkmate
git submodule update --init --recursive
- Set up the environemnt: run
build.shWITHTOUT conda environment
conda deactivate
chmod a+x script/*.sh
script/build.sh
Add Hugepages for DPDK
cd script/
mkdir -p /tmp/mnt/huge
sudo ./dpdk-hugepages.py --mount --directory /tmp/mnt/huge --user `id -u` --group `id -g` --setup 8G
Information can be found in training.
Note
To specify the number of storage nodes please set environment variable
NUM_STORAGE to the desired number.
Information can be found in storage.
Note
To specify the number of training nodes please set environment variable
NUM_TRAINING to the desired number.
Most steps are automated in build.sh script. However, if you want to build manually, follow the steps below.
cd third_party/libtpa
make -j
sudo -E make installImportant
For the newer NICs like ConnectX-7 use DPDK version by setting export DPDK_VERSION=v22.11
cd third_party/nccl
sudo make -j src.installcd third_party/nccl-plugin/cc
make clean; make -j