Skip to content

Files

Latest commit

23e9e07 · Jun 21, 2021

History

History

ray_serve

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Jun 21, 2021
Jun 21, 2021
May 26, 2021
Jun 21, 2021
May 25, 2021
May 25, 2021
May 25, 2021
May 25, 2021
May 25, 2021

Reproducing ML Model Serving Experiments on AWS

(About 30 min)

Setup

(About 15 min)

If you are provided with an AWS IAM account & pre-built binaries

If you are not provided with an AWS account or you want to build everything from scratch, see cluster-config.

ML model serving experiments (Figure 11)

After logging in to the configured cluster, chdir to the current directory in the hoplite repo.

Here is how you run the experiments:

Baseline (2-3 min): python model_ensembling.py ${scale}

Hoplite (1-2 min): python hoplite_model_ensembling.py ${scale}

${scale} controls the cluster size. scale=1 corresponds to 8 GPU nodes, scale=2 corresponds to 16 GPU nodes in the figure.

The script prints the mean and std of throughput (queries/s) at the end.

ML Model Serving fault tolerance experiments (Figure 12a)

Baseline + fault tolerance test (About 2 min): python model_ensembling_fault_tolerance.py 1

With Hoplite + fault tolerance test (About 2 min): python hoplite_model_ensembling_fault_tolerance.py.py 1

Run python analyze_fault_tolerance.py to compare the failure detection latency (see section 5.5 in the paper).

Notes

The initial run will be extremely slow on AWS due to python generating caching files etc (about 4 min). This is totally normal.