Skip to content

Commit 89c5bcc

Browse files
[FSTORE-690] Setup Online Feature Store Benchmark with Locust (logicalclocks#948)
1 parent 607ed6f commit 89c5bcc

11 files changed

+456
-1
lines changed

locust_benchmark/Dockerfile

+11
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
FROM locustio/locust:2.15.1
2+
3+
USER root
4+
5+
RUN apt-get update -y && apt-get -y install gcc python3-dev librdkafka-dev git
6+
7+
USER locust
8+
WORKDIR /home/locust
9+
10+
COPY requirements.txt .
11+
RUN pip3 install -r requirements.txt

locust_benchmark/README.md

+178
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
## Online Feature Store Benchmark Suite
2+
3+
This directory provides the tools to quickly bootstrap the process of benchmarking the Hopsworks Online Feature Store.
4+
5+
The benchmark suite is built around [locust](https://locust.io/) which allows you to quickly start so-called user classes to perform requests against a system.
6+
7+
In the feature store context, a user class represents a serving application performing requests against the online feature store, looking up feature vectors.
8+
9+
Locust takes care of the hard part of collecting the results and aggregating them.
10+
11+
### Quickstart: Single process locust
12+
13+
By default, each user class gets spawned in its own Python thread, which means, when you run locust in a single Python process, all user classes will share the same core. This is due to the Python Global Interpreter lock.
14+
15+
This section illustrates how to quickly run locust in a single process, see the next section on how to scale locust with a master and multiple worker processes.
16+
17+
#### Prerequisites
18+
19+
1. Hopsworks cluster
20+
1. You can follow the documentation on how to setup a cluster on [managed.hopsworks.ai](https://docs.hopsworks.ai/3.1/setup_installation/aws/cluster_creation/)
21+
2. Ideally you create an extra node in the same VPC as the Hopsworks cluster, which you can use to run the benchmark suite from. The easiest way to do this is by using a dedicated [API node](https://docs.hopsworks.ai/3.1/setup_installation/common/rondb/#api-nodes).
22+
3. Once the cluster is created you need the following information:
23+
- Hopsworks domain name: **[UUID].cloud.hopsworks.ai** which you can find in the *Details* tab of your managed.hopsworks.ai control plane
24+
- A project and its name on the hopsworks cluster. For that log into your cluster and create a project.
25+
- An API key for authentication. Head to your project and [create an API key for your account](https://docs.hopsworks.ai/3.1/user_guides/projects/api_key/create_api_key/).
26+
2. Python environment
27+
In order to run the benchmark suite, you need a Python environment.
28+
For that we will SSH into the created API node. Make sure that [SSH is enabled](https://docs.hopsworks.ai/3.1/setup_installation/common/services/) on your cluster.
29+
30+
1. On AWS all nodes have public IPs, so you can head to your cloud console to find the IP of the API node and SSH directly:
31+
`ssh ubuntu@[ip-of-api-node] -i your-ssh-key.pem`
32+
On Azure, you will have to proxy through the Hopsworks head node, you can either use the public IP or the Hopsworks domain name as noted above.
33+
On GCP, we don't attach SSH keys, but you can use the GCP CLI to create an ssh session.
34+
35+
2. API nodes come with Anaconda pre-installed, which we can use to create a python environment and install our dependencies:
36+
37+
```bash
38+
conda init bash
39+
```
40+
41+
After this you need to create a new SSH session.
42+
43+
```bash
44+
conda create -n locust-benchmark python=3.10
45+
conda activate locust-benchmark
46+
```
47+
48+
Clone the feature-store-api repository to retrieve the benchmark suite and install the requirements.
49+
Make sure to checkout the release branch matching the Hopsworks version of your cluster.
50+
51+
```bash
52+
git clone https://github.com/logicalclocks/feature-store-api.git
53+
cd feature-store-api
54+
git checkout master
55+
cd locust_benchmark
56+
pip install -r requirements.txt
57+
```
58+
59+
#### Creating feature group for lookups
60+
61+
1. Save the previously created API key in a file named `.api_key`
62+
63+
```bash
64+
echo "[YOUR KEY]" > .api_key
65+
```
66+
67+
2. Now we need to configure the test, for that modify the `hopsworks_config.json` template:
68+
69+
```json
70+
{
71+
"host": "[UUID].cloud.hopsworks.ai",
72+
"port": 443,
73+
"project": "test",
74+
"external": false,
75+
"rows": 100000,
76+
"schema_repetitions": 1,
77+
"recreate_feature_group": true,
78+
"batch_size": 100
79+
}
80+
```
81+
82+
- `host`: Domain name of your Hopsworks cluster.
83+
- `port`: optional, for managed.hopsworks.ai it should be `443`.
84+
- `project`: the project to run the benchmark in.
85+
- `external`: if you are not running the benchmark suite from the same VPC as Hopsworks, set this to `true`.
86+
- `rows`: Number of rows/unique primary key values in the feature group used for lookup benchmarking.
87+
- `schema_repetitions`: This controls the number of features for the lookup. One schema repetition will result in 10 features plus primary key. Five repetitions will result in 50 features plus primary key.
88+
- `recreate_feature_group`: This controls if the previous feature group should be dropped and recreated. Set this to true when rerunning the benchmark with different size of rows or schema repetitions.
89+
- `batch_size`: This is relevant for the actual benchmark and controls how many feature vectors are looked up in the batch benchmark.
90+
91+
3. Create the feature group
92+
93+
```bash
94+
python create_feature_group.py
95+
```
96+
97+
Note, the `recreate_feature_group` only matters if you rerun the `create_feature_group.py` script.
98+
99+
#### Run one process locust benchmark
100+
101+
You are now ready to run the load test:
102+
```bash
103+
locust -f locustfile.py --headless -u 4 -r 1 -t 30 -s 1 --html=result.html
104+
```
105+
106+
Options:
107+
- `u`: number of users sharing the python process
108+
- `r`: spawn rate of the users in seconds, it will launch one user every `r` seconds until there are `u` users
109+
- `t`: total time to run the test for
110+
- `s`: shutdown timeout, no need to be changed
111+
- `html`: path for the output file
112+
113+
You can also run only single feature vector or batch lookups by running only the respective user class:
114+
115+
```bash
116+
locust -f locustfile.py FeatureVectorLookup --headless -u 4 -r 1 -t 30 -s 1 --html=result.html
117+
118+
locust -f locustfile.py FeatureVectorBatchLookup --headless -u 4 -r 1 -t 30 -s 1 --html=result.html
119+
```
120+
121+
### Distributed Locust Benchmark using Docker Compose
122+
123+
As you will see already with 4 users, the single core tends to be saturated and locust will print a warning.
124+
125+
126+
127+
```
128+
[2023-03-09 12:02:11,304] ip-10-0-0-187/WARNING/root: CPU usage above 90%! This may constrain your throughput and may even give inconsistent response time measurements! See https://docs.locust.io/en/stable/running-distributed.html for how to distribute the load over multiple CPU cores or machines
129+
```
130+
131+
First we need to install docker, for which we will use [the provided convenience script](https://docs.docker.com/engine/install/ubuntu/#install-using-the-convenience-script), however, you can use any other method too:
132+
```bash
133+
curl -fsSL https://get.docker.com -o get-docker.sh
134+
sudo sh get-docker.sh
135+
```
136+
137+
#### Load the docker image
138+
139+
```bash
140+
wget https://repo.hops.works/dev/moritz/locust_benchmark/locust-2.15.1-hsfs-3.3.0.dev1-amd64.tgz
141+
sudo docker load < locust-2.15.0-hsfs-3.2.0.dev1-amd64.tgz
142+
```
143+
144+
#### Create feature group and configure test
145+
146+
See the above section to setup the `hopsworks_config.json` and create the feature group to perform lookups against.
147+
148+
#### Run multiple locust processes using docker compose
149+
150+
Using docker compose we can now start multiple containers, one dedicated master process, which is responsible for collecting metrics and orchestrating the benchmark, and a variable number of worker processes.
151+
152+
```bash
153+
sudo docker compose up
154+
```
155+
156+
Similarly to the configuration of a single locust process, you have the possibility to configure the test using docker compose in the `docker-compose.yml` file.
157+
158+
Locust command: modify the locust command in the same way as the parameters are described above.
159+
160+
```yml
161+
command: -f /home/locust/locustfile.py --master --headless --expect-workers 4 -u 16 -r 1 -t 30 -s 1 --html=/home/locust/result.html
162+
```
163+
164+
- `--expect-workers`: this is the only option that's new with this setup and let's you control for how many worker containers locust will wait to be up and running before starting the load test
165+
- `u`: the number of users, is the total number of users which will be distributed evenly among the worker processes. So with 4 workers and 16 users, each worker will run 4 users.
166+
167+
By default, docker compose will launch 4 worker instances, however, you can scale number of workers with
168+
169+
```bash
170+
sudo docker compose up --scale worker=6
171+
```
172+
173+
However, note that the number of workes shouldn't be lower than the `expect-workers` parameter.
174+
As a rule of thumb, you should be running one worker per core of your host.
175+
176+
#### Collect the results
177+
178+
By default, the directory with the test configuration will be mounted in the containers, and locust will create the `result.html` in the mounted directory, so you will be able to access it after the containers are shut down and the test concluded.

locust_benchmark/common/__init__.py

Whitespace-only changes.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
import datetime
2+
import random
3+
import string
4+
import json
5+
6+
import numpy as np
7+
import pandas as pd
8+
9+
from locust.runners import MasterRunner, LocalRunner
10+
import hsfs
11+
12+
from hsfs import client
13+
from hsfs.client.exceptions import RestAPIError
14+
15+
16+
class HopsworksClient:
17+
def __init__(self, environment=None):
18+
with open("hopsworks_config.json") as json_file:
19+
self.hopsworks_config = json.load(json_file)
20+
if environment is None or isinstance(
21+
environment.runner, (MasterRunner, LocalRunner)
22+
):
23+
print(self.hopsworks_config)
24+
self.connection = hsfs.connection(
25+
project=self.hopsworks_config.get("project", "test"),
26+
host=self.hopsworks_config.get("host", "localhost"),
27+
port=self.hopsworks_config.get("port", 443),
28+
api_key_file=".api_key",
29+
secrets_store="local",
30+
)
31+
self.fs = self.connection.get_feature_store()
32+
33+
# test settings
34+
self.external = self.hopsworks_config.get("external", False)
35+
self.rows = self.hopsworks_config.get("rows", 1_000_000)
36+
self.schema_repetitions = self.hopsworks_config.get("schema_repetitions", 1)
37+
self.recreate_feature_group = self.hopsworks_config.get(
38+
"recreate_feature_group", False
39+
)
40+
self.batch_size = self.hopsworks_config.get("batch_size", 100)
41+
42+
def get_or_create_fg(self):
43+
locust_fg = self.fs.get_or_create_feature_group(
44+
name="locust_fg",
45+
version=1,
46+
primary_key=["ip"],
47+
online_enabled=True,
48+
stream=True,
49+
)
50+
return locust_fg
51+
52+
def insert_data(self, locust_fg):
53+
if locust_fg.id is not None and self.recreate_feature_group:
54+
locust_fg.delete()
55+
locust_fg = self.get_or_create_fg()
56+
if locust_fg.id is None:
57+
df = self.generate_insert_df(self.rows, self.schema_repetitions)
58+
locust_fg.insert(df, write_options={"internal_kafka": not self.external})
59+
return locust_fg
60+
61+
def get_or_create_fv(self, fg=None):
62+
try:
63+
return self.fs.get_feature_view("locust_fv", version=1)
64+
except RestAPIError:
65+
return self.fs.create_feature_view(
66+
name="locust_fv",
67+
query=fg.select_all(),
68+
version=1,
69+
)
70+
71+
def close(self):
72+
if client._client is not None:
73+
self.connection.close()
74+
75+
def generate_insert_df(self, rows, schema_repetitions):
76+
data = {"ip": range(0, rows)}
77+
df = pd.DataFrame.from_dict(data)
78+
79+
for i in range(0, schema_repetitions):
80+
df["rand_ts_1_" + str(i)] = datetime.datetime.now()
81+
df["rand_ts_2_" + str(i)] = datetime.datetime.now()
82+
df["rand_int_1" + str(i)] = np.random.randint(0, 100000)
83+
df["rand_int_2" + str(i)] = np.random.randint(0, 100000)
84+
df["rand_float_1" + str(i)] = np.random.uniform(low=0.0, high=1.0)
85+
df["rand_float_2" + str(i)] = np.random.uniform(low=0.0, high=1.0)
86+
df["rand_string_1" + str(i)] = "".join(
87+
random.choices(string.ascii_lowercase, k=5)
88+
)
89+
df["rand_string_2" + str(i)] = "".join(
90+
random.choices(string.ascii_lowercase, k=5)
91+
)
92+
df["rand_string_3" + str(i)] = "".join(
93+
random.choices(string.ascii_lowercase, k=5)
94+
)
95+
df["rand_string_4" + str(i)] = "".join(
96+
random.choices(string.ascii_lowercase, k=5)
97+
)
98+
99+
return df

locust_benchmark/common/stop_watch.py

+34
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
import inspect
2+
import time
3+
from locust import events
4+
5+
6+
def stopwatch(func):
7+
def wrapper(*args, **kwargs):
8+
# get task's function name
9+
previous_frame = inspect.currentframe().f_back
10+
_, _, task_name, _, _ = inspect.getframeinfo(previous_frame)
11+
12+
start = time.time()
13+
result = None
14+
try:
15+
result = func(*args, **kwargs)
16+
except Exception as e:
17+
total = int((time.time() - start) * 1000)
18+
events.request.fire(
19+
request_type=task_name[:3].upper(),
20+
name=task_name,
21+
response_time=total,
22+
exception=e,
23+
)
24+
else:
25+
total = int((time.time() - start) * 1000)
26+
events.request.fire(
27+
request_type=task_name[:3].upper(),
28+
name=task_name,
29+
response_time=total,
30+
response_length=0,
31+
)
32+
return result
33+
34+
return wrapper
+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
from common.hopsworks_client import HopsworksClient
2+
3+
if __name__ == "__main__":
4+
5+
hopsworks_client = HopsworksClient()
6+
fg = hopsworks_client.get_or_create_fg()
7+
hopsworks_client.insert_data(fg)
8+
hopsworks_client.close()

locust_benchmark/docker-compose.yml

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
version: '3'
2+
3+
services:
4+
master:
5+
extra_hosts:
6+
- "host.docker.internal:host-gateway"
7+
image: locust-hsfs:master
8+
ports:
9+
- "8089:8089"
10+
volumes:
11+
- ./:/home/locust
12+
command: -f /home/locust/locustfile.py --master --headless --expect-workers 4 -u 16 -r 1 -t 30 -s 1 --html=/home/locust/result.html
13+
14+
worker:
15+
extra_hosts:
16+
- "host.docker.internal:host-gateway"
17+
image: locust-hsfs:master
18+
volumes:
19+
- ./:/home/locust
20+
command: -f /home/locust/locustfile.py --worker --master-host master
21+
scale: 4
+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
{
2+
"host": "localhost",
3+
"port": 443,
4+
"project": "test",
5+
"external": true,
6+
"rows": 100000,
7+
"schema_repetitions": 1,
8+
"recreate_feature_group": true,
9+
"batch_size": 100
10+
}

0 commit comments

Comments
 (0)