On a high level overview, the test operations are:
-
Provision a Cluster DB, with the specified number of nodes (the number of nodes can be specified through the config file, or the test writer can set a specific number depending on the test needs).
-
Provision a set of loader nodes. They will be the ones to initiate cassandra stress, and possibly other database stress inducing activities.
-
Provision a set of monitoring nodes. They will run prometheus [3], to store metrics information about the database cluster, and also grafana [4], to let the user see real time dashboards of said metrics while the test is running. This is very useful in case you want to run the test suite and keep watching the behavior of each node.
-
Wait until the loaders are ready (SSH up and cassandra-stress is present)
-
Wait until the DB nodes are ready (SSH up and DB services are up, port 9042 occupied)
-
Wait until the monitoring nodes are ready.
-
Loader nodes execute cassandra stress on the DB cluster (optional)
-
If configured, a Nemesis class, will execute periodically, introducing some disruption activity to the cluster (stop/start a node, destroy data, kill scylla processes on a node). the nemesis starts after an interval, to give cassandra-stress on step 1 to stabilize
Keep in mind that the suite libraries are flexible, and will allow you to set scenarios that differ from this base one.