Skip to content

Performance Testing Guidelines

Simon Gaeremynck edited this page Dec 13, 2013 · 13 revisions

Environment setup

Setting an entire environment up from scratch can be done with oae-provisioning, slapchop and fabric:

fab ulous:performance

This should create all the machines, run the puppet scripts and give you a working environment.

Once your environment is up and running, it's probably a good idea to stop the puppet service on all the machines as you don't want it to run and possible restart services during a test. From the puppet master machine:

mco service puppet stop

Environment sanity-checks

Before you start loading data and running tests, it's usually a good idea to see if the environment is well balanced. A quick and easy way to check that is to ssh into the machines and run a sysbench test:

apt-get install -y sysbench
sysbench --test=cpu --cpu-max-prime=20000 run

That should result in something like:

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
    total time:                          33.4556s
    total number of events:              10000
    total time taken by event execution: 33.4531
    per-request statistics:
         min:                                  2.83ms
         avg:                                  3.35ms
         max:                                 19.62ms
         approx.  95 percentile:               4.65ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   33.4531/0.00

The "important" number here is the avg execution time. That should be similar across the various groups. For example, a decent number on the app/activity nodes is around 30. Joyent can't always guarantee a solid distribution of nodes, so if a couple of them are off by a factor of 2 it's a good idea to trash those machines and fire up new ones. This can be done with slapchop/fabric like so:

slapchop -d performance destroy -i app0 -i app2 -i activity1
fab "provision_machines:performance,app0;app2;activity1"

Generating/loading the data

If you're start from a fresh environment you need to generate and load data. If you've already done a dataload and are following this guide, you can restore a backup in cassandra and start from there Data can be generated with OAE-model-loader. Ensure you have all the user pictures, group pictures, ... Generating:

nohup node generate.js -b 10 -t oae -u 1000 -g 2000 -c 5000 -d 5000 &

Loading:

  • Create a tenant with the oae alias
  • Disable reCaptcha
  • Disable the activity servers, as the activities that get generated by the model loader would kill the db/activity-cache servers. On the puppet machine you can run: mco service hilary stop -I activity0 -I activity1 -I activity2
  • Start the dataload: nohup node loaddata.js -h http://oae.oae-performance.oaeproject.org -b 10 -s 0 -c 2

It's important that the dataload ends without any errors. If some users, groups or content failed to be created you will end up with a bunch of 400's in the tsung tests which makes them hard to read/interpret.

Now that your data is in cassandra, its a good idea to take a backup of it so we can restore it for the next test.

Restoring a cassandra backup

Generating a tsung test

Once your data has been generated/loaded you can generate a tsung test with node-oae-tsung which should be at /opt/node-oae-tsung.

node main.js -s /opt/OAE-model-loader/scripts -b 10 -o ./name-of-feature-you-are-testing

That should give you a directory at /opt/node-oae-tsung/name-of-feature-you-are-testing with the tsung.xml and the properly formatted csv files that tsung can use.

Running a tsung test

cd /opt/node-oae-tsung/name-of-feature-you-are-testing
nohup tsung -f tsung.xml -l /usr/share/nginx/www/name-of-feature-you-are-testing start &

General tips

Cleaning cassandra data

From the puppet master node:

mco service dse stop -W oaeservice::hilary::dse

On each db node

rm -rf /data/cassandra/*
rm -rf /var/lib/cassandra/*
rm -rf /var/log/cassandra/*

# Ensure that the cassandra user has r/w access on all those directories
chown cassandra:cassandra /data/cassandra
chown cassandra:cassandra /var/lib/cassandra
chown cassandra:cassandra /var/log/cassandra

Then start them back up one-by-one

service dse start

Might be best to restart opscenterd (from the monitor machine)

service opscenterd restart