README

I/O Markov chain tools
by Joshua Root <jmr@gelato.unsw.edu.au>

To build a Markov chain: you'll need some blktrace output, which comes in
the form of one file per CPU, normally named {device}.blktrace.{cpu}. First
do: "./prep.sh {device} OPFILE". Then run "./buildChain.py -i OPFILE
-o CHAINFILE" and a Markov chain built from the trace will be output to
CHAINFILE.

Options -s, -k and -d set the size of the buckets into which the transfer
sizes (in bytes), seek lengths (in sectors) and inter-op delays (in seconds)
will be divided, respectively. Smaller buckets means longer build times
and a larger chain file, in exchange for more accurate distribution of
the values.

To run an I/O workload based on your Markov chain:
first build the C module with "python setup.py build". Then move it
where python will find it with "mv build/lib.*/*.so ." (assuming you're in
the directory containing runChain.py).

Now run: "./runChain.py -i CHAINFILE -d TARGET". TARGET will be opened with
O_DIRECT if available, so you really want it to be a device, e.g. /dev/sdb. The
script will keep reading/writing to/from TARGET until you stop it, e.g. with
ctrl-c. Or, you can specify a maximum number of ops to perform and/or a maximum
time for which to run, using the -n and -t options respectively.

Adding "-p OUT" to the command line will cause runChain to not perform the I/O
operations itself, but instead write a description of them to OUT in btrecord
format. The idea is that you can then use OUT as input for btreplay.

Why the C extension exists:

Reads initially always threw exceptions. Apparently this was because Python
doesn't bother allocating blocksize-aligned buffers when reading from a file
that has been opened with O_DIRECT. I wrote a C extension module that
does it right.

Random implementation-related notes I wrote while developing this stuff:

Use min/max data from trace to determine appropriate buckets into which to
divide transfer sizes, inter-op delays, and seek distance.

Count up number of transitions between each triplet. Store weighted
adjacency matrix (sparse, so use hashing).

Replay by generating I/Os matching current triplet, moving to next based
on column in current matrix row matched by random().

Store rows as lists of cumulative probabilities, e.g. [0.1, 0.3, 0.8, 1.0]