GitHub - jmroot/iomkc: I/O Markov chain tools

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.gitignore		.gitignore
GPL.txt		GPL.txt
IOChain.py		IOChain.py
OZPLB.txt		OZPLB.txt
README		README
btrecord.c		btrecord.c
btrecord.diff		btrecord.diff
btrecord.h		btrecord.h
btrecordmodule.c		btrecordmodule.c
buildChain.py		buildChain.py
directrwmodule.c		directrwmodule.c
dumpChain.py		dumpChain.py
list.h		list.h
prep.sh		prep.sh
runChain.py		runChain.py
setup.py		setup.py
traceStats.py		traceStats.py
zipUtils.py		zipUtils.py

Repository files navigation

I/O Markov chain tools
by Joshua Root <[email protected]>

To build a Markov chain: you'll need some blktrace output, which comes in
the form of one file per CPU, normally named {device}.blktrace.{cpu}. First
do: "./prep.sh {device} OPFILE". Then run "./buildChain.py -i OPFILE
-o CHAINFILE" and a Markov chain built from the trace will be output to
CHAINFILE.

Options -s, -k and -d set the size of the buckets into which the transfer
sizes (in bytes), seek lengths (in sectors) and inter-op delays (in seconds)
will be divided, respectively. Smaller buckets means longer build times
and a larger chain file, in exchange for more accurate distribution of
the values.

To run an I/O workload based on your Markov chain:
first build the C module with "python setup.py build". Then move it
where python will find it with "mv build/lib.*/*.so ." (assuming you're in
the directory containing runChain.py).

Now run: "./runChain.py -i CHAINFILE -d TARGET". TARGET will be opened with
O_DIRECT if available, so you really want it to be a device, e.g. /dev/sdb. The
script will keep reading/writing to/from TARGET until you stop it, e.g. with
ctrl-c. Or, you can specify a maximum number of ops to perform and/or a maximum
time for which to run, using the -n and -t options respectively.

Adding "-p OUT" to the command line will cause runChain to not perform the I/O
operations itself, but instead write a description of them to OUT in btrecord
format. The idea is that you can then use OUT as input for btreplay.

Why the C extension exists:

Reads initially always threw exceptions. Apparently this was because Python
doesn't bother allocating blocksize-aligned buffers when reading from a file
that has been opened with O_DIRECT. I wrote a C extension module that
does it right.

Random implementation-related notes I wrote while developing this stuff:

Use min/max data from trace to determine appropriate buckets into which to
divide transfer sizes, inter-op delays, and seek distance.

Count up number of transitions between each triplet. Store weighted
adjacency matrix (sparse, so use hashing).

Replay by generating I/Os matching current triplet, moving to next based
on column in current matrix row matched by random().

Store rows as lists of cumulative probabilities, e.g. [0.1, 0.3, 0.8, 1.0]