Skip to content

Commit 0ead90c

Browse files
committed
Add a blurb about testing.
1 parent 54a2d0b commit 0ead90c

File tree

3 files changed

+80
-0
lines changed

3 files changed

+80
-0
lines changed

biblio.bib

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -421,3 +421,20 @@ @inproceedings{DBLP:conf/cocoon/AckermanM09
421421
publisher = {Springer},
422422
year = {2009}
423423
}
424+
425+
426+
@article{DBLP:journals/toms/Vitter87,
427+
author = {Jeffrey Scott Vitter},
428+
title = {An efficient algorithm for sequential random sampling},
429+
journal = {{ACM} Trans. Math. Softw.},
430+
volume = {13},
431+
number = {1},
432+
pages = {58--67},
433+
year = {1987},
434+
url = {http://doi.acm.org/10.1145/23002.23003},
435+
doi = {10.1145/23002.23003},
436+
timestamp = {Tue, 27 Apr 2010 09:25:58 +0200},
437+
biburl = {https://dblp.org/rec/bib/journals/toms/Vitter87},
438+
bibsource = {dblp computer science bibliography, https://dblp.org}
439+
}
440+

main.tex

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,7 @@
118118
\input{improvements}
119119
\input{ocaml}
120120
\input{bench}
121+
\input{testing}
121122
\input{related}
122123
\input{conclusions}
123124

testing.tex

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
\section{Testing}
2+
\label{sec:test}
3+
4+
We implemented our algorithms as libraries to create
5+
test harnesses for regular expression implementations.
6+
We used this library to implement a test harness for the \ocaml \code{Re} library
7+
\footnote{\url{https://github.com/ocaml/ocaml-re}},
8+
which is one of the most used \ocaml regular expression implementation.
9+
%
10+
We also used created a set of test cases for students projects in \haskell
11+
which helped them write better implementations.
12+
13+
Concretely, the library provides a test-harness which generate
14+
regular expressions along with both positive and negative samples. The
15+
implementation under test can then compile the regular expression and apply it
16+
efficiently on the samples.
17+
The library exposes the sample generation as a generator in the style of
18+
property testing such as QuickCheck~\cite{DBLP:conf/icfp/ClaessenH00}.
19+
This allows to use all the tooling already available in such libraries, for
20+
example to generate arbitrary regular expressions.
21+
%
22+
The simplified API of the \ocaml version is shown below.
23+
The main function \code{arbitrary n alphabet} returns a generator
24+
which provides on average \code{n} samples using the given alphabet.
25+
26+
\begin{lstlisting}
27+
type test = {
28+
re : Regex.t ;
29+
pos : Word.t list ;
30+
neg : Word.t list ;
31+
}
32+
val arbitrary:
33+
int -> Word.char list -> test QCheck.arbitrary
34+
\end{lstlisting}
35+
36+
Regular expressions are represented as an algebraic datatypes which
37+
are easy to generate using QuickCheck-like libraries.
38+
The only constraints we place on the generated regular expressions
39+
is to restrict the star-height to less than 3. While our technique can be used
40+
for regular expressions with several nested repetitions, it can cause
41+
occasional slowdown and large memory consumption which are inconvenient
42+
in the context of automated testings.
43+
44+
Our testing library only returns a finite number of samples. However, the
45+
language can (and often will) be infinite. We want to generate test-cases that will,
46+
on average, exercise the most the implementation to test. For this purpose, we
47+
use a technique similar to the fast approximation
48+
for reservoir sampling~\citep{DBLP:journals/toms/Vitter87}.
49+
When considering
50+
the sequence of words in the language, we skip $k$ elements where
51+
$k$ follows a power law of mean $n$. We then return the given
52+
sample, and stop the sampling with a probability $1/n$.
53+
%
54+
Using this technique, we obtain on average $k$ samples that are regularly
55+
spaced out in the front of the stream, but will occasionally skip ahead
56+
and return very large words. This has proven satisfying at finding good
57+
testing samples in practice.
58+
59+
%%% Local Variables:
60+
%%% mode: latex
61+
%%% TeX-master: "main"
62+
%%% End:

0 commit comments

Comments
 (0)