|
| 1 | +\section{Testing} |
| 2 | +\label{sec:test} |
| 3 | + |
| 4 | +We implemented our algorithms as libraries to create |
| 5 | +test harnesses for regular expression implementations. |
| 6 | +We used this library to implement a test harness for the \ocaml \code{Re} library |
| 7 | +\footnote{\url{https://github.com/ocaml/ocaml-re}}, |
| 8 | +which is one of the most used \ocaml regular expression implementation. |
| 9 | +% |
| 10 | +We also used created a set of test cases for students projects in \haskell |
| 11 | +which helped them write better implementations. |
| 12 | + |
| 13 | +Concretely, the library provides a test-harness which generate |
| 14 | +regular expressions along with both positive and negative samples. The |
| 15 | +implementation under test can then compile the regular expression and apply it |
| 16 | +efficiently on the samples. |
| 17 | +The library exposes the sample generation as a generator in the style of |
| 18 | +property testing such as QuickCheck~\cite{DBLP:conf/icfp/ClaessenH00}. |
| 19 | +This allows to use all the tooling already available in such libraries, for |
| 20 | +example to generate arbitrary regular expressions. |
| 21 | +% |
| 22 | +The simplified API of the \ocaml version is shown below. |
| 23 | +The main function \code{arbitrary n alphabet} returns a generator |
| 24 | +which provides on average \code{n} samples using the given alphabet. |
| 25 | + |
| 26 | +\begin{lstlisting} |
| 27 | +type test = { |
| 28 | + re : Regex.t ; |
| 29 | + pos : Word.t list ; |
| 30 | + neg : Word.t list ; |
| 31 | +} |
| 32 | +val arbitrary: |
| 33 | + int -> Word.char list -> test QCheck.arbitrary |
| 34 | +\end{lstlisting} |
| 35 | + |
| 36 | +Regular expressions are represented as an algebraic datatypes which |
| 37 | +are easy to generate using QuickCheck-like libraries. |
| 38 | +The only constraints we place on the generated regular expressions |
| 39 | +is to restrict the star-height to less than 3. While our technique can be used |
| 40 | +for regular expressions with several nested repetitions, it can cause |
| 41 | +occasional slowdown and large memory consumption which are inconvenient |
| 42 | +in the context of automated testings. |
| 43 | + |
| 44 | +Our testing library only returns a finite number of samples. However, the |
| 45 | +language can (and often will) be infinite. We want to generate test-cases that will, |
| 46 | +on average, exercise the most the implementation to test. For this purpose, we |
| 47 | +use a technique similar to the fast approximation |
| 48 | +for reservoir sampling~\citep{DBLP:journals/toms/Vitter87}. |
| 49 | +When considering |
| 50 | +the sequence of words in the language, we skip $k$ elements where |
| 51 | +$k$ follows a power law of mean $n$. We then return the given |
| 52 | +sample, and stop the sampling with a probability $1/n$. |
| 53 | +% |
| 54 | +Using this technique, we obtain on average $k$ samples that are regularly |
| 55 | +spaced out in the front of the stream, but will occasionally skip ahead |
| 56 | +and return very large words. This has proven satisfying at finding good |
| 57 | +testing samples in practice. |
| 58 | + |
| 59 | +%%% Local Variables: |
| 60 | +%%% mode: latex |
| 61 | +%%% TeX-master: "main" |
| 62 | +%%% End: |
0 commit comments