Skip to content

Commit 26156eb

Browse files
committed
Merge branch 'bad-honnef'
2 parents 6bf63fe + d17d1cc commit 26156eb

16 files changed

+3758
-0
lines changed

bad-honnef/YES_JUST_YES.png

10.7 KB
Loading

bad-honnef/bad-honnef-2018-slides.tex

Lines changed: 583 additions & 0 deletions
Large diffs are not rendered by default.

bad-honnef/bad-honnef-2018.tex

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
\documentclass{llncs}
2+
% Encoding and lang
3+
\usepackage[T1]{fontenc}
4+
\usepackage[utf8]{inputenc}
5+
\usepackage[english]{babel}
6+
7+
% Graphical packages
8+
% \usepackage{graphicx}
9+
\usepackage{xcolor}
10+
11+
12+
% Specialized packages
13+
% \usepackage{syntax} % Grammar definitions
14+
% \usepackage{verbatim}
15+
\usepackage{listings} % Code
16+
% \usepackage{xspace} % Useful for macros
17+
18+
\usepackage[noabbrev,nameinlink,capitalize]{cleveref}
19+
\usepackage{hyperref}
20+
21+
% Custom macros
22+
\input{../prelude}
23+
\bibliographystyle{plain}
24+
25+
\begin{document}
26+
\title{Generating Tests for Regular Expression Engines}
27+
28+
\author{Gabriel Radanne \and Peter Thiemann}
29+
\institute{University of Freiburg, Germany \\
30+
\email{\{radanne,thiemann\}@informatik.uni-freiburg.de}
31+
}
32+
%
33+
34+
35+
% \author{Peter Thiemann}
36+
% \affiliation{
37+
% \institution{University of Freiburg}
38+
% \country{Germany}
39+
% }
40+
41+
42+
\maketitle
43+
44+
\begin{abstract}
45+
\input{../abstract}
46+
\end{abstract}
47+
48+
\section{Introduction}
49+
50+
Regular languages are everywhere. Due to their apparent simplicity and
51+
their concise representability in the form of regular expressions,
52+
regular languages are used for many text processing
53+
applications, reaching from text editors
54+
\cite{DBLP:journals/cacm/Thompson68} to extracting data from web
55+
pages.
56+
57+
Consequently, there are many algorithms and libraries that implement
58+
parsing for regular expressions. Some of them are based on Thompson's
59+
translation from regular expressions to nondeterministic finite
60+
automata and then apply the powerset construction to obtain a
61+
deterministic automaton. Others are based on Brzozowski's derivatives
62+
\cite{Brzozowski1964} and
63+
map a regular expression directly to a deterministic
64+
automaton. Antimirov's partial derivatives \cite{Antimirov96Partial}
65+
yield another transformation into a nondeterministic automaton. An
66+
implementation based on Glushkov automata has been proposed
67+
\cite{DBLP:conf/icfp/FischerHW10} with decent performance.
68+
Russ Cox's webpage gives a good overview
69+
of efficient implementations of regular expression search. It includes
70+
a discussion of his implementation of Google's RE2 \cite{cox10:_regul_expres_match_wild}.
71+
72+
Some of the algorithms for regular expression matching are rather
73+
intricate and the natural question arises how to test these algorithms.
74+
While there online repositories with reams of real life regular
75+
expressions \cite{regul_expres_librar}, there are no satisfactory
76+
generators for test inputs. It is not too hard to come up with
77+
generators for strings that match a given regular expression, but that
78+
is only one side of the medal. On the other hand, the algorithm should
79+
reject strings that do not match the regular expression, so it is
80+
equally important to come up with strings that do \textbf{not} match.
81+
82+
This work presents generator algorithms for extended regular expressions that
83+
contain intersection and complement beyond the regular operators. The
84+
presence of the complement operator enables the algorithms to generate
85+
strings that certainly do not match a given (extended) regular
86+
expression.
87+
88+
Our implementations are useful in practice. They are guaranteed to be
89+
productive and produce total outputs. That is, a user can gauge the
90+
string size as well as the number of generated strings without risking
91+
partiality.
92+
93+
Even though the implementations
94+
are not tuned for efficiency they generate
95+
languages at a rate between $1.3\cdot10^3$ and $1.4\cdot10^6$ strings per
96+
second, for Haskell, and up to $3.6\cdot10^6$ strings per second, for
97+
OCaml. The generation rate depends on the density of the language.
98+
99+
\begin{itemize}
100+
\item Web app available at \url{https://regex-generate.github.io/regenerate/}
101+
\item OCaml code available at \url{https://github.com/regex-generate/regenerate}
102+
\item Haskell code available at \url{https://github.com/peterthiemann/re-generate}
103+
\end{itemize}
104+
\bibliography{../biblio}
105+
\end{document}
106+
107+
%%% Local Variables:
108+
%%% mode: latex
109+
%%% TeX-master: t
110+
%%% End:
Loading
Loading
36.4 KB
Loading

0 commit comments

Comments
 (0)