Skip to content

Commit 7229078

Browse files
committed
cut down a bit
1 parent 38bae12 commit 7229078

File tree

1 file changed

+61
-66
lines changed

1 file changed

+61
-66
lines changed

related.tex

Lines changed: 61 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -4,25 +4,23 @@ \section{Related Work}
44
\subsubsection*{Regular Language Generation}
55

66
\citet{DBLP:journals/actaC/Makinen97} describes a method to enumerate
7-
the words of a regular language $L$ in length-lexicographic
8-
ordering. It relies on the regular language being defined by a
9-
deterministic finite automaton. To generate words up to length $n$,
10-
this method requires to precompute, for each $i\le n$, the
11-
lexicographically minimal and maximal word of length $i$ in $L$. This
12-
precomputation takes time $O(n)$.
13-
14-
The actual enumeration starts with the precomputed minimal word of
15-
length $n$ and repeatedly computes the lexicographically next word
16-
until it reaches the maximal word of length $n$. Each such step requires time $O(n)$.
17-
18-
The same approach can be used for enumerating the language of certain
19-
(prefix-free, length complete) context-free grammars, too.
20-
21-
Compared to our approach, M{\"{a}}kinen requires a deterministic
7+
the words of a regular language $L$, given by a deterministic finite
8+
automaton, in length-lexicographic ordering. To generate words up to
9+
length $n$, this method precomputes in time $O(n)$, for each $i\le n$,
10+
the lexicographically minimal and maximal word of length $i$ in $L$.
11+
%
12+
Enumeration starts with the minimal word of
13+
length $n$ and repeatedly computes the lexicographically next word in $L$
14+
until it reaches the maximal word of length $n$. Each step requires time $O(n)$.
15+
16+
% The same approach can be used for enumerating the language of certain
17+
% (prefix-free, length complete) context-free grammars, too.
18+
19+
In comparison, M{\"{a}}kinen requires a deterministic
2220
finite automaton, which can be obtained from a regular expression in
2321
worst-case exponential time. Complementation is not mentioned, but it
24-
can obviously be handled. M{\"{a}}kinen would give rise to a
25-
productive definition by segments because the computation of minimal
22+
could be handled. M{\"{a}}kinen would give rise to a
23+
productive definition by cross sections because the computation of minimal
2624
and maximal words could be done incrementally, but which is not mentioned
2725
in the paper.
2826

@@ -31,44 +29,40 @@ \subsubsection*{Regular Language Generation}
3129

3230
\citet{DBLP:journals/jfp/McIlroy04} implements the enumeration of all
3331
strings of a regular language in Haskell. He develops two approaches,
34-
one based on interpreting regular expressions, the other (unrelated to
35-
ours) using a shallow embedding of nondeterministic finite
36-
automata. The first approach is inspired by an earlier note by Misra
37-
\cite{misra11:_enumer_strin_regul_expres} and uses operators based on
38-
a length-lexicographically increasing list representation similar to
39-
our first proposal.
40-
41-
The implementation of union is identical to ours, but intersection and
42-
difference operations are not considered and hence complementation is
43-
not considered, either. The implementation of concatenation is the
44-
generic multiplication operation for sequences / power series
45-
\cite{DBLP:journals/jfp/McIlroy99} instantiated for the semiring
46-
of union and concatenation of languages. Unlike our implementation, the generic
47-
implementation does not take advantage of the fact that many
48-
intermediate results can be generated in the correct ordering and hence
49-
requires many more union operations (one for each output string versus
50-
one for each length between $0$ and $n$ where $n$ is the length of
51-
the output string). Moreover, the generation method is reported to
52-
be very inefficient and thus not suitable for generating test inputs
53-
at a large scale.
54-
55-
\citet{DBLP:journals/tcs/AckermanS09} improve on M{\"{a}}kinen's
56-
algorithm by working directly on a nondeterministic finite automaton
57-
and by proposing more efficient algorithms to compute minimal words of
58-
a given length and to proceed to the next word of same length in the
59-
language. An empirical study compares a number of variations of the
60-
enumeration algorithm.
61-
62-
Their enumeration algorithm iteratively invokes a cross-section
63-
enumeration, where the $n^{\text{th}}$ cross-section of a language $L$ is
64-
$L \cap \Sigma^n$, that is, a segment in our terminology.
65-
32+
one based on interpreting regular expressions inspired by
33+
\citet{misra11:_enumer_strin_regul_expres} and discussed in
34+
Section~\ref{sec:naive-approach}, the other (unrelated to ours) using
35+
a shallow embedding of nondeterministic finite automata.
36+
37+
% The implementation of union is identical to ours, but intersection and
38+
% difference operations are not considered and hence complementation is
39+
% not considered, either. The implementation of concatenation is the
40+
% generic multiplication operation for sequences / power series
41+
% \cite{DBLP:journals/jfp/McIlroy99} instantiated for the semiring
42+
% of union and concatenation of languages. Unlike our implementation, the generic
43+
% implementation does not take advantage of the fact that many
44+
% intermediate results can be generated in the correct ordering and hence
45+
% requires many more union operations (one for each output string versus
46+
% one for each length between $0$ and $n$ where $n$ is the length of
47+
% the output string). Moreover, the generation method is reported to
48+
% be very inefficient and thus not suitable for generating test inputs
49+
% at a large scale.
50+
51+
\citet{DBLP:journals/tcs/AckermanS09} improve M{\"{a}}kinen's
52+
algorithm by working on a nondeterministic finite automaton
53+
and by proposing faster algorithms to compute minimal words of a given
54+
length and to proceed to the next word of same length. An empirical
55+
study compares a number of variations of the enumeration algorithm.
56+
%
57+
% Their enumeration algorithm iteratively invokes a cross-section
58+
% enumeration, where the $n^{\text{th}}$ cross-section of a language $L$ is
59+
% $L \cap \Sigma^n$, that is, a segment in our terminology.
60+
%
6661
\citet{DBLP:conf/cocoon/AckermanM09} present three further
67-
improvements on their enumeration algorithms that exhibit better
68-
asymptotic complexity. Their empirical study indicates that the
69-
improved algorithms perform better in practice.
62+
improvements on their enumeration algorithms with better asymptotic
63+
complexity. The improved algorithms perform better in practice, too.
7064

71-
Compared to our work, Ackerman's approach and its subsequent improvement does not incur an
65+
Ackerman's approach and its subsequent improvement does not incur an
7266
exponential blowup when converting from a regular expression. As it is based on
7367
nondeterministic finite automata, complementation cannot readily be
7468
supported. Moreover, the approach is not compositional.
@@ -78,11 +72,11 @@ \subsubsection*{Regular Language Generation}
7872
% account for the size $s$ of the automaton, and obtain $O (s^2n^2)$ for
7973
% the computation of minimal words.
8074

81-
As one example of a line of unrelated work with deceivingly similar
82-
titles, \citet{DBLP:conf/wia/LeeS04} discuss enumerating regular
83-
expressions and their languages. The goal of this work is aims to find
84-
bounds on the \textbf{number of languages} that can be represented
85-
with regular expressions and automata of a certain size.
75+
% As one example of a line of unrelated work with deceivingly similar
76+
% titles, \citet{DBLP:conf/wia/LeeS04} discuss enumerating regular
77+
% expressions and their languages. The goal of this work is aims to find
78+
% bounds on the \textbf{number of languages} that can be represented
79+
% with regular expressions and automata of a certain size.
8680

8781

8882

@@ -116,7 +110,7 @@ \subsubsection*{Test Data Generation}
116110
contexts. In property testing, input data for the function to test is
117111
described via a set of combinators while the actual generation is
118112
driven by a pseudo-random number generator. One difficulty of this
119-
approach is to find the correct distribution of inputs that will
113+
approach is to find a distribution of inputs that will
120114
generate challenging test cases. This problem already arises with
121115
recursive data types, but it is even more pronounced when generating
122116
test inputs for regular expressions because, as explained in
@@ -128,13 +122,14 @@ \subsubsection*{Test Data Generation}
128122
positive and negative input for these randomly generated regular
129123
expressions.
130124

131-
\citet{DBLP:journals/jfp/NewFFM17} are concerned with the enumeration
132-
of elements of various data structures. Their approach is
133-
complementary to test-data generators. They exploit bijections between
134-
natural numbers and the data domain and develop a quality criterion
135-
for data generators based on a notion of fairness. It would be
136-
interesting to investigate the connection between their enumeration
137-
strategies and a direct representation of formal power series.
125+
\citet{DBLP:journals/jfp/NewFFM17} enumerate elements of various data
126+
structures. Their approach is complementary to test-data
127+
generators. It exploits bijections between natural numbers and the
128+
data domain and develops a quality criterion for data generators based
129+
on fairness.
130+
% It would be interesting to investigate the
131+
% connection between their enumeration strategies and a direct
132+
% representation of formal power series.
138133

139134
Crowbar~\cite{crowbar} is a library that combines property testing
140135
with fuzzing. In QuickCheck, the generation is driven by a random

0 commit comments

Comments
 (0)