Skip to content

Commit 1b6f45e

Browse files
committed
as received
1 parent 253ed56 commit 1b6f45e

File tree

1 file changed

+342
-0
lines changed

1 file changed

+342
-0
lines changed

reviews.gpce.txt

Lines changed: 342 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,342 @@
1+
Review #31A
2+
===========================================================================
3+
4+
Overall merit
5+
-------------
6+
A. Accept
7+
8+
Reviewer expertise
9+
------------------
10+
Y. Knowledgeable
11+
12+
Paper summary
13+
-------------
14+
The paper proposes an algorithm that, given a (generalised) regular
15+
expression $r$, generates the words accepted by $r$. By supporting
16+
regex complement, the algorithm can also generate words that are _not_
17+
accepted by $r$. The main use case is the generation of
18+
positive/negative test cases for regular expression parsers,
19+
eliminating the need of an oracle.
20+
21+
The basis of the paper is a work by McIlroy (2004), that also
22+
generates the strings matching a regex, but has two limitations:
23+
inefficient language concatenation, and lack of productivity for
24+
language intersection and difference. To tackle these issues, the key
25+
idea of the paper is to adopt a _segment representation_ of the
26+
language of a regex: roughly, the language is rendered as a lazy
27+
stream of lists of words of the same length. This allows language
28+
operations (and string generation) to be implemented productively, and
29+
efficiently.
30+
31+
The paper describes two implementations of the algorithm (in Haskell
32+
and OCaml), and provides several benchmarks that show good performance
33+
(from $10^3$ to $10^6$ strings per second, depending on the regex).
34+
35+
Comments for author
36+
-------------------
37+
The paper is a good fit for GPCE, and I believe that it should be
38+
accepted.
39+
40+
**Novelty:**
41+
The paper directly builds upon previous works (McIlroy'04), but to
42+
best of my knowledge, its key insight (i.e., the "Generation by Cross
43+
Section" in Section 4), and the resulting optimisations, are both
44+
novel, and interesting.
45+
46+
**Significance:**
47+
The paper provides a nice contribution with clear applications, and
48+
significantly improves the state of the art.
49+
50+
**Evidence:**
51+
The benchmarks show good performance and are discussed in detail,
52+
explaining the advantages and disadvantages of the various
53+
optimisations introduced throughout the paper. The availability of the
54+
implementations as open source software is a nice addition,
55+
and allows to replicate the results and build upon them.
56+
57+
**Clarity:**
58+
The paper is clear, polished and generally well written (modulo a few
59+
things discussed below): it nicely guides the reader by first
60+
presenting a basic version of the algorithm (based on McIlroy'04),
61+
highlighting its limitations, and addressing them with different
62+
strategies, that lead to more sophisticated implementations.
63+
Related works are clearly illustrated.
64+
65+
### Improvements
66+
67+
Although I liked the paper, I would recommend to improve two things:
68+
69+
#### Power series?
70+
71+
I did not understand how "power series" are used in the treatment:
72+
73+
* "power series" is one of the paper keywords
74+
75+
* there are various references to McIlroy'99 (which is, indeed, a
76+
paper on power series in Haskell) - but all such references are
77+
wrong and should point to McIlroy'04 (on regex language generation)
78+
79+
* line 371 shows a "power series representation" of a language, but I
80+
do not understand it: where are the "coefficients", mentioned a few
81+
lines later? What does $L_n x^n$ mean?
82+
83+
* Similarly, I do not understand the reference to the "stream of
84+
boolean coefficients" discussed (as future work) in lines 1299-1303
85+
86+
However, besides a few unclear reference to "power series
87+
coefficients" on page 4, the paper is understandable, even if one
88+
ignores lines 369-374, and just uses the `SegLang` language
89+
representation in Haskell (line 407), that can be intuitively related
90+
to the explanation on lines 362-368.
91+
92+
My recommendation is: please clarify how power series are used in the
93+
treatment - or just remove the references (and lines 369-374) if they
94+
are not necessary.
95+
96+
#### Benchmarks
97+
98+
Figures 13, 14 and 15 show the plot of the number of generated strings
99+
vs. time; but how was the data produced? Do the plots represent a
100+
single execution? Or, do they show the average of multiple (how many?)
101+
repetitions?
102+
103+
I recommend to provide more details. If the benchmarks are (as they
104+
should be) an average of multiple executions, it would be nice to also
105+
provide some information about the standard deviation: e.g., it would
106+
be interesting to know if some algorithm is also jittery, and shows a
107+
wide difference of performance across repetitions.
108+
109+
### Minor notes
110+
111+
Line 84/85: missing space after "investigated"
112+
113+
Line 366/367: the spacing of "th" in $n^{{t}{h}}$ is a bit odd
114+
115+
Line 483/484: "loose" should be "lose"
116+
117+
118+
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
119+
120+
121+
Review #31B
122+
===========================================================================
123+
124+
Overall merit
125+
-------------
126+
A. Accept
127+
128+
Reviewer expertise
129+
------------------
130+
Y. Knowledgeable
131+
132+
Paper summary
133+
-------------
134+
The paper investigates a tester for REs that generates positive and negative
135+
examples. Generating negative examples has not been done according
136+
to paper. The submission goes beyond REs in that they can deal with
137+
"extended regular expressions that contain intersection and
138+
complement beyond the standard regular operators".
139+
140+
The theory is buttressed by not one but 2 implementations (one in
141+
Haskell and one in OCaml). During most of the review (see end for
142+
the exception) I have not looked at the linked GitHub
143+
repos, as that would deanonymise the authors. I trust that the
144+
implementation is solid.
145+
146+
The paper feels solid too. Clearly the author(s) have thought about
147+
this subject deeply. The deep understanding is also visible in the
148+
clarity of the presentation (e.g. clarity of the research question).
149+
150+
The problem tackled is important, as REs are one of the most
151+
widely used tools in a programmer's toolbox. I'm not a researcher in
152+
REs and random testing myself, so I cannot comment on the novelty /
153+
difficulty of the approach. I hope another review with more domain
154+
expertise can comment on novelty.
155+
156+
The paper also has a thorough benchmarking section. Quite how reliable
157+
those benchmarks are, I am unable to say, but that's because
158+
benchmarking software performance in a reliable and meaningful way, is
159+
an extremely hard (and in my opinion unsolved).
160+
161+
My main complaint is that a central idea of the paper is to make
162+
QuickCheck to do the work of test-case generation. But the paper
163+
does not say this explicitly anywhere and I would have almost
164+
misread the paper because of this. I was looking for the punchline
165+
about how the test-cases are generated in Ocaml, since I did not
166+
expect that Ocaml comes with a suitable QuickCheck as well.
167+
I only saw this when I looked at the Ocaml code for this.
168+
169+
Comments for author
170+
-------------------
171+
- Use of term "productive". It's not explained. E.g. on Page 1:
172+
173+
"Our algorithms produce lazy streams, which are guaranteed to
174+
be productive. A user can gauge the string size or the number of
175+
generated strings without risking partiality."
176+
177+
I'm also not sure what the last sentence in the quote means.
178+
179+
- Section 3.2: Paper gives a Haskell definition, but there are now
180+
very general enumeration libraries, in particular [A]. I'm sure REs
181+
as described in Section 3.2, are a special case of the approach in
182+
[A]. Please clarify. In particular, it would be interesting to see a
183+
benchmark comparing REs in the paper with [A].
184+
185+
- Page 4, "By applying the usual spiel of representing": Interesting
186+
sentence. As an outsider, I am not sure what the "usual spiel"
187+
is. Maybe add a reference?
188+
189+
- Implementation in Haskell and OCaml: would it make sense to
190+
implement both (strict and lazy) in Scala, which offers strict and
191+
lazy in a unified way? Using just one language would also make
192+
benchmarking (somewhat) easier.
193+
194+
- Page 10 "Languages are roughly ordered by size/density". I wonder
195+
how you order by density. After all, density is an asymptotic
196+
concept.
197+
198+
- Page 11 "We restrict generated regular expressions to star-heights
199+
less than 3". Maybe discuss whether that affects the theory or
200+
now. It seems to me an entirely pragmatic choice. And one that is
201+
fine in practise -- one never needs large star heights in
202+
applications, at least in my (limited) experience.
203+
204+
- Page 11 "we use a technique similar to the fast approximation for
205+
reservoir sampling [...] This approach has proven satisfactory at
206+
finding good testing samples in practice". I'm slightly
207+
surprise. Does this not have a deep influence on the efficiency of
208+
the test generation? Are you using an off-the-shelf
209+
researvoir-sample and hook in your RE generators as black boxes, or
210+
could the system be improved if both parts were integrated more
211+
tightly? Note that there is quite a bit of literature on (a)
212+
enumerating combinatorial structure, and (b) sampling them
213+
efficiently, primarily coming from P Flajolet and his students, see
214+
e.g. [B-G].
215+
216+
- Page 12 "Their approach is complementary to test-data generators":
217+
Why is it complementory? Apart from the fairness issue it seems to
218+
be dealing with the same problem.
219+
220+
221+
---------------------- References ----------------------
222+
223+
[A] I. Kuraj, V. Kuncak, D. Jackson, Programming with Enumerable Sets
224+
of Structures.
225+
226+
[B] P. Flajolet, P. Zimmerman, B. Van Cutsem, A Calculus for the
227+
Random Generation of Combinatorial Structures. NB, this work is often
228+
cited as A Calculus for the Random Generation of Labelled
229+
Combinatorial Structures, even by the authors.
230+
231+
[C] P. Flajolet, A Calculus of Random Generation.
232+
233+
[D] P. Duchon, P. Flajolet, G. Louchard, G. Schaeffer, Boltzmann
234+
Samplers for the Random Generation of Combinatorial Structures.
235+
236+
[E] O. Bodini, Y. Ponty, Multi-dimensional Boltzmann Sampling of
237+
Languages.
238+
239+
[F] E. Fusy, Random generation of combinatorial structures using
240+
Boltzmann samplers.
241+
242+
[G] P. Lescanne, Boltzmann samplers for random generation of lambda
243+
terms.
244+
245+
246+
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
247+
248+
249+
Review #31C
250+
===========================================================================
251+
252+
Overall merit
253+
-------------
254+
B. Weak accept
255+
256+
Reviewer expertise
257+
------------------
258+
Y. Knowledgeable
259+
260+
Paper summary
261+
-------------
262+
This paper implements an efficient generator for words matching a given extended regular expression, with the goal of
263+
stress-testing regex matcher implementations. The generator aims to produce all the words in a given language, efficiently and
264+
strictly lexicographically. It extends the classic straightforward McIlroy implementation with a number of algorithmic
265+
improvements, such as a representation segmented by a word length and a careful control of language
266+
finiteness/eagerness. The authors provide two implementations, in Haskell and Ocaml, with different choices and
267+
trade-offs to enable more or less efficient enumeration. They then evaluate the performance on these choices on several
268+
selected representative regexes.
269+
270+
Comments for author
271+
-------------------
272+
## Strengths
273+
+ Exceptionally clear and well-written paper
274+
+ Simple but effective extension of an existing algorithm with several efficient implementations
275+
+ A comparative study of performance of different implementation choices on carefully selected regexes
276+
+ The authors mention (but do not describe) practical applications on their enumerator: (a) a tester for a
277+
standard-library regex parser, and (b) a test generator for students studying Haskell.
278+
279+
## Weaknesses & Comments
280+
- The paper makes no study of the effectiveness of the approach for typical real-life regexes. It mentions real-life
281+
regexes in the two applications above, but all the experiments are performed only on seven very simple (albeit carefully
282+
selected as worst-case representative) regexes. A study of the generator's actual effectiveness as a practical
283+
tester would warrant a higher acceptance score. As is stands, the experiments of the generator's effectiveness are, in
284+
effect, ablation studies: they illustrate the impact of different choices in the implementation on the generator's
285+
performance on textbook worst-class cases, but not on the generator's real-life usage.
286+
Examples of interesting databases:
287+
* General-purpose regexes in e.g. RegexLib
288+
* Regexes for XSS/SQL vulnerability detection in firewall filters, e.g. Mod-Security, PHPIDS, Expose
289+
* Regexes in real-life parsers, e.g. Chrome
290+
- I did not appreciate the notion of the "power series" that was introduced and then never used in its capacity as a
291+
power series (moreover, emphasized as distinct from the classical notion of formal power series). Simply presenting
292+
the segment representation as a stratified representation of a language as a union of its length-n cross sections is
293+
enough.
294+
- In practice, a regex generator would not be used to enumerate all strings up to a certain bound, but as a source to
295+
finitely _sample_ matching strings randomly, as the authors rightfully observe in lines 1142-1155.
296+
However, their implementation based on reservoir sampling does not guarantee any probabilistic
297+
coverage over the branches of the underlying regex. Ideally, a sampler would be either (a) uniform over branches in the regex, or (b) proportional [perhaps logarithmically] to the sizes of the respective language classes of the branches in the regex. I'm not an expert here, so unsure if such a sampler is possible, but if it is, it
298+
would ideally also be composable and inductively defined. Tracking of exhausted lists for Lemma 5.1 would also have to
299+
be adapted.
300+
301+
To clarify, I really like the paper, think it addresses an important problem with a good insight, well-targeted to GPCE, and should be accepted. I lower my score from A for B for the first weakness (incomplete evaluation), the other ones are rather comments/suggestions for the authors.
302+
303+
304+
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
305+
306+
307+
Review #31D
308+
===========================================================================
309+
310+
Overall merit
311+
-------------
312+
C. Weak reject
313+
314+
Reviewer expertise
315+
------------------
316+
X. Expert
317+
318+
Paper summary
319+
-------------
320+
This paper describes how to generate both positive and negative examples of words that match extended regular expressions. The focus is on creating test cases that can be used to test regular expression engines.
321+
322+
Comments for author
323+
-------------------
324+
Overall the paper was quite clear. The mixture of mathematical definitions and implementations in Haskell and OCaml was fine for me but may not be for a reader who is not familiar with these languages.
325+
326+
The need for better test case generation for regular expression engines is well-motivated. The main concern I have is that having motivated this need, most of the paper is actually about a more general problem: generating words that are recognised or not recognised in a language specified by regular expressions.
327+
328+
These two problems are related but are not the same. The paper focuses mostly on coming up with words but not on their suitability as test cases. On the one hand, getting words that are in the language is a win since simpler brute force methods can't be expected to do so very well. But there is little discussion of evidence in the paper on the issue of whether these generated words actually are good test cases. Perhaps more problematically, words that aren't in the language are generated but it's not clear that their distribution is useful for negative testing.
329+
330+
The exception is in Section 8 where you explain using a method that skips elements in the generated words to pick the test cases with an aim to "exercise the implementation under test as much as possible". There is no definition of "as much as possible" and no real evaluation of this aspect, just a comment that it has "proven satisfactory" "in practice". The abstract says that the test cases are "more than adequate" which is pretty vague. Given that the main claim of your paper is that your generated words help to improve testing, I expected to see much more robust evaluation.
331+
332+
If the paper is accepted, please adjust the title, abstract and claimed contributions to better match the rest of the content. Since very little is discussed or evaluated in terms of test cases, I think it would be appropriate to only have testing as a general motivation, but not a central contribution of the work.
333+
334+
Putting all of that aside, I can see the benefits of your generation approach compared to previous ones, particularly concerning productivity. It seems to be an improvement over previous approaches and the implementations in a lazy and a strict language shows wide applicability.
335+
336+
Detailed comments:
337+
338+
170: "reasonable efficiency", how do you define "reasonable"?
339+
340+
463-4: "loose" -> "lose"
341+
342+
Some of the related work (e.g., para at 1226) lists some other work but doesn't do any comparison with the current paper.

0 commit comments

Comments
 (0)