Description
As the new QCheck.Shrink.string
unit tests document, on a failure path (going through all shrinking candidates without luck) the current approach may be a bit too exhaustive:
qcheck/test/core/QCheck_unit_tests.ml
Lines 93 to 103 in f37621a
By falling back on Shrink.list_spine
we get a nice and short candidate sequence when we stay clear of element reduction:
# Shrink.string ~shrink:Shrink.nil "zzzzz" (fun xs -> Printf.printf "%s\n" Print.(string xs));;
zzz
zzzz
zzz
zzzz
However, exhaustively trying to reduce all O(n) entries, say O(log n) times, brings this to O(n log n) which may be a bit much:
# Shrink.string ~shrink:Shrink.char "zzzzz" (fun xs -> Printf.printf "%s\n" Print.(string xs));;
zzz
zzzz
zzz
zzzz
nzzzz
tzzzz
wzzzz
yzzzz
znzzz
ztzzz
zwzzz
zyzzz
zznzz
zztzz
zzwzz
zzyzz
zzznz
zzztz
zzzwz
zzzyz
zzzzn
zzzzt
zzzzw
zzzzy
We are getting bitten by this in ocaml-multicore/multicoretests, as remarked by @edwintorok here: ocaml-multicore/multicoretests#329 (comment)
As an alternative to avoiding character-shrinks above a certain threshold, one could consider alternative approaches, e.g.
- reducing only the first non-
'a'
char - reducing several
char
s simultaneously"zzzzz"
-> "aaazz" - ...
Activity
edwintorok commentedon May 1, 2023
FWIW I'm using the following string generator/shrinker in my code, it makes the shrinker work reasonably fast, although it is not a general purpose shrinker.
I tried to make 'shrink_all_chars' apply Shrink.char to each char just once, but even that is too slow, even if I write a one-step shrinker using just
Char.chr/Char.code/Bytes.set
and bit shifts (my strings are sometimes in the range of ~64k to reproduce certain bugs), workarounds there are to artifically introduce smaller limits elsewhere in the code such that bugs can still be hit with much shorter strings.edwintorok commentedon May 1, 2023
I think it might be useful to guide the shrinker based on the kind of failure we're shrinking.
E.g. if we got an exception that says it is length related (e.g. various internal invariants around sizes/quota/etc. broken) then try length based shrinking only (char based shrinking would waste a lot of time failing to shrink...), and only once that is exhausted try only 2 more steps:
OTOH if the failure says something invalid about the string itself then you need to try both methods (and hopefully length based shrinking would first cut down the length considerably...)
This would need some kind of user provided exception classifier, and perhaps too specific and would complicate the API too much.
Perhaps a better way would be for the shrinker to collect stats on what worked and what didn't for a specific counterexample reduction attempt (stats to be reset every time another qcheck test runs): if the char based shrinking doesn't result in any reductions then stop enumerating them after a while for a given "type".
It could also try to "optimistically" jump ahead, e.g. if previously shrinking succeeded to reduce length to N bytes, then for the next parameter try length N directly (if it doesn't work fall back to Shrink.int as usual, perhaps noting that N already failed and stop there).
Finally for shrinking list of strings "delta debugging" techniques might be useful (these are usually used to shrink source code for compilers after splitting by toplevel forms): https://github.com/mpflanzer/delta.
This paper in particular gives a nice overview: https://www.st.cs.uni-saarland.de/papers/tse2002/, see Figure 6 for an example. The basic idea is to introduce a shrinking "granularity", and e.g. shrink groups of N/2, then N/4, N/8, etc.
I don't think it would work at the char level (it would in fact generate way more possibilities than QCheck.Shrink.string currently would), but it shows that you don't have to immediately split up things into its tiniest component (string -> char), but try shrinking in larger chunks first.
Finally: why list of chars instead of array of chars? Surely arrays can be indexed more efficiently than lists, or in fact Bytes might be the most efficient form for shrinking.
jmid commentedon May 2, 2023
Thanks for sharing tips, feedback, and pointers!
At the moment, in QCheck(1) we are using the interface in which shrinkers are
'a -> 'a Iter.t
functions, that are restarted after each successful shrink step. This is a fairly common setup within QuickCheck frameworks I think, all derived from the original Haskell implementation. With the FP roots, this interface is not necessarily compatible with delta debugging, which in the above linked presentation seem to require some state to continue reducing in a particular span of "deltas" between invocations.The repeated restarting also explains why an O(n log n) iterator function is too eager/exhaustive, if it is restarted, say O(n) times, we end up spending O(n^2 log n).... 😬
We did have a more exhaustive list strinker at some point, but its performance was terrible on larger lists once we started measuring in #177. The shrinker improvements in #242 (and slightly tweaked in yesterday's #277) thus sticks to a sub-linear O(log n) iterator function for reducing the length of lists and (transitively) strings.
In that light, it is a bit funny in hindsight, to then ruin the resulting
Shrink.string
performance, by spending O(n log n) to reduce each character withShrink.char
in the 'element reduction' phase following the fast 'size reduction'... 😅Good point, I completely agree that the
string
shrinker may very likely be improved by not reducing it tochar list
s - perhaps in particular in the 'element reduction' phase...edwintorok commentedon May 2, 2023
Indeed there is no convenient place to store the "granularity" (unless we do something really "unfunctional" and use an ephemeron to store the last granularity for a particular generated string, or other kinds of globals...), and without it shrinking would perform a lot of unnecessary additional steps (as mentioned in the paper).
Here is the best I could get so far trying to emulate delta debugging, but it is in fact in this form worse than QCheck's current shrinker:
However even if we don't emulate delta debugging it might be worthwhile to reduce a set of intervals first (as shown above those intervals don't need to be stored, they can be an
(int*int) Iter.t
or(int*int) Seq.t
, and then build the strings just at the end, which for large strings can take advantage of fast blits of substrings.jmid commentedon May 10, 2023
Thanks for sharing!
As far as I can tell, this (a) doesn't attempt shrinking of the individual characters and (b) is not sub-linear.
Here's a quick ASCII chart plotting the increasing number of shrinking candidates as the input size goes up:
The latter hurts (as you pointed out earlier), when repeatedly restarting the iterator as QCheck(1)'s shrinker does.
We actually had something closer to this for lists initially added in e7b44ac as a response to #32 and later improved slightly in #63 (and described in #64).
The performance was measured in the shrinker benchmark suite in #177, which demonstrated performance problems on larger input lists.
The current sub-linear list shrinker added in #242 was added in response to this (and slightly tweaked recently in #277).
Here's a plot for comparison (if we disregard the element shrinking):
As I see it, this nice logarithmic ASCII curve is killed by bolting on a linear-time
char
-level shrinker that runs afterwards:Earlier today, I was trying out a strategy to "zero-out" blocks of sqrt size with 'a's:
This scales better as the size goes up AFAICS:
Caveats:
char
shrinking pass if the input contains sufficiently few non-'a's, say O(sqrt |s|)edwintorok commentedon May 10, 2023
Perhaps the choice of char shrinker (or its depth) could be given to the end-user of the API (with a good default...), e.g. have the ability to turn off char shrinking completely, which I don't think is currently possible (if you leave it as none you get a default shrinker instead of no shrinker), although I'm not sure how to do that in a compatible way with the current API.
Unless there is a special Shrinker.none that the string shrinker detects, so it doesn't even need to explode the string into characters only to find out there is nothing to shrink...
Then perhaps some heuristics could be used based on the length of the string on what kind of char shrinker to use, e.g. we could give the shrinker a certain amount of "fuel", which it can spend as it best sees fit: either for shrinking the length of the string, or once that is exhausted for shrinking characters.
Sys.max_string_length of 2^57 means that if length shrinking is logarithmic a fuel of 57 (or some constant multiple of it) can shrink even the theoretically largest string you can ever create.
And whatever "fuel" is left over from shrinking the length can be spent on shrinking characters, which would allow a bit more exploration of the state space for short strings (where building all those variants is quicker).
Fuel would be consumed based on either spending CPU time processing an item or allocating.
So in that sense length shrinking would consume 'O(1)' fuel for allocation (it calls the allocation function once, for simplicity lets assume that allocating 1MiB is as cheap as allocating 1 byte, which might be even true if it all fits in the minor heap and allocated by the bump allocator), and 'O(n)' processing time, although O(1) function calls to perform that processing.
SImilarly exploding the string itself (regardless how many characters are shrunk) would be O(n) allocation initially, followed by potentially O(n) processing time, including O(n) shrinker calls.
There is a balance here on how to measure things, so for simplicity I propose to consume fuel in this way instead:
allocating an OCaml value consumes one unit of fuel, regardless of the allocation size.
So a string would always consume 1, and a list of N characters would consume N, but an array of N characters would consume just 1 again,
and an array of N non-integer OCaml values N+1 items, etc.
calling another shrinker would give the fuel to the shrinker and consume at least 1 unit on the invocation itself
calls to stdlib runtime functions may initially be considered to consume 0 fuel, and if needed some constant ratio vs calls to other shrinkers if it is determined experimentally that 0 is not good enough. E.g. String.sub that produces N characters vs calling N char shrinkers would need some measurements on what the actual ratios are, but String.sub is a lot cheaper than N individual char processing functions (e.g. at the very least it will do copying word-by-word, not char-by-char).
And the idea of 'fuel' is already part of how recursive data structures are generated, so why not use that idea for shrinking as well? (especially that in QCheck2 it looks like shrinking works on a tree-like structure and it can probably track fuel internally without having to change the 'set_shrink' API ...).
For QCheck version 1 fixing this in a compatible way might be more tricky (e.g.
?n:int -> 'a -> 'a Iter.t
) although I'm not entirely sure whether that would break existing code that wouldn't know about the fuel.Then small values would do more exhaustive shrinking, and large values (that cannot be further shrinked in length) would perform more limited shrinking.
What do you think?
jmid commentedon May 12, 2023
You can use
as I do above in the second plot.
Plots 2 and 3 were meant to illustrate the difference between shrinking strings with and without char/element reductions.
I also realize now that my message above was unclear:
I am measuring and reasoning about the complexity of the shrinkers in terms of the emitted shrinking candidates (the plots are printing a
'#'
for each candidate). This was not at all clear from my message in hinsight.From having spent time collecting the shrinker benchmarks in #177 and later trying to improve the QCheck(1) shrinker performance, allocations and processing aren't as dominating as the number of emitted shrinking candidates.
Also, O(log n) shrinking candidates just compose a lot better, when put inside a list, which is again inside tuples (or function graphs underlying the function generator and shrinker), ...
It sounds intriguing - but also ambitious!
I have to admit that I currently have limited time for QCheck work.
So it is on a by-need-basis - as here - when it turns out that too eager
string
shrinker is hurting us.