The goal of this document is to explain how different parsers fit together. When will it backtrack? When will it not?
Say we have keyword "import":
| String | Result |
|---|---|
"import" |
OK{false} |
"imp" |
ERR{true} |
"export" |
ERR{true} |
In our OK{false} notation, we are indicating:
- Did the parser succeed?
OKif yes.ERRif not. - Is it possible to backtrack? So when
keywordsucceeds, backtracking is not allowed anymore. You must continue along that path.
Say we have map func parser:
parser |
Result |
|---|---|
OK{b} |
OK{b} |
ERR{b} |
ERR{b} |
So result of map func parser is always the same as the result of the parser itself.
Say we have map2 func parserA parserB:
parserA |
parserB |
Result |
|---|---|---|
OK{b} |
OK{b'} |
OK{b && b'} |
OK{b} |
ERR{b'} |
ERR{b && b'} |
ERR{b} |
ERR{b} |
If parserA succeeds, we try parserB. If they are both backtrackable, the combined result is backtrackable.
If parserA fails, that is our result.
This is used to define our pipeline operators like this:
(|.) a b = map2 (\keep ignore -> keep) a b
(|=) a b = map2 (\func arg -> func arg) a bSay we have either parserA parserB:
parserA |
parserB |
Result |
|---|---|---|
OK{b} |
OK{b} |
|
ERR{true} |
OK{b} |
OK{b} |
ERR{true} |
ERR{b} |
ERR{b} |
ERR{false} |
ERR{false} |
The 4th case is very important! If parserA is not backtrackable, you do not even try parserB.
The either function does not appear in the public API, but I used it here because it makes the rules a bit easier to read. In the public API, we have oneOf instead. You can think of oneOf as trying either the head of the list, or oneOf the parsers in the tail of the list.
Say we have andThen callback parserA where callback a produces parserB:
parserA |
parserB |
Result |
|---|---|---|
ERR{b} |
ERR{b} |
|
OK{b} |
OK{b'} |
OK{b && b'} |
OK{b} |
ERR{b'} |
ERR{b && b'} |
If both parts are backtrackable, the overall result is backtrackable.
Say we have backtrackable parser:
parser |
Result |
|---|---|
OK{b} |
OK{true} |
ERR{b} |
ERR{true} |
No matter how parser was defined, it is backtrackable now. This becomes very interesting when paired with oneOf. You can have one of the options start with a backtrackable segment, so even if you do start down that path, you can still try the next parser if something fails. This has important yet subtle implications on performance, so definitely read on!
This parser is intended to give you very precise control over backtracking behavior, and I think that is best explained through examples.
Say we have map2 func (backtrackable spaces) (symbol ",") which can eat a bunch of spaces followed by a comma. Here is how it would work on different strings:
| String | Result |
|---|---|
" ," |
OK{false} |
" :" |
ERR{true} |
"abc" |
ERR{true} |
Remember how map2 is backtrackable only if both parsers are backtrackable. So in the first case, the overall result is not backtrackable because symbol "," succeeded.
This becomes useful when paired with either!
Say we have the following parser definition:
parser : Parser (Maybe Int)
parser =
oneOf
[ succeed Just
|. backtrackable spaces
|. symbol ","
|. spaces
|= int
, succeed Nothing
|. spaces
|. symbol "]"
]Here is how it would work on different strings:
| String | Result |
|---|---|
" , 4" |
OK{false} |
" ," |
ERR{false} |
" , a" |
ERR{false} |
" ]" |
OK{false} |
" a" |
ERR{false} |
"abc" |
ERR{true} |
Some of these cases are tricky, so let's look at them in more depth:
" , a"—backtrackable spaces,symbol ",", andspacesall succeed. At that point we haveOK{false}. Theintparser then fails ona, so we finish withERR{false}. That meansoneOfwill NOT try the second possibility." ]"—backtrackable spacessucceeds, butsymbol ","fails. At that point we haveERR{true}, sooneOftries the second possibility. After backtracking,spacesandsymbol "]"succeed withOK{false}." a"—backtrackable spacessucceeds, butsymbol ","fails. At that point we haveERR{true}, sooneOftries the second possibility. After backtracking,spacessucceeds withOK{false}andsymbol "]"fails resulting inERR{false}.
Notice that in the previous example, we parsed spaces twice in some cases. This is inefficient, especially in large files with lots of whitespace. Backtracking is very inefficient in general though, so if you are interested in performance, it is worthwhile to try to eliminate as many uses of backtrackable as possible.
So we can rewrite that last example to never backtrack:
parser : Parser (Maybe Int)
parser =
succeed identity
|. spaces
|= oneOf
[ succeed Just
|. symbol ","
|. spaces
|= int
, succeed Nothing
|. symbol "]"
]Now we are guaranteed to consume the spaces only one time. After that, we decide if we are looking at a , or ], so we never backtrack and reparse things.
If you are strategic in shuffling parsers around, you can write parsers that do not need backtrackable at all. The resulting parsers are quite fast. They are essentially the same as LR(k) parsers, but more pleasant to write. I did this in Elm compiler for parsing Elm code, and it was very significantly faster.