Releases: zesterer/chumsky
0.10.0
The 0.10.0 release of chumsky is a from-scratch rewrite of the crate based on the work that's been ongoing for several years in the 1.0.0 alpha builds. The release of 0.10.0 is, in many ways, a concession to a few inconvenient facts:
- Users were rightly complaining that the latest stable release of chumsky (i.e: that which docs.rs shows by default) was the 0.9.x release, despite the fact that we've been recommending that new users use 1.0.0 alpha builds for a long time now. Many users have accidentally tried to run code taken from the 1.0.0 examples, only to find that they don't work with 0.9.
- 1.0.0 has been in development for a long time, and there are still some breaking changes left to make (although we're getting closer!). Users wants to be able to pull in a stable version and start working productively, and the existing situation was becoming cumbersome.
- Although 1.0.0 is not yet finished, it is moving closer and closer to its final form. It is unlikely that much about the surface API will change between 0.10.0 and 1.0.0, although some changes left to be made are technically breaking.
We recommend that users of chumsky depend on 0.10.0, if they can. Despite that, there are a few things to consider:
- (This has now been resolved) Not all documentation has been properly updated yet. Some docs still reference 0.9.x concepts or are not yet complete.
- Some features are still in a state of partial completeness. We don't anticipate significant breakage going forwards, but some features are explicitly in need of more work and future 0.x releases will address them.
Any help from the community to assist in resolving these points is greatly appreciated!
Here follows the changelog for 0.10.0. I also wrote up an informal migration guide.
Added
- Support for zero-copy parsing (i.e: parser outputs that hold references to the parser input)
- Support for parsing nested inputs like token trees
- Support for parsing context-sensitive grammars such as Python-style indentation, Rust-style raw strings, and much
more - Support for parsing by graphemes as well as unicode codepoints
- Support for caching parsers independent of the lifetime of the parser
- A new trait,
IterParser
, that allows expressing parsers that generate many outputs - Added the ability to collect iterable parsers into fixed-size arrays, along with a plethora of other container types
- Support for manipulating shared state during parsing, elegantly allowing support for arena allocators, cstrees,
interners, and much more - Support for a vast array of new input types: slices, strings, arrays,
impl Read
ers, iterators, etc. - Experimental support for memoization, allowing chumsky to parse left-recursive grammars and reducing the
computational complexity of parsing certain grammars - An extension API, allowing third-party crates to extend chumsky's capabilities and introduce new combinators
- A
pratt
parser combinator, allowing for conveniently and simply creating expression parsers with precise operator
precedence - A
regex
combinator, allowing the parsing of terms based on a specific regex pattern - Properly differentiated ASCII and Unicode text parsers
Removed
Parser::then_with
has been removed in favour of the new context-sensitive combinators
Changed
- Performance has radically improved
- Error generation and handling is now significantly more flexible
1.0.0-alpha.0 (zero-copy)
This is the first released version of chumsky
's 'zero-copy' rewrite.
This release has no precise changelog, although one will be added when 1.0.0
is eventually released in full form. For now, things are still fluctuating enough that a full changelog would inevitably be out of date in a few weeks.
Thanks
This release has been over a year in development and represents the work of a lot of people. In particular:
- @CraftSpider, who effectively co-developed the rewrite with me and came up with large chunks of the core API
- @wackbyte, who ported many combinators over to the new codebase as well as adding no_std support
- @bew, who reworked many combinators around changes to the core API
- @Zij-IT, who ported all of the text combinators across, as well as the (yet to be merged) pratt parser combinator created by @alvra
- Many other contributors who worked on smaller items
How you can help
This is the first alpha release. Do not expect a finished product: many minor API features are still incomplete, missing, or subject to change. Some documentation is incomplete, or still refers to concepts from past versions of the crate. In particular, the tutorial has not yet been updated. You may experience bugs, API footguns, and more issues besides. That said, we're releasing this version because we believe the core of the rewrite is ready to be exposed to users and we want to find out what problems there are and catch them before a full release.
We'd like folks to open issues if they find:
- Bugs
- API oddities (things that don't look/feel right, or could be expressed more neatly)
- Things that feel like they should work, but don't (lifetime issues, unnecessary cloning, etc.)
- Missing features
If you're an even awesome-er sort of person and you feel like contributing to the crate, there's still a lot of work that needs doing in the following areas:
- Documentation
- Writing/updating examples
- Filling API 'holes'
- Porting old APIs over
- Small improvements to existing combinators
- Writing tests
- API design: there's still work to be done on the context-sensitivity, recovery, and iterable parser APIs
All this aside, you'd be helping us out a bunch just by using this alpha release (especially porting existing chumsky parsers over to it) and telling us how you got on: what worked, what didn't work, what things you got stuck on or confused by, etc. If you'd like to give a more casual report like this, feel free to start a discussion.
What's new?
Needless to say, the crate has received a substantial upgrade, overhauling virtually every aspect of its API. It's substantially more capable than it ever was, and now supports the following:
- Zero-copy parsing: parser outputs can hold references to the input
- Nested parsing: parsers can handle nested data structures like token trees
- Stateful parsing: parsers can be parameterised by state, allowing for the natural integration of arena allocators, string interners, etc.
- Memoisation: parsers can opt into memoisation, allowing you to quickly parse awkward grammars that would normally produce exponential behaviour in a traditional recursive descent parser
- Left recursion: the aforementioned memoisation feature can also properly handle left recursive grammars elegantly
- Context-sensitive parsing: parsers can use built-in context sensitivity to carefully parameterise future parsers, allowing you to parse things like Rust-style raw strings, Pythonic indentation, and other context-sensitive syntax that context-free parsers traditionally struggle with
- Iterable parsers: parsers that produce multiple outputs can now be turned into iterators, similar to
logos
lexers
Performance
On top of all of that, we've worked really hard to push performance as far as we can using an innovative use of Generic Associated Types (GATs) internally that allows chumsky
to automatically detect when an output is never used (such as with .then_ignore(...)
) and avoid generating it all in the first place. You can find some technical details about this approach in Niko Matsakis' blog where they discuss chumsky
.
Our work on performance has paid off: chumsky
's JSON benchmark is now extremely competitive, beating out nom
and others, and even banging on the door of more traditional hand-written JSON parsers.
In general, you can probably expect this new release to be several times faster than older releases for similar parsers. The JSON benchmark is about 12x faster.
Conclusion
Pushing zero-copy to the point of a release was always going to be a very long road to walk, but we're finally approaching the end. Thanks for using chumsky, and - if you're fortunate enough to have the resources and kind enough to consider donating them - please support the other contributors I listed at the top of this release!
0.9.0
Added
- A
spill-stack
feature that usesstacker
to avoid stack overflow errors for deeply recursive parsers - The ability to access the token span when using
select!
likeselect! { |span| Token::Num(x) => (x, span) }
- Added a
skip_parser
recovery strategy that allows you to implement your own recovery strategies in terms of other
parsers. For example,.recover_with(skip_parser(take_until(just(';'))))
skips tokens until after the next semicolon - A
not
combinator that consumes a single token if it is not the start of a given pattern. For example,
just("\\n").or(just('"')).not()
matches anychar
that is not either the final quote of a string, and is not the
start of a newline escape sequence - A
semantic_indentation
parser for parsing indentation-sensitive languages. Note that this is likely to be
deprecated/removed in the future in favour of a more powerful solution #[must_use]
attribute for parsers to ensure that they're not accidentally created without being usedOption<Vec<T>>
andVec<Option<T>>
now implementChain<T>
andOption<String>
implementsChain<char>
choice
now supports both arrays and vectors of parsers in addition to tuples- The
Simple
error type now implementsEq
Changed
text::whitespace
returns aRepeated
instead of animpl Parser
, allowing you to call methods likeat_least
and
exactly
on it.- Improved
no_std
support - Improved examples and documentation
- Use zero-width spans for EoI by default
- Don't allow defining a recursive parser more than once
- Various minor bug fixes
- Improved
Display
implementations for various built-in error types andSimpleReason
- Use an
OrderedContainer
trait to avoid unexpected behaviour for unordered containers in combination withjust
Fixed
- Made several parsers (
todo
,unwrapped
, etc.) more useful by reporting the parser's location on panic - Boxing a parser that is already boxed just gives you the original parser to avoid double indirection
- Improved compilation speeds
0.8
Added
then_with
combinator to allow limited support for parsing nested patterns- impl From<&[T; N]> for Stream
SkipUntil/SkipThenRetryUntil::skip_start/consume_end
for more precise control over skip-based recovery
Changed
- Allowed
Validate
to map the output type - Switched to zero-size End Of Input spans for default implementations of
Stream
- Made
delimited_by
take combinators instead of specific tokens - Minor optimisations
- Documentation improvements
Fixed
- Compilation error with
--no-default-features
- Made default behaviour of
skip_until
more sensible
0.7
Added
-
A new tutorial to help new users
-
select
macro, a wrapper overfilter_map
that makes extracting data from specific tokens easy -
choice
parser, a better alternative to longor
chains (which sometimes have poor compilation performance) -
todo
parser, that panics when used (but not when created) (akin to Rust'stodo!
macro, but for parsers) -
keyword
parser, that parses exact identifiers -
from_str
combinator to allow converting a pattern to a value inline, usingstd::str::FromStr
-
unwrapped
combinator, to automatically unwrap an output value inline -
rewind
combinator, that allows reverting the input stream on success. It's most useful when requiring that a
pattern is followed by some terminating pattern without the first parser greedily consuming it -
map_err_with_span
combinator, to allow fetching the span of the input that was parsed by a parser before an error
was encountered -
or_else
combinator, to allow processing and potentially recovering from a parser error -
SeparatedBy::at_most
to require that a separated pattern appear at most a specific number of times -
SeparatedBy::exactly
to require that a separated pattern be repeated exactly a specific number of times -
Repeated::exactly
to require that a pattern be repeated exactly a specific number of times -
More trait implementations for various things, making the crate more useful
Changed
- Made
just
,one_of
, andnone_of
significant more useful. They can now accept strings, arrays, slices, vectors,
sets, or just single tokens as before - Added the return type of each parser to its documentation
- More explicit documentation of parser behaviour
- More doc examples
- Deprecated
seq
(just
has been generalised and can now be used to parse specific input sequences) - Sealed the
Character
trait so that future changes are not breaking - Sealed the
Chain
trait and made it more powerful - Moved trait constraints on
Parser
to where clauses for improved readability
Fixed
- Fixed a subtle bug that allowed
separated_by
to parse an extra trailing separator when it shouldn't - Filled a 'hole' in the
Error
trait's API that conflated a lack of expected tokens with expectation of end of input - Made recursive parsers use weak reference-counting to avoid memory leaks