-
Notifications
You must be signed in to change notification settings - Fork 260
Description
My understanding is that quick-xml supports reading XML but does not check for all well-formedness errors. It reports a whole bunch of them, but not everything. Since I'm interested in well-formedness I'm curious whether we could have a reader layered over the existing ones that does validate for well-formedness.
I'd like to make an inventory of what's missing:
- while
IllFormedError::MissingEndTag
exists it is not actually produced by the reader normally, only ifread_to_end
is explicitly called. - putting illegal stuff on the top level such as multiple elements, text nodes, etc. Note that when dealing with XML fragments (as implied here, "fragment" has multiple meanings) it's possible to have multiple elements and text nodes on the top.
- having a declaration without content is currently accepted
What other aspects of well-formedness did I miss that quick-xml currently does not check for? There are a whole cluster of them around entities, but I'm okay with ignoring DTDs entirely.
To implement checking whether tags are balanced, some kind of stack of which tags have been opened needs to be maintained. When considering how to do this efficiently I noticed that the internal ReaderState
appears to maintain an efficient structure to track which elements have been started but not ended yet. But I don't think that's exposed to the outside world, is it? Could this indeed be useful for this?
Am I correct that quick-xml is pretty close in providing all the pieces already?