-
Notifications
You must be signed in to change notification settings - Fork 0
Description
lxml.etree.tostring
with pretty_print=True
has this caveat:
If lxml cannot distinguish between whitespace and data, it will not alter your data. Whitespace is therefore only added between nodes that do not contain data. This is always the case for trees constructed element-by-element, so no problems should be expected here. For parsed trees, a good way to assure that no conflicting whitespace is left in the tree is the
remove_blank_text
option [...]
Now instantiating a delb.Document
with the collapse_whitespace
flag somewhat feels like it should do away with whitespaces in a way that makes the parsed XML suitable for custom formatting, e.g. calling:
lxml.etree.tostring(document.root._etree_obj, pretty_print=True)
...or something like this. However, in order to be able to pretty print delb content, it is still necessary to use a custom parser on instantiation, e.g.
document = Document(source, parser=etree.XMLParser(remove_blank_text=True))
...in which case the collapse_whitespace
flag of the Document
constructor isn't even relevant.
I feel like wanting to pretty-print delb objects as a usecase is somewhat justified (I needed it today in order to simplify a test), and think that this behaviour is somewhat obscured right now and should at least be documented in some way. But maybe this could even be handled in a more user-friendly way. Is there a point in using delb.Document
with collapse_whitespace
without an lxml
parser that also removes whitespace or could the use of such a parser perhaps be implied by collapse_whitespace
in general?
Should TagNode
have a tostring
method with an optional pretty_print
flag as well?