Skip to content

Robustness to control chars #205

@bittlingmayer

Description

@bittlingmayer

Certain input causes the error not well-formed (invalid token).

(That's the value of e in parser.on('error', e...).)

If we use bash's built-in xmllint, the error message is more revealing:

parser error : PCDATA invalid Char value 8

ASCII char 8 is of course a control char, Backspace.

Is this expected? Or is there an option to let the parser handle or ignore such segments?

Right now a large file can fail cryptically just because 1 or 2 segments in a million have this character which is like any ASCII not especially exotic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions