xh_scanner loses data when tag name or attribute is too long

I was debugging https://github.com/browsermt/bergamot-translator/issues/273 when I noticed that xh_scanner does test for MAX_TOKEN_SIZE everywhere it adds characters to buffer, but does not call `push_back(c)` if the limit is hit. As a result, if any of the for-loops that add characters to its internal buffers do hit that limit, a character may be lost.

I think this only affects CDATA sections, comments, attribute values and tag names. So for the main use case of warc2text there is little impact for this bug.

Edit: Thinking about it, it would only affect the tag filters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xh_scanner loses data when tag name or attribute is too long #32

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

xh_scanner loses data when tag name or attribute is too long #32

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions