Skip to content

Invalid encodings are not ignored #70

Open
@JanPetterMG

Description

@JanPetterMG

No errors/warnings should be generated when parsing, still I get these:

mb_internal_encoding(): Unknown encoding "OSF10020402" // valid, but not installed
mb_internal_encoding(): Unknown encoding "UTF9" // invalid
mb_internal_encoding(): Unknown encoding "ASCI" // invalid
mb_internal_encoding(): Unknown encoding "ISO8859" // invalid

Such typos / invalid encoding names isn't uncommon when parsing the HTTP header to detect the character encoding.

I think it's a good thing trying to convert everything to UTF-8, but according to the spec, the content is expected to be UTF-8, and any invalid content (due to parsing errors, non-valid rules, or else) shall be ignored without warnings/errors.

What we need is an custom error handler...

If a character encoding is used that results in characters being used which are not a subset of UTF-8, this may result in the contents of the file being parsed incorrectly.

Only valid records will be considered; all other content will be ignored. (...) only valid text lines will be taken into account, the rest will be discarded without warning or error.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions