Invalid encodings are not ignored

No errors/warnings should be generated when parsing, still I get these:

```
mb_internal_encoding(): Unknown encoding "OSF10020402" // valid, but not installed
mb_internal_encoding(): Unknown encoding "UTF9" // invalid
mb_internal_encoding(): Unknown encoding "ASCI" // invalid
mb_internal_encoding(): Unknown encoding "ISO8859" // invalid
```

Such typos / invalid encoding names isn't uncommon when parsing the HTTP header to detect the character encoding. 

I think it's a good thing trying to convert everything to UTF-8, but according to the spec, the content is expected to be UTF-8, and any invalid content (due to parsing errors, non-valid rules, or else) shall be ignored without warnings/errors.

What we need is an custom error handler...

> If a character encoding is used that results in characters being used which are not a subset of UTF-8, this may result in the contents of the file being parsed incorrectly.
> 
> Only valid records will be considered; all other content will be ignored. (...) only valid text lines will be taken into account, the rest will be discarded without warning or error.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Invalid encodings are not ignored #70

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Invalid encodings are not ignored #70

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions