Skip to content

explore: removing or overhauling the EncodingReader #2513

@flavorjones

Description

@flavorjones

The Nokogiri::HTML4::EncodingReader class is used to try to detect encoding of HTML4 documents when they have ambiguous encoding.

Recently, a REDOS vulnerability was found in this code. There are other regular expressions which should be vetted; and we should explore replacing some of those regexes with simpler calls like String#include?.

This class was written during a time (Ruby 1.9) when Ruby strings were encoded as ASCII-8BIT by default. This hasn't been true since (I think) Ruby 2.0, and so this complexity may only be for an edge case that we no longer need to support; and so maybe we can remove the entire class thereby simplifying both CRuby and JRuby implementations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions