-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding detection fix #788
base: master
Are you sure you want to change the base?
Encoding detection fix #788
Conversation
… go official library for encoding detection in place of it.
_, nameOfEncoding, _ := charset.DetermineEncoding(r.Body, contentType) //name of charset/encoding | ||
contentType = "text/plain; charset=" + nameOfEncoding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
charset.DetermineEncoding
returns the detected encoding, so I think this entire function can be greatly simplified and largely replaced with rougly this:
enc, _, certain := charset.DetermineEncoding(r.Body, contentType)
if !certain && !detectCharset {
return nil
}
var err error
r.Body, err = ioutil.ReadAll(enc.NewDecoder().Reader(bytes.NewReader(r.Body)))
return err
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This needs a couple of changes, though (see above).
And please clean up your commit history (or I will do it myself before the merge)
Connected to issue #777 "HTML encoding is not autodetected properly". I removed the current gocolly encoding detection, which through tests showed to be unreliable when detecting Cyrillic encodings, and in place of it used the built-in function DetermineEncoding from the charset package.