-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve handling of non-ASCII filenames #26
Comments
The current Content-Disposition parser is very poor (see e.g. #26). Let the browser determine the file name since they are probably better at it.
I decided to roll my own Content-Disposition parser (6f3bbb8) because the library that you suggested is incomplete. The test case that you referenced from wget is not valid either, I opened a bug for that at https://savannah.gnu.org/bugs/index.php?52531 |
Thank you very much for the parser. It's useful and easy to understand. |
Here are some examples not working the same as vanilla Firefox.
The filename is
測試.txt
, while open-in-browser displays__.txt
. That's because RFC 6266 is not correctly implemented. The Content-Disposition line for this file is:According to RFC 6266:
%E6%B8%AC%E8%A9%A6.txt should be used here. That's exactly 測試.txt.
Similar bug reports and fixes:
By the way, from one of new test cases in wget's commit,
I bet correctly implement RFC 6266 is not something easy.
This website is misconfigured and return filenames in UTF-8 without quoting:
If I disabled the open-in-browser extension, Firefox uses
國立臺灣大學學生逕行修讀博士學位辦法1060609.pdf
as the filename, while open-in-browser says:That's because Firefox re-encodes the header with ISO-8859-1. I guess Firefox has some heuristic for recoding filenames back to UTF-8. In my PR for est31's version, I recode raw filenames back to UTF-8 unconditionally. I'm not sure if it's a good approach.
The text was updated successfully, but these errors were encountered: