Improve handling of non-ASCII filenames #26

yan12125 · 2017-11-24T17:25:50Z

Here are some examples not working the same as vanilla Firefox.

https://drive.google.com/file/d/0B7pIvhrJqP6xaGNkVldaeUpuRG8/view

The filename is 測試.txt, while open-in-browser displays __.txt. That's because RFC 6266 is not correctly implemented. The Content-Disposition line for this file is:

attachment;filename="__.txt";filename*=UTF-8''%E6%B8%AC%E8%A9%A6.txt

According to RFC 6266:

when both "filename" and "filename*" are present in a single header field value, recipients SHOULD pick "filename*" and ignore "filename".

%E6%B8%AC%E8%A9%A6.txt should be used here. That's exactly 測試.txt.

Similar bug reports and fixes:

for est31's version: I use an external implementation https://github.com/jshttp/content-disposition/ in my PR
for wget: the bug report and the fix.

By the way, from one of new test cases in wget's commit,

"filename**0=\"A\"; filename**1=\"A.ext\"; filename*0=\"B\";filename*1=\"B\"", "AA.ext"

I bet correctly implement RFC 6266 is not something easy.

https://www.csie.ntu.edu.tw/download.php?filename=13101_7da5e585.pdf&dir=news&title=%E5%9C%8B%E7%AB%8B%E8%87%BA%E7%81%A3%E5%A4%A7%E5%AD%B8%E5%AD%B8%E7%94%9F%E9%80%95%E8%A1%8C%E4%BF%AE%E8%AE%80%E5%8D%9A%E5%A3%AB%E5%AD%B8%E4%BD%8D%E8%BE%A6%E6%B3%951060609

This website is misconfigured and return filenames in UTF-8 without quoting:

attachment; filename=國立臺灣大學學生逕行修讀博士學位辦法1060609.pdf

If I disabled the open-in-browser extension, Firefox uses 國立臺灣大學學生逕行修讀博士學位辦法1060609.pdf as the filename, while open-in-browser says:

åç«èºç£å¤§å¸å¸çéè¡ä¿®è®åå£«å¸ä½è¾¦æ³1060609.pdf

That's because Firefox re-encodes the header with ISO-8859-1. I guess Firefox has some heuristic for recoding filenames back to UTF-8. In my PR for est31's version, I recode raw filenames back to UTF-8 unconditionally. I'm not sure if it's a good approach.

The text was updated successfully, but these errors were encountered:

The current Content-Disposition parser is very poor (see e.g. #26). Let the browser determine the file name since they are probably better at it.

- Recognize non-ASCII file names (#26) - Parse Content-Disposition according to RFC 2047, 2231, 5987, 6266 - Fix default action on Linux/macOS when pressing Enter (#27) - Work around issue that prematurely closed the dialog (#28)

Rob--W · 2017-11-27T16:26:21Z

I decided to roll my own Content-Disposition parser (6f3bbb8) because the library that you suggested is incomplete.
In particular it does not support RFC 2047 (which is obsolete but still supported in Firefox), and also lacks support for parameter continuations (jshttp/content-disposition#2).

The test case that you referenced from wget is not valid either, I opened a bug for that at https://savannah.gnu.org/bugs/index.php?52531

yan12125 · 2017-11-27T17:20:45Z

Thank you very much for the parser. It's useful and easy to understand.

Rob--W added a commit that referenced this issue Nov 27, 2017

Minimize changes to Content-Disposition response header

2cc8b9c

The current Content-Disposition parser is very poor (see e.g. #26). Let the browser determine the file name since they are probably better at it.

Rob--W closed this as completed in 6f3bbb8 Nov 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve handling of non-ASCII filenames #26

Improve handling of non-ASCII filenames #26

yan12125 commented Nov 24, 2017

Rob--W commented Nov 27, 2017

yan12125 commented Nov 27, 2017

Improve handling of non-ASCII filenames #26

Improve handling of non-ASCII filenames #26

Comments

yan12125 commented Nov 24, 2017

Rob--W commented Nov 27, 2017

yan12125 commented Nov 27, 2017