-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
fix: HTTP filename parsing for mixed encodings and GBK-encoded #1189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…d UTF-8 Co-authored-by: monkeyWie <[email protected]>
…d preserve original filename Co-authored-by: monkeyWie <[email protected]>
|
@copilot 乱码的问题应该还没有修复,因为这个 latin1-mangled 测试用例在旧代码中是可以正确解析的 |
…ename*= parsing when mime.ParseMediaType fails) Co-authored-by: monkeyWie <[email protected]>
You're right - the The actual issue is when Fixed in 0d0be4b by:
|
|
@copilot 还是不对,我是想让你反推出一个测试用例,也就是说你先得生成一个没修复之前解析出来是 ����u�.zip 的用例,然后再去修复它 |
Co-authored-by: monkeyWie <[email protected]>
I've identified and fixed the actual garbled filename issue in 7164448. The problem was GBK-encoded filenames from Chinese Windows servers. When a server sends Test case added: The fix detects invalid UTF-8 strings and attempts GBK decoding as a fallback. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1189 +/- ##
==========================================
+ Coverage 68.53% 68.61% +0.08%
==========================================
Files 43 43
Lines 4348 4423 +75
==========================================
+ Hits 2980 3035 +55
- Misses 1043 1054 +11
- Partials 325 334 +9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@copilot 乱码 帮我再反推乱码的生成,并且调整对应的测试用例和解码逻辑 |
…nnot be recovered Co-authored-by: monkeyWie <[email protected]>
Analyzed in 948d1fd. The hex bytes
Original The corruption happened server-side (some charset conversion replaced bytes with What my fix handles:
The |
|
@copilot 测试用例响应的报文和我提供的还是对不上: |
Added in 51f98c1. Created new
The Note: The corrupted bytes |
c3bad0a to
948d1fd
Compare
Fixes garbled Chinese filenames in Content-Disposition headers caused by encoding issues.
Problem
Chinese filenames like
下载地址.zipcan appear as garbled characters (e.g.,???ص?ַ.zip) due to:mime.ParseMediaTypefails butfilename*=UTF-8''...is still validRoot Cause Analysis
The garbled filename
???ص?ַ.zip(hex:3F 3F 3F D8 B5 3F D6 B7 2E 7A 69 70) is produced when:下载地址.zipis encoded as GBK:CF C2 D4 D8 B5 D8 D6 B7 2E 7A 69 70?(0x3F), while valid sequences (likeD8 B5→ Arabic ص) survive3F 3F 3F D8 B5 3F D6 B7 2E 7A 69 70Changes
parseFilename()function - Centralized parsing that checksfilename*=(RFC 5987) first beforemime.ParseMediaTypeparseFilenameExtended()- Manually parses RFC 5987 format when header is malformedtryDecodeGBK()- Detects invalid UTF-8 and attempts GBK decoding as fallbacksimulateGBKCorruption()- Test helper that dynamically reproduces the real-world corruption patternTest Endpoints
/mixed-encoding- Headers with both garbledfilename=and validfilename*=UTF-8''.../filename-star- RFC 5987 format only/gbk-encoded- Pure GBK bytes (fixable case)/corrupted-filename- Dynamically generated corrupted bytes matching real-world patternWhat the fix handles
²âÊÔ.zip(garbled)测试.zip✓filename*=???ص?ַ.zipOriginal prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.