-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/text/encoding/traditionalchinese: Garbled text found in encoding output file with traditional chinese #43581
Comments
CC @mpvl |
Hi @mengzhuo and the Go team, I’m currently experiencing the same issue regarding garbled text when encoding Traditional Chinese characters using golang.org/x/text/encoding/traditionalchinese. Specifically, characters like “包” are not being encoded correctly, resulting in unexpected characters such as “?” in the output. Is there an ongoing effort to separate the encodings into Big5 and Big5-HKSCS as initially suggested? Additionally, are there any workarounds or recommended practices in the meantime to ensure accurate encoding of Traditional Chinese characters? Thank you for your time and assistance. |
Hi @a00012025 要不要試一下我改的 https://github.com/huyungtang/text! 之前我修改了 encoding/traditionalchinese/maketables.go,將Big5 分拆為 Big5 與 Big5HK, |
@huyungtang 非常感謝 🙏 我來試試看! |
FYI the CL text/397534 require some works to be merge. |
What version of Go are you using (
go version
)?go version go1.15.6 darwin/amd64
Does this issue reproduce with the latest release?
1.15.6 is the latest stable release
What operating system and processor architecture are you using (
go env
)?This has nothing to do with the environment
What did you do?
Using golang.org/x/text/encoding/traditionalchinese to encoding text & writing chinese to a file.
Then opening the output file with encoding "Tradition Chinese (Big5) cp950" in Visual Studio Code,
garbled text found. Re-open with "Tradition Chinese (Big5-HKSCS) big5hkscs" to see the normal text.
I found some duplicate records in the source file of "tables.go".
===== http://encoding.spec.whatwg.org/index-big5.txt =====
8007 0x5A77 婷 (<CJK Ideograph>) <-- Big5
19240 0x5A77 婷 (<CJK Ideograph>) <-- Big5HKSCS
8616 0x745C 瑜 (<CJK Ideograph>) <-- Big5
19672 0x745C 瑜 (<CJK Ideograph>) <-- Big5HKSCS
Cloud you please separate the encoding "traditionalchinese" into two different encodings "Big5" & "Big5-HKSCS"?
The text was updated successfully, but these errors were encountered: