Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/text/encoding/traditionalchinese: Garbled text found in encoding output file with traditional chinese #43581

Open
huyungtang opened this issue Jan 8, 2021 · 5 comments · May be fixed by golang/text#31
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@huyungtang
Copy link

huyungtang commented Jan 8, 2021

What version of Go are you using (go version)?

go version go1.15.6 darwin/amd64

Does this issue reproduce with the latest release?

1.15.6 is the latest stable release

What operating system and processor architecture are you using (go env)?

This has nothing to do with the environment

What did you do?

Using golang.org/x/text/encoding/traditionalchinese to encoding text & writing chinese to a file.
Then opening the output file with encoding "Tradition Chinese (Big5) cp950" in Visual Studio Code,
garbled text found. Re-open with "Tradition Chinese (Big5-HKSCS) big5hkscs" to see the normal text.

I found some duplicate records in the source file of "tables.go".

===== http://encoding.spec.whatwg.org/index-big5.txt =====
8007 0x5A77 婷 (<CJK Ideograph>) <-- Big5
19240 0x5A77 婷 (<CJK Ideograph>) <-- Big5HKSCS

8616 0x745C 瑜 (<CJK Ideograph>) <-- Big5
19672 0x745C 瑜 (<CJK Ideograph>) <-- Big5HKSCS

Cloud you please separate the encoding "traditionalchinese" into two different encodings "Big5" & "Big5-HKSCS"?

@mengzhuo
Copy link
Contributor

mengzhuo commented Jan 8, 2021

CC @mpvl

@toothrot toothrot changed the title Garbled text found in encoding output file with traditional chinese x/text/encoding/traditionalchinese: Garbled text found in encoding output file with traditional chinese Jan 8, 2021
@gopherbot gopherbot added this to the Unreleased milestone Jan 8, 2021
@toothrot toothrot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 8, 2021
@a00012025
Copy link

Hi @mengzhuo and the Go team,

I’m currently experiencing the same issue regarding garbled text when encoding Traditional Chinese characters using golang.org/x/text/encoding/traditionalchinese. Specifically, characters like “包” are not being encoded correctly, resulting in unexpected characters such as “?” in the output.

Is there an ongoing effort to separate the encodings into Big5 and Big5-HKSCS as initially suggested? Additionally, are there any workarounds or recommended practices in the meantime to ensure accurate encoding of Traditional Chinese characters?

Thank you for your time and assistance.

@huyungtang
Copy link
Author

huyungtang commented Sep 16, 2024

Hi @mengzhuo and the Go team,

I’m currently experiencing the same issue regarding garbled text when encoding Traditional Chinese characters using golang.org/x/text/encoding/traditionalchinese. Specifically, characters like “包” are not being encoded correctly, resulting in unexpected characters such as “?” in the output.

Is there an ongoing effort to separate the encodings into Big5 and Big5-HKSCS as initially suggested? Additionally, are there any workarounds or recommended practices in the meantime to ensure accurate encoding of Traditional Chinese characters?

Thank you for your time and assistance.

Hi @a00012025

要不要試一下我改的 https://github.com/huyungtang/text
將 golang.org/x/text 改指到這個倉庫的路徑即可使用;PR 已發出許久,沒下落前我是這麼使用的。

之前我修改了 encoding/traditionalchinese/maketables.go,將Big5 分拆為 Big5 與 Big5HK,
主要是將原本的 Big5 改命名為 Big5HK,另外生成 Big5 做為台灣繁中使用;於生成台灣繁中時,
僅略過香港繁中裡重覆的文字,未做其它變更。

@a00012025
Copy link

@huyungtang 非常感謝 🙏 我來試試看!

@mengzhuo
Copy link
Contributor

FYI the CL text/397534 require some works to be merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants