Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPACING characters for separating THAI sentences are MISSING. #221

Open
1 task
hiohlan opened this issue Jul 30, 2024 · 5 comments
Open
1 task

SPACING characters for separating THAI sentences are MISSING. #221

hiohlan opened this issue Jul 30, 2024 · 5 comments
Labels
bug Something isn't working i18n Internationalization-related issue そのうちやる planned or in progress

Comments

@hiohlan
Copy link

hiohlan commented Jul 30, 2024

💡 Summary

Since Thai can omit punctuation marks, sentence separation relies on spacing and context (even commas are rarely used unless absolutely necessary).

According to this attached picture.
Crowdin vs Actual vs Expect

When I translate in Crowdin, it splits sentences normally. However, when exporting to the live website, some spacing characters are lost. I'm not sure what the cause of this problem is.

🥰 Expected Behavior

There should be Thai sentence spacing characters as shown in Crowdin.
E.g. ไม่ การพัฒนา Misskey ดำเนินการโดยบุคคลทั่วไป (~ No, Misskey development is carried out by individuals.)
(The spacing characters I’m referring to are normal ASCII spaces (U+0020).)

🤬 Actual Behavior

Spacing characters are missing, causing sentences to be unintentionally jumbled and resulting in translation errors.
E.g. ไม่การพัฒนา Misskey ดำเนินการโดยบุคคลทั่วไป (~ Misskey development is not carried out by individuals.)

From what I've observed, there seems to be a problem on almost every Thai translated page.

📝 Steps to Reproduce

  1. Open the Thai translation file for about-misskey.md in Crowdin.
  2. Open the rendered Thai translated of About Misskey page or generated MD file in this repo.
  3. Notice that the content does not match.

💻 Environment

(Not related to this problem)

(For developer) Do you want to address this bug yourself?

  • Yes, I will patch the bug myself and send a pull request
@hiohlan hiohlan added bug? Maybe it's a bug maybe non-developer May be reported by those who are not familiar with the technical aspects labels Jul 30, 2024
@kakkokari-gtyih kakkokari-gtyih added bug Something isn't working upstream Dependencies-related issue i18n Internationalization-related issue and removed bug? Maybe it's a bug maybe non-developer May be reported by those who are not familiar with the technical aspects labels Jul 31, 2024
@kakkokari-gtyih
Copy link
Collaborator

kakkokari-gtyih commented Jul 31, 2024

This is due to the fact that Japanese sentences do not include a space after punctuation marks. Crowdin is unable to handle this properly, and spaces are no longer inserted between sentences.
For languages where punctuation is clearly identifiable (languages with punctuation), spaces are automatically inserted inside Misskey Hub's transformation plugin, but when there is no punctuation, it is extremely difficult for Misskey Hub to handle as we can't easily identify where the end of the sentence.

I'll share this problem with Crowdin

@kakkokari-gtyih
Copy link
Collaborator

@hiohlan
Copy link
Author

hiohlan commented Jul 31, 2024

According to the content in the link you sent.
Even though the original Japanese text is modified to include a space character at the end of the sentence in Crowdin, it can only warn the translator that they must add the space character at the end of the sentence themselves.

So the solution is that I have to add the spaces at the end of the sentence myself, right?

However, if there is no modification to the original Japanese text, how should we inform other Thai translators to add a space character at the end of the sentence or enforce the use of punctuation marks?

@kakkokari-gtyih
Copy link
Collaborator

kakkokari-gtyih commented Jul 31, 2024

So the solution is that I have to add the spaces at the end of the sentence myself, right?

No, that will cause unintended consequences in the future. We are planning to go through the second option (add space internally when import to Crowdin)

@hiohlan
Copy link
Author

hiohlan commented Jul 31, 2024

Noted, thank you for your hard work.

@kakkokari-gtyih kakkokari-gtyih added そのうちやる planned or in progress and removed upstream Dependencies-related issue labels Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working i18n Internationalization-related issue そのうちやる planned or in progress
Projects
None yet
Development

No branches or pull requests

2 participants