Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLDR-18065 site: typo fixes #4164

Merged
merged 1 commit into from
Oct 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ Issues with current EOR rules:
1. The ignoring rules for currency etc. should be filtered out in the CLDR context. ( Mark, John, Åke)
2. The rule for U+029F SMALL CAPITAL L is missing (typo in standard). ( Åke )
3. There are relevant comments by Kent Karlsson in ticket #[763](http://unicode.org/cldr/trac/ticket/763) (2010-10-27), with a modified proposal
1. --- \⃩(\⃩ = [U+20E9](http://unicode.org/cldr/utility/character.jsp?a=20E9) ( ⃩ ) COMBINING WIDE BRIDGE ABOVE) is the (currently) weightiest, at level 2, non-letter general purpose combining mark
2. --- \⃩ is used in the proposal to make all "variants" come after all single-accented versions of letters
1. --- ⃩(⃩ = [U+20E9](http://unicode.org/cldr/utility/character.jsp?a=20E9) ( ⃩ ) COMBINING WIDE BRIDGE ABOVE) is the (currently) weightiest, at level 2, non-letter general purpose combining mark
2. --- ⃩ is used in the proposal to make all "variants" come after all single-accented versions of letters
3. --- resetting to just A, B, etc. would make variant versions come before accented versions
4. ( Åke ) The current reset rules work fine with MimerSQL, but I think you must check the ICU behaviour. Kent might have a vital point here.
5. (Kent) (digraphs) ----tertiary difference in DUCET; keep it that way
Expand Down
14 changes: 8 additions & 6 deletions docs/site/index/cldr-spec/transliteration-guidelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Transliteration is the general process of converting characters from one script
Transliteration is *not* translation. Rather, transliteration is the conversion of letters from one script to another without translating the underlying words. The following shows a sample of transliteration systems:

Sample Transliteration Systems

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| Source | Translation | Transliteration | System |
|:---:|:---:|:---:|:---:|
| Αλφαβητικός | Alphabetic | Alphabētikós | Classic |
Expand All @@ -32,6 +33,7 @@ While an English speaker may not recognize that the Japanese word kyanpasu is eq
- When a service engineer is sent a program dump that is filled with characters from foreign scripts, it is much easier to diagnose the problem when the text is transliterated and the service engineer can recognize the characters.

Sample Transliterations

| Source | Transliteration |
|---|---|
| 김, 국삼 | Gim, Gugsam |
Expand Down Expand Up @@ -322,7 +324,7 @@ If you are interested in providing transliterations for one or more scripts, fil

For submission to CLDR, the data needs to supplied in the correct XML format or in the ICU format, and should follow an accepted standard (like UNGEGN, BGN, or others).

- The format for rules is specified in [Transform\_Rules](http://www.unicode.org/reports/tr35/#Transform_Rules). It is best if the results are tested using the [ICU Transform Demo](https://icu4c-demos.unicode.org/icu-bin/translit) first, since if the data doesn't validate it would not be accepted into CLDR.
- The format for rules is specified in [Transform\_Rules](https://www.unicode.org/reports/tr35/#Transform_Rules). It is best if the results are tested using the [ICU Transform Demo](https://icu4c-demos.unicode.org/icu-bin/translit) first, since if the data doesn't validate it would not be accepted into CLDR.
- As mentioned above, even if a transliteration is only used in certain countries or contexts CLDR can provide for them with different variant tags.
- For comparison, you can see what is currently in CLDR in the [transforms]() folder online. For example, see [Hebrew\-Latin.xml]().
- Script transliterators should cover every character in the exemplar sets for the CLDR locales using that script.
Expand All @@ -331,10 +333,10 @@ For submission to CLDR, the data needs to supplied in the correct XML format or

| Shavian | Relation | Latin | Comments |
|:---:|:---:|:---:|---|
| \𐑐 | ↔ | p | Map all uppercase to lowercase first |
| \𐑚 | ↔ | b | |
| \𐑑 | ↔ | t | |
| \𐑒\𐑕 | ← | x | fallback |
| 𐑐 | ↔ | p | Map all uppercase to lowercase first |
| 𐑚 | ↔ | b | |
| 𐑑 | ↔ | t | |
| 𐑒𐑕 | ← | x | fallback |
| ... | | | |

## More Information
Expand All @@ -349,5 +351,5 @@ For more information, see:
- [ISO\-15915 (Gujarati)](http://transliteration.eki.ee/pdf/Gujarati.pdf)
- [ISO\-15915 (Kannada)](http://transliteration.eki.ee/pdf/Kannada.pdf)
- [ISCII\-91](http://www.cdacindia.com/html/gist/down/iscii_d.asp)
- [UTS \#35: Locale Data Markup Language (LDML)](http://www.unicode.org/reports/tr35/)
- [UTS \#35: Locale Data Markup Language (LDML)](https://www.unicode.org/reports/tr35/)

Loading