You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Extracted a list of Mandarin transliteration from the Hong Kong dataset, and created a toneless pinyin map for testing.
Spacing
Hanyu Pinyin (zho_Hani2Latn_GCH_1979) has detailed rules on word segmentation. These rules have not yet been implemented. Whether a space is needed depends on a number of factors, and cannot be handled by mapping rules alone. For example, these place names below all contain the character "灣", but only the first and third rows below are transliterated as one word.
A separate parsing layer may be needed in order to handle the insertion of space (related to #44 ).
Syllable separator for zero-onset syllables
Syllables begin with a, o, and e should be preceded by a syllable separator ’ unless it is the first syllable of a word, e.g. 西安 Xi’an.
Hong Kong specific reading
涌: Chong
仔: Zai
咀: Zui (<嘴)
Mandarin transliteration in HK data
(Originally in #39)
Extracted a list of Mandarin transliteration from the Hong Kong dataset, and created a toneless pinyin map for testing.
Hanyu Pinyin (zho_Hani2Latn_GCH_1979) has detailed rules on word segmentation. These rules have not yet been implemented. Whether a space is needed depends on a number of factors, and cannot be handled by mapping rules alone. For example, these place names below all contain the character "灣", but only the first and third rows below are transliterated as one word.
A separate parsing layer may be needed in order to handle the insertion of space (related to #44 ).
Syllable separator for zero-onset syllables
Syllables begin with a, o, and e should be preceded by a syllable separator
’
unless it is the first syllable of a word, e.g. 西安 Xi’an.Hong Kong specific reading
涌: Chong
仔: Zai
咀: Zui (<嘴)
Toneless Pinyin Map with HK place names
cn-chn-Hans-Latn-pinyin_toneless.yaml.zip
Originally posted by @chaaklau in #39 (comment)
The text was updated successfully, but these errors were encountered: