Skip to content

Commit

Permalink
CLDR-18002 Update population and likely subtags for MU, TK, ZM and SL
Browse files Browse the repository at this point in the history
There are 4 manual overrides in GenerateLikelySubtags.java that conflict with other data: for MU, SL, TK, and ZM.

For each country, the local language (mfe, kri, tkl, and bem) is spoken by far more than English, even if English is the main language of instruction. Education and literacy in each country is low enough that the local languages should be considered the dominant ones.

I was able to find censuses listing language characteristics for MU, TK and ZM. SL I wasn't able to find data, but I removed the override.

To regenerate data use this command ` mvn package -DskipTests=true &&  java -jar tools/cldr-code/target/cldr-code.jar ConvertLanguageData &&  java -jar tools/cldr-code/target/cldr-code.jar GenerateLikelySubtags &&  java -jar tools/cldr-code/target/cldr-code.jar GenerateTestData`

CLDR-18002 Actually make local languages the default matches

The prior change didn't exactly work because und_MU was defaulting to en_Latn_MU -- this fixes it to go to mfe -- also for the other languages.

The problem is that English is official in these countries so there's a mis-match

CLDR-18002 Style fix

`mvn --file=tools/pom.xml spotless:apply`

CLDR-18002 Default to English since its official
  • Loading branch information
conradarcturus committed Nov 11, 2024
1 parent c5482d3 commit cc75524
Show file tree
Hide file tree
Showing 5 changed files with 52 additions and 44 deletions.
16 changes: 7 additions & 9 deletions common/supplemental/likelySubtags.xml
Original file line number Diff line number Diff line change
Expand Up @@ -370,6 +370,7 @@ not be patched by hand, as any changes made in that fashion may be lost.
<likelySubtag from="kok" to="kok_Deva_IN"/> <!--Konkani‧?‧? ➡ Konkani‧Devanagari‧India-->
<likelySubtag from="kos" to="kos_Latn_FM"/> <!--Kosraean‧?‧? ➡ Kosraean‧Latin‧Micronesia-->
<likelySubtag from="kpe" to="kpe_Latn_LR"/> <!--Kpelle‧?‧? ➡ Kpelle‧Latin‧Liberia-->
<likelySubtag from="kqn" to="kqn_Latn_ZM"/> <!--Kaonde‧?‧? ➡ Kaonde‧Latin‧Zambia-->
<likelySubtag from="krc" to="krc_Cyrl_RU"/> <!--Karachay-Balkar‧?‧? ➡ Karachay-Balkar‧Cyrillic‧Russia-->
<likelySubtag from="kri" to="kri_Latn_SL"/> <!--Krio‧?‧? ➡ Krio‧Latin‧Sierra Leone-->
<likelySubtag from="krj" to="krj_Latn_PH"/> <!--Kinaray-a‧?‧? ➡ Kinaray-a‧Latin‧Philippines-->
Expand Down Expand Up @@ -407,6 +408,7 @@ not be patched by hand, as any changes made in that fashion may be lost.
<likelySubtag from="lbe" to="lbe_Cyrl_RU"/> <!--Lak‧?‧? ➡ Lak‧Cyrillic‧Russia-->
<likelySubtag from="lbw" to="lbw_Latn_ID"/> <!--Tolaki‧?‧? ➡ Tolaki‧Latin‧Indonesia-->
<likelySubtag from="lcp" to="lcp_Thai_CN"/> <!--Western Lawa‧?‧? ➡ Western Lawa‧Thai‧China-->
<likelySubtag from="leb" to="leb_Latn_ZM"/> <!--Lala-Bisa‧?‧? ➡ Lala-Bisa‧Latin‧Zambia-->
<likelySubtag from="len" to="len_Latn_SV"/> <!--Lenca‧?‧? ➡ Lenca‧Latin‧El Salvador-->
<likelySubtag from="lep" to="lep_Lepc_IN"/> <!--Lepcha‧?‧? ➡ Lepcha‧Lepcha‧India-->
<likelySubtag from="lez" to="lez_Cyrl_RU"/> <!--Lezghian‧?‧? ➡ Lezghian‧Cyrillic‧Russia-->
Expand All @@ -432,6 +434,8 @@ not be patched by hand, as any changes made in that fashion may be lost.
<likelySubtag from="ltg" to="ltg_Latn_LV"/> <!--Latgalian‧?‧? ➡ Latgalian‧Latin‧Latvia-->
<likelySubtag from="lu" to="lu_Latn_CD"/> <!--Luba-Katanga‧?‧? ➡ Luba-Katanga‧Latin‧Congo - Kinshasa-->
<likelySubtag from="lua" to="lua_Latn_CD"/> <!--Luba-Lulua‧?‧? ➡ Luba-Lulua‧Latin‧Congo - Kinshasa-->
<likelySubtag from="lue" to="lue_Latn_ZM"/> <!--Luvale‧?‧? ➡ Luvale‧Latin‧Zambia-->
<likelySubtag from="lun" to="lun_Latn_ZM"/> <!--Lunda‧?‧? ➡ Lunda‧Latin‧Zambia-->
<likelySubtag from="luo" to="luo_Latn_KE"/> <!--Luo‧?‧? ➡ Luo‧Latin‧Kenya-->
<likelySubtag from="luy" to="luy_Latn_KE"/> <!--Luyia‧?‧? ➡ Luyia‧Latin‧Kenya-->
<likelySubtag from="luz" to="luz_Arab_IR"/> <!--Southern Luri‧?‧? ➡ Southern Luri‧Arabic‧Iran-->
Expand Down Expand Up @@ -534,6 +538,7 @@ not be patched by hand, as any changes made in that fashion may be lost.
<likelySubtag from="non" to="non_Runr_SE"/> <!--Old Norse‧?‧? ➡ Old Norse‧Runic‧Sweden-->
<likelySubtag from="nqo" to="nqo_Nkoo_GN"/> <!--N’Ko‧?‧? ➡ N’Ko‧N’Ko‧Guinea-->
<likelySubtag from="nr" to="nr_Latn_ZA"/> <!--South Ndebele‧?‧? ➡ South Ndebele‧Latin‧South Africa-->
<likelySubtag from="nse" to="nse_Latn_ZM"/> <!--Nsenga‧?‧? ➡ Nsenga‧Latin‧Zambia-->
<likelySubtag from="nsk" to="nsk_Cans_CA"/> <!--Naskapi‧?‧? ➡ Naskapi‧Unified Canadian Aboriginal Syllabics‧Canada-->
<likelySubtag from="nso" to="nso_Latn_ZA"/> <!--Northern Sotho‧?‧? ➡ Northern Sotho‧Latin‧South Africa-->
<likelySubtag from="nst" to="nst_Tnsa_IN"/> <!--Tase Naga‧?‧? ➡ Tase Naga‧Tangsa‧India-->
Expand Down Expand Up @@ -726,6 +731,7 @@ not be patched by hand, as any changes made in that fashion may be lost.
<likelySubtag from="tnr" to="tnr_Latn_SN"/> <!--Ménik‧?‧? ➡ Ménik‧Latin‧Senegal-->
<likelySubtag from="to" to="to_Latn_TO"/> <!--Tongan‧?‧? ➡ Tongan‧Latin‧Tonga-->
<likelySubtag from="tog" to="tog_Latn_MW"/> <!--Nyasa Tonga‧?‧? ➡ Nyasa Tonga‧Latin‧Malawi-->
<likelySubtag from="toi" to="toi_Latn_ZM"/> <!--Tonga (Zambia)‧?‧? ➡ Tonga (Zambia)‧Latin‧Zambia-->
<likelySubtag from="tok" to="tok_Latn_001"/> <!--Toki Pona‧?‧? ➡ Toki Pona‧Latin‧world-->
<likelySubtag from="tpi" to="tpi_Latn_PG"/> <!--Tok Pisin‧?‧? ➡ Tok Pisin‧Latin‧Papua New Guinea-->
<likelySubtag from="tr" to="tr_Latn_TR"/> <!--Turkish‧?‧? ➡ Turkish‧Latin‧Türkiye-->
Expand Down Expand Up @@ -969,7 +975,7 @@ not be patched by hand, as any changes made in that fashion may be lost.
<likelySubtag from="und_MQ" to="fr_Latn_MQ"/> <!--?‧?‧Martinique ➡ French‧Latin‧Martinique-->
<likelySubtag from="und_MR" to="ar_Arab_MR"/> <!--?‧?‧Mauritania ➡ Arabic‧Arabic‧Mauritania-->
<likelySubtag from="und_MT" to="mt_Latn_MT"/> <!--?‧?‧Malta ➡ Maltese‧Latin‧Malta-->
<likelySubtag from="und_MU" to="mfe_Latn_MU"/> <!--?‧?‧Mauritius ➡ Morisyen‧Latin‧Mauritius-->
<likelySubtag from="und_MU" to="fr_Latn_MU"/> <!--?‧?‧Mauritius ➡ French‧Latin‧Mauritius-->
<likelySubtag from="und_MV" to="dv_Thaa_MV"/> <!--?‧?‧Maldives ➡ Divehi‧Thaana‧Maldives-->
<likelySubtag from="und_MX" to="es_Latn_MX"/> <!--?‧?‧Mexico ➡ Spanish‧Latin‧Mexico-->
<likelySubtag from="und_MY" to="ms_Latn_MY"/> <!--?‧?‧Malaysia ➡ Malay‧Latin‧Malaysia-->
Expand Down Expand Up @@ -1008,7 +1014,6 @@ not be patched by hand, as any changes made in that fashion may be lost.
<likelySubtag from="und_SI" to="sl_Latn_SI"/> <!--?‧?‧Slovenia ➡ Slovenian‧Latin‧Slovenia-->
<likelySubtag from="und_SJ" to="nb_Latn_SJ"/> <!--?‧?‧Svalbard & Jan Mayen ➡ Norwegian Bokmål‧Latin‧Svalbard & Jan Mayen-->
<likelySubtag from="und_SK" to="sk_Latn_SK"/> <!--?‧?‧Slovakia ➡ Slovak‧Latin‧Slovakia-->
<likelySubtag from="und_SL" to="kri_Latn_SL"/> <!--?‧?‧Sierra Leone ➡ Krio‧Latin‧Sierra Leone-->
<likelySubtag from="und_SM" to="it_Latn_SM"/> <!--?‧?‧San Marino ➡ Italian‧Latin‧San Marino-->
<likelySubtag from="und_SN" to="wo_Latn_SN"/> <!--?‧?‧Senegal ➡ Wolof‧Latin‧Senegal-->
<likelySubtag from="und_SO" to="so_Latn_SO"/> <!--?‧?‧Somalia ➡ Somali‧Latin‧Somalia-->
Expand Down Expand Up @@ -1044,7 +1049,6 @@ not be patched by hand, as any changes made in that fashion may be lost.
<likelySubtag from="und_XK" to="sq_Latn_XK"/> <!--?‧?‧Kosovo ➡ Albanian‧Latin‧Kosovo-->
<likelySubtag from="und_YE" to="ar_Arab_YE"/> <!--?‧?‧Yemen ➡ Arabic‧Arabic‧Yemen-->
<likelySubtag from="und_YT" to="fr_Latn_YT"/> <!--?‧?‧Mayotte ➡ French‧Latin‧Mayotte-->
<likelySubtag from="und_ZM" to="bem_Latn_ZM"/> <!--?‧?‧Zambia ➡ Bemba‧Latin‧Zambia-->
<likelySubtag from="und_ZW" to="sn_Latn_ZW"/> <!--?‧?‧Zimbabwe ➡ Shona‧Latin‧Zimbabwe-->
<likelySubtag from="und_Adlm" to="ff_Adlm_GN"/> <!--?‧Adlam‧? ➡ Fula‧Adlam‧Guinea-->
<likelySubtag from="und_Aghb" to="xag_Aghb_AZ"/> <!--?‧Caucasian Albanian‧? ➡ Aghwan‧Caucasian Albanian‧Azerbaijan-->
Expand Down Expand Up @@ -3983,7 +3987,6 @@ not be patched by hand, as any changes made in that fashion may be lost.
<likelySubtag from="kqk" to="kqk_Latn_BJ" origin="sil1"/> <!--Kotafon Gbe‧?‧? ➡ Kotafon Gbe‧Latin‧Benin-->
<likelySubtag from="kql" to="kql_Latn_PG" origin="sil1"/> <!--Kyenele‧?‧? ➡ Kyenele‧Latin‧Papua New Guinea-->
<likelySubtag from="kqm" to="kqm_Latn_CI" origin="sil1"/> <!--Khisa‧?‧? ➡ Khisa‧Latin‧Côte d’Ivoire-->
<likelySubtag from="kqn" to="kqn_Latn_ZM" origin="sil1"/> <!--Kaonde‧?‧? ➡ Kaonde‧Latin‧Zambia-->
<likelySubtag from="kqo" to="kqo_Latn_LR" origin="sil1"/> <!--Eastern Krahn‧?‧? ➡ Eastern Krahn‧Latin‧Liberia-->
<likelySubtag from="kqp" to="kqp_Latn_TD" origin="sil1"/> <!--Kimré‧?‧? ➡ Kimré‧Latin‧Chad-->
<likelySubtag from="kqq" to="kqq_Latn_BR" origin="sil1"/> <!--Krenak‧?‧? ➡ Krenak‧Latin‧Brazil-->
Expand Down Expand Up @@ -4245,7 +4248,6 @@ not be patched by hand, as any changes made in that fashion may be lost.
<likelySubtag from="ldp" to="ldp_Latn_NG" origin="sil1"/> <!--Tso‧?‧? ➡ Tso‧Latin‧Nigeria-->
<likelySubtag from="ldq" to="ldq_Latn_NG" origin="sil1"/> <!--Lufu‧?‧? ➡ Lufu‧Latin‧Nigeria-->
<likelySubtag from="lea" to="lea_Latn_CD" origin="sil1"/> <!--Lega-Shabunda‧?‧? ➡ Lega-Shabunda‧Latin‧Congo - Kinshasa-->
<likelySubtag from="leb" to="leb_Latn_ZM" origin="sil1"/> <!--Lala-Bisa‧?‧? ➡ Lala-Bisa‧Latin‧Zambia-->
<likelySubtag from="lec" to="lec_Latn_BO" origin="sil1"/> <!--Leco‧?‧? ➡ Leco‧Latin‧Bolivia-->
<likelySubtag from="led" to="led_Latn_CD" origin="sil1"/> <!--Lendu‧?‧? ➡ Lendu‧Latin‧Congo - Kinshasa-->
<likelySubtag from="lee" to="lee_Latn_BF" origin="sil1"/> <!--Lyélé‧?‧? ➡ Lyélé‧Latin‧Burkina Faso-->
Expand Down Expand Up @@ -4434,14 +4436,12 @@ not be patched by hand, as any changes made in that fashion may be lost.
<likelySubtag from="ltu" to="ltu_Latn_ID" origin="sil1"/> <!--Latu‧?‧? ➡ Latu‧Latin‧Indonesia-->
<likelySubtag from="luc" to="luc_Latn_UG" origin="sil1"/> <!--Aringa‧?‧? ➡ Aringa‧Latin‧Uganda-->
<likelySubtag from="lud" to="lud_Latn_RU" origin="sil1"/> <!--Ludian‧?‧? ➡ Ludian‧Latin‧Russia-->
<likelySubtag from="lue" to="lue_Latn_ZM" origin="sil1"/> <!--Luvale‧?‧? ➡ Luvale‧Latin‧Zambia-->
<likelySubtag from="luf" to="luf_Latn_PG" origin="sil1"/> <!--Laua‧?‧? ➡ Laua‧Latin‧Papua New Guinea-->
<likelySubtag from="lui" to="lui_Latn_US" origin="sil1"/> <!--Luiseno‧?‧? ➡ Luiseno‧Latin‧United States-->
<likelySubtag from="luj" to="luj_Latn_CD" origin="sil1"/> <!--Luna‧?‧? ➡ Luna‧Latin‧Congo - Kinshasa-->
<likelySubtag from="luk" to="luk_Tibt_BT" origin="sil1"/> <!--Lunanakha‧?‧? ➡ Lunanakha‧Tibetan‧Bhutan-->
<likelySubtag from="lul" to="lul_Latn_SS" origin="sil1"/> <!--Olu'bo‧?‧? ➡ Olu'bo‧Latin‧South Sudan-->
<likelySubtag from="lum" to="lum_Latn_AO" origin="sil1"/> <!--Luimbi‧?‧? ➡ Luimbi‧Latin‧Angola-->
<likelySubtag from="lun" to="lun_Latn_ZM" origin="sil1"/> <!--Lunda‧?‧? ➡ Lunda‧Latin‧Zambia-->
<likelySubtag from="lup" to="lup_Latn_GA" origin="sil1"/> <!--Lumbu‧?‧? ➡ Lumbu‧Latin‧Gabon-->
<likelySubtag from="luq" to="luq_Latn_CU" origin="sil1"/> <!--Lucumi‧?‧? ➡ Lucumi‧Latin‧Cuba-->
<likelySubtag from="lur" to="lur_Latn_ID" origin="sil1"/> <!--Laura‧?‧? ➡ Laura‧Latin‧Indonesia-->
Expand Down Expand Up @@ -5334,7 +5334,6 @@ not be patched by hand, as any changes made in that fashion may be lost.
<likelySubtag from="nsb" to="nsb_Latn_ZA" origin="sil1"/> <!--Lower Nossob‧?‧? ➡ Lower Nossob‧Latin‧South Africa-->
<likelySubtag from="nsc" to="nsc_Latn_NG" origin="sil1"/> <!--Nshi‧?‧? ➡ Nshi‧Latin‧Nigeria-->
<likelySubtag from="nsd" to="nsd_Yiii_CN" origin="sil1"/> <!--Southern Nisu‧?‧? ➡ Southern Nisu‧Yi‧China-->
<likelySubtag from="nse" to="nse_Latn_ZM" origin="sil1"/> <!--Nsenga‧?‧? ➡ Nsenga‧Latin‧Zambia-->
<likelySubtag from="nsf" to="nsf_Yiii_CN" origin="sil1"/> <!--Northwestern Nisu‧?‧? ➡ Northwestern Nisu‧Yi‧China-->
<likelySubtag from="nsg" to="nsg_Latn_TZ" origin="sil1"/> <!--Ngasa‧?‧? ➡ Ngasa‧Latin‧Tanzania-->
<likelySubtag from="nsh" to="nsh_Latn_CM" origin="sil1"/> <!--Ngoshie‧?‧? ➡ Ngoshie‧Latin‧Cameroon-->
Expand Down Expand Up @@ -6664,7 +6663,6 @@ not be patched by hand, as any changes made in that fashion may be lost.
<likelySubtag from="tod" to="tod_Latn_GN" origin="sil1"/> <!--Toma‧?‧? ➡ Toma‧Latin‧Guinea-->
<likelySubtag from="tof" to="tof_Latn_PG" origin="sil1"/> <!--Gizrra‧?‧? ➡ Gizrra‧Latin‧Papua New Guinea-->
<likelySubtag from="toh" to="toh_Latn_MZ" origin="sil1"/> <!--Gitonga‧?‧? ➡ Gitonga‧Latin‧Mozambique-->
<likelySubtag from="toi" to="toi_Latn_ZM" origin="sil1"/> <!--Tonga (Zambia)‧?‧? ➡ Tonga (Zambia)‧Latin‧Zambia-->
<likelySubtag from="toj" to="toj_Latn_MX" origin="sil1"/> <!--Tojolabal‧?‧? ➡ Tojolabal‧Latin‧Mexico-->
<likelySubtag from="tol" to="tol_Latn_US" origin="sil1"/> <!--Tolowa‧?‧? ➡ Tolowa‧Latin‧United States-->
<likelySubtag from="tom" to="tom_Latn_ID" origin="sil1"/> <!--Tombulu‧?‧? ➡ Tombulu‧Latin‧Indonesia-->
Expand Down
Loading

0 comments on commit cc75524

Please sign in to comment.