Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test against GeoNames data (Russia) #52

Open
ronaldtse opened this issue Jan 7, 2020 · 2 comments
Open

Test against GeoNames data (Russia) #52

ronaldtse opened this issue Jan 7, 2020 · 2 comments
Labels
enhancement New feature or request

Comments

@ronaldtse
Copy link
Contributor

Other than this file you have to download from the original site (http://geonames.nga.mil/gns/html/cntyfile/rs.zip)

rs_populatedplaces_p.txt.zip

These systems are used:

  • che_Cyrl2Latn_BGN_2007
  • hye_Armn2Latn_BGN_1981
  • kat_Geor2Latn_GGG_2002
  • rus_Cyrl2Latn_ALA_1997
  • rus_Cyrl2Latn_BGN_1947
@ronaldtse ronaldtse added the enhancement New feature or request label Jan 7, 2020
@webdev778
Copy link
Collaborator

The linked file is not available anymore

@webdev778
Copy link
Collaborator

interscript/geotest#1

For the rs_populatedplaces_p.txt file, GeoTest outputs the following result:

# bundle exec ruby test.rb files/rs_populatedplaces_p.txt 
.....
0 records have a non-unique UNI (should be 0)

Out of 331214 related clusters we get 165598 unique related clusters
Unique clusters have 331214 members in total (this should match a number of related clusters)
Hash of cluster length to a number of clusters of that kind: {2=>165578, 3=>14, 1=>3, 4=>2, 5=>1}

Transliteration systems used:
- "" * 416664 (274164 with a pair)
- "rus_Cyrl2Latn_BGN_1947" * 22972 (20880 with a pair) implemented in Interscript as bgnpcgn-rus-Cyrl-Latn-1947
- "NOT_TRANSLITERATED" * 558 (439 with a pair)
- "che_Cyrl2Latn_BGN_2007" * 532 (337 with a pair)
- "rus_Cyrl2Latn_GOST_1983" * 242 (23 with a pair) implemented in Interscript as gost-rus-Cyrl-Latn-16876-71-1983
- "ukr_Cyrl2Latn_BGN_1965" * 69 (3 with a pair) implemented in Interscript as bgnpcgn-ukr-Cyrl-Latn-1965
- "UNKNOWN" * 1 (0 with a pair)
- "not_transliterated" * 1 (1 with a pair)
- "bel_Cyrl2Latn_BGN_1979" * 1 (1 with a pair) implemented in Interscript as bgnpcgn-bel-Cyrl-Latn-1979
- "rus_Cyrl2Latn_ALA_1997" * 1 (1 with a pair) implemented in Interscript as alalc-rus-Cyrl-Latn-1997

Among the unique clusters:
- 3 clusters are too short
- 1 clusters contain no non-ASCII entries
- 144294 clusters contain no transliteration info
- 3 clusters contain more than 1 non-ASCII entries
- 421 clusters are transliterated with a map not present in Interscript
Remaining 20876 clusters seem to be usable

rus_Cyrl2Latn_BGN_1947: 20614/20845 (98.89%) (Errors: Incorrect punctuation * 119, Incorrect transliteration * 111, Incorrect spacing or punctuation * 1)
rus_Cyrl2Latn_GOST_1983: 11/23 (47.83%) (Errors: Incorrect transliteration * 9, Incorrect punctuation * 3)
: 0/9 (0.0%) (Errors: No support in Interscript * 9)
ukr_Cyrl2Latn_BGN_1965: 3/3 (100.0%)
bel_Cyrl2Latn_BGN_1979: 0/1 (0.0%) (Errors: Incorrect transliteration * 1)
rus_Cyrl2Latn_ALA_1997: 0/1 (0.0%) (Errors: Incorrect transliteration * 1)
che_Cyrl2Latn_BGN_2007: 0/1 (0.0%) (Errors: No support in Interscript * 1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants