Add missing region codes #711

stopped-clock · 2025-09-15T08:05:54Z

In the process of trying to add a population field, I noticed that there was a "region code" column in main.csv, but that not all countries with ISO 3166 country codes had them filled in. So, this pull request would add the 40 remaining 2-letter country codes for all geographical entities in the deck that had them.

Justification

The index for most of the .csv files in data is the country name, but the name of a country according to Wikipedia is both unstable (it can change), and somewhat likely to not match the name of the country in other data sources. Having region codes makes it easier to drag in data from other sources for future contributors/people independently modifying their own decks, and provides a stable base for things like the interactive map project.

More importantly, the region code field already exists, and if it exists, it is logical that it be complete. My one hesitation here is that this appears to have been added as part of the interactive map addition, so I'm not sure if the presence or absence of a region code was being used to signal the presence or absence of an interactive map. I do think that this would not be the optimal way of doing things for the long term, but I don't want to throw a wrench into anything.

Random interesting things noted when adding region codes

There is a country code for Antarctica! (I added it. It felt a little against the logic of the thing.)
Most entities that had ISO country codes but no entry in the deck were tiny colonial islands that probably would not have a code if the system had been created today.
Most entities that had an entry in the deck but no country code were contested states.
There were some entities in the deck that had "unofficial" county codes. I currently did not add these to avoid political disputes, though I can see the data benefits of maximalism.

country	unofficial code	Asterisks
Northern Ireland	XI	EU taxes
Canary Islands	IC	Spain
Kosovo	XK	temporary, EU
South Ossetia	XO	Russia
Abkhazia	XA	Russia

aplaice · 2025-09-15T19:51:27Z

Thanks for the contribution!

both unstable (it can change), and somewhat likely to not match the name of the country in other data sources

Yes, true! (I've thought about the wikidata id, but while it it's stable it definitely doesn't match anything outside the wiki ecosystem.)

so I'm not sure if the presence or absence of a region code was being used to signal the presence or absence of an interactive map.

Currently, yes, it was. (e.g. Small countries were excluded due to not really being visible on the map.)

I do think that this would not be the optimal way of doing things for the long term,

Definitely agreed, but as a first, experimental version it made sense. You might have a better suggestion, but one "better" way that comes to mind would be having another (interactive-map) field, either with y or empty and the use of the interactive map controlled by that field. Another could be to store the "unused" region codes in the interactive map javascript.

I suspect the simplest solution for now would be to have a duplicate field/column, where one would be complete and the other would stay as-is, but there might be other less ugly options!

I have no strong opinion on the unofficial country codes.

Thanks also very much for the clean laying out of your PR description!

stopped-clock · 2025-09-24T03:00:39Z

Thank you for your thoughtful reply, and sorry for the delayed response. (Work trips monopolizing my life at the moment, unfortunately.)

The problem of stable country ids is an interesting and difficult one. There are a lot of places where it's useful to have some kind of unchanging index, like when updating country names and not having to do it across multiple csvs, or having something to key off of when updating notes. (Though I think brainbrew takes care of the latter with it's own id system, at least?)

I can see a couple of different ways of creating ids:

Use the ids from somewhere else. The wikidata id is a good one there, as it is the only id system I can think of that would have full coverage, and does interact with the system this project uses most to get information. (It also led me down a rabbit hole of investigating Wikidata!)
Create UG-specific ids. A pure list of numbers from 1 to 376 would at the very least solve the stability problem, though it wouldn't be in any way useful for interfacing with other systems.
Take the system from an outside source and supplement it where necessary. For example, it looks like there are 86 entities not covered by country codes. If we expanded to use numbers, those all could be easily covered by two-character "codes" that are clearly distinct from standard country codes that other databases use. It's still an ugly mix, though.

I wrote all that but I didn't create this pull request to try and suggest changing anything this fundamental!

I agree that a good method would be a separate interactive-map field with a y flag, but that right now just having a duplicate column is likely simplest.

I don't want to clutter up the project; if it makes things better I can keep out of the main spreadsheet and keep a separate sheet with the country name - country code correspondences. I did hope the country codes might eventually be a good base for other people as well.

Do you think it would be useful if I revised the pull request to add another ISO country column to main.csv, or go in a different direction?

aplaice · 2025-09-24T18:23:14Z

Thanks for your reply!

I don't want to clutter up the project; if it makes things better I can keep out of the main spreadsheet and keep a separate sheet with the country name - country code correspondences. I did hope the country codes might eventually be a good base for other people as well.

Do you think it would be useful if I revised the pull request to add another ISO country column to main.csv, or go in a different direction?

IMO another ISO country column in main.csv (with values from your current PR) would be generally useful irrespective of future plans and it wouldn't significantly clutter up the csv, so yes please! Minimising differences between forks is also valuable!

The problem of stable country ids is an interesting and difficult one. There are a lot of places where it's useful to have some kind of unchanging index, like when updating country names and not having to do it across multiple csvs, or having something to key off of when updating notes. (Though I think brainbrew takes care of the latter with it's own id system, at least?)

The guids we have are Anki's in-built ones, but yes, they're essential to allow easy note changes.

In respect of updating country names without modifying multiple csvs, our current use of the English name is indeed sub-optimal, but I believe that trading this off for more comprehensible row indices is worth it — we change the other fields more frequently and having the country name present in the row is extremely helpful. (There's also the potential issue that using the English country name as a quasi-id requires it to be unique, but given that we need it to be unique for the Country -> {X} cards, anyway, means that it's not actually a problem.)

Hence, I'm not sure if we need any more "internal" ids. (This obviously has no bearing on the benefits of external ids for interoperability with other sources, though.)

…official ISO 2-character country codes

stopped-clock · 2025-11-02T05:33:22Z

Over a month later, here's the revised pull request!

I added a new column for the full ISO country codes and copied the current region codes back into the region code column.

Because of the intervening time period, there were changes to the ultimate-geography repository that I merged into my branch before making my own changes. This means everything should be up-to-date, but let me know if there is a different way that is more standard git procedure!

aplaice

Thanks very much! This looks great!

git procedure

Given that we usually squash PRs in AUG, merging master into the current branch is simple and convenient. (If/when it's useful to keep (not squash) the individual commits in the PR, I'm a fan of rebasing over master, as (IMO!) it results in a cleaner history, but it's a matter of personal preference.)

stopped-clock added 2 commits September 15, 2025 15:58

Filled in missing region codes in main.csv.

4e04d63

Addition of region codes for the island of Saint Martin/Sint Maarten

c016984

stopped-clock and others added 2 commits November 2, 2025 13:55

Merge branch 'anki-geo:master' into region-codes

f12c8ef

restored previous region codes and created new 'ISO' column with all …

12e9fe2

…official ISO 2-character country codes

aplaice approved these changes Nov 2, 2025

View reviewed changes

aplaice added this to the v5.4 milestone Nov 8, 2025

aplaice merged commit f91b220 into anki-geo:master Nov 8, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add missing region codes #711

Add missing region codes #711

Uh oh!

stopped-clock commented Sep 15, 2025

Uh oh!

aplaice commented Sep 15, 2025 •

edited

Loading

Uh oh!

stopped-clock commented Sep 24, 2025

Uh oh!

aplaice commented Sep 24, 2025

Uh oh!

stopped-clock commented Nov 2, 2025

Uh oh!

aplaice left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Add missing region codes #711

Add missing region codes #711

Uh oh!

Conversation

stopped-clock commented Sep 15, 2025

Justification

Random interesting things noted when adding region codes

Uh oh!

aplaice commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stopped-clock commented Sep 24, 2025

Uh oh!

aplaice commented Sep 24, 2025

Uh oh!

stopped-clock commented Nov 2, 2025

Uh oh!

aplaice left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

aplaice commented Sep 15, 2025 •

edited

Loading

aplaice left a comment •

edited

Loading