Skip to content

Conversation

@stopped-clock
Copy link
Contributor

In the process of trying to add a population field, I noticed that there was a "region code" column in main.csv, but that not all countries with ISO 3166 country codes had them filled in. So, this pull request would add the 40 remaining 2-letter country codes for all geographical entities in the deck that had them.

Justification

The index for most of the .csv files in data is the country name, but the name of a country according to Wikipedia is both unstable (it can change), and somewhat likely to not match the name of the country in other data sources. Having region codes makes it easier to drag in data from other sources for future contributors/people independently modifying their own decks, and provides a stable base for things like the interactive map project.

More importantly, the region code field already exists, and if it exists, it is logical that it be complete. My one hesitation here is that this appears to have been added as part of the interactive map addition, so I'm not sure if the presence or absence of a region code was being used to signal the presence or absence of an interactive map. I do think that this would not be the optimal way of doing things for the long term, but I don't want to throw a wrench into anything.

Random interesting things noted when adding region codes

  • There is a country code for Antarctica! (I added it. It felt a little against the logic of the thing.)
  • Most entities that had ISO country codes but no entry in the deck were tiny colonial islands that probably would not have a code if the system had been created today.
  • Most entities that had an entry in the deck but no country code were contested states.
  • There were some entities in the deck that had "unofficial" county codes. I currently did not add these to avoid political disputes, though I can see the data benefits of maximalism.
country unofficial code Asterisks
Northern Ireland XI EU taxes
Canary Islands IC Spain
Kosovo XK temporary, EU
South Ossetia XO Russia
Abkhazia XA Russia

@aplaice
Copy link
Collaborator

aplaice commented Sep 15, 2025

Thanks for the contribution!

both unstable (it can change), and somewhat likely to not match the name of the country in other data sources

Yes, true! (I've thought about the wikidata id, but while it it's stable it definitely doesn't match anything outside the wiki ecosystem.)

so I'm not sure if the presence or absence of a region code was being used to signal the presence or absence of an interactive map.

Currently, yes, it was. (e.g. Small countries were excluded due to not really being visible on the map.)

I do think that this would not be the optimal way of doing things for the long term,

Definitely agreed, but as a first, experimental version it made sense. You might have a better suggestion, but one "better" way that comes to mind would be having another (interactive-map) field, either with y or empty and the use of the interactive map controlled by that field. Another could be to store the "unused" region codes in the interactive map javascript.

I suspect the simplest solution for now would be to have a duplicate field/column, where one would be complete and the other would stay as-is, but there might be other less ugly options!


I have no strong opinion on the unofficial country codes.


Thanks also very much for the clean laying out of your PR description!

@stopped-clock
Copy link
Contributor Author

Thank you for your thoughtful reply, and sorry for the delayed response. (Work trips monopolizing my life at the moment, unfortunately.)

The problem of stable country ids is an interesting and difficult one. There are a lot of places where it's useful to have some kind of unchanging index, like when updating country names and not having to do it across multiple csvs, or having something to key off of when updating notes. (Though I think brainbrew takes care of the latter with it's own id system, at least?)

I can see a couple of different ways of creating ids:

  1. Use the ids from somewhere else. The wikidata id is a good one there, as it is the only id system I can think of that would have full coverage, and does interact with the system this project uses most to get information. (It also led me down a rabbit hole of investigating Wikidata!)
  2. Create UG-specific ids. A pure list of numbers from 1 to 376 would at the very least solve the stability problem, though it wouldn't be in any way useful for interfacing with other systems.
  3. Take the system from an outside source and supplement it where necessary. For example, it looks like there are 86 entities not covered by country codes. If we expanded to use numbers, those all could be easily covered by two-character "codes" that are clearly distinct from standard country codes that other databases use. It's still an ugly mix, though.

I wrote all that but I didn't create this pull request to try and suggest changing anything this fundamental!

I agree that a good method would be a separate interactive-map field with a y flag, but that right now just having a duplicate column is likely simplest.

I don't want to clutter up the project; if it makes things better I can keep out of the main spreadsheet and keep a separate sheet with the country name - country code correspondences. I did hope the country codes might eventually be a good base for other people as well.

Do you think it would be useful if I revised the pull request to add another ISO country column to main.csv, or go in a different direction?

@aplaice
Copy link
Collaborator

aplaice commented Sep 24, 2025

Thanks for your reply!

I don't want to clutter up the project; if it makes things better I can keep out of the main spreadsheet and keep a separate sheet with the country name - country code correspondences. I did hope the country codes might eventually be a good base for other people as well.

Do you think it would be useful if I revised the pull request to add another ISO country column to main.csv, or go in a different direction?

IMO another ISO country column in main.csv (with values from your current PR) would be generally useful irrespective of future plans and it wouldn't significantly clutter up the csv, so yes please! Minimising differences between forks is also valuable!


The problem of stable country ids is an interesting and difficult one. There are a lot of places where it's useful to have some kind of unchanging index, like when updating country names and not having to do it across multiple csvs, or having something to key off of when updating notes. (Though I think brainbrew takes care of the latter with it's own id system, at least?)

The guids we have are Anki's in-built ones, but yes, they're essential to allow easy note changes.

In respect of updating country names without modifying multiple csvs, our current use of the English name is indeed sub-optimal, but I believe that trading this off for more comprehensible row indices is worth it — we change the other fields more frequently and having the country name present in the row is extremely helpful. (There's also the potential issue that using the English country name as a quasi-id requires it to be unique, but given that we need it to be unique for the Country -> {X} cards, anyway, means that it's not actually a problem.)

Hence, I'm not sure if we need any more "internal" ids. (This obviously has no bearing on the benefits of external ids for interoperability with other sources, though.)

@stopped-clock
Copy link
Contributor Author

Over a month later, here's the revised pull request!

I added a new column for the full ISO country codes and copied the current region codes back into the region code column.

Because of the intervening time period, there were changes to the ultimate-geography repository that I merged into my branch before making my own changes. This means everything should be up-to-date, but let me know if there is a different way that is more standard git procedure!

Copy link
Collaborator

@aplaice aplaice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much! This looks great!

git procedure

Given that we usually squash PRs in AUG, merging master into the current branch is simple and convenient. (If/when it's useful to keep (not squash) the individual commits in the PR, I'm a fan of rebasing over master, as (IMO!) it results in a cleaner history, but it's a matter of personal preference.)

@aplaice aplaice added this to the v5.4 milestone Nov 8, 2025
@aplaice aplaice merged commit f91b220 into anki-geo:master Nov 8, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants