-
Notifications
You must be signed in to change notification settings - Fork 89
Add missing region codes #711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for the contribution!
Yes, true! (I've thought about the wikidata id, but while it it's stable it definitely doesn't match anything outside the wiki ecosystem.)
Currently, yes, it was. (e.g. Small countries were excluded due to not really being visible on the map.)
Definitely agreed, but as a first, experimental version it made sense. You might have a better suggestion, but one "better" way that comes to mind would be having another ( I suspect the simplest solution for now would be to have a duplicate field/column, where one would be complete and the other would stay as-is, but there might be other less ugly options! I have no strong opinion on the unofficial country codes. Thanks also very much for the clean laying out of your PR description! |
|
Thank you for your thoughtful reply, and sorry for the delayed response. (Work trips monopolizing my life at the moment, unfortunately.) The problem of stable country ids is an interesting and difficult one. There are a lot of places where it's useful to have some kind of unchanging index, like when updating country names and not having to do it across multiple csvs, or having something to key off of when updating notes. (Though I think brainbrew takes care of the latter with it's own id system, at least?) I can see a couple of different ways of creating ids:
I wrote all that but I didn't create this pull request to try and suggest changing anything this fundamental! I agree that a good method would be a separate I don't want to clutter up the project; if it makes things better I can keep out of the main spreadsheet and keep a separate sheet with the country name - country code correspondences. I did hope the country codes might eventually be a good base for other people as well. Do you think it would be useful if I revised the pull request to add another ISO country column to main.csv, or go in a different direction? |
|
Thanks for your reply!
IMO another ISO country column in
The guids we have are Anki's in-built ones, but yes, they're essential to allow easy note changes. In respect of updating country names without modifying multiple csvs, our current use of the English name is indeed sub-optimal, but I believe that trading this off for more comprehensible row indices is worth it — we change the other fields more frequently and having the country name present in the row is extremely helpful. (There's also the potential issue that using the English country name as a quasi-id requires it to be unique, but given that we need it to be unique for the Hence, I'm not sure if we need any more "internal" ids. (This obviously has no bearing on the benefits of external ids for interoperability with other sources, though.) |
…official ISO 2-character country codes
|
Over a month later, here's the revised pull request! I added a new column for the full ISO country codes and copied the current region codes back into the region code column. Because of the intervening time period, there were changes to the ultimate-geography repository that I merged into my branch before making my own changes. This means everything should be up-to-date, but let me know if there is a different way that is more standard git procedure! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much! This looks great!
git procedure
Given that we usually squash PRs in AUG, merging master into the current branch is simple and convenient. (If/when it's useful to keep (not squash) the individual commits in the PR, I'm a fan of rebasing over master, as (IMO!) it results in a cleaner history, but it's a matter of personal preference.)
In the process of trying to add a population field, I noticed that there was a "region code" column in main.csv, but that not all countries with ISO 3166 country codes had them filled in. So, this pull request would add the 40 remaining 2-letter country codes for all geographical entities in the deck that had them.
Justification
The index for most of the .csv files in data is the country name, but the name of a country according to Wikipedia is both unstable (it can change), and somewhat likely to not match the name of the country in other data sources. Having region codes makes it easier to drag in data from other sources for future contributors/people independently modifying their own decks, and provides a stable base for things like the interactive map project.
More importantly, the region code field already exists, and if it exists, it is logical that it be complete. My one hesitation here is that this appears to have been added as part of the interactive map addition, so I'm not sure if the presence or absence of a region code was being used to signal the presence or absence of an interactive map. I do think that this would not be the optimal way of doing things for the long term, but I don't want to throw a wrench into anything.
Random interesting things noted when adding region codes