-
Notifications
You must be signed in to change notification settings - Fork 1
Description
- Using as toy example
- adm0: Brazil https://www.wikidata.org/wiki/Q155 (Turtle http://www.wikidata.org/wiki/Special:EntityData/Q155.ttl)
- adm1: Rio de Janeiro state https://www.wikidata.org/wiki/Q41428 (Turtle http://www.wikidata.org/wiki/Special:EntityData/Q41428.ttl)
- adm2: Rio de Janeiro city https://www.wikidata.org/wiki/Q8678 (Turtle http://www.wikidata.org/wiki/Special:EntityData/Q8678.ttl)
- Note: Rio de Janeiro is likely to be one of the easiest examples to have codes, even if not current capital of a country. This means cities like this one likely would not be realistic examples to expand for other P-Codes adm2
While the mappings at least at admin0 ("country level") is straightforward (since we can map ISO 3166-1 used on P-Codes prefix and UN m49), things get tricky already at admin boundary level 1. We know some UN PCode patterns of at least some regions (such as the case of P-Codes from Brazil) which we're even lucky have a mapping ready to use like https://www.wikidata.org/wiki/Property:P1585. But not sure about the rest.
Why such mappings becomes relevant
Even if we only manage to somewhat make mappings at best case of admin 1 and only specific administrative regions got very detailed, this alone already allow get more data from Wikidata, which is by far the best place different persons and organizations use it. I personally think (at least as soon as it get decent) worth allow publish such mappings as dedicated public domain dataset, so ITOS or OCHA can at least use it even if for internal comparisons. However, this "soon" can take time and is more likely that for population statistics such as #43, the data from such mappings would be less accurate than what OCHA have, in special for countries with active crisis.
However, in any case, the mappings start allow we know much more mappings (including OpenStreetMap and UN/LOCODE). But by no means I think this will be something ready anytime soon (assuming is something that could be ready at all, since regions can change over time).
Potential approaches
Tooling specialized to integrate intermediate controlled vocabularies
Note: by "intermediate controlled vocabularies" we're talking about anything that could be used to triangulate what could later be assumed to be an exact match with P-Codes
This topic alone will require create several scripts and strategies (even if the early ones would become not as necessary in the medium term) to start know how to make the other relations. The ones we should do more attention are what is relevant to run from time to time to discover new changes.
1. (Not sure, needs testing) maybe compare by matching geometries
At the moment we did not attempted to run tools which could make any type of matching by geometries, but while this definitely would need human intervention, maybe it could work.
To reach this point, not only we would need to create the scripts, but likely allow it run (maybe weekly or monthly) to check the official COD-ABs with what whatever is on Wikidata uses.
2. Trying reverse engineering numeric part of P-Codes (and hope already exist Wikidata P with them)
Since the documentation on how to design P-Codes for more than one decade already recommended to try reuse existing country codes, is likely that more regions would have equivalences such as IBGE Code P1585. The only thing we're sure is that all P-Codes without admi0 prefix are fully numeric (with few exceptions), so this already exclude a lot of potential existing codes
However, the new problem would become if other countries do have mappings on Wikidata P property (and such mappings be as updated as P1585 by others). Otherwise, even if we could know country by country how the P-Codes where designed without try and error (and they be be an 1:1 matching P-Code, which, again, we can't take for granted without human intervention) we cannot use it.
In any case, whatever would be the strategy to map P-Codes to Wikidata Q, we would need to document very well to allow revision.
3. Other inferences
There's several other codes on Wikidata, from OpenStreet Map (https://www.wikidata.org/wiki/Property:P402), UN/LOCODE (https://www.wikidata.org/wiki/Property:P1937), HASC (https://www.wikidata.org/wiki/Property:P8119) to a popular one, the GeoNames (https://www.wikidata.org/wiki/Property:P1566, this one not sure why somethines have more than one code for same place). They might somewhat allow some way to triangulate with P-Codes, but not sure at the moment.