Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add inner and outer members of boundary and multipolygon relations #72

Open
1ec5 opened this issue Jan 23, 2024 · 20 comments
Open

Add inner and outer members of boundary and multipolygon relations #72

1ec5 opened this issue Jan 23, 2024 · 20 comments

Comments

@1ec5
Copy link

1ec5 commented Jan 23, 2024

This query shows that no boundary or multipolygon relation in the OSM Planet dataset has osmrel:members that are ways with the role inner or outer. The only members in the dataset are label and admin_centre nodes, subarea relations, and plenty of tagging mistakes. This makes it difficult to perform tasks such as:

  • Comparing the perimeter of a building that has a courtyard to the perimeter (P2547) property on Wikidata
  • Computing the perimeter of a boundary, for example to apply the Poslby–Popper compactness test to the boundary
  • Associating a disputed boundary claim line with a boundary relation
  • Finding murals on walls of buildings that have courtyards

Also, in this OSM discussion, I needed to access the ways that make up a boundary relation in order to determine the total set of ways that would be part of a proposed time zone relation. I had to drop down to Overpass, which has various recursing operators as well as a length() operator.

@1ec5
Copy link
Author

1ec5 commented Jan 23, 2024

Computing the perimeter of a boundary, for example to apply the Poslby–Popper compactness test to the boundary

Another possible way to satisfy this use case would be to add osm2rdf:length perimeter triples to areas.

@hannahbast
Copy link
Member

osm2rdf has an option --add-relation-border-members. It seems that the dumps available on https://osm2rdf.cs.uni-freiburg.de are currently built without that option. I think there was a time when we were concerned about very large numbers of triples. But since we now have over 40 billion triples already for OSM Planet, I don't think adding a few more is a problem.

@lehmann-4178656ch @patrickbr Do you agree?

@hannahbast
Copy link
Member

hannahbast commented Jan 23, 2024

@1ec5 I have set up a SPARQL endpoint for the data from Germany (that was quick to do), where relations now have the mentioned members. Can you please check whether that has all the triples you need: https://qlever.cs.uni-freiburg.de/osm-germany . Note that for that endpoint the geometries are obtained again with geo:hasGeometry (without the geo:asWKT). I didn't do that on purpose, it was accidental, but just so you know.

Here is a query which gives (and shows) all the geometries of all the members of Berlin: https://qlever.cs.uni-freiburg.de/osm-germany/TWlwsr

@1ec5
Copy link
Author

1ec5 commented Jan 23, 2024

Thank you, yes, this query returns the least compact admin_level=8 boundaries in the extract according to the Polsby–Popper test.

@hannahbast
Copy link
Member

hannahbast commented Jan 23, 2024

@1ec5 Thanks for the feedback!

@lehmann-4178656ch @patrickbr The increase in the number of tripels due to --add-relation-border-members is below 1%. So I would just add that option when building the datasets on https://osm2rdf.cs.uni-freiburg.de . Are there more such options that we could meaningfully add, which would make the datasets more complete?

@patrickbr
Copy link
Member

patrickbr commented Jan 23, 2024

@lehmann-4178656ch is already working on a PR to make --add-relation-border-members the default. This also greatly simplifies the code. Another PR will add the object timestamps.

Regarding additional data completeness options: we are currently not outputting the "members" (node IDs) of ways. The reason is that most of these nodes are empty (without any attributes). We could do this, but it would significantly increase the dataset size (essentially, we would add 3 triples for each anchor point of a way geometry: (1) a triple connecting the way to the empty OSM node, (2) a hasGeometry triple connecting the OSM node to a geometry object, and (3) an asWKT triple connecting the geometry to its WKT representation).

Another thing I just thought of: we are also not outputting author information, which could be present in the input .pbf file (it is present in the input files we use for https://osm2rdf.cs.uni-freiburg.de/). I might be interesting to get all objects last authored by user X.

Also, the changeset id (basically an OSM "commit") is currently not dumped.

@1ec5
Copy link
Author

1ec5 commented Jan 23, 2024

These way members and changeset metadata are often used in Overpass queries, but I refrained from asking for them upfront because I assumed they’d be of more interest internally to OSM and OHM than externally. Off the top of my head, one external use case would be finding a given building’s entrances, something geocoders might do to better serve navigation applications. Another would be finding street intersections.

For reference, Sophox includes relation members but omits way members. Sophox also includes the element’s version and last changeset, timestamp, and user. Some of the example queries make use of this functionality.

@hannahbast
Copy link
Member

@1ec5 I am already convinced that these should be in our dataset. Just waiting for feedback from @lehmann-4178656ch and @patrickbr . They already agreed that we should have the information about changeset, timestamp, and user in our dataset. It's just a few billion more triples :-)

@hannahbast
Copy link
Member

hannahbast commented Jan 5, 2025

A late update to this thread:

  1. For a while now, the predicate osmrel:member is computed by osm2rdf by default. It relates an OSM relation to all its member
  2. For a while now, there is also an option --add-way-node-order that adds the predicate osmway:node, which relates a way to its node. This is now also active on our SPARQL endpoint. For example, here are all nodes on the border of Germany: https://qlever.cs.uni-freiburg.de/osm-planet/JF5pmQ . For a more complex query involving these triples, see Street intersections qlever#1696
  3. Timestamp information is also included by default. For example: https://qlever.cs.uni-freiburg.de/osm-planet/uXtTG2

@patrickbr @lehmann-4178656ch What about the changeset and user information?

@hannahbast
Copy link
Member

@1ec5 Does my previous comment also address the problem you had with your original timezone query?

@1ec5
Copy link
Author

1ec5 commented Jan 5, 2025

Wonderful!

Computing the perimeter of a boundary, for example to apply the Poslby–Popper compactness test to the boundary

I’ve added an example query to the OSM Wiki.

Does my previous comment also address the problem you had with your original timezone query?

Yes, this query actually gets a bit closer to the point I was trying to make in that forum discussion than Overpass was able to, since QLever allows us to distinguish between sfCovers and sfIntersects.

@patrickbr
Copy link
Member

@patrickbr @lehmann-4178656ch What about the changeset and user information?

There is actually an open PR for this: #83

I will merge this into master today.

@patrickbr
Copy link
Member

patrickbr commented Jan 8, 2025

@1ec5 if you want, you can try out #83. It should always output changeset, version and user if this information was present in the input data. Note that if version or user are missing from the input (which is often the case for OSM dumps), no corresponding triples are written.

@hannahbast
Copy link
Member

@patrickbr Great, thanks! I will consider this for the next version of https://qlever.cs.uni-freiburg.de/osm-planet. Two questions:

  1. Will the new triples be written by default, or is there an option for it?

  2. Do you have an estimate of how many triples this will add?

@patrickbr
Copy link
Member

patrickbr commented Jan 8, 2025

  1. Will the new triples be written by default, or is there an option for it?

By default at the moment. But I will add a configuration option to disable it.

  1. Do you have an estimate of how many triples this will add?

For an input dataset containing the user and changeset information, this PR should add exactly 4 * (|nodes| + |ways| + |relations|) triples. So around 36 B for planet.osm (+17%).

@1ec5
Copy link
Author

1ec5 commented Jan 8, 2025

Note that if version or user are empty (which is often the case for OSM dumps)

Right, in particular, the public extracts from Geofabrik omit this information, due to their implementation of GDPR compliance. They have a separate set of extracts containing this information that’s restricted to OSM users and subject to a click-through confidentiality agreement. The official first-party OSM planets don’t have these restrictions, but I’m unfamiliar with the reasoning for the different approaches.

@hannahbast
Copy link
Member

For an input dataset containing the user and changeset information, this PR should add exactly 4 * (|nodes| + |ways| + |relations|) triples. So around 36 B for planet.osm.

Ok, so around 40B more triples for OSM Planet, piece of cake :-)

@hannahbast
Copy link
Member

@1ec5 Do you have any idea, why GDPR compliance is an issue here, given that the complete dataset (including the user infomation) is public?

@1ec5
Copy link
Author

1ec5 commented Jan 8, 2025

Unfortunately, this isn’t my area of expertise. The data protection laws are different where I live (in California). The OSM Wiki has some documents related to the OSMF’s compliance with GDPR, but it’s unclear to me whether the same considerations would apply to QLever.

Besides the official OSMF planet distribution, the major Overpass API instances, Sophox, and OSM by the Slice all expose this metadata based on the official planet. As far as I know, Geofabrik is the only redistributor of raw OSM data that has taken the step of gating access to the metadata behind an OSM account. I just brought it up because it’s the most popular source of extracts in OSM PBF format.

@hannahbast
Copy link
Member

@1ec5 The latest version of https://qlever.cs.uni-freiburg.de/osm-planet now has predicates for changeset, timestamp, version, and user. Note that some (very few) objects have no user info. Each of these predicates has around 10 B triples and the latest version of the whole dataset now has 250 B triples. Here is an example query.

PREFIX osm: <https://www.openstreetmap.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX osmmeta: <https://www.openstreetmap.org/meta/>
SELECT * WHERE {
  ?osm_id rdf:type osm:relation .
  ?osm_id osmmeta:changeset ?changeset .
  ?osm_id osmmeta:timestamp ?timestamp .
  ?osm_id osmmeta:version ?version .
  ?osm_id osmmeta:user ?user .
}
ORDER BY DESC(?timestamp)

https://qlever.cs.uni-freiburg.de/osm-planet/SaHQVs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants