-
-
Notifications
You must be signed in to change notification settings - Fork 766
Index and return entrance coordinates for places #3807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for giving that a try. That's a lot less code than I expected and looks quite feasible to do.
So there are two major points that need some discussion here. One is the technical question where we get the entrance data from, see comment. The other one is about the design of the output.
OSM objects can have multiple entrances. Right now you are choosing one more or less at random. Nominatim could get some heuristic to cleverly chose the main entrance but the discussion in #2833 and #536 was already going in the direction that users might want to chose themselves. So we'd probably want a multivalue field. And in the output instaed of just returning lat/lon, have entrances be an array of objects with an extendable set of properties and a coordinate.
Technically speaking, we could just make the entrance column of JSONB type and then save all the entrance data. But if we go for the "clean solution" for finding entrances and have a planet_osm_entrance table, then entrances can be an array of OSM node IDs, which we just join against that entrance table.
Thanks for the feedback! I'll give it another shot with some of these approaches. |
Pushed up a re-implementation! I updated the PR description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general approach looks good now. Lets go with that and see if it scales to planet-size.
A good next step would be to add a BDD test, which makes sure that the whole import pipeline and the search work together. If our test DB has the data, then a simple API test will do.
'lat', sa.func.ST_Y(entrance_place.c.geometry), | ||
'lon', sa.func.ST_X(entrance_place.c.geometry), | ||
) | ||
)).select_from(entrance_place).filter(sa.any_(t.c.entrance_osm_ids) == entrance_place.c.osm_id).label('entrances')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have been more explicit: the place
table may also disappear, when the database is frozen. That means in theory you would have to copy all the data over to placex
. However, thinking more long term here: we eventually want to have a separate entrance source table and we wouldn't want to have to change the format of the placex table again.
So lets go with this solution for now and simply disable returning entrances for frozen database. Two things we should do for that:
- make querying of entrances here conditional to the existence of the place table. Please use
get_cached_value()
to make sure that the existence of the table is only queried once. - add a warning to the documentation that entrances won't work when the database is frozen
lib-sql/functions/utils.sql
Outdated
SELECT array_agg(osm_id) | ||
FROM (SELECT osm_id, class FROM place WHERE osm_id = ANY(node_ids)) | ||
WHERE class IN ('routing:entrance', 'entrance') | ||
INTO node_ids; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use a join to combine the two selects here.
This second part is going to need a special partial index on place's osm_ids.
lib-sql/tables.sql
Outdated
@@ -157,6 +157,7 @@ CREATE TABLE placex ( | |||
country_code varchar(2), | |||
housenumber TEXT, | |||
postcode TEXT, | |||
entrance_osm_ids BIGINT[] NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOT NULL
is not ideal here. We'd want to use NULL
to indicate something has no entrances because it takes significantly less space than an empty array.
One thing to add to the TODO list: the place table will eventually need a migration that adds the new column. |
7748402
to
1e8723c
Compare
Now that this lookup is indexed, it is much more performant. And testing an unrestricted import of just my US state found 22 potential classes. I suspect it'll be less maintenance to skip this filter. office natural craft emergency information healthcare leisure highway aeroway landuse historic military building waterway club boundary railway place man_made shop tourism amenity
osm_node_id BIGINT NOT NULL, | ||
type TEXT NOT NULL, | ||
geometry GEOMETRY(Point, 4326) NOT NULL | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be a stupid idea but if you structure the table like this instead:
place_id BIGINT NOT NULL,
entrance_info JSONB
and then store one line per place with the entrance information already processed in the way you need it in the API calls below, you might be able to avoid all the complications around json-compatibility for sqlite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's not a bad idea. We're not querying against the location data here anyway, so it might as well just live in a JSONB column. And yeah, this sqlite compatibility has grown into a mess.
One more thought: if you consider the entrance data as an optional extra information similar to address details, then the user can choose if they need the extra information and you can greatly simply the code by implementing the query for the entrances only once in add_result_details. |
Alright, this is starting to look respectable! |
Can you rebase on latest master? There are unfortunately some conflicts and I can't start the CI. |
Description
This PR adds logic to record the main entrance location for places. I'd love to get some feedback on the approach from maintainers on the approach here.
In this implementation a new Array column
entrance_osm_ids
is added to the placex table. The entrance metadata is saved into the place table, and the details are returned for each entrance node.Fixes #536
Example output:
TODO: