Skip to content

Index and return entrance coordinates for places #3807

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

emlove
Copy link

@emlove emlove commented Aug 5, 2025

Description

This PR adds logic to record the main entrance location for places. I'd love to get some feedback on the approach from maintainers on the approach here.

In this implementation a new Array column entrance_osm_ids is added to the placex table. The entrance metadata is saved into the place table, and the details are returned for each entrance node.

Fixes #536

Example output:

[
    {
        "place_id": 534548,
        "licence": "Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright",
        "osm_type": "way",
        "osm_id": 108240964,
        "lat": "41.8838293",
        "lon": "-87.6319547",
        "category": "amenity",
        "type": "townhall",
        "place_rank": 30,
        "importance": 9.99999999995449e-06,
        "addresstype": "amenity",
        "name": "Chicago City Hall",
        "display_name": "Chicago City Hall, 121, Pedway, Loop, Chicago, Cook County, 60602, United States",
        "entrances": [
            {
                "lat": 41.883881,
                "lon": -87.6322362,
                "type": "yes",
                "osm_node_id": 2391052802
            },
            {
                "lat": 41.8843544,
                "lon": -87.6317131,
                "type": "yes",
                "osm_node_id": 2391052808
            },
            {
                "lat": 41.8836391,
                "lon": -87.632232,
                "type": "yes",
                "osm_node_id": 2391057836
            },
            {
                "lat": 41.8833509,
                "lon": -87.6317131,
                "type": "yes",
                "osm_node_id": 10922694346
            }
        ],
        "boundingbox": [
            "41.8833431",
            "41.8843544",
            "-87.6322445",
            "-87.6316693"
        ]
    }
]

TODO:

  • Make entrances data opt-in
  • Implement other format outputs
  • Migration
  • Tests

@emlove emlove changed the title Index and return entrance coordinates for indexed locations Index and return entrance coordinates for places Aug 5, 2025
Copy link
Member

@lonvia lonvia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for giving that a try. That's a lot less code than I expected and looks quite feasible to do.

So there are two major points that need some discussion here. One is the technical question where we get the entrance data from, see comment. The other one is about the design of the output.

OSM objects can have multiple entrances. Right now you are choosing one more or less at random. Nominatim could get some heuristic to cleverly chose the main entrance but the discussion in #2833 and #536 was already going in the direction that users might want to chose themselves. So we'd probably want a multivalue field. And in the output instaed of just returning lat/lon, have entrances be an array of objects with an extendable set of properties and a coordinate.

Technically speaking, we could just make the entrance column of JSONB type and then save all the entrance data. But if we go for the "clean solution" for finding entrances and have a planet_osm_entrance table, then entrances can be an array of OSM node IDs, which we just join against that entrance table.

@emlove
Copy link
Author

emlove commented Aug 6, 2025

Thanks for the feedback! I'll give it another shot with some of these approaches.

@emlove
Copy link
Author

emlove commented Aug 7, 2025

Pushed up a re-implementation! I updated the PR description.

Copy link
Member

@lonvia lonvia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general approach looks good now. Lets go with that and see if it scales to planet-size.

A good next step would be to add a BDD test, which makes sure that the whole import pipeline and the search work together. If our test DB has the data, then a simple API test will do.

'lat', sa.func.ST_Y(entrance_place.c.geometry),
'lon', sa.func.ST_X(entrance_place.c.geometry),
)
)).select_from(entrance_place).filter(sa.any_(t.c.entrance_osm_ids) == entrance_place.c.osm_id).label('entrances'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have been more explicit: the place table may also disappear, when the database is frozen. That means in theory you would have to copy all the data over to placex. However, thinking more long term here: we eventually want to have a separate entrance source table and we wouldn't want to have to change the format of the placex table again.

So lets go with this solution for now and simply disable returning entrances for frozen database. Two things we should do for that:

  • make querying of entrances here conditional to the existence of the place table. Please use get_cached_value() to make sure that the existence of the table is only queried once.
  • add a warning to the documentation that entrances won't work when the database is frozen

SELECT array_agg(osm_id)
FROM (SELECT osm_id, class FROM place WHERE osm_id = ANY(node_ids))
WHERE class IN ('routing:entrance', 'entrance')
INTO node_ids;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use a join to combine the two selects here.

This second part is going to need a special partial index on place's osm_ids.

@@ -157,6 +157,7 @@ CREATE TABLE placex (
country_code varchar(2),
housenumber TEXT,
postcode TEXT,
entrance_osm_ids BIGINT[] NOT NULL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOT NULL is not ideal here. We'd want to use NULL to indicate something has no entrances because it takes significantly less space than an empty array.

@lonvia
Copy link
Member

lonvia commented Aug 11, 2025

One thing to add to the TODO list: the place table will eventually need a migration that adds the new column.

@emlove emlove force-pushed the return-entrance-location branch from 7748402 to 1e8723c Compare August 11, 2025 22:31
emlove added 6 commits August 11, 2025 19:42
Now that this lookup is indexed, it is much more performant. And testing
an unrestricted import of just my US state found 22 potential classes. I
suspect it'll be less maintenance to skip this filter.

office
natural
craft
emergency
information
healthcare
leisure
highway
aeroway
landuse
historic
military
building
waterway
club
boundary
railway
place
man_made
shop
tourism
amenity
osm_node_id BIGINT NOT NULL,
type TEXT NOT NULL,
geometry GEOMETRY(Point, 4326) NOT NULL
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a stupid idea but if you structure the table like this instead:

   place_id BIGINT NOT NULL,
   entrance_info JSONB

and then store one line per place with the entrance information already processed in the way you need it in the API calls below, you might be able to avoid all the complications around json-compatibility for sqlite.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's not a bad idea. We're not querying against the location data here anyway, so it might as well just live in a JSONB column. And yeah, this sqlite compatibility has grown into a mess.

@lonvia
Copy link
Member

lonvia commented Aug 14, 2025

One more thought: if you consider the entrance data as an optional extra information similar to address details, then the user can choose if they need the extra information and you can greatly simply the code by implementing the query for the entrances only once in add_result_details.

@emlove
Copy link
Author

emlove commented Aug 14, 2025

Alright, this is starting to look respectable!

@lonvia
Copy link
Member

lonvia commented Aug 15, 2025

Can you rebase on latest master? There are unfortunately some conflicts and I can't start the CI.

@emlove emlove marked this pull request as ready for review August 15, 2025 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

return entrance point for large places like airports
2 participants