Updating our network analysis libraries? #555
Replies: 2 comments
-
@carlhiggs thanks for summarizing this — very useful! I still haven't dug into the network-based measures. Here are my preliminary (and maybe naive!) thoughts:
So it depends on which functionalities we need: for indicators that require combinations of network metrics, e.g., based on connectivity, we can use OSMnx/NetworkX/Rustworkx. For indicators that need precise estimation of shortest paths and support for multimodality — if we don’t want to implement that ourselves on top of our network analysis library of choice — R5py could be a valuable option (I don’t know of alternative efficient pure-Python libraries that do that). Collecting the requirements could also be useful to drive the discussion. Happy to take a more in-depth look at Rustworkx and R5py — this area is really interesting to explore! Like Carl, I’m curious to hear if others know of alternative approaches or libraries we could consider. |
Beta Was this translation helpful? Give feedback.
-
I will just trim in that R5 routing engine would be the way to go for multi-modal accessibility analysis, it has been very widely used and well developed for transportation planning application. R5py is also very user friendly, I have met Willem in person a couple times in conferences, I believe he is still actively working on the project. The only issue that my colleague and I encounter was its inability for parallel processing (r5py/r5py#387), but the only impact would be the efficiency issue when processing very large network data. r5r works better for parallel processing in this case. Also agree with @rschifan that Java dependency would be a concern here, but could be overcome. Very recently, I learned about this new routing engine valhalla, and its python application routingpy, this was used to build the Canada nationwide spatial access measure. This is as much I know and not sure how it was compared to other routing engine, but just offer another option to consider. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Related to our planned re-build (#534), our attention was recently drawn to new options for network analysis libraries (in #549).
Currently, we use a bit of a mix of PostgreSQL pgRouting extension (for establishing relationships between nodes to pre-compute local walkable neighbourhood catchments), Pandana (also for pre-computation of distance to amenities), Networkx (for shortest path analysis, additional to the pre-computation of neighbourhood catchments), and some additional geometric analysis to help with pre-computing 'full distances'.
The below is intended to document some of our current operations of network analysis, and provide a place where we can continue a discussion on this.
Examples of current network analysis library usage
Here are examples of where we use each of these approaches to network analysis, which we should look to simplifying/consolidating our approach in our future rebuild:
pgRouting
global-indicators/process/subprocesses/_03_create_network_resources.py
Lines 250 to 261 in 2a79b42
Pandana
global-indicators/process/subprocesses/_11_neighbourhood_analysis.py
Lines 144 to 162 in 2a79b42
networkX
global-indicators/process/subprocesses/_11_neighbourhood_analysis.py
Lines 79 to 103 in 2a79b42
Alternatives?
I think it could be worth us exploring how it might perform for our use cases, particularly if we re-considered our use case from scratch, with modern libraries. For our current libraries, pgRouting requires dependency on a PostgreSQL database; Pandana is no longer being developed and its age has started to cause dependency conflicts; networkx is flexible, but not optimised for performance with city-scale networks (hence our use of other techniques).
I don't think we should aim to replicate the operations we currently do them, so much as reconsider what we are trying to achieve in our two distinct uses of network analysis: 1) deriving shortest path estimates, and 2) buffered subgraphs within distance of an origin node.
The latter are used to approximate a 50 metre buffered walkable network (along the lines of Forsyth et al's sausage buffer technique for measuring food and physical activity built environments). The buffer operation is computationally expensive, and to avoid this ,we pre-associate each node traversed with the population grid segment within which it is located. We take the collection of 100m (or whatever size population grid/areas is/are being used) intersecting the walkable path as the buffered local walkable catchment area for any particular sampling location (or residential address proxy).
DuckDB extension(s)?
We are in the process of migrating our approach from using PostgreSQL to using DuckDB, so a first thought is --- is there an equivalent to the PostgreSQL pgRouting extension for DuckDB? I think the answer is, no, not yet.
There is a research group actively working to develop routing algorithms though, in the extension DuckPQG, as per the following masters theses and links:
https://homepages.cwi.nl/~boncz/msc/2024-PinganRen.pdf
https://homepages.cwi.nl/~boncz/msc/2022-DanielTenWolde.pdf
https://duckdb.org/community_extensions/extensions/duckpgq.html
https://github.com/cwida/duckpgq-extension
But as the above links note, this is a work in progress.
rustworkx
This is not a drop in replacement for networkx, but may provide speed improvements from 3x to 100x networkx performance.
Treinish et al., (2022). rustworkx: A High-Performance Graph Library for Python. Journal of Open Source Software, 7(79), 3968, https://doi.org/10.21105/joss.03968
https://www.rustworkx.org/
https://www.rustworkx.org/networkx.html
https://github.com/Qiskit/rustworkx
It would be interesting to explore whether and how our current operations could be adapted/improved using rustworkx.
r5py
Shirley and I have previously discussed Conveyal's r5 (Rapid Realistic Routing with Real-world and Re-imagined networks), now that we are starting to consider multi-modal analysis, e.g. walking, cycling and public transport.
There is now a Python implementation, r5py 'Rapid Realistic Routing with R5 in Python', the lead developer of which appears to be Henrikki Tenkanen who developed Pandana (and who @VuokkoH knows, I believe), but also Willem Klumpenhouwer whose GTFS-lite package we currently use (and which i contributed to, to add functionality for us):
Fink, C., Klumpenhouwer, W., Saraiva, M., Pereira, R., & Tenkanen, H., 2022: r5py: Rapid Realistic Routing with R5 in Python. DOI:10.5281/zenodo.7060437
https://r5py.readthedocs.io/stable/
https://github.com/r5py/r5py
There could be co-benefits for using r5py, as this presumably could handle and extend our public transport analysis using GTFS feeds, and we could leverage its features perhaps for more advanced scenario multimodal scenario modelling.
Other options? No doubt many! But its probably a good time to consider these in a discussion as we start sketching out our rebuild, planning for good trade-offs in the goal for simplified/consolidated/optimised/robust dependencies.
Love to hear your thoughts @healthysustainablecities/ghsci-software-working-group
Beta Was this translation helpful? Give feedback.
All reactions