๐ See blog post
This repo tracks the status of bike stations from various bike-sharing providers. The data is fetched every 15 minutes. The results are stored and versioned as GeoJSON files. This is done using the git scraping technique.
The weather forecast for the next 24 hours is also collected every 15 minutes, for each city.
Everyone is welcome to add new cities. You simply have to contribute the necessary details to scripts/systems.py
, and then send out a pull request.
The git history contains the state of each station and weather at several points in time. This git history can be turned into Parquet files for easy consumption. This is done by archive.py
script. The latter generates Parquet files. These files are stored in a GCP bucket, here.
An easy way to query these files is to use DuckDB. The following Python snippet shows how to fetch the all bike station updates for the city of Toulouse:
import duckdb
with duckdb.connect(":memory:") as con:
con.execute("SET s3_endpoint='storage.googleapis.com'")
updates = con.execute(f"""
SELECT *
FROM READ_PARQUET('s3://bike-sharing-history/toulouse/jcdecaux/*/*.parquet');
""").fetch_df()
And here's a snippet to fetch the 24 hour weather forecast at different points in time for the city of Toulouse:
with duckdb.connect(":memory:") as con:
con.execute("SET s3_endpoint='storage.googleapis.com'")
weather = con.execute(f"""
SELECT *
FROM READ_PARQUET('s3://weather-forecast-history/toulouse/*/*.parquet');
""").fetch_df()
If these exports are not adapted to your needs, feel welcome to reach out. The exports can be easily adapted to different needs, because the source of truth is the git history.