-
-
Notifications
You must be signed in to change notification settings - Fork 217
Description
Add Generation Script for londonTubeLines.json Dataset
The londonTubeLines.json dataset, showcased in this example:
- has a complex lineage
- lacks a generation script
Given the significant community interest in geospatial visualization, maintaining reproducible geographic datasets seems to be a worthwhile priority. A script to add to the repo that can generate (or update) londonTubeLines.json from its original source, which is believed to be OpenStreetMap, would secure the dataset's long-term viability. Input from those with geospatial data expertise would be welcome.
Background and Current Status
As I understand it, the londonTubeLines.json dataset is a TopoJSON file representing selected London Underground rail lines. It appears to have been added to the repository in this commit. The dataset's description, sources, and license are currently being expanded in pull request #663.
The commit history and related documentation suggest the following lineage:
- Original Source (Likely OpenStreetMap): The data was likely originally sourced from OpenStreetMap, although a direct link could not be found.
- Intermediate Source 1 (oobrien/vis): User @oobrien appears to have processed the data into a simplified GeoJSON format,
tfl_lines.json. The file can be found in this commit of theoobrien/visrepository, which cites OpenStreetMap. This file represents a simplified view of London transport lines from the original source. - Intermediate Source 2 (gicentre/litvis): @jwoLondon documented the process of converting
tfl_lines.jsonto a TopoJSON file (similar tolondonTubeLines.json) in this tutorial. This involved filtering specific lines and mapping properties usingndjson-cliandtopojson. When I attempted to folllow the instructions (code below), I wasn't quite able to match this repo's version. Also, this code still relies on an intermediate source, not the original source.
topoJSON files are not limited to aereal units. Here, for example, we can import a file containing the geographical routes of selected London Underground tube lines. The conversion of the
tfl_lines.jsonfollows a similar pattern to the conversion of the borough boundary files, but with some minor differences:
- The file is already in unprojected geoJSON format so does not need reprojecting or conversion from a shapefile.
ndjson-catconverts the original geoJSON file to a single line necessary for further processing.- the file contains details of more rail lines than we need to map so
ndjson.filteris used with a regular expression to select data for tube and DLR lines only.- the property we will use for the id (the tube line name) is inside the first element of an array so we reference it with
[0](where there is more than one element in the array it indicates more than one named tube line shares the same physical line).ndjson-cat < tfl_lines.json \ | ndjson-split 'd.features' \ | ndjson-filter 'd.properties.lines[0].name.match("Ci.*|Di.*|No.*|Ce.*|DLR|Ha.*|Ba.*|Ju.*|Me.*|Pi.*|Vi.*|Wa.*")' \ | ndjson-map 'd.id = d.properties.lines[0].name,delete d.properties,d' \ | geo2topo -n -q 1e4 line="-" \ > londonTubeLines.json
An initial attempt was made to create a generation script using @oobrien 's tfl_lines.json as a starting point. The script involved using ndjson-cli, topojson, and d3-geo-centroid, but the output did not perfectly match the existing londonTubeLines.json in vega-datasets.
1. Setup Commands
npm install -g shapefile ndjson-cli topojson d3-geo-centroid
apt-get install gdal-bin
wget https://raw.githubusercontent.com/oobrien/vis/master/tubecreature/data/tfl_lines.json
ndjson-cat tfl_lines.json \
| ndjson-split 'd.features' \
| ndjson-filter 'd.properties.lines.some((l) => l.name == "DLR" || l.name == "Bakerloo" || l.name == "District" || l.name == "Piccadilly" || l.name == "Northern" || l.name == "Hammersmith & City" || l.name == "Jubilee" || l.name == "Circle" || l.name == "Waterloo & City" || l.name == "Victoria" || l.name == "Metropolitan" || l.name == "Central") && !d.properties.lines.some((l) => l.name == "London Overground")' \
| ndjson-map 'd.id = d.properties.lines[0].name + (d.id ? "_" + d.id : ""), d' \
> tfl_lines_filtered.ndjson
geo2topo -n -q 1e4 line=tfl_lines_filtered.ndjson > londonTubeLines.json