Skip to content

Commit 592fb0a

Browse files
authored
Read version number from the schema (#159)
* Read version number from the schema * Update lockfile after poetry update * Generate example before extracting schema * Allow for differences in the example.parquet file * Move test script to test directory
1 parent 84ae2d9 commit 592fb0a

File tree

11 files changed

+416
-845
lines changed

11 files changed

+416
-845
lines changed

.github/workflows/scripts.yml

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -29,12 +29,6 @@ jobs:
2929
geoparquet_validator $example || exit 1;
3030
done
3131
32-
- name: Test json schema
33-
run: |
34-
python -m pip install pytest
35-
cd tests
36-
pytest test_json_schema.py -v
37-
3832
test-json-metadata:
3933
runs-on: ubuntu-latest
4034
steps:
@@ -56,8 +50,12 @@ jobs:
5650
- name: Run scripts
5751
run: |
5852
cd scripts
53+
poetry run pytest test_json_schema.py -v
54+
poetry run python generate_example.py
5955
poetry run python update_example_schemas.py
6056
cd ../examples
61-
# Assert no changes in the git repo, aka that the json version of the
62-
# schemas are up to date
57+
# Assert that the version number and file metadata are up to date
58+
# Allow for differences in example.parquet
59+
git restore example.parquet
60+
git diff
6361
test -z "$(git status --porcelain)"

.gitignore

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,2 @@
1-
# Ignore GeoPackage file used in conversion to GeoParquet
2-
*.gpkg*
3-
tests/data/*
1+
/scripts/data/
2+
/scripts/__pycache__/

examples/environment.yml

Lines changed: 0 additions & 8 deletions
This file was deleted.

examples/example.parquet

0 Bytes
Binary file not shown.

format-specs/geoparquet.md

Lines changed: 2 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -41,23 +41,12 @@ A GeoParquet file MUST include a `geo` key in the Parquet metadata (see [`FileMe
4141

4242
| Field Name | Type | Description |
4343
| ------------------ | ------ | -------------------------------------------------------------------- |
44-
| version | string | **REQUIRED.** The version of the GeoParquet metadata standard used when writing. |
45-
| primary_column | string | **REQUIRED.** The name of the "primary" geometry column. |
44+
| version | string | **REQUIRED.** The version identifier for the GeoParquet specification. |
45+
| primary_column | string | **REQUIRED.** The name of the "primary" geometry column. In cases where a GeoParquet file contains multiple geometry columns, the primary geometry may be used by default in geospatial operations. |
4646
| columns | object\<string, [Column Metadata](#column-metadata)> | **REQUIRED.** Metadata about geometry columns. Each key is the name of a geometry column in the table. |
4747

4848
At this level, additional implementation-specific fields (e.g. library name) MAY be present, and readers should be robust in ignoring those.
4949

50-
### Additional file metadata information
51-
52-
#### primary_column
53-
54-
This indicates the "primary" or "active" geometry for systems that can store multiple geometries,
55-
but have a default geometry used for geospatial operations.
56-
57-
#### version
58-
59-
Version of the GeoParquet spec used, currently 0.5.0-dev
60-
6150
### Column metadata
6251

6352
Each geometry column in the dataset MUST be included in the `columns` field above with the following content, keyed by the column name:

scripts/README.md

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,11 +21,29 @@ poetry update
2121
To run a script, prefix it with `poetry run`. For example:
2222

2323
```
24-
poetry run python update_example_schemas.py
24+
poetry run python generate_example.py
2525
```
2626

2727
Using `poetry run` ensures that you're running the python script using _this_ local environment, not your global environment.
2828

29+
### Tests
30+
31+
To run the tests, change into the `scripts` directory and run the following:
32+
33+
```
34+
poetry run pytest test_json_schema.py -v
35+
```
36+
37+
### example.parquet
38+
39+
The `example.parquet` file in the `examples` directory is generated with the `generate_example.py` script. This script needs to be updated and run any time there are changes to the "geo" file metadata or to the version constant in `schema.json`.
40+
41+
To update the `../examples/example.parquet` file, run this from the `scripts` directory:
42+
43+
```
44+
poetry run python generate_example.py
45+
```
46+
2947
### nz-building-outlines to Parquet
3048

3149
```bash

examples/example.py renamed to scripts/generate_example.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,15 @@
2222
table = pa.Table.from_pandas(df.head().to_wkb())
2323

2424

25+
def get_version() -> str:
26+
"""Read the version const from the schema.json file"""
27+
with open(HERE / "../format-specs/schema.json") as f:
28+
spec_schema = json.load(f)
29+
return spec_schema["properties"]["version"]["const"]
30+
31+
2532
metadata = {
26-
"version": "0.5.0-dev",
33+
"version": get_version(),
2734
"primary_column": "geometry",
2835
"columns": {
2936
"geometry": {
@@ -42,4 +49,4 @@
4249
)
4350
table = table.cast(schema)
4451

45-
pq.write_table(table, HERE / "example.parquet")
52+
pq.write_table(table, HERE / "../examples/example.parquet")

0 commit comments

Comments
 (0)