Skip to content

Commit 4c70ea4

Browse files
committed
Updates for the 1.0.0-beta.1 release
1 parent 592fb0a commit 4c70ea4

File tree

5 files changed

+41
-63
lines changed

5 files changed

+41
-63
lines changed

.github/workflows/release.yml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
name: Release
2+
3+
on:
4+
push:
5+
tags:
6+
- 'v*.*.*'
7+
8+
jobs:
9+
release:
10+
runs-on: ubuntu-latest
11+
steps:
12+
- uses: actions/checkout@v3
13+
- name: Draft Release
14+
uses: softprops/action-gh-release@v1
15+
with:
16+
draft: true
17+
generate_release_notes: true
18+
files: |
19+
format-specs/geoparquet.md
20+
format-specs/schema.json

examples/example.parquet

3 Bytes
Binary file not shown.

examples/example_metadata.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,6 @@
115115
}
116116
},
117117
"primary_column": "geometry",
118-
"version": "0.5.0-dev"
118+
"version": "1.0.0-beta.1"
119119
}
120120
}

format-specs/geoparquet.md

Lines changed: 19 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,8 @@
1-
# Geospatial Parquet format
1+
# GeoParquet Specification
22

33
## Overview
44

5-
The [Apache Parquet](https://parquet.apache.org/) provides a standardized open-source columnar storage format. This specification defines how geospatial data
6-
should be stored in parquet format, including the representation of geometries and the required additional metadata.
5+
The [Apache Parquet](https://parquet.apache.org/) provides a standardized open-source columnar storage format. The GeoParquet specification defines how geospatial data should be stored in parquet format, including the representation of geometries and the required additional metadata.
76

87
**Additional resources:**
98
* [Examples](../examples/)
@@ -13,7 +12,7 @@ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "S
1312

1413
## Version
1514

16-
This is version 0.5.0-dev of the GeoParquet specification.
15+
This is version 1.0.0-beta.1 of the GeoParquet specification.
1716

1817
## Geometry columns
1918

@@ -65,58 +64,37 @@ Each geometry column in the dataset MUST be included in the `columns` field abov
6564

6665
The Coordinate Reference System (CRS) is an optional parameter for each geometry column defined in GeoParquet format.
6766

68-
The CRS MUST be provided in
69-
[PROJJSON](https://proj.org/specifications/projjson.html) format, which is a JSON encoding of
70-
[WKT2:2019 / ISO-19162:2019](https://docs.opengeospatial.org/is/18-010r7/18-010r7.html),
71-
which itself implements the model of
72-
[OGC Topic 2: Referencing by coordinates abstract specification / ISO-19111:2019](http://docs.opengeospatial.org/as/18-005r4/18-005r4.html).
73-
Apart from the difference of encodings, the semantics are intended to match
74-
WKT2:2019, and a CRS in one encoding can generally be represented in the other.
67+
The CRS MUST be provided in [PROJJSON](https://proj.org/specifications/projjson.html) format, which is a JSON encoding of [WKT2:2019 / ISO-19162:2019](https://docs.opengeospatial.org/is/18-010r7/18-010r7.html), which itself implements the model of [OGC Topic 2: Referencing by coordinates abstract specification / ISO-19111:2019](http://docs.opengeospatial.org/as/18-005r4/18-005r4.html). Apart from the difference of encodings, the semantics are intended to match WKT2:2019, and a CRS in one encoding can generally be represented in the other.
7568

76-
If CRS is not provided, all coordinates in the geometries MUST use longitude, latitude based on the WGS84 datum,
77-
and the default value is [OGC:CRS84](https://www.opengis.net/def/crs/OGC/1.3/CRS84) for CRS-aware implementations.
69+
If CRS is not provided, all coordinates in the geometries MUST use longitude, latitude based on the WGS84 datum, and the default value is [OGC:CRS84](https://www.opengis.net/def/crs/OGC/1.3/CRS84) for CRS-aware implementations.
7870

7971
[OGC:CRS84](https://www.opengis.net/def/crs/OGC/1.3/CRS84) is equivalent to the well-known [EPSG:4326](https://epsg.org/crs_4326/WGS-84.html) but changes the axis from latitude-longitude to longitude-latitude.
8072

81-
Due to the large number of CRSes available and the difficulty of implementing all of them, we expect that a number of implementations will start without support for the optional `crs` field.
82-
Users are recommended to store their data in longitude, latitude (OGC:CRS84 or not including the `crs` field) for it to work with the widest number of tools. Data that are more appropriately represented in particular projections may use an alternate coordinate reference system. We expect many tools will support alternate CRSes, but encourage users to check to ensure their chosen tool supports their chosen CRS.
73+
Due to the large number of CRSes available and the difficulty of implementing all of them, we expect that a number of implementations will start without support for the optional `crs` field. Users are recommended to store their data in longitude, latitude (OGC:CRS84 or not including the `crs` field) for it to work with the widest number of tools. Data that are more appropriately represented in particular projections may use an alternate coordinate reference system. We expect many tools will support alternate CRSes, but encourage users to check to ensure their chosen tool supports their chosen CRS.
8374

8475
See below for additional details about representing or identifying OGC:CRS84.
8576

86-
The value of this key may be explicitly set to `null` to indicate that there is no CRS assigned
87-
to this column (CRS is undefined or unknown).
77+
The value of this key may be explicitly set to `null` to indicate that there is no CRS assigned to this column (CRS is undefined or unknown).
8878

8979
#### epoch
9080

91-
In a dynamic CRS, coordinates of a point on the surface of the Earth may
92-
change with time. To be unambiguous, the coordinates must always be qualified
93-
with the epoch at which they are valid.
81+
In a dynamic CRS, coordinates of a point on the surface of the Earth may change with time. To be unambiguous, the coordinates must always be qualified with the epoch at which they are valid.
9482

95-
The optional `epoch` field allows to specify this in case the `crs` field
96-
defines a a dynamic CRS. The coordinate epoch is expressed as a decimal year
97-
(e.g. `2021.47`). Currently, this specification only supports an epoch per
98-
column (and not per geometry).
83+
The optional `epoch` field allows to specify this in case the `crs` field defines a a dynamic CRS. The coordinate epoch is expressed as a decimal year (e.g. `2021.47`). Currently, this specification only supports an epoch per column (and not per geometry).
9984

10085
#### encoding
10186

102-
This is the binary format that the geometry is encoded in.
103-
The string `"WKB"`, signifying Well Known Binary is the only current option, but future versions
104-
of the spec may support alternative encodings. This SHOULD be the ["OpenGIS® Implementation Specification for Geographic information - Simple feature access - Part 1: Common architecture"](https://portal.ogc.org/files/?artifact_id=18241) WKB representation (using codes for 3D geometry types in the \[1001,1007\] range). This encoding is also consistent with the one defined in the ["ISO/IEC 13249-3:2016 (Information technology - Database languages - SQL multimedia and application packages - Part 3: Spatial)"](https://www.iso.org/standard/60343.html) standard.
87+
This is the binary format that the geometry is encoded in. The string `"WKB"`, signifying Well Known Binary is the only current option, but future versions of the spec may support alternative encodings. This SHOULD be the ["OpenGIS® Implementation Specification for Geographic information - Simple feature access - Part 1: Common architecture"](https://portal.ogc.org/files/?artifact_id=18241) WKB representation (using codes for 3D geometry types in the \[1001,1007\] range). This encoding is also consistent with the one defined in the ["ISO/IEC 13249-3:2016 (Information technology - Database languages - SQL multimedia and application packages - Part 3: Spatial)"](https://www.iso.org/standard/60343.html) standard.
10588

10689
Note that the current version of the spec only allows for a subset of WKB: 2D or 3D geometries of the standard geometry types (the Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection geometry types). This means that M values or non-linear geometry types are not yet supported.
10790

10891
#### Coordinate axis order
10992

110-
The axis order of the coordinates in WKB stored in a GeoParquet follows the de facto standard for axis order in WKB and is therefore always
111-
(x, y) where x is easting or longitude and y is northing or latitude. This ordering explicitly overrides the axis order as specified in the CRS.
112-
This follows the precedent of [GeoPackage](https://geopackage.org), see the [note in their spec](https://www.geopackage.org/spec130/#gpb_spec).
93+
The axis order of the coordinates in WKB stored in a GeoParquet follows the de facto standard for axis order in WKB and is therefore always (x, y) where x is easting or longitude and y is northing or latitude. This ordering explicitly overrides the axis order as specified in the CRS. This follows the precedent of [GeoPackage](https://geopackage.org), see the [note in their spec](https://www.geopackage.org/spec130/#gpb_spec).
11394

11495
#### geometry_types
11596

116-
This field captures the geometry types of the geometries in the
117-
column, when known. Accepted geometry types are: `"Point"`, `"LineString"`,
118-
`"Polygon"`, `"MultiPoint"`, `"MultiLineString"`, `"MultiPolygon"`,
119-
`"GeometryCollection"`.
97+
This field captures the geometry types of the geometries in the column, when known. Accepted geometry types are: `"Point"`, `"LineString"`, `"Polygon"`, `"MultiPoint"`, `"MultiLineString"`, `"MultiPolygon"`, `"GeometryCollection"`.
12098

12199
In addition, the following rules are used:
122100

@@ -125,11 +103,7 @@ In addition, the following rules are used:
125103
- An empty array explicitly signals that the geometry types are not known.
126104
- The geometry types in the list must be unique (e.g. `["Point", "Point"]` is not valid).
127105

128-
It is expected that this field is strictly correct. For
129-
example, if having both polygons and multipolygons, it is not sufficient to
130-
specify `["MultiPolygon"]`, but it is expected to specify
131-
`["Polygon", "MultiPolygon"]`. Or if having 3D points, it is not sufficient to
132-
specify `["Point"]`, but it is expected to list `["Point Z"]`.
106+
It is expected that this field is strictly correct. For example, if having both polygons and multipolygons, it is not sufficient to specify `["MultiPolygon"]`, but it is expected to specify `["Polygon", "MultiPolygon"]`. Or if having 3D points, it is not sufficient to specify `["Point"]`, but it is expected to list `["Point Z"]`.
133107

134108
#### orientation
135109

@@ -149,25 +123,15 @@ This attribute indicates how to interpret the edges of the geometries: whether t
149123

150124
If no value is set, the default value to assume is `"planar"`.
151125

152-
Note if `edges` is `"spherical"` then it is RECOMMENDED that `orientation` is always ensured to be `"counterclockwise"`. If it is not set, it is not clear how polygons should be interpreted within spherical coordinate systems, which can lead to major analytical errors if interpreted incorrectly.
153-
In this case, software will typically interpret the rings of a polygon such that it encloses at most half of the sphere (i.e. the smallest polygon of both ways it could be interpreted). But the specification itself does not make any guarantee about this.
126+
Note if `edges` is `"spherical"` then it is RECOMMENDED that `orientation` is always ensured to be `"counterclockwise"`. If it is not set, it is not clear how polygons should be interpreted within spherical coordinate systems, which can lead to major analytical errors if interpreted incorrectly. In this case, software will typically interpret the rings of a polygon such that it encloses at most half of the sphere (i.e. the smallest polygon of both ways it could be interpreted). But the specification itself does not make any guarantee about this.
154127

155128
#### bbox
156129

157-
Bounding boxes are used to help define the spatial extent of each geometry column.
158-
Implementations of this schema may choose to use those bounding boxes to filter
159-
partitions (files) of a partitioned dataset.
130+
Bounding boxes are used to help define the spatial extent of each geometry column. Implementations of this schema may choose to use those bounding boxes to filter partitions (files) of a partitioned dataset.
160131

161-
The bbox, if specified, MUST be encoded with an array representing the range of values for each dimension in the
162-
geometry coordinates. For geometries in a geographic coordinate reference system, longitude and latitude values are
163-
listed for the most southwesterly coordinate followed by values for the most northeasterly coordinate. This follows the
164-
GeoJSON specification ([RFC 7946, section 5](https://tools.ietf.org/html/rfc7946#section-5)), which also describes how
165-
to represent the bbox for a set of geometries that cross the antimeridian.
132+
The bbox, if specified, MUST be encoded with an array representing the range of values for each dimension in the geometry coordinates. For geometries in a geographic coordinate reference system, longitude and latitude values are listed for the most southwesterly coordinate followed by values for the most northeasterly coordinate. This follows the GeoJSON specification ([RFC 7946, section 5](https://tools.ietf.org/html/rfc7946#section-5)), which also describes how to represent the bbox for a set of geometries that cross the antimeridian.
166133

167-
For non-geographic coordinate reference systems, the items in the bbox are minimum values for each dimension followed by
168-
maximum values for each dimension. For example, given geometries that have coordinates with two dimensions, the bbox
169-
would have the form `[<xmin>, <ymin>, <xmax>, <ymax>]`. For three dimensions, the bbox would have the form
170-
`[<xmin>, <ymin>, <zmin>, <xmax>, <ymax>, <zmax>]`.
134+
For non-geographic coordinate reference systems, the items in the bbox are minimum values for each dimension followed by maximum values for each dimension. For example, given geometries that have coordinates with two dimensions, the bbox would have the form `[<xmin>, <ymin>, <xmax>, <ymax>]`. For three dimensions, the bbox would have the form `[<xmin>, <ymin>, <zmin>, <xmax>, <ymax>, <zmax>]`.
171135

172136
The bbox values are in the same coordinate reference system as the geometry.
173137

@@ -219,19 +183,13 @@ The PROJJSON object for OGC:CRS84 is:
219183
}
220184
```
221185

222-
For implementations that operate entirely with longitude, latitude coordinates
223-
and are not CRS-aware or do not have easy access to CRS-aware libraries that can
224-
fully parse PROJJSON, it may be possible to infer that coordinates conform to
225-
the OGC:CRS84 CRS based on elements of the `crs` field. For simplicity, Javascript
226-
object dot notation is used to refer to nested elements.
186+
For implementations that operate entirely with longitude, latitude coordinates and are not CRS-aware or do not have easy access to CRS-aware libraries that can fully parse PROJJSON, it may be possible to infer that coordinates conform to the OGC:CRS84 CRS based on elements of the `crs` field. For simplicity, Javascript object dot notation is used to refer to nested elements.
227187

228188
The CRS is likely equivalent to OGC:CRS84 for a GeoParquet file if the `id` element is present:
229189

230190
* `id.authority` = `"OGC"` and `id.code` = `"CRS84"`
231191
* `id.authority` = `"EPSG"` and `id.code` = `4326` (due to longitude, latitude ordering in this specification)
232192

233-
It is reasonable for implementations to require that one of the above `id`
234-
elements are present and skip further tests to determine if the CRS is
235-
functionally equivalent with OGC:CRS84.
193+
It is reasonable for implementations to require that one of the above `id` elements are present and skip further tests to determine if the CRS is functionally equivalent with OGC:CRS84.
236194

237195
Note: EPSG:4326 and OGC:CRS84 are equivalent with respect to this specification because this specification specifically overrides the coordinate axis order in the `crs` to be longitude-latitude.

format-specs/schema.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
"properties": {
88
"version": {
99
"type": "string",
10-
"const": "0.5.0-dev"
10+
"const": "1.0.0-beta.1"
1111
},
1212
"primary_column": {
1313
"type": "string",

0 commit comments

Comments
 (0)