Skip to content

Commit 84ae2d9

Browse files
authored
Refer to RFC 2119 for definition of requirement levels (#160)
1 parent a6c102d commit 84ae2d9

File tree

1 file changed

+15
-15
lines changed

1 file changed

+15
-15
lines changed

format-specs/geoparquet.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -9,22 +9,24 @@ should be stored in parquet format, including the representation of geometries a
99
* [Examples](../examples/)
1010
* [JSON Schema](schema.json)
1111

12+
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt).
13+
1214
## Version
1315

1416
This is version 0.5.0-dev of the GeoParquet specification.
1517

1618
## Geometry columns
1719

18-
Geometry columns are stored using the `BYTE_ARRAY` parquet type. They are encoded as [WKB](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary).
20+
Geometry columns MUST be stored using the `BYTE_ARRAY` parquet type. They MUST be encoded as [WKB](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary).
1921
See the [encoding](#encoding) section below for more details.
2022

2123
### Nesting
2224

23-
Geometry columns must be at the root of the schema. A geometry cannot be a group field or nested in a group. In practice, this means that when writing to GeoParquet from another format, geometries cannot be contained in complex or nested types such as structs, lists, arrays, or map types.
25+
Geometry columns MUST be at the root of the schema. A geometry MUST NOT be a group field or nested in a group. In practice, this means that when writing to GeoParquet from another format, geometries cannot be contained in complex or nested types such as structs, lists, arrays, or map types.
2426

2527
### Repetition
2628

27-
The repetition for all geometry columns must be "required" (exactly one) or "optional" (zero or one). A geometry column must not be repeated. A GeoParquet file may have multiple geometry columns with different names, but those geometry columns cannot be repeated.
29+
The repetition for all geometry columns MUST be "required" (exactly one) or "optional" (zero or one). A geometry column MUST NOT be repeated. A GeoParquet file MAY have multiple geometry columns with different names, but those geometry columns cannot be repeated.
2830

2931
## Metadata
3032

@@ -33,19 +35,17 @@ GeoParquet files include additional metadata at two levels:
3335
1. File metadata indicating things like the version of this specification used
3436
2. Column metadata with additional metadata for each geometry column
3537

36-
These are both stored under a `geo` key in the parquet metadata (the [`FileMetaData::key_value_metadata`](https://github.com/apache/parquet-format#metadata)) as a JSON-encoded UTF-8 string.
38+
A GeoParquet file MUST include a `geo` key in the Parquet metadata (see [`FileMetaData::key_value_metadata`](https://github.com/apache/parquet-format#metadata)). The value of this key MUST be a JSON-encoded UTF-8 string representing the file and column metadata that validates against the [GeoParquet metadata schema](schema.json). The file and column metadata fields are described below.
3739

3840
## File metadata
3941

40-
All file-level metadata should be included under the `geo` key in the parquet metadata.
41-
4242
| Field Name | Type | Description |
4343
| ------------------ | ------ | -------------------------------------------------------------------- |
4444
| version | string | **REQUIRED.** The version of the GeoParquet metadata standard used when writing. |
4545
| primary_column | string | **REQUIRED.** The name of the "primary" geometry column. |
4646
| columns | object\<string, [Column Metadata](#column-metadata)> | **REQUIRED.** Metadata about geometry columns. Each key is the name of a geometry column in the table. |
4747

48-
At this level, additional implementation-specific fields (e.g. library name) are allowed, and thus readers should be robust in ignoring those.
48+
At this level, additional implementation-specific fields (e.g. library name) MAY be present, and readers should be robust in ignoring those.
4949

5050
### Additional file metadata information
5151

@@ -60,7 +60,7 @@ Version of the GeoParquet spec used, currently 0.5.0-dev
6060

6161
### Column metadata
6262

63-
Each geometry column in the dataset must be included in the columns field above with the following content, keyed by the column name:
63+
Each geometry column in the dataset MUST be included in the `columns` field above with the following content, keyed by the column name:
6464

6565
| Field Name | Type | Description |
6666
| -------------- | ------------ | ----------- |
@@ -76,15 +76,15 @@ Each geometry column in the dataset must be included in the columns field above
7676

7777
The Coordinate Reference System (CRS) is an optional parameter for each geometry column defined in GeoParquet format.
7878

79-
The CRS must be provided in
79+
The CRS MUST be provided in
8080
[PROJJSON](https://proj.org/specifications/projjson.html) format, which is a JSON encoding of
8181
[WKT2:2019 / ISO-19162:2019](https://docs.opengeospatial.org/is/18-010r7/18-010r7.html),
8282
which itself implements the model of
8383
[OGC Topic 2: Referencing by coordinates abstract specification / ISO-19111:2019](http://docs.opengeospatial.org/as/18-005r4/18-005r4.html).
8484
Apart from the difference of encodings, the semantics are intended to match
8585
WKT2:2019, and a CRS in one encoding can generally be represented in the other.
8686

87-
If CRS is not provided, all coordinates in the geometries must use longitude, latitude based on the WGS84 datum,
87+
If CRS is not provided, all coordinates in the geometries MUST use longitude, latitude based on the WGS84 datum,
8888
and the default value is [OGC:CRS84](https://www.opengis.net/def/crs/OGC/1.3/CRS84) for CRS-aware implementations.
8989

9090
[OGC:CRS84](https://www.opengis.net/def/crs/OGC/1.3/CRS84) is equivalent to the well-known [EPSG:4326](https://epsg.org/crs_4326/WGS-84.html) but changes the axis from latitude-longitude to longitude-latitude.
@@ -112,7 +112,7 @@ column (and not per geometry).
112112

113113
This is the binary format that the geometry is encoded in.
114114
The string `"WKB"`, signifying Well Known Binary is the only current option, but future versions
115-
of the spec may support alternative encodings. This should be the ["OpenGIS® Implementation Specification for Geographic information - Simple feature access - Part 1: Common architecture"](https://portal.ogc.org/files/?artifact_id=18241) WKB representation (using codes for 3D geometry types in the \[1001,1007\] range). This encoding is also consistent with the one defined in the ["ISO/IEC 13249-3:2016 (Information technology - Database languages - SQL multimedia and application packages - Part 3: Spatial)"](https://www.iso.org/standard/60343.html) standard.
115+
of the spec may support alternative encodings. This SHOULD be the ["OpenGIS® Implementation Specification for Geographic information - Simple feature access - Part 1: Common architecture"](https://portal.ogc.org/files/?artifact_id=18241) WKB representation (using codes for 3D geometry types in the \[1001,1007\] range). This encoding is also consistent with the one defined in the ["ISO/IEC 13249-3:2016 (Information technology - Database languages - SQL multimedia and application packages - Part 3: Spatial)"](https://www.iso.org/standard/60343.html) standard.
116116

117117
Note that the current version of the spec only allows for a subset of WKB: 2D or 3D geometries of the standard geometry types (the Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection geometry types). This means that M values or non-linear geometry types are not yet supported.
118118

@@ -150,7 +150,7 @@ If no value is set, no assertions are made about winding order or consistency of
150150

151151
Writers are encouraged but not required to set `orientation="counterclockwise"` for portability of the data within the broader ecosystem.
152152

153-
It is recommended to always set the orientation (to counterclockwise) if `edges` is `"spherical"` (see below).
153+
It is RECOMMENDED to always set the orientation (to counterclockwise) if `edges` is `"spherical"` (see below).
154154

155155
#### edges
156156

@@ -160,7 +160,7 @@ This attribute indicates how to interpret the edges of the geometries: whether t
160160

161161
If no value is set, the default value to assume is `"planar"`.
162162

163-
Note if `edges` is `"spherical"` then it is recommended that `orientation` is always ensured to be `"counterclockwise"`. If it is not set, it is not clear how polygons should be interpreted within spherical coordinate systems, which can lead to major analytical errors if interpreted incorrectly.
163+
Note if `edges` is `"spherical"` then it is RECOMMENDED that `orientation` is always ensured to be `"counterclockwise"`. If it is not set, it is not clear how polygons should be interpreted within spherical coordinate systems, which can lead to major analytical errors if interpreted incorrectly.
164164
In this case, software will typically interpret the rings of a polygon such that it encloses at most half of the sphere (i.e. the smallest polygon of both ways it could be interpreted). But the specification itself does not make any guarantee about this.
165165

166166
#### bbox
@@ -169,7 +169,7 @@ Bounding boxes are used to help define the spatial extent of each geometry colum
169169
Implementations of this schema may choose to use those bounding boxes to filter
170170
partitions (files) of a partitioned dataset.
171171

172-
The bbox, if specified, must be encoded with an array representing the range of values for each dimension in the
172+
The bbox, if specified, MUST be encoded with an array representing the range of values for each dimension in the
173173
geometry coordinates. For geometries in a geographic coordinate reference system, longitude and latitude values are
174174
listed for the most southwesterly coordinate followed by values for the most northeasterly coordinate. This follows the
175175
GeoJSON specification ([RFC 7946, section 5](https://tools.ietf.org/html/rfc7946#section-5)), which also describes how
@@ -186,7 +186,7 @@ The bbox values are in the same coordinate reference system as the geometry.
186186

187187
#### Feature identifiers
188188

189-
If you are using GeoParquet to serialize geospatial data with feature identifiers, it is recommended that you create your own [file key/value metadata](https://github.com/apache/parquet-format#metadata) to indicate the column that represents this identifier. As an example, GDAL writes additional metadata using the `gdal:schema` key including information about feature identifiers and other information outside the scope of the GeoParquet specification.
189+
If you are using GeoParquet to serialize geospatial data with feature identifiers, it is RECOMMENDED that you create your own [file key/value metadata](https://github.com/apache/parquet-format#metadata) to indicate the column that represents this identifier. As an example, GDAL writes additional metadata using the `gdal:schema` key including information about feature identifiers and other information outside the scope of the GeoParquet specification.
190190

191191
### OGC:CRS84 details
192192

0 commit comments

Comments
 (0)