Skip to content

spec: add clarification about the Geometry type calculation #13227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jiayuasu
Copy link
Member

@jiayuasu jiayuasu commented Jun 3, 2025

Background

Several engines including Spark/Sedona and Presto explicitly mention the calculations on geometries are done via Cartesian mathematics. It would be good if we add clarification to the Iceberg spec

Proposed change

Added the following sentence

and geometric calculations are performed using Cartesian mathematics along straight lines

@github-actions github-actions bot added the Specification Issues that may introduce spec changes. label Jun 3, 2025
@jiayuasu
Copy link
Member Author

jiayuasu commented Jun 3, 2025

@rdblue @szehon-ho please review :-)

format/spec.md Outdated
@@ -221,7 +221,7 @@ Supported primitive types are defined in the table below. Primitive types added
| | **`uuid`** | Universally unique identifiers | Should use 16-byte fixed |
| | **`fixed(L)`** | Fixed-length byte array of length L | |
| | **`binary`** | Arbitrary-length byte array | |
| [v3](#version-3) | **`geometry(C)`** | Geospatial features from [OGC – Simple feature access][1001]. Edge-interpolation is always linear/planar. See [Appendix G](#appendix-g-geospatial-notes). Parameterized by CRS C. If not specified, C is `OGC:CRS84`. | |
| [v3](#version-3) | **`geometry(C)`** | Geospatial features from [OGC – Simple feature access][1001]. Edge-interpolation is always linear/planar, and geometric calculations are performed using Cartesian mathematics along straight lines. See [Appendix G](#appendix-g-geospatial-notes). Parameterized by CRS C. If not specified, C is `OGC:CRS84`. | |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

Edge-interpolation is always linear/planar, ie geometric calculations are performed using Cartesian mathematics along straight lines.

Then its more clear its a clarification rather than another fact

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed.

@@ -221,7 +221,7 @@ Supported primitive types are defined in the table below. Primitive types added
| | **`uuid`** | Universally unique identifiers | Should use 16-byte fixed |
| | **`fixed(L)`** | Fixed-length byte array of length L | |
| | **`binary`** | Arbitrary-length byte array | |
| [v3](#version-3) | **`geometry(C)`** | Geospatial features from [OGC – Simple feature access][1001]. Edge-interpolation is always linear/planar. See [Appendix G](#appendix-g-geospatial-notes). Parameterized by CRS C. If not specified, C is `OGC:CRS84`. | |
| [v3](#version-3) | **`geometry(C)`** | Geospatial features from [OGC – Simple feature access][1001]. Edge-interpolation is always linear/planar, i.e., geometric calculations are performed using Cartesian mathematics along straight lines. See [Appendix G](#appendix-g-geospatial-notes). Parameterized by CRS C. If not specified, C is `OGC:CRS84`. | |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention that the purpose of Cartesian mathematics is to avoid the need to interpret the CRS?

Something like:

The CRS may be used to interpret points as geographic locations, but does not affect calculations, which are always Cartesian.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed. Please review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but I think @mkaravel suggests some good changes to make to clarify the bounding boxes for geometry.

@mkaravel
Copy link

mkaravel commented Jun 5, 2025

@jiayuasu Thank you for driving this!

I am not able to comment directly at appropriate place since there are no changes there, which is why I writing this as a high level comment in the PR.

In line 684 of the existing spec (prior to this PR) we state:

For geometry and geography types, lower_bounds and upper_bounds are both points of the following coordinates X, Y, Z, and M (see Appendix G) which are the lower / upper bound of all objects in the file. For the X values only, xmin may be greater than xmax, in which case an object in this bounding box may match if it contains an X such that x >= xmin ORx <= xmax. In geographic terminology, the concepts of xmin, xmax, ymin, and ymax are also known as westernmost, easternmost, southernmost and northernmost, respectively. For geography types, these points are further restricted to the canonical ranges of [-180 180] for X and [-90 90] for Y.

I believe that in order to make the spec self-consistent it would make sense to rephrase this and specify that xmin can be greater than xmax for geographies only.

How about the following slight modification?

For geometry and geography types, lower_bounds and upper_bounds are both points of the following coordinates X, Y, Z, and M (see Appendix G) which are the lower / upper bound of all objects in the file.

For geography, for the X values only, xmin may be greater than xmax, in which case an object in this bounding box may match if it contains an X such that x >= xmin OR x <= xmax. In geographic terminology, the concepts of xmin, xmax, ymin, and ymax are also known as westernmost, easternmost, southernmost and northernmost, respectively. These points are further restricted to the canonical ranges of [-180..180] for X and [-90..90] for Y.

@jiayuasu
Copy link
Member Author

jiayuasu commented Jun 5, 2025

@mkaravel I think this is relevant to our discussion in Slack. After giving some thoughts, I think it makes sense to consider that together with the linear option in Geography type. I guess we can make another PR to discuss that specific issue? This PR is just to add 1 sentence clarification about the geometric calculation

@rdblue
Copy link
Contributor

rdblue commented Jun 6, 2025

@jiayuasu, can you help me understand why we wouldn't make the clarifications @mkaravel suggests in this PR?

If we don't expect the CRS to an effect on calculations for geometry and only geography is restricted to [-180, 180] then it makes sense to me that geometry bboxes should never have xmin > xmax. Clarifying that here where we are already noting that the calculations are purely Cartesian makes sense to me!

@jiayuasu
Copy link
Member Author

jiayuasu commented Jun 6, 2025

@rdblue I think it makes sense to drop xmin > xmax wraparound in geometry if we don't expect CRS to have an effect.

On the other hand, I'd like to also add linear interpolation to geography if we want to drop this. This is an idea proposed by @mkaravel before. Because there will be use cases that needs linear interpolation (cartesian mathematics) on antimeridian crossing geometries on geographic CRS.

I am fine adding the sentence to disallow wraparound here and create another PR to add linear interpolation although I think the 2 issues are interconnected. Or, I can put everything in this PR instead so we can solve these issues together

Please let me know what you think

@mkaravel
Copy link

mkaravel commented Jun 6, 2025

@rdblue I think it makes sense to drop xmin > xmax wraparound in geometry if we don't expect CRS to have an effect.

On the other hand, I'd like to also add linear interpolation to geography if we want to drop this. This is an idea proposed by @mkaravel before. Because there will be use cases that needs linear interpolation (cartesian mathematics) on antimeridian crossing geometries on geographic CRS.

@jiayuasu I am obviously (since I have suggested it in the past) totally on board with this plan. This will make the semantics of the types very clean (geometry always Cartesian, calculations independent of the CRS) and allow geography to have an easy, practical, and meaningful way to deal with the antimeridian geometries (also very clean IMHO).

I am fine adding the sentence to disallow wraparound here and create another PR to add linear interpolation although I think the 2 issues are interconnected. Or, I can put everything in this PR instead so we can solve these issues together

My thinking when I proposed the changes above in this PR was to make it self-consistent and self-contained regarding the geometry type. We can follow up with a separate PR for geography or add the changes here. Either way is fine with me. Maybe a single PR is just simpler.

@rdblue Are you okay with this? Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Specification Issues that may introduce spec changes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants