-
Notifications
You must be signed in to change notification settings - Fork 49
feat: Implement ST_LENGTH geography function #1791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit introduces the ST_LENGTH function for BigQuery DataFrames. ST_LENGTH computes the length of GEOGRAPHY objects in meters. The implementation includes: - A new operation `geo_st_length_op` in `bigframes.operations.geo_ops`. - The user-facing function `st_length` in `bigframes.bigquery._operations.geo`. - Exposure of the new operation and function in relevant `__init__.py` files. - Comprehensive unit tests covering various geometry types (Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, GeometryCollection), empty geographies, and NULL inputs. The function behaves as per the BigQuery ST_LENGTH documentation: - Returns 0 for POINT, MULTIPOINT, and empty GEOGRAPHYs. - Returns the perimeter for POLYGON and MULTIPOLYGON. - Returns the total length for LINESTRING and MULTILINESTRING. - For GEOMETRYCOLLECTION, sums the lengths/perimeters of its constituent linestrings and polygons.
This commit adds a `length` property to the `GeoSeries` class. Accessing this property will raise a `NotImplementedError`, guiding you to utilize the `bigframes.bigquery.st_length()` function instead. This change includes: - The `length` property in `bigframes/geopandas/geoseries.py`. - A unit test in `tests/system/small/geopandas/test_geoseries.py` to verify that the correct error is raised with the specified message when `GeoSeries.length` is accessed.
This commit applies a user-provided patch that includes: - Removing `st_length` from `bigframes/bigquery/_operations/__init__.py`. - Adding an Ibis implementation for `geo_st_length_op` in `bigframes/core/compile/scalar_op_compiler.py`. - Modifying `KMeans` in `bigframes/ml/cluster.py` to handle `init="k-means++"`. - Updating geo tests in `tests/system/small/bigquery/test_geo.py` to use `to_pandas()` and `pd.testing.assert_series_equal`. Note: System tests requiring Google Cloud authentication were not executed due to limitations in my current environment.
This commit introduces the `use_spheroid` parameter to the `ST_LENGTH` geography function, aligning it more closely with the BigQuery ST_LENGTH(geography_expression[, use_spheroid]) signature. Key changes: - `bigframes.operations.geo_ops.GeoStLengthOp` is now a dataclass that accepts `use_spheroid` (defaulting to `False`). A check is included to raise `NotImplementedError` if `use_spheroid` is `True`, as this is the current limitation in BigQuery. - The Ibis compiler implementation for `geo_st_length_op` in `bigframes.core.compile.scalar_op_compiler.py` has been updated to accept the new `GeoStLengthOp` operator type. - The user-facing `st_length` function in `bigframes.bigquery._operations.geo.py` now includes the `use_spheroid` keyword argument. - The docstring for `st_length` has been updated to match the official BigQuery documentation, clarifying that only lines contribute to the length (points and polygons result in 0 length), and detailing the `use_spheroid` parameter. Examples have been updated accordingly. - Tests in `tests/system/small/bigquery/test_geo.py` have been updated to: - Reflect the correct behavior (0 length for polygons/points). - Test calls with both default `use_spheroid` and explicit `use_spheroid=False`. - Verify that `use_spheroid=True` raises a `NotImplementedError`. Note: System tests requiring Google Cloud authentication were not re-executed for this specific commit due to environment limitations identified in previous steps. The changes primarily affect the operator definition, function signature, and client-side validation, with the core Ibis compilation logic for length remaining unchanged.
This commit refactors the ST_LENGTH implementation to correctly pass the `use_spheroid` parameter to BigQuery by using Ibis's `ibis_udf.scalar.builtin('ST_LENGTH', ...)` function. Key changes: - `bigframes.operations.geo_ops.GeoStLengthOp`: The client-side `NotImplementedError` for `use_spheroid=True` (raised in `__post_init__`) has been removed. BigQuery DataFrames will now pass this parameter directly to BigQuery. - `bigframes.core.compile.scalar_op_compiler.geo_length_op_impl`: The implementation now always uses `ibis_udf.scalar.builtin('ST_LENGTH', x, op.use_spheroid)` instead of `x.length()`. This ensures the `use_spheroid` parameter is included in the SQL generated for BigQuery. - `tests/system/small/bigquery/test_geo.py`: - The test expecting a client-side `NotImplementedError` for `use_spheroid=True` has been removed. - A new test `test_st_length_use_spheroid_true_errors_from_bq` has been added. This test calls `st_length` with `use_spheroid=True` and asserts that an exception is raised from BigQuery, as BigQuery itself currently only supports `use_spheroid=False` for the `ST_LENGTH` function. - Existing tests for `st_length` were already updated in a previous commit to reflect that only line geometries contribute to the length, and these continue to verify behavior with `use_spheroid=False`. This change ensures that BigQuery DataFrames accurately reflects BigQuery's `ST_LENGTH` capabilities concerning the `use_spheroid` parameter.
This commit refactors the ST_LENGTH geography operation to use an Ibis UDF defined via `@ibis_udf.scalar.builtin`. This aligns with the pattern exemplified by other built-in functions like ST_DISTANCE when a direct Ibis method with all necessary parameters is not available. Key changes: - A new `st_length` function is defined in `bigframes/core/compile/scalar_op_compiler.py` using `@ibis_udf.scalar.builtin`. This UDF maps to BigQuery's `ST_LENGTH(geography, use_spheroid)` function. - The `geo_length_op_impl` in the same file now calls this `st_length` Ibis UDF, replacing the previous use of `op_typing.ibis_function`. - The `GeoStLengthOp` in `bigframes/operations/geo_ops.py` and the user-facing `st_length` function in `bigframes/bigquery/_operations/geo.py` remain unchanged from the previous version, as they correctly define the operation's interface and parameters. This change provides a cleaner and more direct way to map the BigQuery DataFrames operation to the specific BigQuery ST_LENGTH SQL function signature, while maintaining the existing BigQuery DataFrames operation structure. The behavior of the `st_length` function, including its handling of the `use_spheroid` parameter and error conditions from BigQuery, remains the same.
This commit refactors the system tests for the `st_length` geography function in `tests/system/small/bigquery/test_geo.py`. The numerous individual test cases for different geometry types have been combined into a single, comprehensive test function `test_st_length_various_geometries`. This new test uses a single GeoSeries with a variety of inputs (Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, GeometryCollection, None/Empty) and compares the output of `st_length` (with both default and explicit `use_spheroid=False`) against a pandas Series of expected lengths. This consolidation improves the conciseness and maintainability of the tests for `st_length`. The test for `use_spheroid=True` (expecting an error from BigQuery) remains separate.
This commit fixes an ImportError caused by an incorrect name being used for the ST_LENGTH geography operator in `bigframes/operations/__init__.py`. When `geo_st_length_op` (a variable) was replaced by the dataclass `GeoStLengthOp`, the import and `__all__` list in this `__init__.py` file were not updated. This commit changes the import from `.geo_ops` to correctly import `GeoStLengthOp` and updates the `__all__` list to export `GeoStLengthOp`.
@chelsea-lin This is mostly generated via jules.google.com/ but I patched some of the tests to make it work. I think this now ready for review. |
) -> bigframes.series.Series: | ||
"""Returns the total length in meters of the lines in the input GEOGRAPHY. | ||
|
||
If geography_expression is a point or a polygon, returns zero. If |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update geography_expression
into series
bigframes.series.Series: | ||
Series of floats representing the lengths in meters. | ||
""" | ||
series = series._apply_unary_op(ops.GeoStLengthOp(use_spheroid=use_spheroid)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC from the docstring, we don't support for a True value of use_spheroid
. Can you please throw a NotImplementedError here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, why we cannot support a True value of use_spheroid
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a server-side limitation: https://cloud.google.com/bigquery/docs/reference/standard-sql/geography_functions#st_length
I'd rather avoid any client-side checks, as maybe someday the server side will implement this feature.
I'm not sure why they have the parameter at all if it's not implemented, to be honest.
Looks like this broke some other functions. I need to take a closer look:
|
This commit introduces the ST_LENGTH function for BigQuery DataFrames. ST_LENGTH computes the length of GEOGRAPHY objects in meters.
The implementation includes:
geo_st_length_op
inbigframes.operations.geo_ops
.st_length
inbigframes.bigquery._operations.geo
.__init__.py
files.The function behaves as per the BigQuery ST_LENGTH documentation:
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕