Skip to content

feat: Implement ST_LENGTH geography function #1791

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

Conversation

tswast
Copy link
Collaborator

@tswast tswast commented Jun 4, 2025

This commit introduces the ST_LENGTH function for BigQuery DataFrames. ST_LENGTH computes the length of GEOGRAPHY objects in meters.

The implementation includes:

  • A new operation geo_st_length_op in bigframes.operations.geo_ops.
  • The user-facing function st_length in bigframes.bigquery._operations.geo.
  • Exposure of the new operation and function in relevant __init__.py files.
  • Comprehensive unit tests covering various geometry types (Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, GeometryCollection), empty geographies, and NULL inputs.

The function behaves as per the BigQuery ST_LENGTH documentation:

  • Returns 0 for POINT, MULTIPOINT, and empty GEOGRAPHYs.
  • Returns the perimeter for POLYGON and MULTIPOLYGON.
  • Returns the total length for LINESTRING and MULTILINESTRING.
  • For GEOMETRYCOLLECTION, sums the lengths/perimeters of its constituent linestrings and polygons.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

This commit introduces the ST_LENGTH function for BigQuery DataFrames.
ST_LENGTH computes the length of GEOGRAPHY objects in meters.

The implementation includes:
- A new operation `geo_st_length_op` in `bigframes.operations.geo_ops`.
- The user-facing function `st_length` in `bigframes.bigquery._operations.geo`.
- Exposure of the new operation and function in relevant `__init__.py` files.
- Comprehensive unit tests covering various geometry types (Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, GeometryCollection), empty geographies, and NULL inputs.

The function behaves as per the BigQuery ST_LENGTH documentation:
- Returns 0 for POINT, MULTIPOINT, and empty GEOGRAPHYs.
- Returns the perimeter for POLYGON and MULTIPOLYGON.
- Returns the total length for LINESTRING and MULTILINESTRING.
- For GEOMETRYCOLLECTION, sums the lengths/perimeters of its constituent linestrings and polygons.
@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Jun 4, 2025
This commit adds a `length` property to the `GeoSeries` class.
Accessing this property will raise a `NotImplementedError`, guiding you to utilize the `bigframes.bigquery.st_length()` function instead.

This change includes:
- The `length` property in `bigframes/geopandas/geoseries.py`.
- A unit test in `tests/system/small/geopandas/test_geoseries.py` to verify that the correct error is raised with the specified message when `GeoSeries.length` is accessed.
This commit applies a user-provided patch that includes:
- Removing `st_length` from `bigframes/bigquery/_operations/__init__.py`.
- Adding an Ibis implementation for `geo_st_length_op` in `bigframes/core/compile/scalar_op_compiler.py`.
- Modifying `KMeans` in `bigframes/ml/cluster.py` to handle `init="k-means++"`.
- Updating geo tests in `tests/system/small/bigquery/test_geo.py` to use `to_pandas()` and `pd.testing.assert_series_equal`.

Note: System tests requiring Google Cloud authentication were not executed due to limitations in my current environment.
This commit introduces the `use_spheroid` parameter to the `ST_LENGTH`
geography function, aligning it more closely with the BigQuery
ST_LENGTH(geography_expression[, use_spheroid]) signature.

Key changes:
- `bigframes.operations.geo_ops.GeoStLengthOp` is now a dataclass
  that accepts `use_spheroid` (defaulting to `False`). A check is
  included to raise `NotImplementedError` if `use_spheroid` is `True`,
  as this is the current limitation in BigQuery.
- The Ibis compiler implementation for `geo_st_length_op` in
  `bigframes.core.compile.scalar_op_compiler.py` has been updated
  to accept the new `GeoStLengthOp` operator type.
- The user-facing `st_length` function in
  `bigframes.bigquery._operations.geo.py` now includes the
  `use_spheroid` keyword argument.
- The docstring for `st_length` has been updated to match the
  official BigQuery documentation, clarifying that only lines contribute
  to the length (points and polygons result in 0 length), and
  detailing the `use_spheroid` parameter. Examples have been
  updated accordingly.
- Tests in `tests/system/small/bigquery/test_geo.py` have been
  updated to:
    - Reflect the correct behavior (0 length for polygons/points).
    - Test calls with both default `use_spheroid` and explicit
      `use_spheroid=False`.
    - Verify that `use_spheroid=True` raises a `NotImplementedError`.

Note: System tests requiring Google Cloud authentication were not
re-executed for this specific commit due to environment limitations
identified in previous steps. The changes primarily affect the operator
definition, function signature, and client-side validation, with the
core Ibis compilation logic for length remaining unchanged.
This commit refactors the ST_LENGTH implementation to correctly
pass the `use_spheroid` parameter to BigQuery by using Ibis's
`ibis_udf.scalar.builtin('ST_LENGTH', ...)` function.

Key changes:
- `bigframes.operations.geo_ops.GeoStLengthOp`: The client-side
  `NotImplementedError` for `use_spheroid=True` (raised in
  `__post_init__`) has been removed. BigQuery DataFrames will now
  pass this parameter directly to BigQuery.
- `bigframes.core.compile.scalar_op_compiler.geo_length_op_impl`:
  The implementation now always uses
  `ibis_udf.scalar.builtin('ST_LENGTH', x, op.use_spheroid)`
  instead of `x.length()`. This ensures the `use_spheroid`
  parameter is included in the SQL generated for BigQuery.
- `tests/system/small/bigquery/test_geo.py`:
    - The test expecting a client-side `NotImplementedError` for
      `use_spheroid=True` has been removed.
    - A new test `test_st_length_use_spheroid_true_errors_from_bq`
      has been added. This test calls `st_length` with
      `use_spheroid=True` and asserts that an exception is raised
      from BigQuery, as BigQuery itself currently only supports
      `use_spheroid=False` for the `ST_LENGTH` function.
    - Existing tests for `st_length` were already updated in a
      previous commit to reflect that only line geometries contribute
      to the length, and these continue to verify behavior with
      `use_spheroid=False`.

This change ensures that BigQuery DataFrames accurately reflects BigQuery's
`ST_LENGTH` capabilities concerning the `use_spheroid` parameter.
This commit refactors the ST_LENGTH geography operation to use an
Ibis UDF defined via `@ibis_udf.scalar.builtin`. This aligns with
the pattern exemplified by other built-in functions like ST_DISTANCE
when a direct Ibis method with all necessary parameters is not available.

Key changes:
- A new `st_length` function is defined in
  `bigframes/core/compile/scalar_op_compiler.py` using
  `@ibis_udf.scalar.builtin`. This UDF maps to BigQuery's
  `ST_LENGTH(geography, use_spheroid)` function.
- The `geo_length_op_impl` in the same file now calls this
  `st_length` Ibis UDF, replacing the previous use of
  `op_typing.ibis_function`.
- The `GeoStLengthOp` in `bigframes/operations/geo_ops.py` and
  the user-facing `st_length` function in
  `bigframes/bigquery/_operations/geo.py` remain unchanged from
  the previous version, as they correctly define the operation's
  interface and parameters.

This change provides a cleaner and more direct way to map the
BigQuery DataFrames operation to the specific BigQuery ST_LENGTH
SQL function signature, while maintaining the existing BigQuery DataFrames
operation structure. The behavior of the `st_length` function,
including its handling of the `use_spheroid` parameter and error
conditions from BigQuery, remains the same.
This commit refactors the system tests for the `st_length` geography
function in `tests/system/small/bigquery/test_geo.py`.

The numerous individual test cases for different geometry types
have been combined into a single, comprehensive test function
`test_st_length_various_geometries`. This new test uses a single
GeoSeries with a variety of inputs (Point, LineString, Polygon,
MultiPoint, MultiLineString, MultiPolygon, GeometryCollection,
None/Empty) and compares the output of `st_length` (with both
default and explicit `use_spheroid=False`) against a pandas Series
of expected lengths.

This consolidation improves the conciseness and maintainability of
the tests for `st_length`. The test for `use_spheroid=True`
(expecting an error from BigQuery) remains separate.
@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: l Pull request size is large. labels Jun 5, 2025
google-labs-jules bot and others added 5 commits June 5, 2025 17:30
This commit fixes an ImportError caused by an incorrect name being
used for the ST_LENGTH geography operator in `bigframes/operations/__init__.py`.

When `geo_st_length_op` (a variable) was replaced by the dataclass
`GeoStLengthOp`, the import and `__all__` list in this `__init__.py`
file were not updated. This commit changes the import from `.geo_ops`
to correctly import `GeoStLengthOp` and updates the `__all__` list
to export `GeoStLengthOp`.
@tswast tswast marked this pull request as ready for review June 5, 2025 18:02
@tswast tswast requested review from a team as code owners June 5, 2025 18:02
@tswast tswast requested a review from chelsea-lin June 5, 2025 18:02
@tswast
Copy link
Collaborator Author

tswast commented Jun 5, 2025

@chelsea-lin This is mostly generated via jules.google.com/ but I patched some of the tests to make it work. I think this now ready for review.

) -> bigframes.series.Series:
"""Returns the total length in meters of the lines in the input GEOGRAPHY.

If geography_expression is a point or a polygon, returns zero. If
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update geography_expression into series

bigframes.series.Series:
Series of floats representing the lengths in meters.
"""
series = series._apply_unary_op(ops.GeoStLengthOp(use_spheroid=use_spheroid))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC from the docstring, we don't support for a True value of use_spheroid. Can you please throw a NotImplementedError here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, why we cannot support a True value of use_spheroid here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a server-side limitation: https://cloud.google.com/bigquery/docs/reference/standard-sql/geography_functions#st_length

I'd rather avoid any client-side checks, as maybe someday the server side will implement this feature.

I'm not sure why they have the parameter at all if it's not implemented, to be honest.

@tswast tswast requested a review from chelsea-lin June 5, 2025 19:27
chelsea-lin
chelsea-lin previously approved these changes Jun 5, 2025
@tswast
Copy link
Collaborator Author

tswast commented Jun 6, 2025

Looks like this broke some other functions. I need to take a closer look:

__________________________________ test_geo_x __________________________________
[gw6] linux -- Python 3.9.20 /tmpfs/src/github/python-bigquery-dataframes/.nox/system-3-9/bin/python

urban_areas_dfs = (       urban_area_code                      name  \
geo_id                                             
0            ..., -83.74099 ...
84630            84630  ...  MULTIPOLYGON (((-79.13478 38.15216, -79.13484 ...

[22 rows x 13 columns])

    def test_geo_x(urban_areas_dfs):
        bf_ua, pd_ua = urban_areas_dfs
        bf_series: bigframes.geopandas.GeoSeries = bf_ua["internal_point_geom"].geo
        pd_series: geopandas.GeoSeries = geopandas.GeoSeries(pd_ua["internal_point_geom"])
>       bf_result = bf_series.x.to_pandas()

[tests/system/small/geopandas/test_geoseries.py:48](https://cs.corp.google.com/piper///depot/google3/tests/system/small/geopandas/test_geoseries.py?l=48): 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[bigframes/core/log_adapter.py:175](https://cs.corp.google.com/piper///depot/google3/bigframes/core/log_adapter.py?l=175): in wrapper
    return method(*args, **kwargs)
[bigframes/series.py:577](https://cs.corp.google.com/piper///depot/google3/bigframes/series.py?l=577): in to_pandas
    df, query_job = self._block.to_pandas(
[bigframes/core/blocks.py:531](https://cs.corp.google.com/piper///depot/google3/bigframes/core/blocks.py?l=531): in to_pandas
    df, query_job = self._materialize_local(
[bigframes/core/blocks.py:619](https://cs.corp.google.com/piper///depot/google3/bigframes/core/blocks.py?l=619): in _materialize_local
    execute_result = self.session._executor.execute(
[bigframes/session/bq_caching_executor.py:196](https://cs.corp.google.com/piper///depot/google3/bigframes/session/bq_caching_executor.py?l=196): in execute
    return self._execute_plan(
[bigframes/session/bq_caching_executor.py:661](https://cs.corp.google.com/piper///depot/google3/bigframes/session/bq_caching_executor.py?l=661): in _execute_plan
    compiled = compile.compile_sql(
[bigframes/core/compile/compiler.py:60](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/compiler.py?l=60): in compile_sql
    sql = compile_result_node(result_node)
[bigframes/core/compile/compiler.py:87](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/compiler.py?l=87): in compile_result_node
    return compile_node(root.child).to_sql(
[bigframes/core/compile/compiler.py:98](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/compiler.py?l=98): in compile_node
    return node.reduce_up(lambda node, children: _compile_node(node, *children))
[bigframes/core/bigframe_node.py:376](https://cs.corp.google.com/piper///depot/google3/bigframes/core/bigframe_node.py?l=376): in reduce_up
    result = reduction(node, child_results)
[bigframes/core/compile/compiler.py:98](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/compiler.py?l=98): in <lambda>
    return node.reduce_up(lambda node, children: _compile_node(node, *children))
/[usr/local/lib/python3.9/functools.py:888](https://cs.corp.google.com/piper///depot/google3/usr/local/lib/python3.9/functools.py?l=888): in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
[bigframes/core/compile/compiler.py:243](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/compiler.py?l=243): in compile_projection
    return child.projection(tuple(projections))
[bigframes/core/compile/compiled.py:129](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/compiled.py?l=129): in projection
    new_values = [
[bigframes/core/compile/compiled.py:130](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/compiled.py?l=130): in <listcomp>
    op_compiler.compile_expression(expression, bindings).name(id)
/[usr/local/lib/python3.9/functools.py:938](https://cs.corp.google.com/piper///depot/google3/usr/local/lib/python3.9/functools.py?l=938): in _method
    return method.__get__(obj, cls)(*args, **kwargs)
[bigframes/core/compile/scalar_op_compiler.py:102](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/scalar_op_compiler.py?l=102): in _
    return self.compile_row_op(expression.op, inputs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <bigframes.core.compile.scalar_op_compiler.ScalarOpCompiler object at 0x1478cde8d580>
op = geo_x()
inputs = [r0 := SQLQueryResult
  query:
    SELECT `geo_id`, `internal_point_geom` FROM `bigframes- …
  schema:
    geo_id     ...ial:geography;4326
    urban_area_geom          geospatial:geography;4326

internal_point_geom: r0.internal_point_geom]

    def compile_row_op(
        self, op: ops.RowOp, inputs: typing.Sequence[ibis_types.Value]
    ) -> ibis_types.Value:
>       impl = self._registry[op.name]
E       KeyError: 'geo_x'

[bigframes/core/compile/scalar_op_compiler.py:107](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/scalar_op_compiler.py?l=107): KeyError
_____________________________ test_geo_x_non_point _____________________________
[gw6] linux -- Python 3.9.20 /tmpfs/src/github/python-bigquery-dataframes/.nox/system-3-9/bin/python

urban_areas_dfs = (       urban_area_code                      name  \
geo_id                                             
0            ..., -83.74099 ...
84630            84630  ...  MULTIPOLYGON (((-79.13478 38.15216, -79.13484 ...

[22 rows x 13 columns])

    def test_geo_x_non_point(urban_areas_dfs):
        bf_ua, _ = urban_areas_dfs
        bf_series: bigframes.geopandas.GeoSeries = bf_ua["urban_area_geom"].geo
    
        with pytest.raises(google.api_core.exceptions.BadRequest, match="ST_X"):
>           bf_series.x.to_pandas()

[tests/system/small/geopandas/test_geoseries.py:62](https://cs.corp.google.com/piper///depot/google3/tests/system/small/geopandas/test_geoseries.py?l=62):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants