ExpectColumnValueLengthsToEqual is failing/raising exception when applied on a column having null values as well #10917

suchintakp5 · 2025-02-06T14:39:01Z

Describe the bug
ExpectColumnValueLengthsToEqual is failing/raising exception when applied on a column having null values as well. in version 1.3.5. This was not failing in version 0.18

To Reproduce

import great_expectations as gx
import great_expectations.expectations as gxe

# Retrieve your Data Context
data_context = gx.get_context(mode="ephemeral")
# Define the Data Source name
data_source_name = "source_system_name_spark_dataframe"
# Add the Data Source to the Data Context
data_source = data_context.data_sources.add_spark(name=data_source_name)
# Define the Data Asset name
data_asset_name = "dataset_name"
# Add a Data Asset to the Data Source
data_asset = data_source.add_dataframe_asset(name=data_asset_name)
# Define the Batch Definition name
batch_definition_name = "dataset_batch_definition"

# Add a Batch Definition to the Data Asset
batch_definition = data_asset.add_batch_definition_whole_dataframe(
    batch_definition_name
)

df = <A pyspark dataframe containing few null values in a string column>

batch_parameters = {"dataframe": df}
# Get the dataframe as a Batch
batch = batch_definition.get_batch(batch_parameters=batch_parameters)

test = gxe.ExpectColumnValueLengthsToEqual(column=<column_name>, value=<length>)
# Test the Expectation
validation_results = batch.validate(test, result_format="COMPLETE")
print(validation_results)

Expected behavior
No exception should be raised. For the null values, the length should be equal to zero or they should not be considered as part the expectation result

Environment (please complete the following information):

Great Expectations Version: [1.3.5]
Data Source: [Spark dataframe created from a csv file]
Cloud environment: [Azure Databricks]

Additional context

{
  "success": false,
  "expectation_config": {
    "type": "expect_column_value_lengths_to_equal",
    "kwargs": {
      "column": "ID Number",
      "value": 7.0,
      "batch_id": "source_system_name_spark_dataframe-dataset_name"
    },
    "meta": {}
  },
  "result": {},
  "meta": {},
  "exception_info": {
    "('column_values.value_length.map', '0464e137b2cdb1dd819e7ee85c081f95', ())": {
      "exception_traceback": "Traceback (most recent call last):\n  File \"/local_disk0/.ephemeral_nfs/envs/pythonEnv-70771ece-6841-4d7b-a9e8-4a8bc864ed04/lib/python3.9/site-packages/great_expectations/execution_engine/execution_engine.py\", line 532, in _process_direct_and_bundled_metric_computation_configurations\n    metric_computation_configuration.metric_fn(  # type: ignore[misc] # F not callable\n  File \"/local_disk0/.ephemeral_nfs/envs/pythonEnv-70771ece-6841-4d7b-a9e8-4a8bc864ed04/lib/python3.9/site-packages/great_expectations/expectations/metrics/metric_provider.py\", line 99, in inner_func\n    return metric_fn(*args, **kwargs)\n  File \"/local_disk0/.ephemeral_nfs/envs/pythonEnv-70771ece-6841-4d7b-a9e8-4a8bc864ed04/lib/python3.9/site-packages/great_expectations/expectations/metrics/map_metric_provider/column_function_partial.py\", line 239, in inner_func\n    ) = execution_engine.get_compute_domain(\n  File \"/local_disk0/.ephemeral_nfs/envs/pythonEnv-70771ece-6841-4d7b-a9e8-4a8bc864ed04/lib/python3.9/site-packages/great_expectations/execution_engine/sparkdf_execution_engine.py\", line 800, in get_compute_domain\n    data: pyspark.DataFrame = self.get_domain_records(domain_kwargs=domain_kwargs)\n  File \"/local_disk0/.ephemeral_nfs/envs/pythonEnv-70771ece-6841-4d7b-a9e8-4a8bc864ed04/lib/python3.9/site-packages/great_expectations/execution_engine/sparkdf_execution_engine.py\", line 689, in get_domain_records\n    data = data.filter(filter_condition.condition)\n  File \"/databricks/spark/python/pyspark/instrumentation_utils.py\", line 48, in wrapper\n    res = func(*args, **kwargs)\n  File \"/databricks/spark/python/pyspark/sql/dataframe.py\", line 3123, in filter\n    jdf = self._jdf.filter(condition)\n  File \"/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py\", line 1321, in __call__\n    return_value = get_return_value(\n  File \"/databricks/spark/python/pyspark/errors/exceptions.py\", line 234, in deco\n    raise converted from None\npyspark.errors.exceptions.ParseException: \n[PARSE_SYNTAX_ERROR] Syntax error at or near 'IS'.(line 1, pos 10)\n\n== SQL ==\nID Number IS NOT NULL\n----------^^^\n\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n  File \"/local_disk0/.ephemeral_nfs/envs/pythonEnv-70771ece-6841-4d7b-a9e8-4a8bc864ed04/lib/python3.9/site-packages/great_expectations/validator/validation_graph.py\", line 276, in _resolve\n    self._execution_engine.resolve_metrics(\n  File \"/local_disk0/.ephemeral_nfs/envs/pythonEnv-70771ece-6841-4d7b-a9e8-4a8bc864ed04/lib/python3.9/site-packages/great_expectations/execution_engine/execution_engine.py\", line 279, in resolve_metrics\n    return self._process_direct_and_bundled_metric_computation_configurations(\n  File \"/local_disk0/.ephemeral_nfs/envs/pythonEnv-70771ece-6841-4d7b-a9e8-4a8bc864ed04/lib/python3.9/site-packages/great_expectations/execution_engine/execution_engine.py\", line 537, in _process_direct_and_bundled_metric_computation_configurations\n    raise gx_exceptions.MetricResolutionError(\ngreat_expectations.exceptions.exceptions.MetricResolutionError: \n[PARSE_SYNTAX_ERROR] Syntax error at or near 'IS'.(line 1, pos 10)\n\n== SQL ==\nID Number IS NOT NULL\n----------^^^\n\n",
      "exception_message": "\n[PARSE_SYNTAX_ERROR] Syntax error at or near 'IS'.(line 1, pos 10)\n\n== SQL ==\nID Number IS NOT NULL\n----------^^^\n",
      "raised_exception": true
    }
  }
}

The text was updated successfully, but these errors were encountered:

adeola-ak · 2025-02-10T22:58:13Z

hi there. What is your Databricks Runtime version? I'm not able to replicate this. Can you also check if the issue occurs in both single-user and multi-user clusters? Have you tested this outside of Databricks

suchintakp5 · 2025-02-11T11:33:35Z

This issue is reproducible in DBR version 15.4 in both single-user and multi-user cluster

adeola-ak added this to GX Core Issues Board Feb 10, 2025

github-project-automation bot moved this to To Do in GX Core Issues Board Feb 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ExpectColumnValueLengthsToEqual is failing/raising exception when applied on a column having null values as well #10917

ExpectColumnValueLengthsToEqual is failing/raising exception when applied on a column having null values as well #10917

suchintakp5 commented Feb 6, 2025

adeola-ak commented Feb 10, 2025

suchintakp5 commented Feb 11, 2025

ExpectColumnValueLengthsToEqual is failing/raising exception when applied on a column having null values as well #10917

ExpectColumnValueLengthsToEqual is failing/raising exception when applied on a column having null values as well #10917

Comments

suchintakp5 commented Feb 6, 2025

adeola-ak commented Feb 10, 2025

suchintakp5 commented Feb 11, 2025