You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug ExpectColumnValueLengthsToEqual is failing/raising exception when applied on a column having null values as well. in version 1.3.5. This was not failing in version 0.18
To Reproduce
import great_expectations as gx
import great_expectations.expectations as gxe
# Retrieve your Data Context
data_context = gx.get_context(mode="ephemeral")
# Define the Data Source name
data_source_name = "source_system_name_spark_dataframe"
# Add the Data Source to the Data Context
data_source = data_context.data_sources.add_spark(name=data_source_name)
# Define the Data Asset name
data_asset_name = "dataset_name"
# Add a Data Asset to the Data Source
data_asset = data_source.add_dataframe_asset(name=data_asset_name)
# Define the Batch Definition name
batch_definition_name = "dataset_batch_definition"
# Add a Batch Definition to the Data Asset
batch_definition = data_asset.add_batch_definition_whole_dataframe(
batch_definition_name
)
df = <A pyspark dataframe containing few null values in a string column>
batch_parameters = {"dataframe": df}
# Get the dataframe as a Batch
batch = batch_definition.get_batch(batch_parameters=batch_parameters)
test = gxe.ExpectColumnValueLengthsToEqual(column=<column_name>, value=<length>)
# Test the Expectation
validation_results = batch.validate(test, result_format="COMPLETE")
print(validation_results)
Expected behavior
No exception should be raised. For the null values, the length should be equal to zero or they should not be considered as part the expectation result
Environment (please complete the following information):
Great Expectations Version: [1.3.5]
Data Source: [Spark dataframe created from a csv file]
Cloud environment: [Azure Databricks]
Additional context
{
"success": false,
"expectation_config": {
"type": "expect_column_value_lengths_to_equal",
"kwargs": {
"column": "ID Number",
"value": 7.0,
"batch_id": "source_system_name_spark_dataframe-dataset_name"
},
"meta": {}
},
"result": {},
"meta": {},
"exception_info": {
"('column_values.value_length.map', '0464e137b2cdb1dd819e7ee85c081f95', ())": {
"exception_traceback": "Traceback (most recent call last):\n File \"/local_disk0/.ephemeral_nfs/envs/pythonEnv-70771ece-6841-4d7b-a9e8-4a8bc864ed04/lib/python3.9/site-packages/great_expectations/execution_engine/execution_engine.py\", line 532, in _process_direct_and_bundled_metric_computation_configurations\n metric_computation_configuration.metric_fn( # type: ignore[misc] # F not callable\n File \"/local_disk0/.ephemeral_nfs/envs/pythonEnv-70771ece-6841-4d7b-a9e8-4a8bc864ed04/lib/python3.9/site-packages/great_expectations/expectations/metrics/metric_provider.py\", line 99, in inner_func\n return metric_fn(*args, **kwargs)\n File \"/local_disk0/.ephemeral_nfs/envs/pythonEnv-70771ece-6841-4d7b-a9e8-4a8bc864ed04/lib/python3.9/site-packages/great_expectations/expectations/metrics/map_metric_provider/column_function_partial.py\", line 239, in inner_func\n ) = execution_engine.get_compute_domain(\n File \"/local_disk0/.ephemeral_nfs/envs/pythonEnv-70771ece-6841-4d7b-a9e8-4a8bc864ed04/lib/python3.9/site-packages/great_expectations/execution_engine/sparkdf_execution_engine.py\", line 800, in get_compute_domain\n data: pyspark.DataFrame = self.get_domain_records(domain_kwargs=domain_kwargs)\n File \"/local_disk0/.ephemeral_nfs/envs/pythonEnv-70771ece-6841-4d7b-a9e8-4a8bc864ed04/lib/python3.9/site-packages/great_expectations/execution_engine/sparkdf_execution_engine.py\", line 689, in get_domain_records\n data = data.filter(filter_condition.condition)\n File \"/databricks/spark/python/pyspark/instrumentation_utils.py\", line 48, in wrapper\n res = func(*args, **kwargs)\n File \"/databricks/spark/python/pyspark/sql/dataframe.py\", line 3123, in filter\n jdf = self._jdf.filter(condition)\n File \"/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py\", line 1321, in __call__\n return_value = get_return_value(\n File \"/databricks/spark/python/pyspark/errors/exceptions.py\", line 234, in deco\n raise converted from None\npyspark.errors.exceptions.ParseException: \n[PARSE_SYNTAX_ERROR] Syntax error at or near 'IS'.(line 1, pos 10)\n\n== SQL ==\nID Number IS NOT NULL\n----------^^^\n\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File \"/local_disk0/.ephemeral_nfs/envs/pythonEnv-70771ece-6841-4d7b-a9e8-4a8bc864ed04/lib/python3.9/site-packages/great_expectations/validator/validation_graph.py\", line 276, in _resolve\n self._execution_engine.resolve_metrics(\n File \"/local_disk0/.ephemeral_nfs/envs/pythonEnv-70771ece-6841-4d7b-a9e8-4a8bc864ed04/lib/python3.9/site-packages/great_expectations/execution_engine/execution_engine.py\", line 279, in resolve_metrics\n return self._process_direct_and_bundled_metric_computation_configurations(\n File \"/local_disk0/.ephemeral_nfs/envs/pythonEnv-70771ece-6841-4d7b-a9e8-4a8bc864ed04/lib/python3.9/site-packages/great_expectations/execution_engine/execution_engine.py\", line 537, in _process_direct_and_bundled_metric_computation_configurations\n raise gx_exceptions.MetricResolutionError(\ngreat_expectations.exceptions.exceptions.MetricResolutionError: \n[PARSE_SYNTAX_ERROR] Syntax error at or near 'IS'.(line 1, pos 10)\n\n== SQL ==\nID Number IS NOT NULL\n----------^^^\n\n",
"exception_message": "\n[PARSE_SYNTAX_ERROR] Syntax error at or near 'IS'.(line 1, pos 10)\n\n== SQL ==\nID Number IS NOT NULL\n----------^^^\n",
"raised_exception": true
}
}
}
The text was updated successfully, but these errors were encountered:
hi there. What is your Databricks Runtime version? I'm not able to replicate this. Can you also check if the issue occurs in both single-user and multi-user clusters? Have you tested this outside of Databricks
Describe the bug
ExpectColumnValueLengthsToEqual is failing/raising exception when applied on a column having null values as well. in version 1.3.5. This was not failing in version 0.18
To Reproduce
Expected behavior
No exception should be raised. For the null values, the length should be equal to zero or they should not be considered as part the expectation result
Environment (please complete the following information):
Additional context
The text was updated successfully, but these errors were encountered: