You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SQL execution Exception executing validations, using SQL and Pandas data sources: "Duplicated field name in view schema: column_values.nonnull.unexpected_count"
#10926
Open
bluinchiostro opened this issue
Feb 11, 2025
· 0 comments
Bug description
SQL execution Exception executing validations, using SQL and Pandas data sources.
I created a File Data Context, connected to SQL data, filesystem data and Pandas Dataframes. I defined my expectation suite, created a validation definition and run it. While executing
validation_results = validation_definition.run() with SQL and Pandas datasource, I'm facing with this error:
Calculating Metrics: 86%|████████▌ | 24/28 [00:01<00:00, 20.39it/s]An SQL execution Exception occurred. OperationalError: "(psycopg2.OperationalError) **Duplicated field name in view schema: column_values.nonnull.unexpected_count**
DETAIL: java.sql.SQLException: Duplicated field name in view schema: column_values.nonnull.unexpected_count
[SQL: SELECT sum(CASE WHEN (field1 IS NULL) THEN %(param_1)s ELSE %(param_2)s END) AS "column_values.nonnull.unexpected_count", sum(CASE WHEN (field2 IS NULL) THEN %(param_3)s ELSE %(param_4)s END) AS "column_values.nonnull.unexpected_count", sum(CASE WHEN (field3 IS NULL) THEN %(param_5)s ELSE %(param_6)s END) AS "column_values.nonnull.unexpected_count", sum(CASE WHEN (field4 IS NULL) THEN %(param_7)s ELSE %(param_8)s END) AS "column_values.nonnull.unexpected_count"
FROM (SELECT *
FROM (SELECT * from my_table_data) AS anon_1
WHERE true) AS anon_1]
[parameters: {'param_1': 1, 'param_2': 0, 'param_3': 1, 'param_4': 0, 'param_5': 1, 'param_6': 0, 'param_7': 1, 'param_8': 0}]
(Background on this error at: https://sqlalche.me/e/14/e3q8)". Traceback: "Traceback (most recent call last):
File "d:\Veronica\Anaconda\envs\gxenv\lib\site-packages\sqlalchemy\engine\base.py", line 1900, in _execute_context
self.dialect.do_execute(
File "d:\Veronica\Anaconda\envs\gxenv\lib\site-packages\sqlalchemy\engine\default.py", line 736, in do_execute
cursor.execute(statement, parameters)
psycopg2.OperationalError: Duplicated field name in view schema: column_values.nonnull.unexpected_count
DETAIL: java.sql.SQLException: Duplicated field name in view schema: column_values.nonnull.unexpected_count
batch = batch_definition.get_batch(batch_parameters=batch_parameters)
result = validation_definition.run()
Expected behavior
My usecase requires the complete validation with Pandas data source.
Environment:
Operating System: tested in Windows and Linux
Great Expectations Version: 1.3.5
Data Source: Pandas, SQL, filesystem
Cloud environment: none
Additional context
It works if the expectation suite contains only one expectation (with the three different data sources); it works with multiple expectations inside the expectation suite only with filesystem data source. My usecase requires the complete validation of the dataset with Pandas data source, the other datasources have been added only for test. I checked the requirements.txt in this repository and my environment is alligned with this.
The text was updated successfully, but these errors were encountered:
Bug description
SQL execution Exception executing validations, using SQL and Pandas data sources.
I created a File Data Context, connected to SQL data, filesystem data and Pandas Dataframes. I defined my expectation suite, created a validation definition and run it. While executing
validation_results = validation_definition.run()
with SQL and Pandas datasource, I'm facing with this error:The dataframe hasn't duplicated field names.
To Reproduce
great_expectations.yml config:
config_version: 4.0
config_variables_file_path: uncommitted/config_variables.yml
plugins_directory: plugins/
stores:
data_docs_sites:
My code:
"""Retrieve the dataframe Batch Definition"""
"""Get the dataframe as a Batch"""
Expected behavior
My usecase requires the complete validation with Pandas data source.
Environment:
Additional context
It works if the expectation suite contains only one expectation (with the three different data sources); it works with multiple expectations inside the expectation suite only with filesystem data source. My usecase requires the complete validation of the dataset with Pandas data source, the other datasources have been added only for test. I checked the requirements.txt in this repository and my environment is alligned with this.
The text was updated successfully, but these errors were encountered: