-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Description
This is not included in the migration guide, so I need a bit of guidance. In our v0 implementation we are using a RuntimeBatchRequest
object where we pass the dictionary 'batch_identifiers' that later appears in the result json:
batch_request = RuntimeBatchRequest(
datasource_name=data_source_name,
data_connector_name="dataframe_connector",
data_asset_name=data_asset_name,
batch_identifiers={
"effective_date": reference_date,
"layer_name_type_identifier": pipeline_stage,
"run_id": f"dq_run_at_{datetime.date.today().strftime('%Y%m%d_%H%M%S')}",
"source_system_type_identifier": domain,
"data_product_identifier": "Other",
"showstopper": is_blocking,
},
runtime_parameters={"batch_data": df},
)
I haven't yet found a way to pass these batch_identifiers
anywhere at the v1 implementation. I'm evaluating in-memory spark dataframes as following:
batch_definition = context.data_sources.get(data_source_name).get_asset(data_asset_name).get_batch_definition(batch_definition_name)
suite = context.suites.get(expectation_suite_name)
validation_definition = context.validation_definitions.add(
gx.ValidationDefinition(data=batch_definition, suite=suite, name=validation_definition_name)
)
checkpoint = gx.Checkpoint(
name=checkpoint_name,
validation_definitions=[validation_definition],
result_format={
"result_format": "SUMMARY",
"partial_unexpected_count": 50,
"unexpected_index_column_names": primary_keys,
},
)
return checkpoint.run(
batch_parameters={"dataframe": df},
)
Where do the batch_identifiers identifiers fit in the logic for v1 above? I went through all documentation and I haven't found the answer yet. Any help is highly appreciated. Thank you in advance!
Metadata
Metadata
Assignees
Labels
No labels