Row-level test failure documentation: pointblank extension package #638

petrbouchal · 2025-07-12T15:35:14Z

petrbouchal
Jul 12, 2025

I have been working through a use case with the following characteristics:

large dataset - millions of rows in a parquet file, hundreds of columns
need very specific information on which rows fail which check
this information is needed downstream to exclude rows failing different tests depending on the analysis done (e.g. with a payroll dataset, for calculating mean salary I want to exclude only those with implausible salary columns, but for making comparisons I want to exclude everyone with incorrectly recorded seniority level)

This turns out to be quite difficult with the current interrogation flow - the post-interrogation agent is quite large, one needs full-sample failure extracts (which contain the whole row, in our case with hundreds of columns) and needs to compile them manually, in-memory. This soon becomes unworkable in terms of memory, speed, and code maintainability.

The size of the input data can be dealt with via databases, the main problem is the size and manipulation of the failed rows.

To solve this, I created an extension package that accommodates this use case: it extracts per-row failure logs directly into a database, file, or R object without creating a large post-interrogation agent.

See https://petrbouchal.xyz/pointblankops/

The implemntation is: create a lightweight agent (called an operative), then instead of interrogating, debrief the operative (which is a lightweight version of interrogation which creates a row-level failure log instead of an agent with the report).

If this worked as a new "validation workflow" in pointblank, I would be happy for it to be incorporated, but the code as it is now has not been tested extensively for the various pointblank scenarios.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Row-level test failure documentation: pointblank extension package #638

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Row-level test failure documentation: pointblank extension package #638

Uh oh!

Uh oh!

petrbouchal Jul 12, 2025

Replies: 0 comments

petrbouchal
Jul 12, 2025