Skip to content

Add a method to get error_reports per_column in a dataframe #237

@paddymul

Description

@paddymul

Prework

Proposal

I want to build an integration between pointblank validations and buckaroo. This will allow viewing the original dataframe inline, and highlighting errors. For reference Buckaroo is an interactive dataframe UI for notebooks, it lazily loads data into the browser and can scroll infinitely

To do this, I could use some help from pointblank.

Can you add a method that returns the original dataframe with "error_columns"

There could be a couple of approaches, but basically Buckaroo needs an additional column for each original column with a predictable name. This column should be null everywhere there isn't an error, when there is an error, just include the text explaining the error for that cell.

For pandas a multi-index of columns would be pretty cool, so make each column ('column_name', 'orig'), ('column_name', 'errors').

For polars you could make similar tuple columns.

Initially though just adding columns with a predictable name like 'column_name1__errors', 'column_name_foo__errors', will be easier for buckaroo. But that will inevitably run into some type of escaping error. Buckaroo can easily support this column name format right now. I'm very close to releasing multi-index column support and tuple column support.

What other cosiderations would go into a feature like that from your'alls side?

Here is a similar tool I built for pandera on top of Buckaroo
https://marimo.io/p/@paddy-mullen/buckaroo-pandera

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions