-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Prework
- [ X] Read and abide by the Pointblank code of conduct and contributing guidelines.
- [X ] Search for duplicates among the existing issues (both open and closed).
Proposal
I want to build an integration between pointblank validations and buckaroo. This will allow viewing the original dataframe inline, and highlighting errors. For reference Buckaroo is an interactive dataframe UI for notebooks, it lazily loads data into the browser and can scroll infinitely
To do this, I could use some help from pointblank.
Can you add a method that returns the original dataframe with "error_columns"
There could be a couple of approaches, but basically Buckaroo needs an additional column for each original column with a predictable name. This column should be null everywhere there isn't an error, when there is an error, just include the text explaining the error for that cell.
For pandas a multi-index of columns would be pretty cool, so make each column ('column_name', 'orig'), ('column_name', 'errors').
For polars you could make similar tuple columns.
Initially though just adding columns with a predictable name like 'column_name1__errors', 'column_name_foo__errors', will be easier for buckaroo. But that will inevitably run into some type of escaping error. Buckaroo can easily support this column name format right now. I'm very close to releasing multi-index column support and tuple column support.
What other cosiderations would go into a feature like that from your'alls side?
Here is a similar tool I built for pandera on top of Buckaroo
https://marimo.io/p/@paddy-mullen/buckaroo-pandera