Open
Description
Overview
Adding skip_errors
to the validate()
function has a dramatic impact in the performance of the function:
The following is an output on some data I'm working on and it goes from 140ms
to 60s
when skipping errors:
In [12]: %time report = validate('./data/plans-barcelona-small.xlsx', skip_errors=['blank-row'])
CPU times: user 60 s, sys: 119 μs, total: 60 s
Wall time: 60 s
In [13]: %time report = validate('./data/plans-barcelona-small.xlsx')
CPU times: user 141 ms, sys: 12 μs, total: 141 ms
Wall time: 140 ms
In [14]: %time report = validate('./data/plans-barcelona-small.xlsx', skip_errors=['blank-row'])
CPU times: user 1min 2s, sys: 4 ms, total: 1min 2s
Wall time: 1min 2s
A small blank-rows.xlsx file can change from 100ms
to 500ms
by just skipping errors.
In [29]: %time report = validate('./data/blank-rows.xlsx', skip_errors=['blank-row'])
CPU times: user 590 ms, sys: 3.98 ms, total: 594 ms
Wall time: 593 ms
In [30]: %time report = validate('./data/blank-rows.xlsx')
CPU times: user 117 ms, sys: 5 μs, total: 117 ms
Wall time: 116 ms
Metadata
Metadata
Assignees
Labels
No labels