feature request: multiprocessing the validation of a single file #1721

pierrecamilleri · 2024-12-13T15:01:47Z

Using the frictionless validate --parallel flag at the command line or validate(parallel = True) in the python code does not seem to trigger any parallel treatment.

Performance of the validation on a moderately large csv file (~30 mb) does not change with or without this option. In addition,
monitoring cpu shows that only 1 core seems to be solicited.

In the case of a datapackage, the command does not even run (see #1644)

To reproduce

time frictionless validate --schema schema.json --parallel data.csv vs time frictionless validate --schema schema.json data.csv

The text was updated successfully, but these errors were encountered:

pierrecamilleri · 2024-12-13T16:27:04Z

After looking into the code, it does not seem to be meant for multiprocessing of a single file (which would really be a neat feature), but instead the parallel processing of files in a datapackage.

I tag feature for the feature request and documentation as there is a clear lack of documentation on this option.

pierrecamilleri added feature New functionality comms Documentation related issues labels Dec 13, 2024

pierrecamilleri changed the title ~~parallel validation option does not seem to work~~ feature request: multiprocessing the validation of a single file Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature request: multiprocessing the validation of a single file #1721

feature request: multiprocessing the validation of a single file #1721

pierrecamilleri commented Dec 13, 2024 •

edited

Loading

pierrecamilleri commented Dec 13, 2024

feature request: multiprocessing the validation of a single file #1721

feature request: multiprocessing the validation of a single file #1721

Comments

pierrecamilleri commented Dec 13, 2024 • edited Loading

To reproduce

pierrecamilleri commented Dec 13, 2024

pierrecamilleri commented Dec 13, 2024 •

edited

Loading