You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using the frictionless validate --parallel flag at the command line or validate(parallel = True) in the python code does not seem to trigger any parallel treatment.
Performance of the validation on a moderately large csv file (~30 mb) does not change with or without this option. In addition,
monitoring cpu shows that only 1 core seems to be solicited.
In the case of a datapackage, the command does not even run (see #1644)
After looking into the code, it does not seem to be meant for multiprocessing of a single file (which would really be a neat feature), but instead the parallel processing of files in a datapackage.
I tag feature for the feature request and documentation as there is a clear lack of documentation on this option.
pierrecamilleri
changed the title
parallel validation option does not seem to work
feature request: multiprocessing the validation of a single file
Dec 13, 2024
Using the
frictionless validate --parallel
flag at the command line orvalidate(parallel = True)
in the python code does not seem to trigger any parallel treatment.Performance of the validation on a moderately large csv file (~30 mb) does not change with or without this option. In addition,
monitoring cpu shows that only 1 core seems to be solicited.
In the case of a datapackage, the command does not even run (see #1644)
To reproduce
time frictionless validate --schema schema.json --parallel data.csv
vstime frictionless validate --schema schema.json data.csv
The text was updated successfully, but these errors were encountered: