Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: multiprocessing the validation of a single file #1721

Open
pierrecamilleri opened this issue Dec 13, 2024 · 1 comment
Open
Labels
comms Documentation related issues feature New functionality

Comments

@pierrecamilleri
Copy link
Collaborator

pierrecamilleri commented Dec 13, 2024

Using the frictionless validate --parallel flag at the command line or validate(parallel = True) in the python code does not seem to trigger any parallel treatment.

Performance of the validation on a moderately large csv file (~30 mb) does not change with or without this option. In addition,
monitoring cpu shows that only 1 core seems to be solicited.

In the case of a datapackage, the command does not even run (see #1644)

To reproduce

time frictionless validate --schema schema.json --parallel data.csv vs time frictionless validate --schema schema.json data.csv

@pierrecamilleri
Copy link
Collaborator Author

After looking into the code, it does not seem to be meant for multiprocessing of a single file (which would really be a neat feature), but instead the parallel processing of files in a datapackage.

I tag feature for the feature request and documentation as there is a clear lack of documentation on this option.

@pierrecamilleri pierrecamilleri added feature New functionality comms Documentation related issues labels Dec 13, 2024
@pierrecamilleri pierrecamilleri changed the title parallel validation option does not seem to work feature request: multiprocessing the validation of a single file Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comms Documentation related issues feature New functionality
Projects
None yet
Development

No branches or pull requests

1 participant