-
-
Notifications
You must be signed in to change notification settings - Fork 948
Description
The --exec-batch parameter runs the specified command multiple times over the biggest possible batches of file names. However, if a lot of files are to be found, and this parameter is used, it very easily becomes the bottleneck.
Batches always (appear to) run sequentially, ignoring the --threads parameter. If the command takes a long time (which tends to happen when you throw 10000 filenames at a singular program lol), then every invocation just sits there until it is done, maxing out a single CPU core while 15 other cores and fd itself lay dormant.
It'd be nice to be able to get parallelism benefits of --exec with batching benefits of --exec-batch.
My particular use case right now is that I'm bored and I want to scan a lot of files with ClamAV. Its clamscan tool appears to run through files completely sequentially too, running on one CPU, and also always takes a few seconds on startup just to load its databases. Upon running, it seems to consume at most ~1.5GiB of RAM. I have 64GiB RAM and 16 cores, so it seems to be well within budget to run it in parallel on my machine.
In particular, I'm running this:
sudo fd --type f --exec-batch clamscan -i --no-summary
As of right now, it seems my only choices is to either have only one core used, or write some weird script that would batch stuff up and dispatch instances of clamscan myself (I'm too lazy for that though lol), or use --exec instead and have all cores fully used at cost of ~10000x the "loading database" overhead, since there would be one invocation per one file.
I hope I'm not missing something lol