Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isn't your Benchmark misleading? #1661

Open
bedilbek opened this issue Jan 23, 2025 · 5 comments
Open

Isn't your Benchmark misleading? #1661

bedilbek opened this issue Jan 23, 2025 · 5 comments
Labels

Comments

@bedilbek
Copy link

What version of fd are you using?
fd 10.2.0

As I understood correctly, by default fd tries to use all available CPU cores and I think that's most of the benefit that it's getting for better performance.

It would be better to explicitly write it in the Benchmark section and also show the comparison when only 1 thread is used via --threads=1.

@tmccombs
Copy link
Collaborator

The third bullet point in the features list on the README, explicitly states that the speed is due to being parallelized:

Very fast due to parallelized directory traversal.

(emphasis mine)

find is always single-threaded.

Using all of your cores is why fd is faster.

@tavianator
Copy link
Collaborator

Well, fd -j1 is also faster than find for me, generally. But the parallelism is why it's significantly faster in practice.

@aidaho
Copy link

aidaho commented Feb 1, 2025

I've heard fd is much faster than find, which is something that was of interest to me, since my /home contains over 33 million files inside a half a million of dirs. As you can imagine, file searches are not particularly fast.
I've benchmarked find and fd versions available in Debian 12.

I've tried both hot cache and cold cache regex searches where fd was advertised as being massively faster.

Hot cache:

time find ~/ -xdev -iregex "^config.*\.json$"
real    0m14.673s
user    0m8.657s
sys     0m5.962s

time fd --unrestricted --xdev "^config.*\.json$" ~/
real    0m18.670s
user    1m15.239s
sys     1m48.359s

On a hot cache fd is insignificantly slower, while consuming massive 12.5x CPU time.

echo 3 > /proc/sys/vm/drop_caches

time find ~/ -xdev -iregex "^config.*\.json$"
real    0m25.325s
user    0m8.953s
sys     0m8.514s

time fd --unrestricted --xdev "^config.*\.json$" ~/
real    0m19.849s
user    1m12.760s
sys     1m53.222s

On a cold cache fd is insignificantly faster, maintaining roughly the same one order of magnitude overhead in CPU time.

@tavianator
Copy link
Collaborator

fd versions available in Debian 12.

What version is this? Looks like 8.6.0? It would be worth trying a newer version, there are a lot of performance improvements introduced in 9.0.0

@tavianator tavianator reopened this Feb 2, 2025
@aidaho
Copy link

aidaho commented Feb 2, 2025

I've build current master with release target. Here are the results:

Hot cache:

time ~/temp/fd/target/release/fd -HI --unrestricted --color never --xdev "^config.*\.json$" ~/
real    0m3.675s
user    0m21.727s
sys     0m15.676s

fd was 4x faster than find, while consuming 2.5x more CPU time.

Cold cache:

time ~/temp/fd/target/release/fd -HI --unrestricted --color never --xdev "^config.*\.json$" ~/
real    0m5.645s
user    0m19.529s
sys     0m23.103s

fd was 4.5x faster while maintaining the same 2.5x CPU time overhead.

In conclusion, it is true, that current fd master is indeed significantly faster than find.
Yet perhaps not nearly as fast as it is claimed to be.

I do agree with the title of the issue: current benchmarks look misleading to me.
Putting a warning into readme about the performance of the old prebuilt versions might be a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants