Linux specific improvements (BIG potential speedup)

Hi All,

Links to be found here

https://crates.io/crates/fdf

https://github.com/alexcu2718/fdf


https://github.com/alexcu2718/fdf/tree/main/fd_benchmarks


I've made a rough skeleton copy of fd.

The reason I've done this was learning rust and C and I am terribly disorganised and want to combine my efforts into something where I utilise both.



So I do the *natural* thing and...make an overly complicated tool to fight it. I'm a genius, you don't need to tell me.


I have replicated about 30-40% of the features, note that I can't be too bothered to recreate the rest.

I'm posting this here as a question to see if the maintainers would want me to further develop my idea---commit to this project., or take it into my own project.




Some small issues I haven't bothered to touch yet because still VERY work-in-progress.

1. I have not implemented custom errors, it's pretty Box dyn error style... (Handle errors at the wrapup...not early on)

2. I believe my parallelism attempts are far from ideal, I think I can develop my traversal strategy to be much more refined.

3. I do not know how reliable my methodology would be on eg: btrfs or ext2, due to using basic cheap syscalls to do so.






Quick rundown of methodology:




I basically remade a read_dir that uses inputs and outputs raw bytes, this is handy because I can pass it to regex without any cost ( and also recurse without any overhead!)

I've minimised heap allocations, not enough I believe, I'm still very new to C-RUST.

By using cheaper syscalls than eg fstat, I manage to keep the speed pretty damn good. I do get a lot of metadata for free. Notable exceptions are symlinks/executables, the speed for filtering these is still faster than fd.


There's a lot of unsafe code in here, mostly raw pointer casts, I've tested it on a recent Arch+Debian install and it works out a lot quicker/no issues of UB.







NOTE:

I HAVE NOT DONE THE 'NO PATTERN' as there's some weird bugs for them not aligning.
(There's weird issues with either truncation or an extra slash being added? Not sure,
given the fact the rest of the benchmarks are spot on, I'm wondering if it's temporary files or whatever)


the benchmarks seen here are 100% matching(IT IS MUCH FASTER though)

The following benchmarks (works on my machine TM)

Command 	Mean [ms] 	Min [ms] 	Max [ms] 	Relative
fdf -HI '.*[0-9]\.jpg$' '/home/alexc' 	354.1 ± 1.3 	352.6 	356.6 	5.88 ± 0.08
fdf  '.*[0-9]\.jpg$' '/home/alexc' 	60.2 ± 0.8 	59.1 	63.8 	1.00
fd -HI '.*[0-9]\.jpg$' '/home/alexc' 	460.0 ± 13.8 	446.8 	490.4 	7.64 ± 0.25
fd '.*[0-9]\.jpg$' '/home/alexc' 	152.2 ± 1.1 	150.4 	154.8 	2.53 ± 0.04

Command 	Mean [ms] 	Min [ms] 	Max [ms] 	Relative
fdf -HI --extension 'jpg' '' '/home/alexc' 	451.2 ± 2.7 	447.8 	456.0 	1.00
fd -HI --extension 'jpg' '' '/home/alexc' 	669.9 ± 13.0 	659.1 	703.1 	1.48 ± 0.03


Command 	Mean [ms] 	Min [ms] 	Max [ms] 	Relative
fdf .  '/home/alexc' -HI --type l 	489.0 ± 2.2 	484.6 	491.7 	1.00
fd -HI '' '/home/alexc' --type l 	622.2 ± 3.2 	616.3 	625.9 	1.27 ± 0.01




I will say that developing this has some pretty IFFY* choices performance wise in some regards, mostly I wanted to get the main skeleton working. I'm also aware I might need to totally redesign some aspects, what do you expect from a guy who's been learning for only 4 months when he's sick of his shitty python/bash/C# job.



(*though I think my DirEntry is pretty damn good efficiency wise!)



So,

Please let me know your thoughts. If you'd like me to do a proper rewrite and you'd accept the code(if it looked good), I'd be happy to do so.


Thanks,

Alex









Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Linux specific improvements (BIG potential speedup) #1687

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Linux specific improvements (BIG potential speedup) #1687

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions