Skip to content

Conversation

@Gofastasf
Copy link
Contributor

@Gofastasf Gofastasf commented Mar 29, 2025

  • Runtime switching of godirwalk and filepath.WalkDir
  • Default to godirwalk as it still keeps 15 - 30% efficiency memory wise.
  • Benchmarked godirwalk vs filepath.WalkDir

Related to #212

@Gofastasf Gofastasf requested a review from alex-aizman as a code owner March 29, 2025 19:58
@Gofastasf
Copy link
Contributor Author

Gofastasf commented Mar 29, 2025

Here is the benchmark code I ran.

https://go.dev/play/p/oBjMnTFm6Sn

@alex-aizman
Copy link
Member

will look at it on Mon; now running github workflows: lint and integration tests

@alex-aizman
Copy link
Member

alex-aizman commented Mar 29, 2025

lint already failed, take a look. Also, maybe rebase on top

make lint

Must pass locally.

@alex-aizman
Copy link
Member

running more tests now; separately, note:

@alex-aizman
Copy link
Member

There are two test fails:

Secondly, benchmark must be part of the commit - goes under bench/micro

I slightly modified it as follows:

@@ -68,12 +68,19 @@
        return dir
 }

+const (
+       numFilesDflt = 500_000   // NOTE: 50K files is way too small
+       scratchDftl  = 64 * 1024
+       depthDflt    = 16
+)
+
 // BenchmarkGodirwalkLargeDir benchmarks godirwalk on a flat directory.
 func BenchmarkGodirwalkLargeDir(b *testing.B) {
-       dir := setupLargeDir(b, 50000)
+       dir := setupLargeDir(b, numFilesDflt)
        defer os.RemoveAll(dir)
-       scratch := make([]byte, 32*1024)
+       scratch := make([]byte, scratchDftl)

+       b.ResetTimer()
        var count int64
        for i := 0; i < b.N; i++ {
                err := godirwalk.Walk(dir, &godirwalk.Options{
@@ -94,9 +101,10 @@
...

Results on my machine:

$ go test -bench=. -benchtime=10s -benchmem
goos: linux
goarch: amd64
pkg: github.com/NVIDIA/aistore/tmp
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
BenchmarkGodirwalkLargeDir-16                 20         509249643 ns/op        69046790 B/op    1550438 allocs/op
BenchmarkWalkDirLargeDir-16                   19         541173847 ns/op        99095748 B/op    1553091 allocs/op
BenchmarkGodirwalkDeepTree-16                 25         407719713 ns/op        102886230 B/op   1540834 allocs/op
BenchmarkWalkDirDeepTree-16                   25         406888381 ns/op        132332444 B/op   1540819 allocs/op
PASS
ok      github.com/NVIDIA/aistore/tmp   268.725s

There are a couple observations but most importantly, godirwarlk still maintains a fairly significant advantage memory-wise.

@alex-aizman
Copy link
Member

finally, I'd recommend a bench that runs on multiple cores, as in:

  • b.RunParallel(func(pb *testing.PB) { ... }

More realistic, given that we never walk filesystems in a serial single-threaded way.

Copy link
Contributor

@VirrageS VirrageS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And as @alex-aizman mentioned we have to have benchmarks as part of the commit.

fs/walkbck.go Outdated
}

func (j *joggerBck) cb(fqn string, de DirEntry) error {
func (j *joggerBck) cb(fqn string, de fs.DirEntry, err error) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we check if err != nil in this function?

fs/walk.go Outdated
import (
"context"
iofs "io/fs"
"io/fs"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use iofs "io/fs" across the board

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@Gofastasf
Copy link
Contributor Author

@alex-aizman @VirrageS Added benchmark code. Do let me know if there is anymore correction needed. The test failure seems to be not related to code.

@Gofastasf
Copy link
Contributor Author

Gofastasf commented Apr 1, 2025

Benchmark on my macine.

goos: linux
goarch: amd64
pkg: github.com/NVIDIA/aistore/bench/walk
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
BenchmarkGodirwalkLargeDir-8          82         437096095 ns/op        67334033 B/op    1512335 allocs/op
BenchmarkWalkDirLargeDir-8            90         421606610 ns/op        97212893 B/op    1511242 allocs/op
BenchmarkGodirwalkDeepTree-8         132         249528975 ns/op        101420901 B/op   1508148 allocs/op
BenchmarkWalkDirDeepTree-8           129         273513059 ns/op        130853701 B/op   1508303 allocs/op
PASS
ok      github.com/NVIDIA/aistore/bench/walk    821.341s

As godirwalk is more memory efficient do you want to keep godirwalk as it is now ?

@alex-aizman
Copy link
Member

started now github workflows on your last commit

As godirwalk is more memory efficient do you want to keep godirwalk as it is now ?

That is certainly the main question as godirwalk maintains a 15% and 30% edge over the standard library. Which was not surprising 3 years ago but it is, frankly, somewhat of a surprise today...

I don't know. How about a build tag? What do you think? Ideas welcome.

@Gofastasf
Copy link
Contributor Author

Gofastasf commented Apr 3, 2025

Sorry for the delay. Got caught up with something else. Will be fixing this asap.

That is certainly the main question as godirwalk maintains a 15% and 30% edge over the standard library. Which was not surprising 3 years ago but it is, frankly, somewhat of a surprise today...

Actually I started off this seeing so many other repos migrating to std lib WalkDir referencing the blog which benchmarked with linux kernel as test data. The godirwalk maintainer also stopped maintaining it linking the reference to that blog.
But it was kind of misleading. That benchmark ignored the buffer option and hence giving massive performance. It was benchmarked with hyperfine and was also not considering memory efficiency.

How about a build tag? What do you think? Ideas welcome.

That would be a good idea to consider but I also came to know about this lib fastwalk which is a parallel version of filepath.WalkDir.

According to fastwalk benchmark they put on their repo filepath.WalkDir has significant perf benifit to godirwalk
image

But will have look into their benchmark code carefully to confirm. Do you think there were any discrepancies in our benchmark ?

@alex-aizman
Copy link
Member

alex-aizman commented Apr 3, 2025

Never heard about it. Quote:

"Package fastwalk provides a fast parallel version of filepath.WalkDir that is ~2.5x faster on macOS, ~4x faster on Linux, ~6x faster on Windows, allocates 50% less memory, and requires 25% fewer memory allocations. Additionally, it is ~4-5x faster than godirwalk across OSes."

Well, that's quite a claim to make. Unclear though why do they not appear to be very popular...

Anyway, back to this PR. Other than the benchmark (a solid piece of code) most of the changes look to me quite cosmetic:

--- a/ais/tgttxn.go
+++ b/ais/tgttxn.go
@@ -7,6 +7,7 @@ package ais
 import (
        "errors"
        "fmt"
+       iofs "io/fs"
        "net/http"
        "net/url"
        "os"
@@ -1001,7 +1002,7 @@ func prmScan(dirFQN string, prmMsg *apc.PromoteArgs) (fqns []string, totalN int,
                cksum      *cos.CksumHash
                autoDetect = !prmMsg.SrcIsNotFshare || !cmn.Rom.Features().IsSet(feat.DontAutoDetectFshare)
        )
-       cb := func(fqn string, de fs.DirEntry) (err error) {
+       cb := func(fqn string, de iofs.DirEntry, _ error) (err error) {
                if de.IsDir() {
                        return
                }
@@ -1021,7 +1022,7 @@ func prmScan(dirFQN string, prmMsg *apc.PromoteArgs) (fqns []string, totalN int,
                cksum = cos.NewCksumHash(cos.ChecksumCesXxh)
        }
        if prmMsg.Recursive {
-               opts := &fs.WalkOpts{Dir: dirFQN, Callback: cb, Sorted: true}
+               opts := &fs.WalkOpts{Dir: dirFQN, Callback: cb}
...

I wonder if it'd be possible to absorb it into some sort of runtime polymorphism - at runtime choice of one possible walk or the other. With default being godirwalk at least for now.

As far as fastwalk it's your call. For starters I'd be skeptical but you never know...

Finally, there's a different idea of what do we REALLY need from this type "walking" functionality. Hint: think NVMe drives.

But - one step at a time. Let's get to some closure with this PR.

PS. Please read and accept CONTRIBUTING.md (sign-off == acceptance)

@Gofastasf
Copy link
Contributor Author

I dont recommend using fastwalk as its fairly new and nobody seems to be using it atleast for now. I was just hinting at their benchmark numbers and std lib WalkDir was performing much better than godirwalk in their benchmark.

I wonder if it'd be possible to absorb it into some sort of runtime polymorphism - at runtime choice of one possible walk or the other. With default being godirwalk at least for now.

Build tag would be an easier integration I think. What do you think ?

@alex-aizman
Copy link
Member

sure, let's do build tag. There are pros and cons, we'll see how it looks

@Gofastasf
Copy link
Contributor Author

Gofastasf commented Apr 7, 2025

Hey I went with runtime switching. No rebuild needed easier switching and extensible. Do let me know if there is any correction needed or if you want to do build tag. Thanks!

@Gofastasf Gofastasf changed the title Replace: godirwalk with filepath.WalkDir feat: runtime switching for godirwalk and filepath.WalkDir and benchmark. Apr 7, 2025
@alex-aizman
Copy link
Member

I'll check later. Meantime, started all github workflows: lint, et al.

fs/walk.go Outdated
}
if cmn.IsErrObjLevel(err) {
ew.counter.Inc()
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In godirwalk we skip the entire directory by returning godirwalk.SkipNode. Should we do a similar thing for a standardWalk? We can return filepath.SkipDir: https://pkg.go.dev/path/filepath?utm_source=godoc#SkipDir

Copy link
Contributor Author

@Gofastasf Gofastasf Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think nil fits better here. As per the WalkDirFunc docs (https://pkg.go.dev/io/fs#WalkDirFunc), SkipDir skips the current directory if d.IsDir() is true, but for files, it skips the parent directory. Returning nil skips just the node (file or directory) and continues the walk, which aligns with godirwalk.SkipNode’s behavior (https://pkg.go.dev/github.com/karrick/godirwalk#SkipNode).

@alex-aizman
Copy link
Member

alex-aizman commented Apr 8, 2025

I am ready to merge but I need you to sign-off as previously mentioned

The following is optional (but still desirable), so let me know.

  1. rebase on top of the current main; update linter as follows: make lint-update; run make lint and PASS it
  2. add fs/walk_godir.go and fs/walk_stdlib.go, and move the corresponding sources accordingly
  3. remove:
var useWalker walker = &godir{}
 
func UseGodirWalker()  { useWalker = &godir{} }
func UseStdlibWalker() { useWalker = &stdlib{} }

None of it it is actually needed. Use //nolint to overcome lint errors (that will show up). Hardcode godir walker with a BIG BOLD COMMENT on how to switch to the standard one.

PS. Maybe in a separate commit.

@Gofastasf
Copy link
Contributor Author

Gofastasf commented Apr 9, 2025

I need you to sign-off as previously mentioned

I realised it was GPG signed and wont show up on commit message after the CI was ran so thought to do it handle while merging.

@alex-aizman
Copy link
Member

I need you to sign-off as previously mentioned

I realised it was GPG signed and wont show up on commit message

No, that's git commit -s option. As in: https://git-scm.com/docs/git-commit#Documentation/git-commit.txt--s

See https://github.com/NVIDIA/aistore/blob/main/CONTRIBUTING.md#signing-off-commits

@Gofastasf
Copy link
Contributor Author

Hey sorry I meant I did GPG signing (-S) instead of the other one (-s).

- Add runtime switching with default as godirwalk for better memory efficiency.
- Benchmark godirwalk vs filepath.WalkDir.

Signed-off-by: Gofastasf <[email protected]>
@Gofastasf
Copy link
Contributor Author

Hey I think it's ready to get merged now!

@alex-aizman
Copy link
Member

@Gofastasf thank you so much. Incredible work and perseverance.

@alex-aizman alex-aizman merged commit 8b41992 into NVIDIA:main Apr 10, 2025
13 checks passed
@Gofastasf
Copy link
Contributor Author

@alex-aizman Hey thank you so much. Means a lot!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants