Skip to content

alternative iterators API #160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

43081j
Copy link
Contributor

@43081j 43081j commented Jul 13, 2025

an alternative to #154

draft until one day we can settle on a direction. probably end up with a 3rd branch once this one has been reviewed

43081j and others added 17 commits July 10, 2025 07:50
Doesn't work yet, but future James will sort that out.
Works almost but is an absolute trainwreck. Will write it better now
that i know the problem.
Of course, we need a queue! Now things work because we're trying to
async iterate a synchronous task within an asynchronous task.
Sometimes it is possible we complete before the next iteration has run,
so we leave a promise dangling or some such thing.
We currently exceed `maxFiles` and slice the result later. This doesn't
work in an iterator since there is no end "result".

This change instead makes the walker stop walking when `maxFiles` is
met, so we already have the correctly sized result set.
* refactor: return an `AsyncGenerator` instead

* chore: update test
Seems a waste jumping through hoops with sync-async when we can just
inline cleaner logic.
state.visited.push(crawlPath);

try {
const entries = await promisify(fs.readdir)(crawlPath || ".", {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this too expensive? it's promisifying the function each time it calls a directory

Copy link
Contributor Author

@43081j 43081j Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a TODO somewhere saying this exact thing

i did already know this when writing it, but wasn't sure at the time what to do with it yet. since someone can change the fs under the hood, we can't really hold a promisified reference around in case they change it. or we do, and just assume nobody will mutate things under us (my preferred solution)

3rd option is we change FSLike to include promises - so custom FS implementations must provide some async methods then

@thecodrr
Copy link
Owner

I was able to get a prototype working with full iterator support using opendir. Will open a PR soon for testing and review. It's still a little rough but looks "okay" overall.

@43081j
Copy link
Contributor Author

43081j commented Jul 14, 2025

this is a "full iterator", switching it to use opendir is trivial now since its just an implementation detail

do you mean you made a new branch unrelated to this and my other one?

if that's the case, feedback would have been nice as these two branches took a lot of effort. and every comment that's been made was stuff i was already well aware of, but didn't want scope creep. i just needed to know we were open to larger change and would have done it

i just wish you said sooner so i didn't spend so much time building this 😅

@thecodrr
Copy link
Owner

do you mean you made a new branch unrelated to this and my other one?

Not completely unrelated since the whole thing is still based on your PR (and idea). I was experimenting yesterday with your PR trying different approaches and came upon it. Still not sure how it performs or even if it is merge worthy yet.

i just wish you said sooner so i didn't spend so much time building this 😅

I am sorry if the above comment came across as me undermining your work. My intention was to share a different approach to the same problem and see if we can reach a better solution overall. Especially since its a fundamentally different approach, I couldn't just leave a review.

I hope that clarifies things.

@43081j
Copy link
Contributor Author

43081j commented Jul 14, 2025

No worries at least this is still somewhat useful to help us figure out how we want to implement this

To be honest, I think we need to rework the various push functions. Split them into test functions ("should we push this"), and push functions (actually push it).

Then the iterator can share the same tests but do its own pushing (yielding).

If we switch to opendir, we won't need any of the push functions but will need the test functions (to avoid duplicating conditional logic)

@thecodrr
Copy link
Owner

@43081j that is certainly a design limitation right now. The main idea behind this approach was to reduce branching on the hot paths e.g. if you don't use a filter, the push function should be inlined by the engine. Is there a way to separate the "check" part and the "push" part without introducing branching at the callsite?

@43081j
Copy link
Contributor Author

43081j commented Jul 14, 2025

Maybe we can have a "file filter" and a "directory filter" we build up at initialisation time, the same way we do with push functions now

Then any time we want to push something, we call the appropriate filter and just directly push (or have a separate push function)

So basically you would fileFilter.build(options) or something. Which checks if there's filters and returns a static truthy function if there aren't

@thecodrr
Copy link
Owner

I see no way to yield without at least if condition. Even if we build a filter on init, we'll have to check its result inside the loop. The only other alternative I found was to build a generator version of pushFile and pushDirectory and then just yield *. Unfortunately, that is very slow compared to just doing if then yield.

@43081j
Copy link
Contributor Author

43081j commented Jul 15, 2025

yes we would need a condition either way, like this:

if (this.filterFileFn(path)) {
  yield path;
}

where:

filterFileFn = filterFile.build(options, state);

// and elsewhere

const allFilter = () => true;
function build(options, state) {
  if (!options.filters) {
    return allFilter;
  }

  // etc...
}

is your concern that the non-iterator code can bypass this because it won't have a condition? it'll just always push(path) and rely on the fact the push function does no conditional logic (if no filters were set)?

there's not really a clean way to achieve that when yielding i think. but a no-op function that is always true should be pretty fast

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants