feat: add `withIterator` #154

43081j · 2025-07-02T11:45:07Z

Fixes #40

it isn't a true iterator of paths since we still read each directory as a batch under the hood. but it does mean we're only ever holding at most whats in one directory in memory

we could one day follow up on this by switching to opendir and reading entries one by one

if this implementation seems of any interest, let me know and i can review with you

@thecodrr @SuperchupuDev

Notable changes

Tests use an `execute` helper

An execute helper has been extracted so we can run the tests against all types without caring if its an iterator or not.

Now in a test, you just await execute(api, type); and it'll handle normalising the result.

`onlyCounts` tests disabled for `withIterator` type

It wouldn't throw if we tried, but onlyCounts doesn't make any sense since there's no result when using withIterator. It'd be an empty iteration. So we just don't test this in that case (as there'd be type errors too i suspect).

`group-files` now exposes a `pushGroup` parameter

The file grouping mechanism decides if to group or not, then goes ahead and pushes a group if it needs to.

With an iterator, we need a way to hook into the underlying push, not the outer groupFiles function.

For that reason, i've introduced a parameter, pushGroup which is (item, arr) => void.

This means we can pass a custom pushGroup in iterator mode which doesn't actually push to arr, but feeds it into the iterable. In all other cases, we use a default pushGroup which just pushes to arr.

`pushFile` and `pushDirectory` also have a `pushPath` parameter

For the same reason as groupFiles, both pushFile and pushDirectory now take a `pushPath.

This is used when we want to push a path, whether it be a directory or not, to the result set. In iterator mode, we can use this to feed it into the iterable. Otherwise, we push to this.state.paths (arr).

`Counts#directories` takes filters into account

The directories count used to be all directories we have visited, rather than directories we have matched. This means filters and such don't get taken into account (e.g. if we don't include dirs, we still count them right now. and if we filter dirs, we sometimes count the ones we filtered out).

Meanwhile, files count was always the files we have matched.

In iterator mode, we need a way to know how many paths (files or directories) we pushed so far as we don't have a set anywhere to check the length of. This would be as simple as directories + files if not for the inconsistency mentioned above.

So i have fixed the inconsistency, only incrementing directories for matched directories (i.e. those that end up in the result).

`WalkerIterator`

A basic iterator that essentially loops until the async callback of the existing Walker has been called.

On each iteration, it waits until we either push a new path or we complete (so isn't just mindlessly looping).

src/api/walker.ts

benmccann · 2025-07-02T15:07:46Z

src/api/walker.ts

    )
      return;

    const files = this.getArray(this.state.paths);
    for (let i = 0; i < entries.length; ++i) {
+      if (maxFiles && counts.directories + counts.files >= maxFiles) {


I wonder if this should happen after the push so that you only check it when adding something new vs checking it for every entry. It might be faster that way

Or maybe the check should happen inside the push methods and they can call the abort controller?

i did originally have it in the push function but it means you either need:

to give the pushFile and pushDirectory functions access to the counts and maxFiles

to do the check in the default pushPath callback, which means you need to remember to do it in any override ones too (such as the iterator)

the way it is now, we don't need to pass these things all the way down and can do it in one place

__tests__/fdir.test.ts

Doesn't work yet, but future James will sort that out.

Works almost but is an absolute trainwreck. Will write it better now that i know the problem.

Of course, we need a queue! Now things work because we're trying to async iterate a synchronous task within an asynchronous task.

Sometimes it is possible we complete before the next iteration has run, so we leave a promise dangling or some such thing.

We currently exceed `maxFiles` and slice the result later. This doesn't work in an iterator since there is no end "result". This change instead makes the walker stop walking when `maxFiles` is met, so we already have the correctly sized result set.

* refactor: return an `AsyncGenerator` instead * chore: update test

thecodrr

I am not sure if this is the best approach. It looks super hacky and error prone. I am sure we can find a better approach even if that requires rewriting/duplicating some parts of fdir instead of shoehorning a solution over the existing implementation.

43081j · 2025-07-11T14:30:34Z

can you expand on what parts you think are hacky?

the outer part is just a generator, whatever way we implement this, that'll have to exist somewhere

the way we hook into the what paths have been emitted is the way it is since i didn't want to change how we read directories under the hood.

if we change the way reading directories works, we could trigger the next read only when the next iteration happens i imagine.

thecodrr · 2025-07-12T05:08:54Z

can you expand on what parts you think are hacky?

The way the pushPath and pushGroup functions are passed down is not the most elegant. I also think an iterator should be different from the normal walker. There's no need to support onlyCounts and withGroups in an iterator. The WalkerIterator itself looks very confusing since it is basically overriding an existing implementation and I am sure there are edge cases it doesn't handle (e.g. what happens to items that enter the #queue while the for..loop is running?)

the way we hook into the what paths have been emitted is the way it is since i didn't want to change how we read directories under the hood.

if we change the way reading directories works, we could trigger the next read only when the next iteration happens i imagine.

It wouldn't be a bad idea to write separate directory walking logic for the iterator. We can reuse whatever parts we can.

43081j · 2025-07-12T11:44:16Z

There's no need to support onlyCounts and withGroups in an iterator

to be clear, it doesn't support onlyCounts. it does support withGroups

The WalkerIterator itself looks very confusing since it is basically overriding an existing implementation and I am sure there are edge cases it doesn't handle (e.g. what happens to items that enter the #queue while the for..loop is running?)

i was just trying to provide an iterator around what already exists. it can be written better if we don't need to use the existing implementation. worth noting it is a regular pattern for implementing an iterator of a mixture of async/sync though. i haven't done anything unusual here, its a fairly normal solution if we want to use the existing implementation.

it seems clear that you'd rather just use a new implementation, so i will revisit this some day when i get time.

43081j marked this pull request as ready for review July 2, 2025 13:16

SuperchupuDev reviewed Jul 2, 2025

View reviewed changes

src/api/walker.ts Outdated Show resolved Hide resolved

benmccann reviewed Jul 2, 2025

View reviewed changes

SuperchupuDev reviewed Jul 7, 2025

View reviewed changes

__tests__/fdir.test.ts Outdated Show resolved Hide resolved

43081j added 12 commits July 10, 2025 07:50

wip: add withIterator

87c7302

Doesn't work yet, but future James will sort that out.

wip: crazy next function

7c3bd05

Works almost but is an absolute trainwreck. Will write it better now that i know the problem.

wip: use a queue

a8bf947

Of course, we need a queue! Now things work because we're trying to async iterate a synchronous task within an asynchronous task.

fix: resolve on complete

5577c62

Sometimes it is possible we complete before the next iteration has run, so we leave a promise dangling or some such thing.

fix: support maxFiles without slicing

13af68e

We currently exceed `maxFiles` and slice the result later. This doesn't work in an iterator since there is no end "result". This change instead makes the walker stop walking when `maxFiles` is met, so we already have the correctly sized result set.

fix: abort the walker when returned early

d5dc023

fix: break walker if aborted

966dac8

fix: expose aborted state of walker

a3801e1

fix: only bail if aborted

f4ecba0

test: add interrupt tests

0013e5b

test: use conditional tests for aborts

4860e5f

fix: avoid nullish coalesce

8cc1f85

43081j force-pushed the iterator-madness branch from a15c95c to 8cc1f85 Compare July 10, 2025 06:50

SuperchupuDev and others added 2 commits July 11, 2025 11:55

refactor: return an AsyncGenerator instead (#1)

d04ed18

* refactor: return an `AsyncGenerator` instead * chore: update test

test: only push non-done values

913225b

thecodrr requested changes Jul 11, 2025

View reviewed changes

43081j mentioned this pull request Jul 13, 2025

alternative iterators API #160

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add `withIterator` #154

feat: add `withIterator` #154

Uh oh!

43081j commented Jul 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

benmccann Jul 2, 2025

Uh oh!

benmccann Jul 2, 2025

Uh oh!

43081j Jul 2, 2025

Uh oh!

Uh oh!

thecodrr left a comment

Uh oh!

43081j commented Jul 11, 2025

Uh oh!

thecodrr commented Jul 12, 2025

Uh oh!

43081j commented Jul 12, 2025

Uh oh!

Uh oh!

feat: add withIterator #154

Are you sure you want to change the base?

feat: add withIterator #154

Uh oh!

Conversation

43081j commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notable changes

Tests use an execute helper

onlyCounts tests disabled for withIterator type

group-files now exposes a pushGroup parameter

pushFile and pushDirectory also have a pushPath parameter

Counts#directories takes filters into account

WalkerIterator

Uh oh!

Uh oh!

benmccann Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

benmccann Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

43081j Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

thecodrr left a comment

Choose a reason for hiding this comment

Uh oh!

43081j commented Jul 11, 2025

Uh oh!

thecodrr commented Jul 12, 2025

Uh oh!

43081j commented Jul 12, 2025

Uh oh!

Uh oh!

feat: add `withIterator` #154

feat: add `withIterator` #154

43081j commented Jul 2, 2025 •

edited

Loading

Tests use an `execute` helper

`onlyCounts` tests disabled for `withIterator` type

`group-files` now exposes a `pushGroup` parameter

`pushFile` and `pushDirectory` also have a `pushPath` parameter

`Counts#directories` takes filters into account

`WalkerIterator`