Skip to content

feat: add withIterator #154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

Conversation

43081j
Copy link
Contributor

@43081j 43081j commented Jul 2, 2025

Fixes #40

it isn't a true iterator of paths since we still read each directory as a batch under the hood. but it does mean we're only ever holding at most whats in one directory in memory

we could one day follow up on this by switching to opendir and reading entries one by one

if this implementation seems of any interest, let me know and i can review with you

@thecodrr @SuperchupuDev

Notable changes

Tests use an execute helper

An execute helper has been extracted so we can run the tests against all types without caring if its an iterator or not.

Now in a test, you just await execute(api, type); and it'll handle normalising the result.

onlyCounts tests disabled for withIterator type

It wouldn't throw if we tried, but onlyCounts doesn't make any sense since there's no result when using withIterator. It'd be an empty iteration. So we just don't test this in that case (as there'd be type errors too i suspect).

group-files now exposes a pushGroup parameter

The file grouping mechanism decides if to group or not, then goes ahead and pushes a group if it needs to.

With an iterator, we need a way to hook into the underlying push, not the outer groupFiles function.

For that reason, i've introduced a parameter, pushGroup which is (item, arr) => void.

This means we can pass a custom pushGroup in iterator mode which doesn't actually push to arr, but feeds it into the iterable. In all other cases, we use a default pushGroup which just pushes to arr.

pushFile and pushDirectory also have a pushPath parameter

For the same reason as groupFiles, both pushFile and pushDirectory now take a `pushPath.

This is used when we want to push a path, whether it be a directory or not, to the result set. In iterator mode, we can use this to feed it into the iterable. Otherwise, we push to this.state.paths (arr).

Counts#directories takes filters into account

The directories count used to be all directories we have visited, rather than directories we have matched. This means filters and such don't get taken into account (e.g. if we don't include dirs, we still count them right now. and if we filter dirs, we sometimes count the ones we filtered out).

Meanwhile, files count was always the files we have matched.

In iterator mode, we need a way to know how many paths (files or directories) we pushed so far as we don't have a set anywhere to check the length of. This would be as simple as directories + files if not for the inconsistency mentioned above.

So i have fixed the inconsistency, only incrementing directories for matched directories (i.e. those that end up in the result).

WalkerIterator

A basic iterator that essentially loops until the async callback of the existing Walker has been called.

On each iteration, it waits until we either push a new path or we complete (so isn't just mindlessly looping).

@43081j 43081j marked this pull request as ready for review July 2, 2025 13:16
)
return;

const files = this.getArray(this.state.paths);
for (let i = 0; i < entries.length; ++i) {
if (maxFiles && counts.directories + counts.files >= maxFiles) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should happen after the push so that you only check it when adding something new vs checking it for every entry. It might be faster that way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe the check should happen inside the push methods and they can call the abort controller?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i did originally have it in the push function but it means you either need:

  • to give the pushFile and pushDirectory functions access to the counts and maxFiles
  • to do the check in the default pushPath callback, which means you need to remember to do it in any override ones too (such as the iterator)

the way it is now, we don't need to pass these things all the way down and can do it in one place

43081j added 12 commits July 10, 2025 07:50
Doesn't work yet, but future James will sort that out.
Works almost but is an absolute trainwreck. Will write it better now
that i know the problem.
Of course, we need a queue! Now things work because we're trying to
async iterate a synchronous task within an asynchronous task.
Sometimes it is possible we complete before the next iteration has run,
so we leave a promise dangling or some such thing.
We currently exceed `maxFiles` and slice the result later. This doesn't
work in an iterator since there is no end "result".

This change instead makes the walker stop walking when `maxFiles` is
met, so we already have the correctly sized result set.
@43081j 43081j force-pushed the iterator-madness branch from a15c95c to 8cc1f85 Compare July 10, 2025 06:50
SuperchupuDev and others added 2 commits July 11, 2025 11:55
* refactor: return an `AsyncGenerator` instead

* chore: update test
Copy link
Owner

@thecodrr thecodrr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this is the best approach. It looks super hacky and error prone. I am sure we can find a better approach even if that requires rewriting/duplicating some parts of fdir instead of shoehorning a solution over the existing implementation.

@43081j
Copy link
Contributor Author

43081j commented Jul 11, 2025

can you expand on what parts you think are hacky?

the outer part is just a generator, whatever way we implement this, that'll have to exist somewhere

the way we hook into the what paths have been emitted is the way it is since i didn't want to change how we read directories under the hood.

if we change the way reading directories works, we could trigger the next read only when the next iteration happens i imagine.

@thecodrr
Copy link
Owner

can you expand on what parts you think are hacky?

The way the pushPath and pushGroup functions are passed down is not the most elegant. I also think an iterator should be different from the normal walker. There's no need to support onlyCounts and withGroups in an iterator. The WalkerIterator itself looks very confusing since it is basically overriding an existing implementation and I am sure there are edge cases it doesn't handle (e.g. what happens to items that enter the #queue while the for..loop is running?)

the way we hook into the what paths have been emitted is the way it is since i didn't want to change how we read directories under the hood.

if we change the way reading directories works, we could trigger the next read only when the next iteration happens i imagine.

It wouldn't be a bad idea to write separate directory walking logic for the iterator. We can reuse whatever parts we can.

@43081j
Copy link
Contributor Author

43081j commented Jul 12, 2025

There's no need to support onlyCounts and withGroups in an iterator

to be clear, it doesn't support onlyCounts. it does support withGroups

The WalkerIterator itself looks very confusing since it is basically overriding an existing implementation and I am sure there are edge cases it doesn't handle (e.g. what happens to items that enter the #queue while the for..loop is running?)

i was just trying to provide an iterator around what already exists. it can be written better if we don't need to use the existing implementation. worth noting it is a regular pattern for implementing an iterator of a mixture of async/sync though. i haven't done anything unusual here, its a fairly normal solution if we want to use the existing implementation.

it seems clear that you'd rather just use a new implementation, so i will revisit this some day when i get time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stream API / Async Iterator API
4 participants