Skip to content

Optimize file scanning in Filebeat's filestream #48686

@rdner

Description

@rdner

Describe the enhancement:

Looks like we exclude files in a wrong place.

When we match the glob expression we allocate memory for all the file paths matching the glob and we iterate through all of them checking the "excluded files" filter. I think we should not even add paths to this list if they're excluded during the glob resolution.

Perhaps we should introduce our optimized glob implementation that does not even list excluded files.

Another alternative would be a glob resolution as an iterator pattern (accepts a function for each iteration). Never allocates the entire list in memory, works file by file.

Describe a specific use case for the enhancement or feature:

Some users set a very broad glob expression in the path config. This glob expression may match hundreds of thousands of files. Users expect that setting a few patterns in the exclude_files would illuminate all unnecessary files. It does, however, in a very inefficient way.

Perhaps we can use some already existing implementations or borrow some principles.

https://burntsushi.net/ripgrep/ has the best overview, and links to some competing tools in Go like https://github.com/monochromegane/the_platinum_searcher/tree/master

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions