Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

glob is not reporting empty folders #908

Open
NicholasFiorentini opened this issue Oct 30, 2024 · 7 comments
Open

glob is not reporting empty folders #908

NicholasFiorentini opened this issue Oct 30, 2024 · 7 comments

Comments

@NicholasFiorentini
Copy link

When using S3FileSystem.glob pattern, empty folders aren't reported.

Expected Behaviour

The empty folders must be reported. Those "folders" are indeed listed by S3FileSystem.ls.

How to Reproduce

From the bucket's AWS S3 Console create an empty folder (create folder button) and a non-empty folder (a folder with a proper non-empty file).

Then,

>>> import s3fs
>>> s3 = s3fs.S3FileSystem(anon=False)
>>> s3.glob("s3://my-test-bucket/20241030-tests/**/*")
['my-test-bucket/20241030-tests/non-empty', 'my-test-bucket/20241030-tests/non-empty/timsdevel-test-folders.pdf']

Notice that:

>>> s3.ls("s3://my-test-bucket/20241030-tests/")
['my-test-bucket/20241030-tests/', 'my-test-bucket/20241030-tests/empty', 'my-test-bucket/20241030-tests/non-empty']

Information

  • Python 3.10.13
  • s3fs version 2024.10.0
@martindurant
Copy link
Member

The following test passes - am I understanding your situation incorrectly?

def test_glob_empty_folder(s3):
    s3.touch(f"{test_bucket_name}/glob/empty/")
    s3.touch(f"{test_bucket_name}/glob/not_empty/")
    s3.touch(f"{test_bucket_name}/glob/not_empty/file")

    out = s3.glob(f"{test_bucket_name}/glob/**/*")
    assert out == ['test/glob/empty', 'test/glob/not_empty', 'test/glob/not_empty/file']

@NicholasFiorentini
Copy link
Author

The following test passes - am I understanding your situation incorrectly?

I wonder if touch is equivalent to create folder from AWS Console.

@martindurant
Copy link
Member

I believe create-folder makes a zero-length file with "/" on the end; but it's possible (I can check) that touch cuts off that "/" for exactly the same reason.

@NicholasFiorentini
Copy link
Author

I believe create-folder makes a zero-length file with "/" on the end; but it's possible (I can check) that touch cuts off that "/" for exactly the same reason.

Thanks, I'll also have a look tomorrow to ensure you can replicate.

@martindurant
Copy link
Member

Indeed, if I change it to the following, it fails - so this is the reproducer to fix.

def test_glob_empty_folder(s3):
    s3.call_s3("put_object", Bucket=test_bucket_name, Key="glob/empty/")
    s3.touch(f"{test_bucket_name}/glob/not_empty/")
    s3.touch(f"{test_bucket_name}/glob/not_empty/file")

    out = s3.glob(f"{test_bucket_name}/glob/**/*")
    assert out == ['test/glob/empty', 'test/glob/not_empty', 'test/glob/not_empty/file']

@martindurant
Copy link
Member

The output of

>>> s3.find(f"{test_bucket_name}/glob", withdirs=True)
['test/glob', 'test/glob/empty/', 'test/glob/not_empty', 'test/glob/not_empty', 'test/glob/not_empty/file']

so, the question is: does really "test/glob/**/*" match "test/glob/empty/" ?

@NicholasFiorentini
Copy link
Author

so, the question is: does really "test/glob/**/*" match "test/glob/empty/" ?

The folder is matched when using the Path interface:

>>> list(Path(".").glob("test/**/*"))
[PosixPath('test/file.txt'), PosixPath('test/nonempty'), PosixPath('test/empty'), PosixPath('test/nonempty/file.txt')]

With the following folder structure:

lsla -R
Permissions Size User                Date Modified Name
drwxr-x---@    -   4 Dec 11:10  empty
.rw-r-----@    0   4 Dec 11:18  file.txt
drwxr-x---@    -   4 Dec 11:18  nonempty

./empty:

./nonempty:
Permissions Size User                Date Modified Name
.rw-r-----@    0   4 Dec 11:18  file.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants