Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cachetools exceptions when reading table metadata #1905

Open
1 of 3 tasks
bdilday opened this issue Apr 11, 2025 · 1 comment
Open
1 of 3 tasks

cachetools exceptions when reading table metadata #1905

bdilday opened this issue Apr 11, 2025 · 1 comment

Comments

@bdilday
Copy link
Contributor

bdilday commented Apr 11, 2025

Apache Iceberg version

0.8.0

Please describe the bug 🐞

While reading metadata via table.inspect.partitions(), have seen exceptions from the cachetools library. There are 2 different variants as shown below. Note that we are using multithreading and this might be a race condition when multiple threads alter the cache?

File "/site-packages/pyiceberg/table/inspect.py", line 314, in partitions    for manifest in snapshot.manifests(self.tbl.io):
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/site-packages/pyiceberg/table/snapshots.py", line 259, in manifests
return list(_manifests(io, self.manifest_list))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/site-packages/cachetools/_decorators.py", line 119, in wrapper
    cache[k] = v
    ~~~~~^^^
File "/site-packages/cachetools/__init__.py", line 217, in __setitem__
cache_setitem(self, key, value)\n  File "/site-packages/cachetools/__init__.py", line 79, in __setitem__
self.popitem()
File "/site-packages/cachetools/__init__.py", line 227, in popitem
key = next(iter(self.__order))
    ^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: OrderedDict mutated during iteration
File "/site-packages/pyiceberg/table/inspect.py", line 314, in partitions
for manifest in snapshot.manifests(self.tbl.io):
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/site-packages/pyiceberg/table/snapshots.py", line 259, in manifests
    return list(_manifests(io, self.manifest_list))\n                
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/site-packages/cachetools/_decorators.py", line 119, in wrapper
    cache[k] = v
    ~~~~~^^^
File "/site-packages/cachetools/__init__.py", line 217, in __setitem__
cache_setitem(self, key, value)\n  File "/site-packages/cachetools/__init__.py", line 79, in __setitem__
self.popitem()
File "/site-packages/cachetools/__init__.py", line 231, in popitem
return (key, self.pop(key))
^^^^^^^^^^^^^
File "/site-packages/cachetools/__init__.py", line 116, in pop
raise KeyError(key)\nKeyError: (\'s3a://.../metadata/snap-7359430581510295461-0-b204a3ad-087b-4a79-87fb-9fc023e258af.avro\',)

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time
@kevinjqliu
Copy link
Contributor

thanks for reporting this.

cachetools was introduced here and hasn't changed since 0.8.0

Note that we are using multithreading and this might be a race condition when multiple threads alter the cache?

i wonder if its due to the LRU cache size of 128. how many manifest list files are you typically reading?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants