-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrap cache-file-management operations for CachingFileSystem in thread locks #1127
base: master
Are you sure you want to change the base?
Conversation
fsspec/implementations/cached.py
Outdated
existcond = all(os.path.exists(storage) for storage in self.storage) | ||
if timecond or not existcond: | ||
self.load_cache() | ||
with self.lock: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.load_cache
has a lock and I think all the other points are either safe for parallel execution (nothing binding to self, and file presence check is ok to not be locked), so why to lock here?
Overall -- please analyze the code and see if may be we could avoid as much locking, and may be even avoid using reentering lock (RLock) by clearly identifying "problematic" code blocks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yarikoptic I've removed this lock. I don't think any more can be removed, and RLock
is needed due to things like clear_cache()
.
is this PR enough to revert datalad/datalad-fuse#83 and get dandi healthcheck work without errorring out? if so -- do we see any more sensible performance from healthcheck? |
if this is a PR which is currently tested on drogon, it seems it did improve situation over when locked at datalad-fuse level. I see more of MATLAB processes getting at least somewhat busy, total load is about 20 not 60,
and getting 100 samples of any
shows that we are spending lots of time on writing/re-reading cache file. Could that be avoided may be? |
The continuous writing of the cache file is designed for the situations that you might have simultaneous access to the cache from multiple processes. If you only ever have the one one FUSE process using the cache, I don't suppose there's any good reason to write the cache until program end (if you can guarantee not to crash). |
Thanks @martindurant , so it feels like ideally it could be an option like |
Exactly. Again, it would be nice to decompose the caching logic into
|
@martindurant that sounds like a good amount of work to overhaul entire caching layer. Would you consider PR which would first just augment life cycle of the cache with "always" and "exit" strategies to when it is dumped? |
I can understand hesitating to do the whole job, and the augmentation may be alright. What I worry about, it extra keyword arguments that we will find had to go back on later. |
@martindurant before any other major RF -- what do you think about this particular PR? IMHO it is "the best we could do" ATM to ensure thread-safety, seems to work for us in that at least we do not crash (right @jwodder ?) and should not really have noticeable penalty on users (run time performance of which would be bottlenecked primarily by all those cache read/writes anyways) |
I think it's OK... but still share the worry that the locking is too aggressive. Are the list operations really problematic, or just the file ops (which is what the original issue was about)? |
Hi. I did not look too deep into the details of the struggles here, but I wanted to take the chance to humbly remind you about GH-895 by @gutzbenj. The patch probably needs a refresh, but the core idea was to start using DiskCache, as it promises to be a rock-solid (thread-safe and process-safe) solution to disk-based caching, without needing to run any kind of cache- or database-server. On this matter, also being tested on Linux, macOS and Windows, we think this package is one of its own kind. |
You may be right, but DiskCache should only be optional, as fsspec is very careful not to require any dependencies. Also, you should be aware that we wish to be able to cache files to remote stores too! It may seem counter intuitive, but some remotes are faster than others, and egress fees often mean that copying into near-remote once is worthwhile if there are to be many reads. |
@martindurant: Thank you for sharing those important details we have not been aware of. Happy new year everyone! |
So where shall we go from here? A full rewrite of the cache classes seems to beyond anyone's capacity right now :( |
note that diskcache might be unfit for cases where location is on NFS. grantjenks/python-diskcache#198 was "resolved" but not clear to me if there is NFS-safe way to use diskcache. I hope that @jwodder could have a look and may be see how to refactor/structure the interface to encapsulate caching so alternative implementations could be provided. |
Tries to address #1126.