You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the _isdir function, the _lsdir function is used to determine whether a directory exists, but max_items is not set. This results in a default listing of 1000 items, whereas only one item is actually needed to verify the existence of the directory.
s3fs is widely used on many object storage systems that are compatible with the S3 protocol. For example, some users use s3fs to access Aliyun OSS services; in this case, limiting max_items can bring significant performance improvements.
Of course, here we need to be mindful of the impact on dircache. The result carrying max_items is incomplete. However, when there is an especially large number of files, such as in AI training datasets, the cost of listing the directory every time to check if a directory exists becomes prohibitively expensive. It is sufficient to update the dircache only when the user explicitly performs an ls.
The text was updated successfully, but these errors were encountered:
You suggestion is reasonable, and I would accept a PR. However, be aware that there is one downside: for a complete directory listing, we cache the return making subsequent isdir or listing operations much faster. However, you have convinced me that when a user explicitly wants isdir, they probably won't be calling that multiple times.
In the
_isdir
function, the_lsdir
function is used to determine whether a directory exists, butmax_items
is not set. This results in a default listing of1000
items, whereas only one item is actually needed to verify the existence of the directory.s3fs is widely used on many object storage systems that are compatible with the S3 protocol. For example, some users use s3fs to access Aliyun OSS services; in this case, limiting
max_items
can bring significant performance improvements.Of course, here we need to be mindful of the impact on dircache. The result carrying max_items is incomplete. However, when there is an especially large number of files, such as in AI training datasets, the cost of listing the directory every time to check if a directory exists becomes prohibitively expensive. It is sufficient to update the dircache only when the user explicitly performs an ls.
The text was updated successfully, but these errors were encountered: