Description
Hi,
I'm completely new to fsspec and was extremely interested by the rsync
utility (even if signaled as experimental) and gave it a try (version 2025.5.1).
My first use case is to sync a S3 source to a local directory.
Having a quick look at the source code, I tried something like this:
rsync('s3://bucket-name/key', './path/to/local/directory',
inst_kwargs = {
'default_method': 'options',
'storage_options': {
's3': { 'endpoint_url': 'xxxxxx', 'key': 'xxxxxxx', 'secret': 'xxxx', },
},
}
)
The idea being to let the GenericFileSystem
instance used by rsync resolve both underlying filesystem behind source and destination.
Unfortunately this didn't work, because the GenericFileSystem
:
- didn't override all methods
rsync
uses, namelyisdir
- didn't resolve correctly URL
First problem can be solved by adding the following method to GenericFileSystem
(I have absolutely no idea if this is correct, it simply mimics the way it is done for other methods):
async def _isdir(
self,
url,
**kwargs,
):
fs = _resolve_fs(url, self.method, storage_options=self.st_opts)
if fs.async_impl:
return await fs._isdir(url, **kwargs)
else:
return fs.isdir(url, **kwargs)
Second problem was addressed by making sure every call to _resolve_fs
uses the storage_options=self.st_opts
argument (it is missing in many places).
Then it worked.
This is however not very efficient, because every call to _resolve_fs
creates a new instance. rsync
should probably call it only twice (once for the source, once for the destination, if they do not have the same protocol).
Then came the next question: what if I want to synchronise two S3 directories in differents buckets with different credentials? Also a use case I have.
Since _resolve_fs
only uses the protocol to find a filesystem instance, this will not work.
Among the possible solutions I figured out:
- let
_resolve_fs
handle URL as URI and discriminate both on protocol and authority. At least the bucket name could be used to select different credentials ininst_kwargs
. This would also probably work with other protocols - make
rsync
take asource_fs
anddestination_fs
argument instead of a uniquefs
Hope this helps.
Best regards,
Antoine