You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When bulk downloading the data, it can be quite slow, especially when we are being selective with the data stored in s3.
This is because we have many, many small files. When we want to download everything we have to page through all the of the keys in the s3 which can take some time.
I am wondering how we can make this process more performant.
One solution is to have a file index and query this, but that would incur introducing an access layer.
Another solution is we push users towards local caching and they just load what they need:
The second solution has implications for how you access the data on HPC systems. For CSD3 we should have an internally facing s3 storage. For other sites they will need an internet connection.
The text was updated successfully, but these errors were encountered:
When bulk downloading the data, it can be quite slow, especially when we are being selective with the data stored in s3.
This is because we have many, many small files. When we want to download everything we have to page through all the of the keys in the s3 which can take some time.
I am wondering how we can make this process more performant.
The second solution has implications for how you access the data on HPC systems. For CSD3 we should have an internally facing s3 storage. For other sites they will need an internet connection.
The text was updated successfully, but these errors were encountered: