-
Notifications
You must be signed in to change notification settings - Fork 18
Description
Asking the user to specify the input size is error prone and inconvenient.
Whenever we download a file to a local disk for the purpose of uploading it to the job store, we should switch to using Toil's import functionality instead. It uses streaming instead of local disk thereby eliminating the need for estimating a disk requirement for the import job. As imports are implemented in Toil right now, this approach might be less reliable and slower than using s3am but we can address those issues in Toil if and when they occur.
What do we do in cases where files are processed immediately after being downloaded from an external location and the job store upload is skipped? Not skipping is one option. Trying to determine the file size is another. For HTTP this can be done with a HEAD request, for S3 there is a API call, probably also being a HEAD request under the hood.
Activity