Skip to content

Eliminate config options for approximate input file size #427

@hannes-ucsc

Description

@hannes-ucsc
Contributor

Asking the user to specify the input size is error prone and inconvenient.

Whenever we download a file to a local disk for the purpose of uploading it to the job store, we should switch to using Toil's import functionality instead. It uses streaming instead of local disk thereby eliminating the need for estimating a disk requirement for the import job. As imports are implemented in Toil right now, this approach might be less reliable and slower than using s3am but we can address those issues in Toil if and when they occur.

What do we do in cases where files are processed immediately after being downloaded from an external location and the job store upload is skipped? Not skipping is one option. Trying to determine the file size is another. For HTTP this can be done with a HEAD request, for S3 there is a API call, probably also being a HEAD request under the hood.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @hannes-ucsc

        Issue actions

          Eliminate config options for approximate input file size · Issue #427 · BD2KGenomics/toil-scripts