Skip to content

Passing large numbers of files to createsetdb #5

@SDmetagenomics

Description

@SDmetagenomics

I would like to run spacedust on a plasmid database. This database has ~60k individual files that represent separate plasmid "genomes". However when I pass the following command to spacedust:

$spacedust createsetdb /individual_faa/*.faa SpacedustDB tmp --threads 18

bash: /shared/software/bin/spacedust: Argument list too long

I receive a bash error that the arguments list is too long. I have tried a number of workarounds to this such as passing an environment variable that contains all the file names...but to no avail

It would be useful if instead of passing a file glob (*), that spacedust createsetdb could instead take a single input file with paths to each of the .faa files needed for db creation. Alternatively if I could create databases in batches and combine them that could be another approach, just not sure if that is supported. Finally, if you have any other suggestions I would be forever greatful.

In terms of the total number of proteins in these plasmid "genomes" it would be quite similar to the 9000 genomes you ran in the spacedust paper since plasmids are much smaller in size. So I think computationally it should be managable just trouble getting all the files in :-)

My Environment

  • Linux
  • Using Statically compiled spacedust executable for AVX2 instruction set

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions