Skip to content

Commit 93bda84

Browse files
authored
Merge pull request #117 from sanger-tol/misc_fixes
Misc fixes before release
2 parents 102dbf4 + dfb4655 commit 93bda84

File tree

13 files changed

+46
-22
lines changed

13 files changed

+46
-22
lines changed

CHANGELOG.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,16 @@ The pipeline is now considered to be a complete and suitable replacement for the
1111
"grid plots".
1212
- Fill in accurate read information in the blobDir. Users are now reqiured
1313
to indicate in the samplesheet whether the reads are paired or single.
14+
- Updated the Blastn settings to allow 7 days runtime at most, since that
15+
covers 99.7% of the jobs.
16+
17+
### Software dependencies
18+
19+
Note, since the pipeline is using Nextflow DSL2, each process will be run with its own [Biocontainer](https://biocontainers.pro/#/registry). This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference. Only `Docker` or `Singularity` containers are supported, `conda` is not supported.
20+
21+
| Dependency | Old version | New version |
22+
| ----------- | ----------- | ----------- |
23+
| blobtoolkit | 4.3.9 | 4.3.13 |
1424

1525
## [[0.6.0](https://github.com/sanger-tol/blobtoolkit/releases/tag/0.6.0)] – Bellsprout – [2024-09-13]
1626

conf/base.config

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -106,14 +106,15 @@ process {
106106

107107
withName: "BLAST_BLASTN" {
108108

109-
// There are blast failures we don't know how to fix. Just ignore for now
110-
errorStrategy = { task.exitStatus in ((130..145) + 104) ? (task.attempt == process.maxRetries ? 'ignore' : 'retry') : 'finish' }
109+
// There are blast failures we don't know how to fix. We just give up after 3 attempts
110+
errorStrategy = { task.exitStatus in ((130..145) + 104) ? (task.attempt == 3 ? 'ignore' : 'retry') : 'finish' }
111+
111112

112113
// Most jobs complete quickly but some need a lot longer. For those outliers,
113-
// the CPU usage remains usually low, often nearing a single CPU
114-
cpus = { check_max( 6 - (task.attempt-1), 'cpus' ) }
115-
memory = { check_max( 1.GB * Math.pow(4, task.attempt-1), 'memory' ) }
116-
time = { check_max( 10.h * Math.pow(4, task.attempt-1), 'time' ) }
114+
// the CPU usage remains usually low, averaging a single CPU
115+
cpus = { check_max( task.attempt == 1 ? 4 : 1, 'cpus' ) }
116+
memory = { check_max( 2.GB, 'memory' ) }
117+
time = { check_max( task.attempt == 1 ? 4.h : ( task.attempt == 2 ? 47.h : 167.h ), 'time' ) }
117118
}
118119

119120
withName:CUSTOM_DUMPSOFTWAREVERSIONS {

docs/usage.md

Lines changed: 19 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,20 @@ An [example samplesheet](assets/test/samplesheet.csv) has been provided with the
5454
The pipeline can also accept a samplesheet generated by the [nf-core/fetchngs](https://nf-co.re/fetchngs) pipeline (tested with version 1.11.0).
5555
The pipeline then needs the `--fetchngs_samplesheet true` option _and_ `--align true`, since the data files would all be unaligned.
5656

57-
## Getting databases ready for the pipeline
57+
## Database parameters
58+
59+
Configure access to your local databases with the `--busco`, `--blastp`, `--blastx`, `--blastn`, and `--taxdump` parameters.
60+
61+
Note that `--busco` refers to the download path of _all_ lineages.
62+
Then, when explicitly selecting the lineages to run the pipeline on,
63+
provide the names of these lineages _with_ their `_odb10` suffix as a comma-separated string.
64+
For instance:
65+
66+
```bash
67+
--busco path-to-databases/busco/ --busco_lineages vertebrata_odb10,bacteria_odb10,fungi_odb10
68+
```
69+
70+
### Getting databases ready for the pipeline
5871

5972
The BlobToolKit pipeline can be run in many different ways. The default way requires access to several databases:
6073

@@ -65,7 +78,7 @@ The BlobToolKit pipeline can be run in many different ways. The default way requ
6578

6679
It is a good idea to put a date suffix for each database location so you know at a glance whether you are using the latest version. We are using the `YYYY_MM` format as we do not expect the databases to be updated more frequently than once a month. However, feel free to use `DATE=YYYY_MM_DD` or a different format if you prefer.
6780

68-
### 1. NCBI taxdump database
81+
#### 1. NCBI taxdump database
6982

7083
Create the database directory and move into the directory:
7184

@@ -82,7 +95,7 @@ Retrieve and decompress the NCBI taxdump:
8295
curl -L ftp://ftp.ncbi.nih.gov/pub/taxonomy/new_taxdump/new_taxdump.tar.gz | tar xzf -
8396
```
8497

85-
### 2. NCBI nucleotide BLAST database
98+
#### 2. NCBI nucleotide BLAST database
8699

87100
Create the database directory and move into the directory:
88101

@@ -106,7 +119,7 @@ tar xf taxdb.tar.gz -C $NT &&
106119
rm taxdb.tar.gz
107120
```
108121

109-
### 3. UniProt reference proteomes database
122+
#### 3. UniProt reference proteomes database
110123

111124
You need [diamond blast](https://github.com/bbuchfink/diamond) installed for this step. The easiest way is probably using [conda](https://anaconda.org/bioconda/diamond). Make sure you have the latest version of Diamond (>2.x.x) otherwise the `--taxonnames` argument may not work.
112125

@@ -140,7 +153,7 @@ zcat */*/*.idmapping.gz | grep "NCBI_TaxID" | awk '{print $1 "\t" $1 "\t" $3 "\t
140153
diamond makedb -p 16 --in reference_proteomes.fasta.gz --taxonmap reference_proteomes.taxid_map --taxonnodes $TAXDUMP/nodes.dmp --taxonnames $TAXDUMP/names.dmp -d reference_proteomes.dmnd
141154
```
142155

143-
### 4. BUSCO databases
156+
#### 4. BUSCO databases
144157

145158
Create the database directory and move into the directory:
146159

@@ -232,7 +245,7 @@ List of tools for any given dataset can be fetched from the API, for example htt
232245

233246
| Dependency | Snakemake | Nextflow |
234247
| ----------------- | --------- | -------- |
235-
| blobtoolkit | 4.3.2 | 4.3.9 |
248+
| blobtoolkit | 4.3.2 | 4.3.13 |
236249
| blast | 2.12.0 | 2.14.1 |
237250
| blobtk | 0.5.0 | 0.5.1 |
238251
| busco | 5.3.2 | 5.5.0 |

modules/local/blobtoolkit/chunk.nf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ process BLOBTOOLKIT_CHUNK {
55
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
66
exit 1, "BLOBTOOLKIT_CHUNK module does not support Conda. Please use Docker / Singularity / Podman instead."
77
}
8-
container "docker.io/genomehubs/blobtoolkit:4.3.9"
8+
container "docker.io/genomehubs/blobtoolkit:4.3.13"
99

1010
input:
1111
tuple val(meta) , path(fasta)

modules/local/blobtoolkit/countbuscos.nf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ process BLOBTOOLKIT_COUNTBUSCOS {
55
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
66
exit 1, "BLOBTOOLKIT_COUNTBUSCOS module does not support Conda. Please use Docker / Singularity / Podman instead."
77
}
8-
container "docker.io/genomehubs/blobtoolkit:4.3.9"
8+
container "docker.io/genomehubs/blobtoolkit:4.3.13"
99

1010
input:
1111
tuple val(meta), path(table, stageAs: 'dir??/*')

modules/local/blobtoolkit/createblobdir.nf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ process BLOBTOOLKIT_CREATEBLOBDIR {
55
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
66
exit 1, "BLOBTOOLKIT_BLOBDIR module does not support Conda. Please use Docker / Singularity / Podman instead."
77
}
8-
container "docker.io/genomehubs/blobtoolkit:4.3.9"
8+
container "docker.io/genomehubs/blobtoolkit:4.3.13"
99

1010
input:
1111
tuple val(meta), path(window, stageAs: 'windowstats/*')

modules/local/blobtoolkit/extractbuscos.nf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ process BLOBTOOLKIT_EXTRACTBUSCOS {
55
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
66
exit 1, "BLOBTOOLKIT_EXTRACTBUSCOS module does not support Conda. Please use Docker / Singularity / Podman instead."
77
}
8-
container "docker.io/genomehubs/blobtoolkit:4.3.9"
8+
container "docker.io/genomehubs/blobtoolkit:4.3.13"
99

1010
input:
1111
tuple val(meta), path(fasta)

modules/local/blobtoolkit/summary.nf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ process BLOBTOOLKIT_SUMMARY {
55
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
66
exit 1, "BLOBTOOLKIT_SUMMARY module does not support Conda. Please use Docker / Singularity / Podman instead."
77
}
8-
container "docker.io/genomehubs/blobtoolkit:4.3.9"
8+
container "docker.io/genomehubs/blobtoolkit:4.3.13"
99

1010
input:
1111
tuple val(meta), path(blobdir)

modules/local/blobtoolkit/unchunk.nf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ process BLOBTOOLKIT_UNCHUNK {
55
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
66
exit 1, "BLOBTOOLKIT_UNCHUNK module does not support Conda. Please use Docker / Singularity / Podman instead."
77
}
8-
container "docker.io/genomehubs/blobtoolkit:4.3.9"
8+
container "docker.io/genomehubs/blobtoolkit:4.3.13"
99

1010
input:
1111
tuple val(meta), path(blast_table)

modules/local/blobtoolkit/updateblobdir.nf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ process BLOBTOOLKIT_UPDATEBLOBDIR {
55
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
66
exit 1, "BLOBTOOLKIT_BLOBDIR module does not support Conda. Please use Docker / Singularity / Podman instead."
77
}
8-
container "docker.io/genomehubs/blobtoolkit:4.3.9"
8+
container "docker.io/genomehubs/blobtoolkit:4.3.13"
99

1010
input:
1111
tuple val(meta), path(input, stageAs: "input_blobdir")

modules/local/blobtoolkit/updatemeta.nf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ process BLOBTOOLKIT_UPDATEMETA {
55
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
66
exit 1, "BLOBTOOLKIT_UPDATEMETA module does not support Conda. Please use Docker / Singularity / Podman instead."
77
}
8-
container "docker.io/genomehubs/blobtoolkit:4.3.9"
8+
container "docker.io/genomehubs/blobtoolkit:4.3.13"
99

1010
input:
1111
tuple val(meta), path(input)

modules/local/blobtoolkit/windowstats.nf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ process BLOBTOOLKIT_WINDOWSTATS {
44
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
55
exit 1, "BLOBTOOLKIT_WINDOWSTATS module does not support Conda. Please use Docker / Singularity / Podman instead."
66
}
7-
container "docker.io/genomehubs/blobtoolkit:4.3.9"
7+
container "docker.io/genomehubs/blobtoolkit:4.3.13"
88

99
input:
1010
tuple val(meta), path(tsv)

modules/local/generate_config.nf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ process GENERATE_CONFIG {
33
label 'process_single'
44

55
conda "conda-forge::requests=2.28.1 conda-forge::pyyaml=6.0"
6-
container "docker.io/genomehubs/blobtoolkit:4.3.9"
6+
container "docker.io/genomehubs/blobtoolkit:4.3.13"
77

88
input:
99
tuple val(meta), val(fasta)

0 commit comments

Comments
 (0)