Skip to content

Commit 03d1f11

Browse files
authored
Merge pull request #183 from sanger-tol/dev
Release 0.8.0
2 parents 06ba00c + e8c2043 commit 03d1f11

39 files changed

+700
-1007
lines changed

.github/workflows/ci.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,5 +81,6 @@ jobs:
8181
uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
8282

8383
- name: "Run pipeline with test data ${{ matrix.NXF_VER }} | ${{ matrix.test_name }} | ${{ matrix.profile }}"
84+
continue-on-error: ${{ matrix.NXF_VER == 'latest-everything' }}
8485
run: |
8586
nextflow run ${GITHUB_WORKSPACE} -profile ${{ matrix.test_name }},${{ matrix.profile }} --outdir ./results

.github/workflows/linting.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,15 +53,15 @@ jobs:
5353
pip install nf-core==${{ steps.read_yml.outputs['nf_core_version'] }}
5454
5555
- name: Run nf-core pipelines lint
56-
if: ${{ github.base_ref != 'master' }}
56+
if: ${{ github.base_ref != 'main' }}
5757
env:
5858
GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }}
5959
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
6060
GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }}
6161
run: nf-core -l lint_log.txt pipelines lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md
6262

6363
- name: Run nf-core pipelines lint --release
64-
if: ${{ github.base_ref == 'master' }}
64+
if: ${{ github.base_ref == 'main' }}
6565
env:
6666
GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }}
6767
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

.nf-core.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ lint:
3131
- manifest.name
3232
- manifest.homePage
3333
template_strings: false
34-
nf_core_version: 3.2.0
34+
nf_core_version: 3.2.1
3535
repository_type: pipeline
3636
template:
3737
author: priyanka-surana
@@ -43,4 +43,4 @@ template:
4343
outdir: .
4444
skip_features:
4545
- igenomes
46-
version: 0.7.1
46+
version: 0.8.0

CHANGELOG.md

Lines changed: 34 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,36 @@
33
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
44
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
55

6-
## [[0.7.1](https://github.com/sanger-tol/blobtoolkit/releases/tag/0.7.1)] – Psyduck (patch 1) – [2025-03-297
6+
## [[0.8.0](https://github.com/sanger-tol/blobtoolkit/releases/tag/0.8.0)] – Sprigatito – [2025-05-19]
7+
8+
### Enhancements & fixes
9+
10+
- Runtime of the blast commands is now capped at 12 hours (#166)
11+
- Upgraded Busco and added an option to control the gene predictor used (#160, #174, #181)
12+
- nf-core template upgrade (to version 3.2.1) (#164, #176)
13+
- Documentation fixes (broken links) (#175)
14+
- Ability to run without any read data (#177)
15+
16+
### Software dependencies
17+
18+
Note, since the pipeline is using Nextflow DSL2, each process will be run with its own [Biocontainer](https://biocontainers.pro/#/registry). This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference. Only `Docker` or `Singularity` containers are supported, `conda` is not supported.
19+
20+
| Dependency | Old version | New version |
21+
| ----------- | ----------- | ----------- |
22+
| blobtoolkit | 4.4.4 | 4.4.6 |
23+
| busco | 5.7.1 | 5.8.3 |
24+
25+
### Parameters
26+
27+
| Old parameter | New parameter |
28+
| ------------- | ---------------------- |
29+
| | --busco_gene_predictor |
30+
31+
> **NB:** Parameter has been **updated** if both old and new parameter information is present. </br> **NB:** Parameter has been **added** if just the new parameter information is present. </br> **NB:** Parameter has been **removed** if new parameter information isn't present.
32+
33+
## [[0.7.1](https://github.com/sanger-tol/blobtoolkit/releases/tag/0.7.1)] – Psyduck (patch 1) – [2025-03-29]
34+
35+
### Enhancements & fixes
736

837
- Upgraded the blobtools version which contains a bugfix
938

@@ -17,6 +46,8 @@ Note, since the pipeline is using Nextflow DSL2, each process will be run with i
1746

1847
## [[0.7.0](https://github.com/sanger-tol/blobtoolkit/releases/tag/0.7.0)] – Psyduck – [2025-03-19]
1948

49+
### Enhancements & fixes
50+
2051
- Fetch information about the chromosomes of the assemblies. Used to power
2152
"grid plots".
2253
- Fill in accurate read information in the blobDir. Users are now reqiured
@@ -54,6 +85,8 @@ Note, since the pipeline is using Nextflow DSL2, each process will be run with i
5485
5586
## [[0.6.0](https://github.com/sanger-tol/blobtoolkit/releases/tag/0.6.0)] – Bellsprout – [2024-09-13]
5687

88+
### Enhancements & fixes
89+
5790
The pipeline has now been validated for draft (unpublished) assemblies.
5891

5992
- The pipeline now queries the NCBI database instead of GoaT to establish the

CITATION.cff

Lines changed: 58 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,48 +1,69 @@
1-
cff-version: 1.2.0
2-
title: sanger-tol/blobtoolkit v0.7.1
31
authors:
4-
- family-names: Butt
2+
- affiliation: Wellcome Sanger Institute
3+
family-names: Butt
54
given-names: Zaynab
6-
affiliation: Wellcome Sanger Institute
7-
orcid: 0009-0009-7934-8440
8-
- family-names: Chafin
5+
orcid: https://orcid.org/0009-0009-7934-8440
6+
website: https://github.com/zb32
7+
- affiliation: Wellcome Sanger Institute
8+
9+
family-names: Chafin
910
given-names: Tyler
10-
affiliation: Wellcome Sanger Institute
11-
orcid: 0000-0001-8687-5905
12-
- family-names: Challis
11+
orcid: https://orcid.org/0000-0001-8687-5905
12+
website: https://github.com/tkchafin
13+
- affiliation: Wellcome Sanger Institute
14+
family-names: Challis
1315
given-names: Rich
14-
affiliation: Wellcome Sanger Institute
15-
orcid: 0000-0002-3502-1122
16-
- family-names: Kumar
16+
orcid: https://orcid.org/0000-0002-3502-1122
17+
website: https://github.com/rjchallis
18+
- affiliation: Wellcome Sanger Institute
19+
20+
family-names: Kumar
1721
given-names: Sujai
18-
affiliation: Wellcome Sanger Institute
19-
orcid: 0000-0001-5902-6641
20-
- family-names: Muffato
22+
orcid: https://orcid.org/0000-0001-5902-6641
23+
website: https://github.com/sujaikumar
24+
- affiliation: Wellcome Sanger Institute
25+
26+
family-names: Muffato
2127
given-names: Matthieu
22-
affiliation: Wellcome Sanger Institute
23-
orcid: 0000-0002-7860-3560
24-
- family-names: Pointon
25-
given-names: Damon-Lee
26-
affiliation: Wellcome Sanger Institute
27-
orcid: 0000-0003-2949-6719
28-
- family-names: Qi
28+
orcid: https://orcid.org/0000-0002-7860-3560
29+
website: https://github.com/muffato
30+
- affiliation: Wellcome Sanger Institute
31+
32+
family-names: Qi
2933
given-names: Guoying
30-
orcid: 0000-0003-1262-8973
31-
affiliation: Wellcome Sanger Institute
32-
- family-names: "Ramos D\xEDaz"
34+
orcid: https://orcid.org/0000-0003-1262-8973
35+
website: https://github.com/gq1
36+
37+
family-names: "Ramos D\xEDaz"
3338
given-names: Alexander
34-
affiliation: Wellcome Sanger Institute
35-
orcid: 0000-0001-6410-3349
36-
- family-names: Surana
39+
orcid: https://orcid.org/0000-0001-6410-3349
40+
website: https://github.com/alxndrdiaz
41+
- affiliation: Wellcome Sanger Institute
42+
43+
family-names: Sims
44+
given-names: Yumi
45+
orcid: https://orcid.org/0000-0003-4765-4872
46+
website: https://github.com/yumisims
47+
- affiliation: Wellcome Sanger Institute
48+
49+
family-names: Surana
3750
given-names: Priyanka
38-
affiliation: Wellcome Sanger Institute
39-
orcid: 0000-0002-7167-0875
40-
- family-names: Yates
51+
orcid: https://orcid.org/0000-0002-7167-0875
52+
website: https://github.com/priyanka-surana
53+
- affiliation: Wellcome Sanger Institute
54+
55+
family-names: Yates
4156
given-names: Bethan
42-
affiliation: Wellcome Sanger Institute
43-
orcid: 0000-0003-1658-1762
44-
doi: 10.5281/zenodo.13758882
45-
repository-code: "https://github.com/sanger-tol/blobtoolkit"
57+
orcid: https://orcid.org/0000-0003-1658-1762
58+
website: https://github.com/BethYates
59+
cff-version: 1.2.0
60+
date-released: "2025-04-25"
61+
doi: 10.5281/zenodo.7949058
4662
license: MIT
47-
version: 0.7.1
48-
date-released: "2024-09-13"
63+
message: If you use this software, please cite it using the metadata from this file
64+
and all references from CITATIONS.md .
65+
repository-code: https://github.com/sanger-tol/blobtoolkit
66+
title: sanger-tol/blobtoolkit v0.8.0 -
67+
type: software
68+
url: https://pipelines.tol.sanger.ac.uk/blobtoolkit
69+
version: 0.8.0

bin/generate_config.py

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,8 +53,21 @@ def parse_args(args=None):
5353
parser.add_argument("--taxdump", help="Path to the taxonomy database", required=True)
5454
parser.add_argument("--precomputed_busco", action="append", help="Path to precomputed BUSCO outputs", required=False)
5555
parser.add_argument("--version", action="version", version="%(prog)s 2.0")
56-
return parser.parse_args(args)
56+
args = parser.parse_args(args)
57+
58+
if not args.read_id and not args.read_type and not args.read_layout and not args.read_path:
59+
# All read arguments skipped, OK
60+
pass
61+
elif args.read_id and args.read_type and args.read_layout and args.read_path:
62+
# All read arguments passed
63+
if len(set([len(args.read_id), len(args.read_type), len(args.read_layout), len(args.read_path)])) != 1:
64+
print(f"The --read_id, --read_type, --read_layout, and --read_path, must be passed the same number of times", file=sys.stderr)
65+
sys.exit(1)
66+
else:
67+
print(f"The --read_id, --read_type, --read_layout, and --read_path, must be passed the same number of times", file=sys.stderr)
68+
sys.exit(1)
5769

70+
return args
5871

5972
def make_dir(path):
6073
if len(path) > 0:
@@ -360,7 +373,7 @@ def main(args=None):
360373
if sequence_report:
361374
print_tsvs(args.output_prefix, sequence_report)
362375

363-
reads = zip(args.read_id, args.read_type, args.read_layout, args.read_path)
376+
reads = zip(args.read_id, args.read_type, args.read_layout, args.read_path) if args.read_id else []
364377

365378
print_yaml(
366379
f"{args.output_prefix}.yaml",

conf/base.config

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -105,15 +105,14 @@ process {
105105
}
106106

107107
withName: "BLAST_BLASTN" {
108-
109-
// There are blast failures we don't know how to fix. We just give up after 3 attempts
110-
errorStrategy = { task.exitStatus in ((130..145) + 104) ? (task.attempt == 3 ? 'ignore' : 'retry') : 'finish' }
111-
112-
113-
// Most jobs complete quickly but some need a lot longer. For those outliers,
114-
// the CPU usage remains usually low, averaging a single CPU
115-
cpus = { task.attempt == 1 ? 4 : 1 }
108+
cpus = 4
116109
memory = 2.GB
117-
time = { task.attempt == 1 ? 4.h : ( task.attempt == 2 ? 47.h : 167.h ) }
110+
time = 12.h
111+
}
112+
113+
withName: "NOHIT_LIST" {
114+
cpus = { task.attempt }
115+
memory = { 1.GB * Math.pow(4, task.attempt) }
116+
time = { 1.h * Math.pow(8, task.attempt) }
118117
}
119118
}

conf/modules.config

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -85,11 +85,15 @@ process {
8585
withName: "BUSCO_BUSCO" {
8686
// Obey "use_work_dir_as_temp", except for large genomes
8787
scratch = { !params.use_work_dir_as_temp || (meta.genome_size < 2000000000) }
88-
ext.args = { 'test' in workflow.profile.tokenize(',') || 'test_raw' in workflow.profile.tokenize(',') || 'test_nobusco' in workflow.profile.tokenize(',') ?
89-
// Additional configuration to speed processes up during testing.
90-
// Note: BUSCO *must* see the double-quotes around the parameters
91-
'--force --metaeuk --metaeuk_parameters \'"-s=2"\' --metaeuk_rerun_parameters \'"-s=2"\''
92-
: '--force --metaeuk' }
88+
ext.args = {
89+
def base = '--force'
90+
// is a certain predictor requested ?
91+
if (params.busco_gene_predictor) {
92+
base += ' --' + params.busco_gene_predictor
93+
}
94+
// otherwise, let's go with the default (miniprot)
95+
return base
96+
}
9397
}
9498

9599
withName: "RESTRUCTUREBUSCODIR" {

docs/usage.md

Lines changed: 29 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -45,13 +45,29 @@ sample3,ont,ont.cram,SINGLE
4545
| `datafile` | Full path to read data file. |
4646
| `library_layout` | Layout of the library. Must be one of `SINGLE`, `PAIRED`. |
4747

48-
An [example samplesheet](assets/test/samplesheet.csv) has been provided with the pipeline.
48+
An [example samplesheet](../assets/test/samplesheet.csv) has been provided with the pipeline.
4949

5050
### Support for [nf-core/fetchngs](https://nf-co.re/fetchngs)
5151

5252
The pipeline can also accept a samplesheet generated by the [nf-core/fetchngs](https://nf-co.re/fetchngs) pipeline (tested with version 1.11.0).
5353
The pipeline then needs the `--fetchngs_samplesheet true` option _and_ `--align true`, since the data files would all be unaligned.
5454

55+
## BUSCO
56+
57+
BUSCO is an important part of the assessment done by the pipeline.
58+
59+
### Gene prediction method
60+
61+
Busco starts with a quick gene prediction run, for which it has the following options (by decreasing speed):
62+
63+
- Miniprot
64+
- Metaeuk
65+
- Augustus
66+
67+
The default value has changed across Busco versions (and may change in the future).
68+
The pipeline exposes the `--busco_gene_predictor` option to force a specific method to be used.
69+
Otherwise, the pipeline will default to Busco's own default (currently Miniprot).
70+
5571
### Support for pre-computed `BUSCO` outputs
5672

5773
The pipeline may be optionally run with a set of pre-computed [`BUSCO`](https://busco.ezlab.org) runs, provided using the `--precomputed_busco` parameter. These can be provided as either a directory path, or a `.tar.gz` compressed archive. The contents should be each `run_` output directory (directly from `BUSCO`) named as `run_[odb_dabasase_name]`:
@@ -261,31 +277,31 @@ see <https://training.nextflow.io/basic_training/config/> for some examples.
261277
Here is a full list of snakemake subworkflows and their Nextflow couterparts:
262278

263279
- **`minimap.smk`**
264-
- Implemented as [`minimap_alignment.nf`](subworkflows/local/minimap_alignment.nf).
280+
- Implemented as [`minimap_alignment.nf`](../subworkflows/local/minimap_alignment.nf).
265281
- Optimised alignment is done using the [sanger-tol/readmapping](https://github.com/sanger-tol/readmapping) pipeline.
266282
- **`windowmasker.smk`**
267-
- Implemented as part of [`prepare_genome.nf`](subworkflows/local/prepare_genome.nf).
283+
- Implemented as part of [`prepare_genome.nf`](../subworkflows/local/prepare_genome.nf).
268284
- Genomes downloaded by [sanger-tol/insdcdownload](https://github.com/sanger-tol/insdcdownload) are already masked.
269285
- **`chunk_stats.smk`**
270-
- Modified implementation as part of [`coverage_stats.nf`](subworkflows/local/coverage_stats.nf).
286+
- Modified implementation as part of [`coverage_stats.nf`](../subworkflows/local/coverage_stats.nf).
271287
- BED file and additional statistics calculated using [`fasta_windows`](https://github.com/tolkit/fasta_windows).
272288
- **`busco.smk`**
273-
- Implemented as [`busco_diamond_blastp.nf`](subworkflows/local/busco_diamond_blastp.nf).
289+
- Implemented as [`busco_diamond_blastp.nf`](../subworkflows/local/busco_diamond_blastp.nf).
274290
- **`cov_stats.smk`**
275-
- Implemented as part of [`coverage_stats.nf`](subworkflows/local/coverage_stats.nf).
276-
- Combining the various tsv files is done in subworkflow [`collate_stats.nf`](subworkflows/local/collate_stats.nf).
291+
- Implemented as part of [`coverage_stats.nf`](../subworkflows/local/coverage_stats.nf).
292+
- Combining the various tsv files is done in subworkflow [`collate_stats.nf`](../subworkflows/local/collate_stats.nf).
277293
- **`window_stats.smk`**
278-
- Implemented as part of [`collate_stats.nf`](subworkflows/local/collate_stats.nf).
294+
- Implemented as part of [`collate_stats.nf`](../subworkflows/local/collate_stats.nf).
279295
- **`diamond_blastp.smk`**
280-
- Implemented as [`busco_diamond_blastp.nf`](subworkflows/local/busco_diamond_blastp.nf).
296+
- Implemented as [`busco_diamond_blastp.nf`](../subworkflows/local/busco_diamond_blastp.nf).
281297
- **`diamond.smk`**
282-
- Implemented as [`run_blastx.nf`](subworkflows/local/run_blastx.nf).
298+
- Implemented as [`run_blastx.nf`](../subworkflows/local/run_blastx.nf).
283299
- **`blastn.smk`**
284-
- Implemented as [`run_blastn.nf`](subworkflows/local/run_blastn.nf).
300+
- Implemented as [`run_blastn.nf`](../subworkflows/local/run_blastn.nf).
285301
- **`blobtools.smk`**
286-
- Implemented as [`blobtools.nf`](subworkflows/local/blobtools.nf).
302+
- Implemented as [`blobtools.nf`](../subworkflows/local/blobtools.nf).
287303
- **`view.smk`**
288-
- Implemented as [`view.nf`](subworkflows/local/view.nf).
304+
- Implemented as [`view.nf`](../subworkflows/local/view.nf).
289305

290306
### Software dependencies
291307

modules.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
},
1414
"busco/busco": {
1515
"branch": "master",
16-
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
16+
"git_sha": "05954dab2ff481bcb999f24455da29a5828af08d",
1717
"installed_by": ["modules"],
1818
"patch": "modules/nf-core/busco/busco/busco-busco.diff"
1919
},

modules/local/blobtoolkit/chunk.nf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ process BLOBTOOLKIT_CHUNK {
55
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
66
exit 1, "BLOBTOOLKIT_CHUNK module does not support Conda. Please use Docker / Singularity / Podman instead."
77
}
8-
container "docker.io/genomehubs/blobtoolkit:4.4.5"
8+
container "docker.io/genomehubs/blobtoolkit:4.4.6"
99

1010
input:
1111
tuple val(meta) , path(fasta)

modules/local/blobtoolkit/countbuscos.nf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ process BLOBTOOLKIT_COUNTBUSCOS {
55
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
66
exit 1, "BLOBTOOLKIT_COUNTBUSCOS module does not support Conda. Please use Docker / Singularity / Podman instead."
77
}
8-
container "docker.io/genomehubs/blobtoolkit:4.4.5"
8+
container "docker.io/genomehubs/blobtoolkit:4.4.6"
99

1010
input:
1111
tuple val(meta), path(table, stageAs: 'dir??/*')

modules/local/blobtoolkit/createblobdir.nf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ process BLOBTOOLKIT_CREATEBLOBDIR {
55
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
66
exit 1, "BLOBTOOLKIT_BLOBDIR module does not support Conda. Please use Docker / Singularity / Podman instead."
77
}
8-
container "docker.io/genomehubs/blobtoolkit:4.4.5"
8+
container "docker.io/genomehubs/blobtoolkit:4.4.6"
99

1010
input:
1111
tuple val(meta), path(window, stageAs: 'windowstats/*')

modules/local/blobtoolkit/extractbuscos.nf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ process BLOBTOOLKIT_EXTRACTBUSCOS {
55
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
66
exit 1, "BLOBTOOLKIT_EXTRACTBUSCOS module does not support Conda. Please use Docker / Singularity / Podman instead."
77
}
8-
container "docker.io/genomehubs/blobtoolkit:4.4.5"
8+
container "docker.io/genomehubs/blobtoolkit:4.4.6"
99

1010
input:
1111
tuple val(meta), path(fasta)

0 commit comments

Comments
 (0)