Skip to content

Commit 18d2daf

Browse files
authored
Merge pull request #97 from sanger-tol/draft_assemblies
Support draft assemblies
2 parents 544c135 + 8c70c77 commit 18d2daf

23 files changed

+708
-232
lines changed

CHANGELOG.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,45 @@
33
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
44
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
55

6+
## – Bellsprout – []
7+
8+
The pipeline has now been validated for draft (unpublished) assemblies.
9+
10+
- The pipeline now queries the NCBI database instead of GoaT to establish the
11+
taxonomic classification of the species and the relevant Busco lineages.
12+
In case the taxon_id is not found, the pipeline falls back to GoaT, which
13+
is aware of upcoming taxon_ids in ENA.
14+
- New `--busco_lineages` parameter to choose specific Busco lineages instead of
15+
automatically selecting based on the taxonomy.
16+
- All parameters are now passed the regular Nextflow way. There is no support
17+
for the original Yaml configuration files of the Snakemake version.
18+
- New option `--skip_taxon_filtering` to skip the taxon filtering in blast searches.
19+
Mostly relevant for draft assemblies.
20+
21+
### Parameters
22+
23+
| Old parameter | New parameter |
24+
| ------------- | ---------------------- |
25+
| --yaml | |
26+
| | --busco_lineages |
27+
| | --skip_taxon_filtering |
28+
29+
> **NB:** Parameter has been **updated** if both old and new parameter information is present. </br> **NB:** Parameter has been **added** if just the new parameter information is present. </br> **NB:** Parameter has been **removed** if new parameter information isn't present.
30+
31+
### Software dependencies
32+
33+
Note, since the pipeline is using Nextflow DSL2, each process will be run with its own [Biocontainer](https://biocontainers.pro/#/registry). This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference. Only `Docker` or `Singularity` containers are supported, `conda` is not supported.
34+
35+
| Dependency | Old version | New version |
36+
| ---------- | ----------- | ----------- |
37+
| goat | 0.2.5 | |
38+
39+
## [[0.5.1](https://github.com/sanger-tol/blobtoolkit/releases/tag/0.5.1)] – Snorlax (patch 1) – [2024-08-22]
40+
41+
### Enhancements & fixes
42+
43+
- Bugfix: skip BLASTN if there are no chunks to align
44+
645
## [[0.5.0](https://github.com/sanger-tol/blobtoolkit/releases/tag/0.5.0)] – Snorlax – [2024-07-31]
746

847
General tidy up of the configuration and the pipeline

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ It takes a samplesheet of BAM/CRAM/FASTQ/FASTA files as input, calculates genome
1616

1717
1. Calculate genome statistics in windows ([`fastawindows`](https://github.com/tolkit/fasta_windows))
1818
2. Calculate Coverage ([`blobtk/depth`](https://github.com/blobtoolkit/blobtk))
19-
3. Fetch associated BUSCO lineages ([`goat/taxonsearch`](https://github.com/genomehubs/goat-cli))
19+
3. Determine the appropriate BUSCO lineages from the taxonomy.
2020
4. Run BUSCO ([`busco`](https://busco.ezlab.org/))
2121
5. Extract BUSCO genes ([`blobtoolkit/extractbuscos`](https://github.com/blobtoolkit/blobtoolkit))
2222
6. Run Diamond BLASTp against extracted BUSCO genes ([`diamond/blastp`](https://github.com/bbuchfink/diamond))
Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
422676 aconoidasida
2+
7898 actinopterygii
3+
5338 agaricales
4+
155619 agaricomycetes
5+
33630 alveolata
6+
5794 apicomplexa
7+
6854 arachnida
8+
6656 arthropoda
9+
4890 ascomycota
10+
8782 aves
11+
5204 basidiomycota
12+
68889 boletales
13+
3699 brassicales
14+
134362 capnodiales
15+
33554 carnivora
16+
91561 cetartiodactyla
17+
34395 chaetothyriales
18+
3041 chlorophyta
19+
5796 coccidia
20+
28738 cyprinodontiformes
21+
7147 diptera
22+
147541 dothideomycetes
23+
3193 embryophyta
24+
33392 endopterygota
25+
314146 euarchontoglires
26+
33682 euglenozoa
27+
2759 eukaryota
28+
5042 eurotiales
29+
147545 eurotiomycetes
30+
9347 eutheria
31+
72025 fabales
32+
4751 fungi
33+
314147 glires
34+
1028384 glomerellales
35+
5178 helotiales
36+
7524 hemiptera
37+
7399 hymenoptera
38+
5125 hypocreales
39+
50557 insecta
40+
314145 laurasiatheria
41+
147548 leotiomycetes
42+
7088 lepidoptera
43+
4447 liliopsida
44+
40674 mammalia
45+
33208 metazoa
46+
6029 microsporidia
47+
6447 mollusca
48+
4827 mucorales
49+
1913637 mucoromycota
50+
6231 nematoda
51+
33183 onygenales
52+
9126 passeriformes
53+
5820 plasmodium
54+
92860 pleosporales
55+
38820 poales
56+
5303 polyporales
57+
9443 primates
58+
4891 saccharomycetes
59+
8457 sauropsida
60+
4069 solanales
61+
147550 sordariomycetes
62+
33634 stramenopiles
63+
32523 tetrapoda
64+
155616 tremellomycetes
65+
7742 vertebrata
66+
33090 viridiplantae
67+
71240 eudicots
68+
57723 acidobacteria
69+
201174 actinobacteria_phylum
70+
1760 actinobacteria_class
71+
28211 alphaproteobacteria
72+
135622 alteromonadales
73+
200783 aquificae
74+
1385 bacillales
75+
91061 bacilli
76+
2 bacteria
77+
171549 bacteroidales
78+
976 bacteroidetes
79+
68336 bacteroidetes-chlorobi_group
80+
200643 bacteroidia
81+
28216 betaproteobacteria
82+
80840 burkholderiales
83+
213849 campylobacterales
84+
1706369 cellvibrionales
85+
204428 chlamydiae
86+
1090 chlorobi
87+
200795 chloroflexi
88+
135613 chromatiales
89+
1118 chroococcales
90+
186801 clostridia
91+
186802 clostridiales
92+
84999 coriobacteriales
93+
84998 coriobacteriia
94+
85007 corynebacteriales
95+
1117 cyanobacteria
96+
768507 cytophagales
97+
768503 cytophagia
98+
68525 delta-epsilon-subdivisions
99+
28221 deltaproteobacteria
100+
213118 desulfobacterales
101+
213115 desulfovibrionales
102+
69541 desulfuromonadales
103+
91347 enterobacterales
104+
186328 entomoplasmatales
105+
29547 epsilonproteobacteria
106+
1239 firmicutes
107+
200644 flavobacteriales
108+
117743 flavobacteriia
109+
32066 fusobacteria
110+
203491 fusobacteriales
111+
1236 gammaproteobacteria
112+
186826 lactobacillales
113+
118969 legionellales
114+
85006 micrococcales
115+
31969 mollicutes
116+
2085 mycoplasmatales
117+
206351 neisseriales
118+
32003 nitrosomonadales
119+
1161 nostocales
120+
135619 oceanospirillales
121+
1150 oscillatoriales
122+
135625 pasteurellales
123+
203682 planctomycetes
124+
85009 propionibacteriales
125+
1224 proteobacteria
126+
72274 pseudomonadales
127+
356 rhizobiales
128+
227290 rhizobium-agrobacterium_group
129+
204455 rhodobacterales
130+
204441 rhodospirillales
131+
766 rickettsiales
132+
909929 selenomonadales
133+
117747 sphingobacteriia
134+
204457 sphingomonadales
135+
136 spirochaetales
136+
203691 spirochaetes
137+
203692 spirochaetia
138+
85011 streptomycetales
139+
85012 streptosporangiales
140+
1890424 synechococcales
141+
508458 synergistetes
142+
544448 tenericutes
143+
68295 thermoanaerobacterales
144+
200918 thermotogae
145+
72273 thiotrichales
146+
1737405 tissierellales
147+
1737404 tissierellia
148+
74201 verrucomicrobia
149+
135623 vibrionales
150+
135614 xanthomonadales
151+
2157 archaea
152+
2266 thermoproteales
153+
2281 sulfolobales
154+
114380 desulfurococcales
155+
183967 thermoplasmata
156+
651137 thaumarchaeota
157+
2182 methanococcales
158+
2191 methanomicrobiales
159+
183925 methanobacteria
160+
183924 thermoprotei
161+
2235 halobacteriales
162+
1644060 natrialbales
163+
224756 methanomicrobia
164+
1644055 haloferacales
165+
183963 halobacteria
166+
28890 euryarchaeota
Binary file not shown.
Binary file not shown.

0 commit comments

Comments
 (0)