Skip to content

Commit fd2ff09

Browse files
authored
Merge pull request #87 from SPAAM-community/2024-pipelineupdate
More tweaks fixing aMeta commands
2 parents e4a8497 + a127ad3 commit fd2ff09

File tree

3 files changed

+89
-42
lines changed

3 files changed

+89
-42
lines changed

ancient-metagenomic-pipelines.qmd

Lines changed: 85 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -538,11 +538,7 @@ rm -r *
538538
## What is aMeta?
539539
540540
::: {.callout-note collapse="true" title="Self guided: chapter environment setup"}
541-
For this chapter's exercises, if not already performed, we will need to create the [conda environment](before-you-start.qmd#creating-a-conda-environment) from the following [`yml` file](https://github.com/NBISweden/aMeta/blob/main/workflow/envs/environment.yaml), and activate the environment.
542-
543-
```bash
544-
conda activate aMeta
545-
```
541+
For this chapter's exercises, if not already performed, we will need to create the special aMeta [conda environment](before-you-start.qmd#creating-a-conda-environment) and activate the environment.
546542
:::
547543

548544
While nf-core/eager is a solid pipeline for microbial genomics, and can also perform metagenomic screening via the integrated HOPS pipeline [@Hubler2019-qw] or `Kraken2` [@Wood2019-mf], in some cases we may wish to have a more accurate and resource efficient pipeline In this section, we will demonstrate an example of using aMeta, a `Snakemake` workflow proposed by @Pochon2022-hj that aims to minimise resource usage by combining both low-resource requiring k-mer based taxonomic profiling as well as accurate read-alignment ([@fig-ancientmetagenomicpipelines-ametadiagram]).
@@ -564,40 +560,33 @@ In this tutorial we will try running the small test data that comes with aMeta.
564560

565561
aMeta has been written in `Snakemake`, which means running the pipeline has to be installed in a slightly different manner to the `nextflow pull` command that can be used for nf-core/eager.
566562

563+
Make sure you have followed the instructions in the [Before You Start Chapter](/before-you-start.qmd#ancient-metagenomic-pipelines) for cloning the aMeta GitHub repository to the `ancient-metagenomic-pipelines/` directory. Once we have done this, we can make sure we are in the aMeta directory, if not already.
564+
567565
```bash
568566
cd /<path>/<to>/ancient-metagenomic-pipelines/aMeta
569567
```
570568

571-
As aMeta also includes tools that normally require very large computational resources that cannot fit on a standard laptop, we will instead try to re-use the internal very small 'fake' data the aMeta developers use to test the pipeline.
569+
And activate the dedicated aMeta conda environment.
572570

571+
```bash
572+
conda activate aMeta
573+
```
573574

574-
We don't have to worry about trying to understand exactly what the following commands are doing, they will not be important for the rest of the chapter.
575-
However generally the commands try to pull all the relevant software (via conda), make a fake database and download other required files, and then reconstruct the basic directory and file structure required for running the pipeline.
575+
As aMeta also includes tools that normally require very large computational resources that cannot fit on a standard laptop, we will instead try to re-use the internal very small 'fake' data the aMeta developers use to test the pipeline.
576576

577577
:::{.callout-warning}
578-
The next steps, particularly the `set up conda envs` will take a very long time!
579-
580-
If we are impatient, we can speed this process up by using `mamba` rather than conda.
581-
582-
```bash
583-
## Download installation script and run
584-
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
585-
bash Miniforge3-Linux-x86_64.sh
578+
The following steps are already performed up for Students in the summer schools as, particularly the `set up conda envs` will take a very long time!
586579

587-
## When asked: Accept the license
588-
## When asked: Press 'y' to run conda init
580+
If you are doing this chapter self-guided, it is critical to perform the following set up steps!
589581

590-
## Turn off base
591-
conda config --set auto_activate_base false
582+
We don't have to worry about trying to understand exactly what the following commands are doing, they will not be important for the rest of the chapter.
583+
However generally the commands try to pull all the relevant software (via conda), make a fake database and download other required files, and then reconstruct the basic directory and file structure required for running the pipeline.
592584
593-
## Re-add original conda environments to be recognised by mamba
594-
conda config --append envs_dirs /home/ubuntu/bin/miniconda3/envs/
595-
```
596-
:::
597585
586+
:::{.callout-note title="Self-guided: aMeta set up and configuration" collapse=true}
598587
```bash
599588
## Change into ~/.test to set up all the required test resources (Databases etc.)
600-
cd ~/.test
589+
cd .test/
601590
602591
## Set up conda envs
603592
## If we can an error about a 'non-default solver backend' the run `conda config --set solver classic` and re-start the command
@@ -610,7 +599,7 @@ source $(dirname $(dirname $CONDA_EXE))/etc/profile.d/conda.sh
610599
## Build dummy KrakenUniq database
611600
env=$(grep krakenuniq .snakemake/conda/*yaml | awk '{print $1}' | sed -e "s/.yaml://g")
612601
conda activate $env
613-
krakenuniq-build --db resources/KrakenUniq_DB --kmer-len 21 --minimiser-len 11 --jellyfish-bin $(pwd)/$env/bin/jellyfish
602+
krakenuniq-build --db resources/KrakenUniq_DB --kmer-len 21 --minimizer-len 11 --jellyfish-bin $(pwd)/$env/bin/jellyfish
614603
conda deactivate
615604
616605
## Get Krona taxonomy tax dump
@@ -633,13 +622,11 @@ conda deactivate
633622
634623
touch .initdb
635624
636-
## Run a quick test
637-
snakemake --use-conda --show-failed-logs --conda-cleanup-pkgs cache -s ../workflow/Snakefile $@ --conda-frontend conda
638-
625+
## Run a quick test and generate the report (you can open this to check it looks like everythin was generated)
626+
snakemake --use-conda --show-failed-logs --conda-cleanup-pkgs cache -s ../workflow/Snakefile $@ --conda-frontend conda -j 4
639627
snakemake -s ../workflow/Snakefile --report --report-stylesheet ../workflow/report/custom.css --conda-frontend conda
640628
641629
## Now we move back into the main repository where we can symlink all the database files back to try running our 'own' test
642-
643630
cd ../
644631
cd resources/
645632
ln -s ../.test/resources/* .
@@ -648,17 +635,27 @@ mv config.yaml config.yaml.bkp
648635
mv samples.tsv samplest.tsv.bkp
649636
cd ../
650637
ln -s .test/data/ .
638+
ln -s .test/.snakemake/ . ## so we can re-use conda environments from the `.test` directory for the summer school run
651639
652-
## Again get hte taxonomy tax dump for Krona, but this time for a real run
653-
env=$(grep krona .snakemake/conda/*yaml | awk '{print $1}' | sed -e "s/.yaml://g" | head -1)
640+
## Again get the taxonomy tax dump for Krona, but this time for a real run
641+
## Make sure you're now in the root directory of the repository!
642+
env=$(grep krona .test/.snakemake/conda/*yaml | awk '{print $1}' | sed -e "s/.yaml://g" | head -1)
654643
conda activate $env
655644
cd $env/opt/krona
656645
./updateTaxonomy.sh taxonomy
657646
cd -
658647
conda deactivate
648+
649+
## And back to the root of the repo for practising aMeta properly!
650+
cd ../
659651
```
660652
653+
Now hopefully we can forget all this, and imagine we are running data though aMeta as you would normally from scratch.
654+
:::
655+
661656
OK now aMeta is all set up, we can now simulate running a 'real' pipeline job!
657+
:::
658+
662659
663660
### aMeta configuration
664661
@@ -668,8 +665,8 @@ In a text editor (e.g. `nano`), write the following names paths in TSV format.
668665
669666
```bash
670667
sample fastq
671-
foo data/bar.fq.gz
672-
bar data/foo.fq.gz
668+
bar data/bar.fq.gz
669+
foo data/foo.fq.gz
673670
```
674671
675672
:::{.callout-warning}
@@ -679,8 +676,11 @@ Make sure when copy pasting into our test editor, tabs are not replaced with spa
679676
Then we need to write a config file.
680677
This tells aMeta where to find things such as database files and other settings.
681678
682-
These paths and settings go inside a `config.yaml` file inside `aMeta/config/`.
683679
A minimal example `config.yaml` files can look like this.
680+
This includes specifying the location the main samplesheet, which points to a TSV file that contains all the FASTQs if the samples we want to analyse, and paths to all the required database files and reference genomes you may need.
681+
These paths and settings go inside the `config.yaml` file that must be placed inside inside `aMeta/config/`.
682+
683+
Make a configuration file with your text editor of choice (e.g. `nano`).
684684
685685
```bash
686686
samplesheet: "config/samples.tsv"
@@ -699,24 +699,38 @@ ncbi_db: resources/ncbi
699699
700700
n_unique_kmers: 1000
701701
n_tax_reads: 200
702+
```
702703
704+
And make a two column samplesheet file with the following content in a file called `samples.tsv`, also under `configs/`.
705+
706+
```tsv
707+
sample fastq
708+
foo data/foo.fq.gz
709+
bar data/bar.fq.gz
703710
```
704711
712+
:::{.callout-warning}
713+
aMeta (v1.0.0) currently only supports single-end or pre-merged- data only!
714+
:::
715+
716+
Once this config file is generated, we can start the run.
717+
718+
705719
:::{.callout-note}
706720
As this is only a dummy run (due to the large-ish computational resources required for KrakenUniq), we re-use some of the resource files here.
707721
While this will produce nonsense output, it is used here to demonstrate how we would execute the pipeline.
708-
709722
:::
710723
711724
### Prepare and run aMeta
712725
713-
Make sure we're still in the `aMeta` conda environment, and in the main aMeta directory with the following.
726+
Make sure we're still in the `aMeta` conda environment, and that we are still in the main aMeta directory with the following.
714727
715728
```bash
716-
cd /<path/<to>/ancient-metagenomic-pipelines/ameta/aMeta/
729+
conda activate aMeta
730+
cd /<path/<to>/ancient-metagenomic-pipelines/aMeta/
717731
```
718732
719-
And, finally, we are ready to run aMeta!
733+
Finally, we are ready to run aMeta, where it will automatically pick up our config and samplesheet file we placed in `config/`!
720734
721735
```bash
722736
#| eval: false
@@ -746,6 +760,14 @@ Complete log: .snakemake/log/2023-10-05T155051.524987.snakemake.log
746760
All output files of the workflow are located in `aMeta/results` directory.
747761
To get a quick overview of ancient microbes present in our samples we should check a heatmap in `results/overview_heatmap_scores.pdf`.
748762
763+
:::{.callout-warning}
764+
If running during the summer school, you can use the following command to open the PDF file from the command line.
765+
766+
```bash
767+
evince results/overview_heatmap_scores.pdf
768+
```
769+
:::
770+
749771
![Example microbiome profiling summary heatmap from aMeta.
750772
The columns represent different samples, and the rows of different species.
751773
The cells of the heatmap are coloured from blue, to yellow, to red, representing aMeta authentication scores from 0 to 10, with the higher the number the more confident of the hit being both the correct taxonomic assignment and that it is ancient.
@@ -794,11 +816,34 @@ From Left to Right, Top from bottom, the panels consists of:
794816
9. A general statistics table including the name of the taxonomic node, number of reads, duplicates, and mean read length etc.
795817
](assets/images/chapters/ancient-metagenomic-pipelines/aMeta_output.png){#fig-ancientmetagenomicpipelines-persampleplot}
796818
819+
::: {.callout-tip title="Question" appearance="simple"}
820+
In our test data, what score does the sample 'foo' for the hit against _Yersinia pestis_?
821+
Is this a good score?
822+
823+
Inspect the results `AUTHENTICATION/xxx/authentic_Sample_foo_*.pdf` file
824+
What could have contributed to this particular score?
825+
826+
Hint: Check Supplementary File 2, section S5 of [@Pochon2022-hj] for some hints.
827+
:::
828+
829+
::: {.callout-note collapse="true" title="Answer"}
830+
The sample foo gets a score of `4`.
831+
This is a low score, and indicates that aMeta is not very confident that this is a true hit.
832+
The metrics that contribute to this score are:
833+
834+
- Edit distance all reads (+1)
835+
- Deamination plot (+2)
836+
- Reads mapped with identity (+1),
837+
:::
838+
797839
### Clean up
798840
799-
Before continuing onto the next section of this chapter, we will need to deactivate from the conda environment.
841+
Before continuing onto the next section of this chapter, we will need to remove the output files, and deactivate from the conda environment.
800842
801843
```bash
844+
rm -r results/ log/
845+
## You can also optionall remove the conda environments if we are running out of space
846+
# rm -r .snakemake/ .test/.snakemake
802847
conda deactivate
803848
```
804849

before-you-start.qmd

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -216,5 +216,7 @@ For some chapters you may need the following software/and or data manually insta
216216
cd /<path>/<to>/ancient-metagenomic-pipelines/
217217
git clone https://github.com/NBISweden/aMeta
218218
cd aMeta
219+
## We have to patch the environment to use an old version of Snakemake as aMeta is not compatible with the latest version
220+
sed -i 's/snakemake-minimal>=5.18/snakemake <=6.3.0/' workflow/envs/environment.yaml
219221
conda env create -f workflow/envs/environment.yaml
220222
```

git-github.qmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -840,9 +840,9 @@ Once the edit window is opened, add your name and GitHub user name to the list (
840840

841841
![Screenshot of GitHub file edit window, with a name added to a bullet point list at the bottom.](assets/images/chapters/git-github/github-fork-addname.png){#fig-gitgithub-fork-addname}
842842

843-
Make our commit to record the change to Git history (@fig-accessingdata-firstpagefig-gitgithub-fork-commitedit) and double check we've made the change ()
843+
Make our commit to record the change to Git history (@fig-accessingdata-firstpagefig-gitgithub-fork-commitedit) and double check we've made the change (@fig-gitgithub-fork-confirmedit).
844844

845-
![A commit message being written describing the addition of a new name in the GitHub commit interface.](assets/images/chapters/git-github/github-fork-commitedit.png){#fig-gitgithub-fork-commitedit}
845+
![A commit message being written describing the addition of a new name in the GitHub commit interface.](assets/images/chapters/git-github/github-fork-commitedit.png){#fig-accessingdata-firstpagefig-gitgithub-fork-commitedit}
846846

847847
![The rendered README with the newly added name at the bottom of the list.](assets/images/chapters/git-github/github-fork-confirmedit.png){#fig-gitgithub-fork-confirmedit}
848848

0 commit comments

Comments
 (0)