You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For this chapter's exercises, if not already performed, we will need to create the [conda environment](before-you-start.qmd#creating-a-conda-environment) from the following [`yml` file](https://github.com/NBISweden/aMeta/blob/main/workflow/envs/environment.yaml), and activate the environment.
542
-
543
-
```bash
544
-
conda activate aMeta
545
-
```
541
+
For this chapter's exercises, if not already performed, we will need to create the special aMeta [conda environment](before-you-start.qmd#creating-a-conda-environment) and activate the environment.
546
542
:::
547
543
548
544
While nf-core/eager is a solid pipeline formicrobial genomics, and can also perform metagenomic screening via the integrated HOPS pipeline [@Hubler2019-qw] or `Kraken2` [@Wood2019-mf],in some cases we may wish to have a more accurate and resource efficient pipeline In this section, we will demonstrate an example of using aMeta, a `Snakemake` workflow proposed by @Pochon2022-hj that aims to minimise resource usage by combining both low-resource requiring k-mer based taxonomic profiling as well as accurate read-alignment ([@fig-ancientmetagenomicpipelines-ametadiagram]).
@@ -564,40 +560,33 @@ In this tutorial we will try running the small test data that comes with aMeta.
564
560
565
561
aMeta has been written in`Snakemake`, which means running the pipeline has to be installed in a slightly different manner to the `nextflow pull`command that can be used for nf-core/eager.
566
562
563
+
Make sure you have followed the instructions in the [Before You Start Chapter](/before-you-start.qmd#ancient-metagenomic-pipelines) forcloning the aMeta GitHub repository to the `ancient-metagenomic-pipelines/` directory. Once we have done this, we can make sure we arein the aMeta directory, if not already.
564
+
567
565
```bash
568
566
cd /<path>/<to>/ancient-metagenomic-pipelines/aMeta
569
567
```
570
568
571
-
As aMeta also includes tools that normally require very large computational resources that cannot fit on a standard laptop, we will instead try to re-use the internal very small 'fake' data the aMeta developers use to test the pipeline.
569
+
And activate the dedicated aMeta conda environment.
572
570
571
+
```bash
572
+
conda activate aMeta
573
+
```
573
574
574
-
We don't have to worry about trying to understand exactly what the following commands are doing, they will not be important for the rest of the chapter.
575
-
However generally the commands try to pull all the relevant software (via conda), make a fake database and download other required files, and then reconstruct the basic directory and file structure required for running the pipeline.
575
+
As aMeta also includes tools that normally require very large computational resources that cannot fit on a standard laptop, we will instead try to re-use the internal very small 'fake' data the aMeta developers use to test the pipeline.
576
576
577
577
:::{.callout-warning}
578
-
The next steps, particularly the `set up conda envs` will take a very long time!
579
-
580
-
If we are impatient, we can speed this process up by using `mamba` rather than conda.
The following steps are already performed up forStudentsin the summer schools as, particularly the `set up conda envs` will take a very long time!
586
579
587
-
## When asked: Accept the license
588
-
## When asked: Press 'y' to run conda init
580
+
If you are doing this chapter self-guided, it is critical to perform the following set up steps!
589
581
590
-
## Turn off base
591
-
conda config --set auto_activate_base false
582
+
We don't have to worry about trying to understand exactly what the following commands are doing, they will not be important for the rest of the chapter.
583
+
However generally the commands try to pull all the relevant software (via conda), make a fake database and download other required files, and then reconstruct the basic directory and file structure required for running the pipeline.
592
584
593
-
## Re-add original conda environments to be recognised by mamba
ln -s .test/.snakemake/ . ## so we can re-use conda environments from the `.test` directory for the summer school run
651
639
652
-
## Again get hte taxonomy tax dump for Krona, but this time for a real run
653
-
env=$(grep krona .snakemake/conda/*yaml | awk '{print $1}' | sed -e "s/.yaml://g" | head -1)
640
+
## Again get the taxonomy tax dump for Krona, but this time for a real run
641
+
## Make sure you're now in the root directory of the repository!
642
+
env=$(grep krona .test/.snakemake/conda/*yaml | awk '{print $1}'| sed -e "s/.yaml://g"| head -1)
654
643
conda activate $env
655
644
cd$env/opt/krona
656
645
./updateTaxonomy.sh taxonomy
657
646
cd -
658
647
conda deactivate
648
+
649
+
## And back to the root of the repo for practising aMeta properly!
650
+
cd ../
659
651
```
660
652
653
+
Now hopefully we can forget all this, and imagine we are running data though aMeta as you would normally from scratch.
654
+
:::
655
+
661
656
OK now aMeta is all set up, we can now simulate running a 'real' pipeline job!
657
+
:::
658
+
662
659
663
660
### aMeta configuration
664
661
@@ -668,8 +665,8 @@ In a text editor (e.g. `nano`), write the following names paths in TSV format.
668
665
669
666
```bash
670
667
sample fastq
671
-
foo data/bar.fq.gz
672
-
bar data/foo.fq.gz
668
+
bar data/bar.fq.gz
669
+
foo data/foo.fq.gz
673
670
```
674
671
675
672
:::{.callout-warning}
@@ -679,8 +676,11 @@ Make sure when copy pasting into our test editor, tabs are not replaced with spa
679
676
Then we need to write a config file.
680
677
This tells aMeta where to find things such as database files and other settings.
681
678
682
-
These paths and settings go inside a `config.yaml` file inside `aMeta/config/`.
683
679
A minimal example `config.yaml` files can look like this.
680
+
This includes specifying the location the main samplesheet, which points to a TSV file that contains all the FASTQs if the samples we want to analyse, and paths to all the required database files and reference genomes you may need.
681
+
These paths and settings go inside the `config.yaml` file that must be placed inside inside `aMeta/config/`.
682
+
683
+
Make a configuration file with your text editor of choice (e.g. `nano`).
684
684
685
685
```bash
686
686
samplesheet: "config/samples.tsv"
@@ -699,24 +699,38 @@ ncbi_db: resources/ncbi
699
699
700
700
n_unique_kmers: 1000
701
701
n_tax_reads: 200
702
+
```
702
703
704
+
And make a two column samplesheet file with the following content in a file called `samples.tsv`, also under `configs/`.
705
+
706
+
```tsv
707
+
sample fastq
708
+
foo data/foo.fq.gz
709
+
bar data/bar.fq.gz
703
710
```
704
711
712
+
:::{.callout-warning}
713
+
aMeta (v1.0.0) currently only supports single-end or pre-merged- data only!
714
+
:::
715
+
716
+
Once this config file is generated, we can start the run.
717
+
718
+
705
719
:::{.callout-note}
706
720
As this is only a dummy run (due to the large-ish computational resources required for KrakenUniq), we re-use some of the resource files here.
707
721
While this will produce nonsense output, it is used here to demonstrate how we would execute the pipeline.
708
-
709
722
:::
710
723
711
724
### Prepare and run aMeta
712
725
713
-
Make sure we're still in the `aMeta` conda environment, and in the main aMeta directory with the following.
726
+
Make sure we're still in the `aMeta` conda environment, and that we are still in the main aMeta directory with the following.
714
727
715
728
```bash
716
-
cd /<path/<to>/ancient-metagenomic-pipelines/ameta/aMeta/
729
+
conda activate aMeta
730
+
cd /<path/<to>/ancient-metagenomic-pipelines/aMeta/
717
731
```
718
732
719
-
And, finally, we are ready to run aMeta!
733
+
Finally, we are ready to run aMeta, where it will automatically pick up our config and samplesheet file we placed in `config/`!
All output files of the workflow are located in `aMeta/results` directory.
747
761
To get a quick overview of ancient microbes present in our samples we should check a heatmap in `results/overview_heatmap_scores.pdf`.
748
762
763
+
:::{.callout-warning}
764
+
If running during the summer school, you can use the following command to open the PDF file from the command line.
765
+
766
+
```bash
767
+
evince results/overview_heatmap_scores.pdf
768
+
```
769
+
:::
770
+
749
771
{#fig-gitgithub-fork-addname}
842
842
843
-
Make our commit to record the change to Git history (@fig-accessingdata-firstpagefig-gitgithub-fork-commitedit) and double check we've made the change ()
843
+
Make our commit to record the change to Git history (@fig-accessingdata-firstpagefig-gitgithub-fork-commitedit) and double check we've made the change (@fig-gitgithub-fork-confirmedit).
844
844
845
-
{#fig-gitgithub-fork-commitedit}
845
+
{#fig-accessingdata-firstpagefig-gitgithub-fork-commitedit}
846
846
847
847
{#fig-gitgithub-fork-confirmedit}
0 commit comments