diff --git a/MOP1.1/404.html b/MOP1.1/404.html index 96a7b9c..d6a8667 100644 --- a/MOP1.1/404.html +++ b/MOP1.1/404.html @@ -8,10 +8,10 @@ Master of Pores - + - + @@ -19,15 +19,15 @@
- - - - - - + + + + + + diff --git a/MOP1.1/assets/js/database.js b/MOP1.1/assets/js/database.js index c4bcf77..a0c2b2a 100644 --- a/MOP1.1/assets/js/database.js +++ b/MOP1.1/assets/js/database.js @@ -8,7 +8,7 @@ window.database = { "category": "", "content": "Amazon Web Service EC2The simplest option is running an EC2 instance interactively where the pipeline can be installed as pointed in the previous documentation pages.Last available Amazon Machine (AMI) we provide is: ami-0bf3a9a6cb7a5ea9f (Ubuntu 18.04, CUDA compatible, Docker 19.x, Singularity 3.2.1 and Nextflow 19.10 preinstalled)When running an instance among the different available types, minimum CPU and memory requirements must be taken into account. These must fit with Nextflow executor process configuration values. In our sample configuration files we use m4.2xlarge for CPU and p3.2xlarge for GPU as examples.Keep in mind that not all Amazon infrastructure regions may have the same instance types. As a example, in January 2020 Frankfurt has GPU nodes, but not Paris.Launch an instane from the AMI image above (Go to EC2 > Images > AMI and copy-paste the ID provided above filtering in public images). Once you find that image, you can launch an instance from it.You can connect to the launched instance by using this command below:ssh -i \"key-nf.pem\" ubuntu@xxx.eu-central-1.compute.amazonaws.comwhere key-nf.pem is your private key (reference) and host details can be obtained from Connect popup in EC2 instances dashboard.TerraformFor sake of commodity, you may prefer to automate deployment of EC2 instances and S3 buckets. Terraform is a convenient tool for this.Place terraform binary in your local workstation path and move where your are keeping your tf files. Examples are provided in the terraform directory of this repository.Adapt terraform configuration files to include your credentials, use your chosen instance types, which key pair they are associated with, or whether allow files in S3 bucket to be kept or not (force_destroy parameter).Initialize terraform directory:terraform initValidate terraform files:terraform validateInspect what changes are going to be performed in your AWS account:terraform planProceed:terraform applyOnce analyses are finished, infrastructure can be dismantled with:terraform destroyShare files in Amazon S3Amazon Simple Storage Service (S3) is a convenient web service storage system for sharing raw input and final output files between your premises and your computing cloud instances.Below we provide some instructions and advices to set up a S3 bucket.Some convenient instructions for S3 permisions in your EC2 instance can be found here. From the previous link you can learn how to retrieve the key and password to place in /root/.passwd-s3fsEnsure proper permission as well: chmod 600 /root/.passwd-s3fsYou can include in /etc/fstab the following mounting point (adapt according to your case):frankfurt-nf /mnt/frankfurt-nf fuse.s3fs _netdev,allow_other,passwd_file=/root/.passwd-s3fs,uid=1000,gid=1000 0 0If not mounted, you can mount it therefore straightforward by running:sudo mount /mnt/frankfurt-nfAdapt your S3 bucket and mounting point names according to your choice.Specially for huge amount of data, we suggest to use AWS CLI to transfer files from your premises to a S3 Bucket (Ref). For instance, the commandline below uploads the data example file in a pre-existing bucket.aws s3 cp multifast5_1.fast5 s3://frankfurt-nfModify your Nexflow configuration files in order to point your input files at the mounted S3 bucket. Both input and final output files can be placed in that mounted S3 storage, but we do not recommend that work Nextflow directory (containing pipeline intermediary files) is kept there, since it significatively delays the whole process. Choose a suitable disk size for your instance depending on the amount of data to be processed.", "url": "/cloud.html", - "href": "/cloud.html" + "href": "/master_of_pores/MOP1.1/cloud.html" } , @@ -22,7 +22,7 @@ window.database = { "category": "", "content": "[![Docker Build Status](https://img.shields.io/docker/automated/biocorecrg/nanopore.svg)](https://cloud.docker.com/u/biocorecrg/repository/docker/biocorecrg/nanopore/builds)[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3518291.svg)](https://doi.org/10.5281/zenodo.3518291)[![Nextflow version](https://img.shields.io/badge/Nextflow-20.01.0-brightgreen)](https://www.nextflow.io/)[![Singularity version](https://img.shields.io/badge/Singularity-v3.2.1-green.svg)](https://www.sylabs.io/)[![Docker version](https://img.shields.io/badge/Docker-v19.03-blue)](https://www.docker.com/)# ![Direct RNA nanopore analysis pipeline](https://raw.githubusercontent.com/biocorecrg/master_of_pores/master/docs/logo_master.jpg) # Nanopore analysis pipeline v1.5Nextflow pipeline for analysis of Nanopore data from direct RNA sequencing. This is a joint project between [CRG bioinformatics core](https://biocore.crg.eu/) and [Epitranscriptomics and RNA Dynamics research group](https://www.crg.eu/en/programmes-groups/novoa-lab). ## BackgroundThe direct RNA sequencing platform offered by Oxford Nanopore Technologies allows for direct measurement of RNA molecules without the need of conversion to complementary DNA (cDNA), and as such, is virtually capable of detecting any given RNA modification present in the molecule that is being sequenced.Although the technology has been publicly available since 2017, the complexity of the raw current intensity output data generated by nanopore sequencing, together with lack of systematic and reproducible pipelines for the analysis of direct RNA sequencing datasets, have greatly hindered the access of this technology to the general user. Here we provide an in silico scalable and parallelizable workflow for the analysis of direct RNA sequencing reads, which converts raw current intensities into multiple types of processed data, providing metrics of the quality of the run, per-gene counts, RNA modification predictions and polyA tail length predictions.The workflow named Master of Pores, which has been built using the Nextflow framework and is distributed with Docker and Singularity containers, can be executed on any Unix-compatible OS (both Linux and Mac OSX) on a computer, cluster or cloud without the need of installing any additional software or dependencies. The workflow is easily scalable, as it can incorporate updated software versions or algorithms that may be released in the future in a modular manner. We expect that our pipeline will make the analysis of direct RNA sequencing datasets highly simplified and accessible to the non-bioinformatic expert, and thus boost our understanding of the epitranscriptome with single molecule resolution.## Modules includedThe MasterOfPores workflow includes all steps needed to process raw FAST5 files produced by Nanopore direct RNA sequencing and executes the following steps, allowing users a choice among different algorithms. The pipeline consists of 4 modules:- ### Module 1: *NanoPreprocess*This module takes as input the raw Fast5 reads and produces as output base-called FASTQ and BAM. The pre-processing module performs base-calling, demultiplexing, filtering, quality control, mapping, read counting, generating a final report of the performance and results of each of the steps performed. It automatically detects the kind of input fast5 file (single or multi sequence).The NanoPreprocess module comprises 8 main steps:1. *Read base-calling* with the algorithm of choice, using ***Albacore*** (https://nanoporetech.com) or ***Guppy*** (https://nanoporetech.com). This step can be run in parallel and the user can decide the number of files to be processed in a single job by using the command --granularity. When using GPU the granularity is ignored and all the files are analyzed sequentially.2. *Filtering* of the resulting fastq files using ***Nanofilt*** (https://github.com/wdecoster/nanofilt). This step is optional and can be run in parallel.3. *Demultiplexing* of the fastq files using one of the following tools or combination of tools:- ***DeePlexiCon*** (basecaller= \"deeplexicon\")(https://github.com/Psy-Fer/deeplexicon). - ***Guppy*** (basecaller= \"guppy\") (https://nanoporetech.com).- ***Guppy + Readucks*** (basecaller= \"guppy-readucks\") (https://github.com/artic-network/readucks)This step is optional. DeePlexiCon needs the model as option in the params.config4. *Quality control* of the base-called data using ***MinIONQC*** (https://github.com/roblanf/minion_qc) and ***FastQC*** (http://www.bioinformatics.babraham.ac.uk/projects/fastqc).5. *Read mapping* to the reference genome or transcriptome using ***minimap2*** (https://github.com/lh3/minimap2), **Graphmap** (https://github.com/isovic/graphmap) or ***Graphmap2*** (https://github.com/lbcb-sci/graphmap2). 6. Quality control on the alignment using ***NanoPlot*** (https://github.com/wdecoster/NanoPlot) and ***bam2stats*** (https://github.com/lpryszcz/bin).7. *Gene or Isoform quantification* using ***HTSeq*** (https://htseq.readthedocs.io/) or ***NanoCount*** (https://github.com/a-slide/NanoCount) which estimates transcript abundance using an expectation-maximization algorithm. Of note, NanoCount is run if the reads have been mapped to the transcriptome, using the flag --reference_type transcriptome while HTseq is used when mapping to the genome. By default, reads are mapped to the genome and HTSeq is used to quantify per-gene counts. 8. *Final report* of the data processing using ***MultiQC*** (https://github.com/ewels/MultiQC) that combines the single quality controls done previously, as well as global run statistics. - ### Module 2: *NanoPreprocessSimple* This module is essentially nanopreprocess without basecalling. So it needs basecalled fastq files and optionally fast5 (if you need demultiplexing with ***DeePlexiCon***).- ### Module 3: *NanoTail* This module takes as input the output produced by the NanoPreprocess module and produces polyA tail estimations.The NanoTail module estimates polyA tail lengths using ***Nanopolish*** (https://github.com/jts/nanopolish) and ***Tailfindr*** (https://github.com/adnaniazi/tailfindr), producing a plain text file that includes polyA tail length estimates for each read, computed using both algorithms. The correlation between the two algorithms is also reported as a plot. - ### Module 4: *NanoMod* This module takes as input the rthe output produced by the NanoPreprocess module and produces a flat text file which includes the predicted RNA modifications.The NanoMod module predicts RNA modifications using ***Tombo*** (https://github.com/nanoporetech/tombo) and ***EpiNano*** (https://github.com/enovoa/EpiNano), producing a plain text file that is intersection of predicted sites both algorithms, to reduce the number of false positives. ## Citing this workIf you use this tool please cite our paper:\"MasterOfPores: A Workflow for the Analysis of Oxford Nanopore Direct RNA Sequencing Datasets\" Luca Cozzuto, Huanle Liu, Leszek P. Pryszcz, Toni Hermoso Pulido, Anna Delgado-Tejedor, Julia Ponomarenko, Eva Maria Novoa. Front. Genet., 17 March 2020. https://doi.org/10.3389/fgene.2020.00211", "url": "/", - "href": "/" + "href": "/master_of_pores/MOP1.1/" } , @@ -32,7 +32,7 @@ window.database = { "category": "", "content": "## Pre-requisitesFor using the pipeline [Nextflow](https://www.nextflow.io/) and a linux container engine (either [Docker](https://www.docker.com/) or [Singularity](https://sylabs.io/guides/3.1/user-guide/cli/singularity_apps.html)) need to be installed. The pipeline can be run in Mac OSX and Linux operative systems. ## Installation### 1. Install Nextflow (version 19.10.0)```bashcurl -s https://get.nextflow.io | bash```### 2. Clone the MasterOfPores repositoryThe pipeline can be cloned in this way using **git**:```bashgit clone --depth 1 https://github.com/biocorecrg/master_of_pores.git```### 3. Install Docker and/or Singularity - Docker: https://docs.docker.com/install/ (version 10.03 or later is required)- Singularity: https://sylabs.io/guides/2.6/user-guide/quick_start.html#quick-installation-steps (version 2.6.1 or 3.2.1 is required)### 4. Download Nanopore base-calling algorithmsBecause of redistribution restriction of the basecallers **Albacore** and **Guppy** we cannot provide them inside the docker image, so you would need to download the binaries from the official website https://nanoporetech.com and place them inside the **master_of_pores/NanoPreprocess/bin** folder.#### a) Both Albacore and Guppy```bashcd master_of_pores/NanoPreprocess/bintar -zvxf ont-guppy_3.1.5_linux64.tar.gzln -s ont-guppy_3.1.5_linux64/ont-guppy/bin/guppy_* .pip3 install --target=./albacore ont_albacore-2.1.7-cp36-cp36m-manylinux1_x86_64.whlln -s albacore/bin/read_fast5_basecaller.py .```#### b) AlbacoreDownload the wheel file.```bashcd master_of_pores/NanoPreprocess/binpip3 install --target=./albacore ont_albacore-2.1.7-cp36-cp36m-manylinux1_x86_64.whl$ ln -s albacore/bin/multi_to_single_fast5 $ ln -s albacore/bin/read_fast5_basecaller.py .```#### c) GuppyPlease note Guppy versions older than 3.1 (e.g. 3.0.3) only runs on CPUs.Newer versions (e.g. 3.1.5 and above) works on both CPUs and GPUs. The difference of speed between CPUs and GPU is more than 10 times.```bashcd master_of_pores/NanoPreprocess/bintar -zvxf ont-guppy_3.1.5_linux64.tar.gzln -s ont-guppy_3.1.5_linux64/ont-guppy/bin/guppy_* .````### 5. Optional step: install CUDA drivers (only needed for GPU support): In case you want to use the GPU you need to install the [CUDA drivers](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) ### 6. Run the pipeline:Using Singularity:```bashcd master_of_pores/NanoPreprocess/nextflow run preprocessing.nf -with-singularity```Using Docker:```bashcd master_of_pores/NanoPreprocess/nextflow run preprocessing.nf -with-docker``` ", "url": "/install.html", - "href": "/install.html" + "href": "/master_of_pores/MOP1.1/install.html" } , @@ -42,7 +42,7 @@ window.database = { "category": "", "content": "# NanoModThis module allows to predict the loci with RNA modifications starting from data produced by NanoPreprocess.## Workflow* **index_reference** index the reference file for Epinano* **call_variants** uses Samtools for calling the variants for Epinano* **calc_var_frequencies** it uses TSV_to_Variants_Freq.py3 for calculating the frequencies of each variants for Epinano* **predict_with_EPInano** It predicts the modifications with Epinano in parallel splitting the input file in 1 million rows* **combineEpinanoPred** It combine the results from Epinano * **resquiggling** resquiggle fast5 files for Tombo* **getModifications** it estimates the modifications using Tombo comparing WT vs KO## Input Parameters1. **input_path** path to the folders produced by NanoPreprocessing step.1. **comparison** tab separated text file containing the list of comparison. Here an example:```bashWT1 KO1WT2 KO2WT3 KO3```1. **reference** reference transcriptome1. **output** folder1. **coverage** read coverage threshold for prediction1. **tombo_opt** options for tombo1. **epinano_opt** options for epinano1. **email**## ResultsThree folders are produced by this module:1. Epinano, containing the results obtained with this method. You have a single file with putative modifications: ```bash#Kmer,Window,Ref,Coverage,q1,q2,q3,q4,q5,mis1,mis2,mis3,mis4,mis5,ins1,ins2,ins3,ins4,ins5,del1,del2,del3,del4,del5,prediction,dist,ProbM,ProbUAGTGG,394404:394405:394406:394407:394408,chr2,8.0:8.0:7.0:7.0:7.0,21.5,21.25,19.857,23.0,16.285999999999998,0.0,0.0,0.0,0.0,0.0,0.0,0.062,0.071,0.0,0.0,0.0,0.0,0.0,0.0,0.0,unm,19.26143361547619,3.00000089999998e-14,0.9999999999999699TTTTT,12150:12151:12152:12153:12154,chr8,3.0:3.0:3.0:3.0:3.0,0.0,16.5,18.5,16.0,16.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.33299999999999996,0.33299999999999996,0.0,0.0,unm,2.5976484688977424,0.06071658133381308,0.9392834186661868ACATT,438165:438166:438167:438168:438169,chr13,67.0:67.0:67.0:68.0:68.0,13.635,13.446,9.323,9.6,12.127,0.03,0.045,0.015,0.147,0.07400000000000001,0.0,0.0,0.0,0.0,0.0,0.06,0.03,0.075,0.11800000000000001,0.07400000000000001,unm,0.08435556637195174,0.519879422458087,0.48012057754191295...```and three plots in pdf indicating possible events related to insertion, deletion and mismatches, see the example below. 2. Tombo, containing the results obtained with this method in fasta format. You have one file for each comparison WT vs KO```bash>chr11:455562:- Est. Frac. Alternate: 0.98TGACA>chr12:1008723:- Est. Frac. Alternate: 0.98TATCT>chr15:491587:+ Est. Frac. Alternate: 0.96TATAT>chr10:425794:- Est. Frac. Alternate: 0.95ATGTT>chr13:510759:+ Est. Frac. Alternate: 0.95...```And for convenience a 6 bed files with the coordinates of the event.", "url": "/nanomod.html", - "href": "/nanomod.html" + "href": "/master_of_pores/MOP1.1/nanomod.html" } , @@ -52,7 +52,7 @@ window.database = { "category": "", "content": "# NanoPreprocessThis module takes as input the raw fast5 reads - single or multi - and produces a number of outputs (basecalled fast5, sequences in fastq format, aligned reads in BAM format etc). The pre-processing module performs base-calling, demultiplexing (optional), filtering, quality control, mapping to a genome / transcriptome reference, feature counting and it generates a final report of the performance and results of each of the steps performed. It automatically detects the kind of input fast5 file (single or multi sequence).## Workflow| Process name | Description || ------------- | ------------- ||**testInput**|Detection of kind of fast5 (multi or single)||**baseCalling**|Basecalling with *Albacore* or *Guppy* (up to guppy 4.0)||**demultiplexing**|Demultiplexing (optional)|**concatenateFastQFiles**|This process concatenates the fastq files produces for each single basecalling ||**QC**|Performed with *MinIONQC*||**fastQC**|Executed on fastq files||**mapping**|Mapping to genome / transcriptome with either *minimap2*, *graphmap* or*graphmap2*||**counting**|If mapping to the genome, it obtains counts per gene with *htseq-count*. Otherwise, if mapping to the transcriptome, transcript counts are generated with *NanoCount*. Reads are also assigned to a gene or to a transcript if they are uniquely mapping. A report file is also generated.||**alnQC2**|QC of aligned reads with *NanoPlot*. The plots PercentIdentityvsAverageBaseQuality_kde, LengthvsQualityScatterPlot_dot, HistogramReadlength and Weighted_HistogramReadlength are then merged together in a single picture.||**alnQC**|QC of aligned reads with *bam2stats*.||**cram_conversion**|Generating cram file from alignment.||**joinAlnQCs**|Merging the QC files generated by the alnQC step.||**joinCountQCs**|Merging the report files generated by the counting step.||**multiQC**|Final report generation - enventually sent by mail to the user too.| ## Input Parameters| Parameter name | Description || ------------- | --------------||**fast5 files**|Path to fast5 input files (single or multi-fast5 files). They should be inside a folder that will be used as sample name. **[/Path/sample_name/*.fast5]**||**reference**|File in fasta format. **[Reference_file.fa]**||**ref_type**| Specify if the reference is a genome or a transcriptome. **[genome / transcriptome]** ||**kit**|Kit used in library prep - required for basecalling.||**flowcell**|Flowcell used in sequencing - required for basecalling. ||**annotation**|Annotation file in GTF format. It is optional and needed only in case of mapping to the genome and when interested in gene counts. **[Annotation_file.gtf]** ||**seq_type**| Sequence type. **[RNA / DNA]** ||**output**|Output folder name. **[/Path/to_output_folder]**||**granularity**|indicates the number of input fast5 files analyzed in a single process. It is by default 4000 for single-sequence fast5 files and 1 for multi-sequence fast5 files. In case **GPU** option is turned on this value is not needed since every file will be analyzed sequentially.||**basecaller**|Algorithm to perform the basecalling. guppy or albacore are supported. **[albacore / guppy]**||**basecaller_opt**| Command line options for basecaller program. Check available options in respective basecaller repository.||**GPU**| Allow the pipeline to run with GPU. **[OFF / ON]**||**demultiplexing**| Demultiplexing algorithm to be used. **[OFF / deeplexicon / guppy / guppy-readucks]**||**demultiplexing_opt**|Command line options for the demultiplexing software. ||**demulti_fast5**| If performing demultiplexing, also generate demultiplexed multifast5 files. **[OFF / ON]**||**filter**| Program to filter fastq files. **[nanofilt / OFF]**||**filter_opt**| Command line options of the filtering program. ||**mapper**| Mapping algorithm. **[minimap2 / graphmap / graphmap2]** ||**mapper_opt**| Command line options of the mapping algorithm. ||**map_type**|Spliced - recommended for genome mapping - or unspliced - recommended for transcriptome mapping. **[spliced / unspliced]**||**counter**| Generating gene counts (genome mapping) or transcript counts (transcriptome mapping). **[YES / \"\"]**||**counter_opt**|Command line options of the counter program: NanoCount for transcripts and Htseq-count for genes.||**email**| Users email for receving the final report when the pipeline is finished. **[user_email]**|You can change them by editing the **params.config** file or using the command line - please, see next section. ## How to run the pipelineBefore launching the pipeline, user should decide which containers to use - either docker or singularity **[-with-docker / -with-singularity]**.Then, to launch the pipeline, please use the following command:```bashnextflow run nanopreprocess.nf -with-singularity > log.txt```* Run the pipeline in the background:```bashnextflow run nanopreprocess.nf -with-singularity -bg > log.txt```* Run the pipeline while changing **params.config** file:```bashnextflow run nanopreprocess.nf -with-singularity -bg --output test2 > log.txt```* Specify a directory for the working directory (temporary files location):```bashnextflow run nanopreprocess.nf -with-singularity -bg -w /path/working_directory > log.txt```* Run the pipeline with GPU - **CRG GPU cluster users**```bashnextflow run nanopreprocess.nf -with-singularity -bg -w /path/working_directory -profile cluster > log.txt```* Run the pipeline with GPU - **local GPU** ```bashnextflow run nanopreprocess.nf -with-singularity -bg -w /path/working_directory -profile local > log.txt```## Troubleshooting* Checking what has gone wrong: If there is an error, please see the log file (log.txt) for more details. Furthermore, if more information is needed, you can also find the working directory of the process in the file. Then, access that directory and check both the `.command.log` and `.command.err` files. * Resume an execution: Once the error has been solved or if you change a specific parameter, you can resume the execution with the **Netxtlow** parameter **-resume** (only one dash!). If there was an error, the pipeline will resume from the process that had the error and proceed with the rest. If a parameter was changed, only processes affected by this parameter will be re-run. ```bashnextflow run nanopreprocess.nf -with-singularity -bg -resume > log_resumed.txt```To check whether the pipeline has been resumed properly, please check the log file. If previous correctly executed process are found as *Cached*, resume worked!```...[warm up] executor > crg[e8/2e64bd] Cached process > baseCalling (RNA081120181_1)[b2/21f680] Cached process > QC (RNA081120181_1)[c8/3f5d17] Cached process > mapping (RNA081120181_1)...```**IMPORTANT:** To resume the execution, temporary files generated previously by the pipeline must be kept. Otherwise, pipeline will re-start from the beginning. ## Results:Several folders are created by the pipeline within the output directory specified by the **output** parameter and the input folder name is taken as sample name. * **fast5_files**: Contains the basecalled multifast5 files. Each batch contains 4000 sequences. * **fastq_files**: Contains one or, in case of demultiplexing, more fastq files.* **QC_files**: Contains each single QC produced by the pipeline.* **alignment**: Contains the bam file(s).* **cram_files**: Contains the cram file(s).* **counts (OPTIONAL)**: Contains read counts per gene / transcript if counting was performed.* **assigned (OPTIONAL)**: Contains assignment of each read to a given gene / transcript if counting was performed.* **report**: Contains the final multiqc report. * **variants (OPTIONAL)**: still experimental. It contains variant calling. -----------------------------------------------------# NanoPreprocessSimpleThis is a light version of NanoPreprocess that does not perform the basecalling step. It allows to make the same analysis starting from basecalled reads in fastq format. You can also provide fast5 files if you need to demultiplex using DeepLexiCon.This module will allow to run the pipeline on multiple input samples by using this syntax in the params.file:```bashfastq = \"$baseDir/../../../org_data/**/*.fastq.gz\"```In this way it will produces a number of output files with the same sample name indicated by the two asterisks.", "url": "/nanopreprocess.html", - "href": "/nanopreprocess.html" + "href": "/master_of_pores/MOP1.1/nanopreprocess.html" } , @@ -62,7 +62,7 @@ window.database = { "category": "", "content": "# NanoTailThis module allows to estimates polyA sizes by using two different methods (nanopolish and talifindr). it reads directly the output produced by NanoPreprocess and in particular it needs the read counts / assignment.# Workflow 1. **check_reference** It verifies whether the reference is zipped and eventually unzip it 1. **tailfindr** it runs *tailfindr* tool in parallel. 1. **collect_tailfindr_results** It collects the results of tailfindr. 1. **filter_bam** Bam files are filtered with *samtools* to keep only mapped reads and remove secondary alignments 1. **tail_nanopolish** It runs *nanopolish* in parallel. 1. **collect_nanopolish_results** It collects the results of tail_nanopolish. 1. **join_results** It merges the results from the two algorithms and make a plot of the correlation.## Input Parameters1. **input_folders** path to the folders produced by NanoPreprocessing step.1. **nanopolish_opt** options for the nanopolish program1. **tailfindr_opt** options for the tailfindr program1. **reference** reference genome / transcriptome1. **output** folder1. **email** ## ResultsThree folders are created by the pipeline within the output folder:1. NanoPolish: contains the output of *nanopolish* tool.1. Tailfindr: contains the output of *tailfindr* tool.1. PolyA_final: contains the txt files with the combined results (i.e. predicted polyA sizes). Here an example of a test:```bash\"Read name\"\t\"Tailfindr\"\t\"Nanopolish\"\t\"Gene Name\"\"013a5dde-9c52-4de1-83eb-db70fb2cd130\"\t52.16\t49.39\t\"YKR072C\"\"01119f62-ca68-458d-aa1f-cf8c8c04cd3b\"\t231.64\t274.28\t\"YDR133C\"\"0154ce9c-fe6b-4ebc-bbb1-517fdc524207\"\t24.05\t24.24\t\"YFL044C\"\"020cde28-970d-4710-90a5-977e4b4bbc46\"\t41.27\t56.79\t\"YGL238W\"```A plot is also produced for showing the correlation between the two methods.", "url": "/nanotail.html", - "href": "/nanotail.html" + "href": "/master_of_pores/MOP1.1/nanotail.html" } , diff --git a/MOP1.1/cloud.html b/MOP1.1/cloud.html index b8245c3..beef025 100644 --- a/MOP1.1/cloud.html +++ b/MOP1.1/cloud.html @@ -8,10 +8,10 @@ Running on the cloud - + - + @@ -19,15 +19,15 @@
- - - - - - + + + + + + diff --git a/MOP1.1/feed.xml b/MOP1.1/feed.xml index 98069f1..4591711 100644 --- a/MOP1.1/feed.xml +++ b/MOP1.1/feed.xml @@ -1 +1 @@ -Jekyll2024-12-03T17:35:19+00:00https://biocorecrg.github.io/master_of_pores/MOP1.1/feed.xmlMaster of PoresNextflow pipeline for analysis of Nanopore data from direct RNA sequencing. \ No newline at end of file +Jekyll2024-12-03T17:40:07+00:00https://biocorecrg.github.io/master_of_pores/MOP1.1/feed.xmlMaster of PoresNextflow pipeline for analysis of Nanopore data from direct RNA sequencing. \ No newline at end of file diff --git a/MOP1.1/index.html b/MOP1.1/index.html index 03e7466..7a092e4 100644 --- a/MOP1.1/index.html +++ b/MOP1.1/index.html @@ -8,10 +8,10 @@ Home - + - + @@ -19,15 +19,15 @@
- - - - - - + + + + + + diff --git a/MOP1.1/install.html b/MOP1.1/install.html index c0f82c4..d5b2a1b 100644 --- a/MOP1.1/install.html +++ b/MOP1.1/install.html @@ -8,10 +8,10 @@ Installation - + - + @@ -19,15 +19,15 @@
- - - - - - + + + + + + diff --git a/MOP1.1/nanomod.html b/MOP1.1/nanomod.html index 23d6fb1..052e9d9 100644 --- a/MOP1.1/nanomod.html +++ b/MOP1.1/nanomod.html @@ -8,10 +8,10 @@ NanoMod - + - + @@ -19,15 +19,15 @@
- - - - - - + + + + + + diff --git a/MOP1.1/nanopreprocess.html b/MOP1.1/nanopreprocess.html index 11a2689..37771ef 100644 --- a/MOP1.1/nanopreprocess.html +++ b/MOP1.1/nanopreprocess.html @@ -8,10 +8,10 @@ NanoPreprocess - + - + @@ -19,15 +19,15 @@
- - - - - - + + + + + + diff --git a/MOP1.1/nanotail.html b/MOP1.1/nanotail.html index 882e81c..c8b2535 100644 --- a/MOP1.1/nanotail.html +++ b/MOP1.1/nanotail.html @@ -8,10 +8,10 @@ NanoTail - + - + @@ -19,15 +19,15 @@
- - - - - - + + + + + +