-
Notifications
You must be signed in to change notification settings - Fork 23
remove_primers
Locating and removing primers from amplicon data sets (PCR amplified regions - typically 16S rRNA)
can be done with remove_primers that will locate forward and reverse primers and remove these
from the sequence if found. It is possible to use ambiguity codes and allow for a given amount of
mismatches, insertions, and deletions. If the forward primer is found FORWARD_POS
and FORWARD_LEN
keys are added to the record and the sequence and scores are trimmed. If the reverse primer is found
REVERSE_POS
and REVERSE_LEN
keys are add to the record and the sequence and scores are trimmed.
The resulting records look like this:
SEQ_NAME: test
SEQ: ACTGGGTGGAGCACATCAA
SEQ_LEN: 19
FORWARD_POS: 3
FORWARD_LEN: 11
REVERSE_POS: 66
REVERSE_LEN: 12
---
For removing partial primers .
... | remove_primers <-f primer> <-r primer> [options]
[-? | --help] # Print full usage description.
[-f <string> | --adaptor=<string>] # Forward primer to locate.
[-r <string> | --adaptor=<string>] # Reverse primer to locate.
[-m <uint> | --mismatches=<uint>] # Max mismatch percent allowed - Default=2
[-i <uint> | --insertions=<uint>] # Max insertion percent allowed - Default=1
[-d <uint> | --deletions=<uint>] # Max deletion percent allowed - Default=1
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
Consider the following FASTA entry in the file test.fna
:
>test
tcagtACTGAGCTAGCAGCGGTGCGccgcaaacgacggtgaccaggcgcaggcggcgagcaccgcattctgcggTGCTGGACTGGGTGGAGCACatcaa
To check for the primers:
FORWARD: ACTGAGCTAGCAGCGGTGCG
REVERSE: GTGCTCCACCCAGTCCAGCA
We need to reverse-complement the reverse primer:
REVERSE-RC: TGCTGGACTGGGTGGAGCAC
Then we can do:
read_fasta -i test.fna | remove_primers -f ACTGAGCTAGCAGCGGTGCG -r TGCTGGACTGGGTGGAGCAC
SEQ_NAME: test
SEQ: ccgcaaacgacggtgaccaggcgcaggcggcgagcaccgcattctgcg
SEQ_LEN: 48
FORWARD_POS: 6
FORWARD_LEN: 19
REVERSE_POS: 48
REVERSE_LEN: 21
---
Notice that the above removes an extra nucleotide in the 3' end because of the allowed insertions/deletions. Allowing zero mismatches, insertions, and deletions gives:
read_fasta -i test.fna | remove_primers -f ACTGAGCTAGCAGCGGTGCG -r TGCTGGACTGGGTGGAGCAC -m 0 -i 0 -d 0
SEQ_NAME: test
SEQ: ccgcaaacgacggtgaccaggcgcaggcggcgagcaccgcattctgcgg
SEQ_LEN: 49
FORWARD_POS: 5
FORWARD_LEN: 20
REVERSE_POS: 49
REVERSE_LEN: 20
---
If we look for a forward primer that is not found we get:
read_fasta -i test.fna | remove_primers -f TTTTTTTTTTTTTTTTTTTT -r TGCTGGACTGGGTGGAGCAC
SEQ_NAME: test
SEQ: tcagtACTGAGCTAGCAGCGGTGCGccgcaaacgacggtgaccaggcgcaggcggcgagcaccgcattctgcg
SEQ_LEN: 73
REVERSE_POS: 73
REVERSE_LEN: 21
---
If we look for a reverse primer that is not found we get:
read_fasta -i test.fna | remove_primers -f ACTGAGCTAGCAGCGGTGCG -r TTTTTTTTTTTTTTTTTTTT
SEQ_NAME: test
SEQ: ccgcaaacgacggtgaccaggcgcaggcggcgagcaccgcattctgcggTGCTGGACTGGGTGGAGCACatcaa
SEQ_LEN: 74
FORWARD_POS: 6
FORWARD_LEN: 19
---
Martin Asser Hansen - Copyright (C) - All rights reserved.
November 2011
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
remove_primers is part of the Biopieces framework.