Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 6 revisions

Biopiece: remove_primers

Description

Locating and removing primers from amplicon data sets (PCR amplified regions - typically 16S rRNA) can be done with remove_primers that will locate forward and reverse primers and remove these from the sequence if found. It is possible to use ambiguity codes and allow for a given amount of mismatches, insertions, and deletions. If the forward primer is found FORWARD_POS and FORWARD_LEN keys are added to the record and the sequence and scores are trimmed. If the reverse primer is found REVERSE_POS and REVERSE_LEN keys are add to the record and the sequence and scores are trimmed. The resulting records look like this:

SEQ_NAME: test
SEQ: ACTGGGTGGAGCACATCAA
SEQ_LEN: 19
FORWARD_POS: 3
FORWARD_LEN: 11
REVERSE_POS: 66
REVERSE_LEN: 12
---

For removing partial primers .

Usage

... | remove_primers <-f primer> <-r primer> [options]

Options

[-?          | --help]               #  Print full usage description.
[-f <string> | --adaptor=<string>]   #  Forward primer to locate.
[-r <string> | --adaptor=<string>]   #  Reverse primer to locate.
[-m <uint>   | --mismatches=<uint>]  #  Max mismatch percent allowed   -  Default=2
[-i <uint>   | --insertions=<uint>]  #  Max insertion percent allowed  -  Default=1
[-d <uint>   | --deletions=<uint>]   #  Max deletion percent allowed   -  Default=1
[-I <file!>  | --stream_in=<file!>]  #  Read input from stream file    -  Default=STDIN
[-O <file>   | --stream_out=<file>]  #  Write output to stream file    -  Default=STDOUT
[-v          | --verbose]            #  Verbose output.

Examples

Consider the following FASTA entry in the file test.fna:

>test
tcagtACTGAGCTAGCAGCGGTGCGccgcaaacgacggtgaccaggcgcaggcggcgagcaccgcattctgcggTGCTGGACTGGGTGGAGCACatcaa

To check for the primers:

FORWARD: ACTGAGCTAGCAGCGGTGCG
REVERSE: GTGCTCCACCCAGTCCAGCA

We need to reverse-complement the reverse primer:

REVERSE-RC: TGCTGGACTGGGTGGAGCAC

Then we can do:

read_fasta -i test.fna | remove_primers -f ACTGAGCTAGCAGCGGTGCG -r TGCTGGACTGGGTGGAGCAC

SEQ_NAME: test
SEQ: ccgcaaacgacggtgaccaggcgcaggcggcgagcaccgcattctgcg
SEQ_LEN: 48
FORWARD_POS: 6
FORWARD_LEN: 19
REVERSE_POS: 48
REVERSE_LEN: 21
---

Notice that the above removes an extra nucleotide in the 3' end because of the allowed insertions/deletions. Allowing zero mismatches, insertions, and deletions gives:

read_fasta -i test.fna | remove_primers -f ACTGAGCTAGCAGCGGTGCG -r TGCTGGACTGGGTGGAGCAC -m 0 -i 0 -d 0

SEQ_NAME: test
SEQ: ccgcaaacgacggtgaccaggcgcaggcggcgagcaccgcattctgcgg
SEQ_LEN: 49
FORWARD_POS: 5
FORWARD_LEN: 20
REVERSE_POS: 49
REVERSE_LEN: 20
---

If we look for a forward primer that is not found we get:

read_fasta -i test.fna | remove_primers -f TTTTTTTTTTTTTTTTTTTT -r TGCTGGACTGGGTGGAGCAC

SEQ_NAME: test
SEQ: tcagtACTGAGCTAGCAGCGGTGCGccgcaaacgacggtgaccaggcgcaggcggcgagcaccgcattctgcg
SEQ_LEN: 73
REVERSE_POS: 73
REVERSE_LEN: 21
---

If we look for a reverse primer that is not found we get:

read_fasta -i test.fna | remove_primers -f ACTGAGCTAGCAGCGGTGCG -r TTTTTTTTTTTTTTTTTTTT
SEQ_NAME: test
SEQ: ccgcaaacgacggtgaccaggcgcaggcggcgagcaccgcattctgcggTGCTGGACTGGGTGGAGCACatcaa
SEQ_LEN: 74
FORWARD_POS: 6
FORWARD_LEN: 19
---

See also

read_fasta

find_adaptor

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

November 2011

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

remove_primers is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally