get_genome_seq

Biopiece: get_genome_seq

Description

get_genome_seq can be used to get subsequences from a specified genome that have been indexed with format_genome. The subsequence can be obtained explicitly or from BED/PSL/BLAST entries in the stream.

Use list_genomes to see available genome sequences.

Usage

get_genome_seq [options] <-g genome> <-c chr> <-b beg> <-e end|-l len>

or

... | get_genome_seq [options] <-g genome>

Options

[-?          | --help]               #  Print full usage description.
[-g <genome> | --genome=<genome>]    #  Genome to get subsequence from.
[-c <string> | --chr=<string>]       #  Chromosome with requested subsequence.
[-b <uint>   | --beg=<uint>]         #  Begin position of subsequence (first residue=1).
[-e <uint>   | --end=<uint>]         #  End position of subsequence.
[-l <uint>   | --len=<uint>]         #  Length of subsequence.
[-f <uint>   | --flank=<uint>]       #  Include flanking sequence.
[-m          | --mask]               #  Softmask non-exonic sequence.
[-s          | --splice]             #  Splice sequence BLOCKS in BED/PSL type records.
[-I <file!>  | --stream_in=<file!>]  #  Read input from stream file  -  Default=STDIN
[-O <file>   | --stream_out=<file>]  #  Write output to stream file  -  Default=STDOUT
[-v          | --verbose]            #  Verbose output.

Examples

To get an explicit subsequence from the human genome (currently hg18) do:

get_genome_seq -g hg18 -c chrX -b 12000 -l 20

CHR_END: 12018
SEQ: GGTGCAGTAACACCTGCCGT
CHR_BEG: 11999
SEQ_LEN: 20
CHR: chrX
---

If you have a stream with BED, PSL or BLAST records, you obtain the subsequence simple by piping the stream through get_genome_seq and the sequence will be added to the record. Below is an example for a BED entry:

read_bed -i <BED file> -n 1 | get_genome_seq -g hg18

STRAND: +
CHR_END: 95127728
Q_ID: gi|108087685|gb|DQ594132.1|_Homo_sapiens_piRNA_piR-60244,_complete_sequence
CHR_BEG: 95127700
SCORE: 1
REC_TYPE: BED
BED_LEN: 29
CHR: chr15
SEQ: TTCACTTCTCCCATGTAGTTCCTGAGTGC
BED_COLS: 6
---

Now consider the following 12 column BED entry in the file test.bed:

chr4    70946   71196   AA699063        0       -       70946   71196   0       2       142,55, 0,195,

To obtain the dm3 sequence for this EST, do:

read_bed -i test.bed | get_genome_seq -g dm3 | write_fasta -xw 80

>AA699063
CTTTAGTAATGCTAGTGATCGCGCAAAGCATCAAAATCGAACACACAGTAATGAGGTAAGTCTTTCATCTATATAAAAGT
AATATTTTTCATTTATAGCTATTTTTAGAAACCGTACATTTGTAAAGCACCTGGATGCACAAAACGTTACACCGACCCGA
GCTCTTTGAGAAAACATGTAAAAACAGTCCATGGAGCTGAATTTTATGCAAATAAAAAACATAAGGGGTTGCCTCTAAAT
GACGCAAACT

Now use the -m switch to soft mask the intron so that exons are in uppercase sequence and the rest are in lowercase:

read_bed -i test.bed | get_genome_seq -mg dm3 | write_fasta -xw 80

>AA699063
CTTTAGTAATGCTAGTGATCGCGCAAAGCATCAAAATCGAACACACAGTAATGAGGTAAGTCTTTCATCTATATAAAAGT
AATATTTTTCATTTATAGCTATTTTTAGAAACCGTACATTTGTAAAGCACCTGGATGCACAAaacgttacaccgacccga
gctctttgagaaaacatgtaaaaacagtccatggaGCTGAATTTTATGCAAATAAAAAACATAAGGGGTTGCCTCTAAAT
GACGCAAACT

To splice the exon blocks use the -s switch:

read_bed -i test.bed | get_genome_seq -sg dm3 | write_fasta -xw 80

>AA699063
CTTTAGTAATGCTAGTGATCGCGCAAAGCATCAAAATCGAACACACAGTAATGAGGTAAGTCTTTCATCTATATAAAAGT
AATATTTTTCATTTATAGCTATTTTTAGAAACCGTACATTTGTAAAGCACCTGGATGCACAAGCTGAATTTTATGCAAAT
AAAAAACATAAGGGGTTGCCTCTAAATGACGCAAACT

Author

[email protected]

August 2007

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

get_genome_seq is part of the Biopieces framework.

http://www.biopieces.org

get_genome_seq

Biopiece: get_genome_seq

Description

Usage

Options

Examples

See also

Author

License

Help

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!