This project, Gene Prediction by Similarity Approach (GPSA), aims to provide a graphical tool (Qt) to identify the genomic region of an unknown DNA sequence based on well-known genes by similarity approach. The program allows users to select an unknown DNA sequence, a template or mRNA (well-known gene), and specify scoring scheme (for a match, mismatch or gap) and the threshold of the alignment score. Local Alignment algorithm will be used to align the two sequences. A putative exon is an alignment according to the given threshold found in the Local Alignment process. Exon Chaining algorithm will be used to find the maximum chain of non-overlapping putative genomic regions.
- Files paths: browse to select Unknown DNA Sequence and DNA template accordingly. It filters to select *.txt or *.fasta (ignore line that starts with character '>') files.
- Scoring scheme:
- Match: spin box of integer ranges from 1 to 99
- Mismatch and Gap: spin box of integer range from -99 to -1
- Threshold: textfield that can only accept integer
- View Alignment Scores: button to run the Local Alignment algorithm on given sequences
- Find Exons: button to run the Exon Chaining algorithm on the result of Local Alignment
- Table: to display the result of Local Alignment and Exon Chaining
- Start and End: format as index_of_template-index_of_unknown_sequence
- Score: score of the alignment from the scoring scheme
- View Alignment: double click to display the alignment in the plain textfield below
- Alignment: to display the selected alignment
Demo: Download GPSA.zip and run GenePrediction.exe