-
Notifications
You must be signed in to change notification settings - Fork 6
Home
Robert J. Gifford
MRC-University of Glasgow Centre for Virus Research
BACKGROUND
Much of the content of genomes consists of poorly characterised 'dark matter', such as transposons, pseudogenes, endogenous viral elements (EVEs) and obscure non-coding DNA elements. These sequences, even when non-functional, contain a wealth of useful biological information that can be explored by using sequence similarity searches in combination with strategically chosen reference sequence datasets.
In database-integrated genome-screening (DIGS), the output of sequence similarity search-based genome 'screens' is captured in a relational database. This facilitates the implementation of automated screens that can be performed on a large scale. In addition, it allows for the interrogation and manipulation of output data using structured query language (SQL), and provides all the benefits of a relational database management system (RDBMS) with respect to features such as data recoverability, multi-user support and network access.
The DIGS tool provides a computational framework for implementing DIGS. The tool is written in PERL. It uses the Basic Local Alignment Search Tool (BLAST) to perform sequence similarity searches, and the MySQL RDBMS to capture results and track progress.
I originally created the DIGS tool for my own research in the area of ‘paleovirology’, and I have primarily used it to screen genome sequence databases for sequences derived from viruses. However, I expect it may have broader utility, and it has been developed with aim of saving others the effort of creating a similar system.
How to use this guide
This guide presumes some basic familiarity with BLAST and command line programs. I suggest reading the introduction and installation pages first, and installing programs as necessary. At least a skim reading of section 2 is recommendable before attempting one of the worked examples in section 3. The worked examples should provide a level of familiarity that allows the user to set up their own screening procedure - in parallel with a more careful reading of section 2.
DISCLAIMER
This program probably contains bugs, both apparent and less apparent ones. I do not accept responsibility for any problems that arise from use of this software. Use entirely at your own risk.