Skip to content
Robert J. Gifford edited this page Mar 14, 2016 · 96 revisions

images/UserGuideCover.png

Database-integrated genome-screening (DIGS)

Much of the content of genomes consists of poorly characterised 'dark matter', such as transposons, pseudogenes, endogenous viral elements (EVEs) and obscure non-coding DNA elements. These sequences, even when non-functional, contain a wealth of useful biological information that can be explored by using sequence similarity searches in combination with strategically chosen reference sequence datasets.

In database-integrated genome-screening (DIGS), the output of sequence similarity search-based genome 'screens' is captured in a relational database. This facilitates the implementation of automated screens that can be performed on a large scale. In addition, it allows for the interrogation and manipulation of output data using structured query language (SQL), and provides all the benefits of a relational database management system (RDBMS) with respect to features such as data recoverability, multi-user support and network access.

The DIGS tool

The DIGS tool provides a computational framework for implementing DIGS. The tool is written in PERL. It uses the Basic Local Alignment Search Tool (BLAST) to perform sequence similarity searches, and the MySQL RDBMS to capture results and track progress.

I originally created the DIGS tool for my own research in the area of ‘paleovirology’, and I have primarily used it to screen genome sequence databases for sequences derived from viruses. However, I expect it may have broader utility, and it has been developed with aim of saving others the effort of creating a similar system.

How to use this guide

This guide presumes some basic familiarity with BLAST and command line programs. I suggest reading the introduction and installation sections first, and installing programs as necessary. At least a skim reading of section 2 is advisable before attempting one of the worked examples in section 3. The worked examples should provide a level of familiarity that allows the user to set up their own screening procedure - in parallel with a more careful reading of section 2.

DISCLAIMER

This program may contain bugs, both apparent and less apparent ones. I do not accept responsibility for any problems that arise from use of this software. Use entirely at your own risk.

Clone this wiki locally