Efficient computation of A-specific string w.r.t. a set {B,C,...,Z} of other long reads samples. A A-specific string is a string which occur only in sample A and not in the others.
Note: This repository is now depracated and maintained for historical reasons only. Please use SVDSS instead.
- compute strings specific to child w.r.t. parents
- compute strings specific to individual A from population PA w.r.t. individual B from population PB
C++11-compliant compiler (GCC 8.2 or newer), ropebwt2 and htslib. For convenience, ropebwt2 and htslib are included in the repository.
git clone --recursive https://github.com/Parsoa/PingPong.git
cd PingPong
cd ropebwt2 ; make ; cd ..
cd htslib ; make ; cd ..
make
You can now run PingPong by adding the clone directory to PATH. Because the package uses an internal clone of htslib, the shared objects will be in non-standard locations and have to be manually specified before running:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/clone/dir/htslib
Let's assume we have 3 samples A, B, and, C. To compute A-specific strings we have to:
- Index samples B and C:
./PingPong index --binary --fastq /path/to/sample/B --index B.index.bin
./PingPong index --append B.index.bin --fastq /path/to/sample/C --index BC.index.fmd
- Search for A-specific strings in the index
./PingPong search --index [B.index.bin] --fastq /path/to/sample/A --threads [nthreads]
The algorithm will output multiple files named solutions_batch_<i>.sfs
with the list of A-specific strings. Each string is defined in terms of:
- identifier of the read it comes from (a
*
means "same identifier as previous SFS") - sequence
- starting position on the read
- length
- number of occurrences (we note that from this first pass, this number is always set to 1)
- Convert the
n
.sfs files to FASTQ (output to stdout):
./PingPong convert --batches n > /path/to/all-sfs.fq
Usage: PingPong index [--binary] [--append /path/to/binary/index] --fastq /path/to/fastq --index /path/to/output/index
Optional arguments:
-b, --binary output index in binary format
-a, --append append to existing index (must be stored in binary)
Usage: PingPong search --index /path/to/index/file --fastq /path/to/fastq [--threads threads]
Optional arguments:
--workdir create output files in this directory (default:.)
--overlap -1/0 run the exact algorithm (-1) or the relaxed one (0) (default:0)
-t, --threads number of threads (default:4)
Usage: PingPong convert --batches num_sfs_files
Optional arguments:
--workdir create output files in this directory (default:.)
- To append (
-a
) to an existing index, the existing index must be stored in binary format (-b
option) - An index built with
--binary
cannot be queried. Use--binary
only for indices that are meant to be later appended to. - The output file iscreated in the current directory (if
--workdir
is not set) - Even when indexing a FASTA file, pass it with the
--fastq
option.
./PingPong index --binary --fastq example/father.fq --index example/father.fq.bin
./PingPong index --append example/father.fq.bin --fastq example/mother.fq --index example/index.fmd
./PingPong search --index example/index.fmd --fastq example/child.fq --overlap -1 --workdir example --threads 1
This will output strings that are specific to child.fq
in example/solution_batch_0.sfs
. To convert it to .fq
, run:
./PingPong convert --workdir example --batches 1 > example/child-sfs.fq
For inquiries on this software please open an issue or contact either Parsoa Khorsand or Luca Denti.
PingPong is now published in Bioinformatics Advances.