Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 6 revisions

Biopiece: analyze_scores

Description

analyze_scores analyzes the Illumina quality scores encoded in ASCII in the stream and outputs some basic statistics like length and minimum, maximum, mean and median scores. analyze_scores add the following keys to the records with a SCORES key.

  • SCORES_LEN
  • SCORES_MIN
  • SCORES_MAX
  • SCORES_MEAN
  • SCORES_MEDIAN

Usage

... | analyze_scores [options]

Options

[-?         | --help]               #  Print full usage description.
[-I <file!> | --stream_in=<file!>]  #  Read input from stream file  -  Default=STDIN
[-O <file>  | --stream_out=<file>]  #  Write output to file         -  Default=STDOUT
[-v         | --verbose]            #  Verbose output.

Examples

Consider the file test.fq containing the single entry:

@ILLUMINA-52179E_0004:2:1:1040:5263#TTAGGC/1
TTCGGCATCGGCGGCGACGTTGGCGGCGGGGCCGGGCGGGTCGANNNCAT
+
GGFBGGEADFAFFDDD,-5AC5?!C:)7?#####################

To analyze the scores, read the file using read_fastq:

read_fastq -i test.fq | analyze_scores

SEQ_NAME: ILLUMINA-52179E_0004:2:1:1040:5263#TTAGGC/1
SEQ: TTCGGCATCGGCGGCGACGTTGGCGGCGGGGCCGGGCGGGTCGANNNCAT
SEQ_LEN: 50
SCORES: GGFBGGEADFAFFDDD,-5AC5?!C:)7?#####################
SCORES_LEN: 50
SCORES_MIN: 0
SCORES_MAX: 38
SCORES_MEAN: 17.86
SCORES_MEDIAN: 16
---

If you want to analyze multiple records use analyze_scores like this:

read_fastq -i test_big.fq.bz2 -n 100  | analyze_scores | analyze_vals -x | write_tab -cpx

+---------------+------------+-------+-------+-------+---------+-------+
| KEY           | TYPE       | COUNT | MIN   | MAX   | SUM     | MEAN  |
+---------------+------------+-------+-------+-------+---------+-------+
| SEQ_NAME      | Alphabetic |   100 |    43 |    44 |    4358 | 43.58 |
| SEQ           | Alphabetic |   100 |    50 |    50 |    5000 | 50.00 |
| SEQ_LEN       | Numeric    |   100 |    50 |    50 |    5000 | 50.00 |
| SCORES        | Alphabetic |   100 |    50 |    50 |    5000 | 50.00 |
| SCORES_LEN    | Numeric    |   100 |    50 |    50 |    5000 | 50.00 |
| SCORES_MIN    | Numeric    |   100 |     0 |    32 |     925 |  9.25 |
| SCORES_MAX    | Numeric    |   100 |    34 |    38 |    3693 | 36.93 |
| SCORES_MEAN   | Numeric    |   100 | 15.30 | 36.84 | 3145.18 | 31.45 |
| SCORES_MEDIAN | Numeric    |   100 |     2 |    37 |    3371 | 33.71 |
+---------------+------------+-------+-------+-------+---------+-------+

See also

read_fastq

analyze_vals

analyze_seq

write_tab

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

October 2013

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

analyze_scores is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally