-
Notifications
You must be signed in to change notification settings - Fork 23
analyze_scores
Martin Asser Hansen edited this page Oct 2, 2015
·
6 revisions
analyze_scores analyzes the Illumina quality scores encoded in ASCII in the stream and outputs some basic statistics like length and minimum, maximum, mean and median scores. analyze_scores add the following keys to the records with a SCORES key.
- SCORES_LEN
- SCORES_MIN
- SCORES_MAX
- SCORES_MEAN
- SCORES_MEDIAN
... | analyze_scores [options]
[-? | --help] # Print full usage description.
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to file - Default=STDOUT
[-v | --verbose] # Verbose output.
Consider the file test.fq
containing the single entry:
@ILLUMINA-52179E_0004:2:1:1040:5263#TTAGGC/1
TTCGGCATCGGCGGCGACGTTGGCGGCGGGGCCGGGCGGGTCGANNNCAT
+
GGFBGGEADFAFFDDD,-5AC5?!C:)7?#####################
To analyze the scores, read the file using read_fastq:
read_fastq -i test.fq | analyze_scores
SEQ_NAME: ILLUMINA-52179E_0004:2:1:1040:5263#TTAGGC/1
SEQ: TTCGGCATCGGCGGCGACGTTGGCGGCGGGGCCGGGCGGGTCGANNNCAT
SEQ_LEN: 50
SCORES: GGFBGGEADFAFFDDD,-5AC5?!C:)7?#####################
SCORES_LEN: 50
SCORES_MIN: 0
SCORES_MAX: 38
SCORES_MEAN: 17.86
SCORES_MEDIAN: 16
---
If you want to analyze multiple records use analyze_scores like this:
read_fastq -i test_big.fq.bz2 -n 100 | analyze_scores | analyze_vals -x | write_tab -cpx
+---------------+------------+-------+-------+-------+---------+-------+
| KEY | TYPE | COUNT | MIN | MAX | SUM | MEAN |
+---------------+------------+-------+-------+-------+---------+-------+
| SEQ_NAME | Alphabetic | 100 | 43 | 44 | 4358 | 43.58 |
| SEQ | Alphabetic | 100 | 50 | 50 | 5000 | 50.00 |
| SEQ_LEN | Numeric | 100 | 50 | 50 | 5000 | 50.00 |
| SCORES | Alphabetic | 100 | 50 | 50 | 5000 | 50.00 |
| SCORES_LEN | Numeric | 100 | 50 | 50 | 5000 | 50.00 |
| SCORES_MIN | Numeric | 100 | 0 | 32 | 925 | 9.25 |
| SCORES_MAX | Numeric | 100 | 34 | 38 | 3693 | 36.93 |
| SCORES_MEAN | Numeric | 100 | 15.30 | 36.84 | 3145.18 | 31.45 |
| SCORES_MEDIAN | Numeric | 100 | 2 | 37 | 3371 | 33.71 |
+---------------+------------+-------+-------+-------+---------+-------+
Martin Asser Hansen - Copyright (C) - All rights reserved.
October 2013
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
analyze_scores is part of the Biopieces framework.