Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Coverage Statistics Calculation Script for Mosdepth #460

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

bshifaw
Copy link
Collaborator

@bshifaw bshifaw commented Jul 26, 2024

This pull request introduces a new script to calculate coverage statistics from Mosdepth output files. The script processes coverage values, calculates various summary statistics, and outputs the results in a JSON file.

Argument Parsing:
Command-line arguments for input file, coverage column, output prefix, rounding precision, and debug mode.

Statistics Calculation:
Calculation of mean, quartiles, median, interquartile range, standard deviation, mean absolute deviation, percentage of coverage values above 4x, and evenness score.
JSON Output: Write the calculated statistics to a JSON file.

Changes:
Added docker/lr-mosdepth/coverage_stats.py with functions for argument parsing, file handling, and statistics calculation.

coverage_stats.py script
Example log:

> python /Users/longreadpipes/docker/lr-mosdepth/coverage_stats.py  --cov_col 4 --round 2 --output_prefix test_example.coverage_over_bed test_example.text  

INFO:root:Arguments: Namespace(mosdepth_regions='test_example.text', cov_col=4, output_prefix='test_example.coverage_over_bed', round=2, debug=False)
INFO:root:Calculating coverage statistics
INFO:root:Opened file: test_example.text
INFO:root:Percentage of coverage values greater than 4x: 1.0
INFO:root:Evenness score: 0.92
INFO:root:Summary statistics: {'mean_cov': 16.0, 'q1_cov': 15.0, 'median_cov': 16.0, 'q3_cov': 17.75, 'iqr_cov': 2.75, 'sstdev_cov': 3.69, 'mad_cov': 2.67, 'percent_above_4x': 1.0, 'evenness_score': 0.92}
INFO:root:Writing summary statistics to file: test_example.coverage_over_bed.cov_stat_summary.json

Example output file

> cat test_example.coverage_over_bed.cov_stat_summary.json  

{"mean_cov": 16.0, "q1_cov": 15.0, "median_cov": 16.0, "q3_cov": 17.75, "iqr_cov": 2.75, "sstdev_cov": 3.69, "mad_cov": 2.67, "percent_above_4x": 1.0, "evenness_score": 0.92}   

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant