Skip to content

Commit 2c2b8e3

Browse files
committed
Lot's of small fixes
1 parent ec752c2 commit 2c2b8e3

31 files changed

+445
-420
lines changed

.github/dependabot.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,4 @@ updates:
77
directory: "/"
88
schedule:
99
# Check for updates to GitHub Actions every week
10-
interval: "weekly"
10+
interval: "weekly"

.github/workflows/build.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
name: RNAseq normalization CI
1+
name: RNA-seq normalization CI
22
on:
33
push:
44
branches:
@@ -13,14 +13,16 @@ jobs:
1313
strategy:
1414
fail-fast: false
1515
matrix:
16-
tox-env: [py38, py39, py310, docs, linters, package]
16+
tox-env: [py38, py39, py310, py311, docs, linters, package]
1717
include:
1818
- tox-env: py38
1919
python-version: 3.8
2020
- tox-env: py39
2121
python-version: 3.9
2222
- tox-env: py310
2323
python-version: "3.10"
24+
- tox-env: py311
25+
python-version: "3.11"
2426
- tox-env: docs
2527
python-version: "3.10"
2628
- tox-env: linters

MANIFEST.in

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,5 @@ include tox.ini
22
include .readthedocs.yaml
33
recursive-include docs *.py *.rst
44
recursive-include tests *.py
5-
recursive-include src/rnanorm/files *.csv.gz *.gtf.gz
6-
recursive-include tests/files *.tsv
5+
recursive-include src/rnanorm/files *.csv.gz *.gtf.gz *.csv *.gtf
6+
recursive-include tests/files *.tsv

README.rst

Lines changed: 75 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
====================
2-
RNAseq normalization
3-
====================
1+
=====================
2+
RNA-seq normalization
3+
=====================
44

55
|build| |black| |docs| |pypi_version| |pypi_pyversions| |pypi_downloads|
66

@@ -29,7 +29,7 @@ RNAseq normalization
2929
:alt: Number of downloads from PyPI
3030

3131

32-
Python implementation of common RNAseq normalization methods:
32+
Python implementation of common RNA-seq normalization methods:
3333

3434
- CPM (Counts per million)
3535
- FPKM_ (Fragments per kilobase million)
@@ -39,22 +39,25 @@ Python implementation of common RNAseq normalization methods:
3939
- TMM_ (Trimmed mean of M-values)
4040
- CTF_ (Counts adjusted with TMM factors)
4141

42+
For in-depth description of methods see documentation_.
4243

4344
.. _FPKM: https://www.nature.com/articles/nmeth.1226
4445
.. _TPM: https://link.springer.com/article/10.1007/s12064-012-0162-3
4546
.. _UQ: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-11-94
4647
.. _CUF: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02568-9/
4748
.. _TMM: https://genomebiology.biomedcentral.com/articles/10.1186/gb-2010-11-3-r25
4849
.. _CTF: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02568-9/
50+
.. _documentation: https://rnanorm.readthedocs.io/
51+
4952

5053
Features
5154
========
5255

5356
- Pure Python implementation (no need for R, etc.)
54-
- Scikit-learn_ compatible
57+
- Compatible with Scikit-learn_
5558
- Command line interface
56-
- Verbose documentation_ (at least we hope so...)
57-
- Tested methods
59+
- Verbose documentation_
60+
- Validated method implementation
5861

5962

6063
.. _Scikit-learn: https://scikit-learn.org/
@@ -72,37 +75,86 @@ We recommend installing RNAnorm with pip::
7275
Quick start
7376
===========
7477

75-
Implemented methods can be used from Python or from the command line.
78+
The implemented methods can be executed from Python or from the command line.
7679

7780
Normalize from Python
7881
---------------------
7982

80-
Most commonly normalization methods are run from Python. E.g.::
83+
The most common use case is to run normalization from Python::
8184

82-
>>> from rnanorm.datasets import load_rnaseq_toy
85+
>>> from rnanorm.datasets import load_toy_data
8386
>>> from rnanorm import FPKM
84-
>>> dataset = load_rnaseq_toy()
87+
>>> dataset = load_toy_data()
88+
>>> # Expressions need to have genes in columns and samples in rows
8589
>>> dataset.exp
86-
G1 G2 G3 G4 G5
87-
S1 200.0 300.0 500.0 2000.0 7000.0
88-
S2 400.0 600.0 1000.0 4000.0 14000.0
89-
S3 200.0 300.0 500.0 2000.0 17000.0
90-
S4 200.0 300.0 500.0 2000.0 2000.0
90+
Gene_1 Gene_2 Gene_3 Gene_4 Gene_5
91+
Sample_1 200 300 500 2000 7000
92+
Sample_2 400 600 1000 4000 14000
93+
Sample_3 200 300 500 2000 17000
94+
Sample_4 200 300 500 2000 2000
9195
>>> fpkm = FPKM(dataset.gtf_path).set_output(transform="pandas")
9296
>>> fpkm.fit_transform(dataset.exp)
93-
G1 G2 G3 G4 G5
94-
S1 100000.0 100000.0 100000.0 200000.0 700000.0
95-
S2 100000.0 100000.0 100000.0 200000.0 700000.0
96-
S3 50000.0 50000.0 50000.0 100000.0 850000.0
97-
S4 200000.0 200000.0 200000.0 400000.0 400000.0
97+
Gene_1 Gene_2 Gene_3 Gene_4 Gene_5
98+
Sample_1 100000.0 100000.0 100000.0 200000.0 700000.0
99+
Sample_2 100000.0 100000.0 100000.0 200000.0 700000.0
100+
Sample_3 50000.0 50000.0 50000.0 100000.0 850000.0
101+
Sample_4 200000.0 200000.0 200000.0 400000.0 400000.0
98102

99103

100104
Normalize from command line
101105
---------------------------
102106

103-
Often it is handy to do normalization from the command line::
107+
Normalization from the command line is also supported. To list available
108+
methods and general help::
109+
110+
rnanorm --help
111+
112+
Get info about a particular method, e.g., CPM::
113+
114+
rnanorm cpm --help
115+
116+
To normalize with CPM::
117+
118+
rnanorm cpm exp.csv --out exp_cpm.csv
119+
120+
File ``exp.csv`` needs to be comma separated file with genes in columns and
121+
samples in rows. Values should be raw counts. The output is saved to
122+
``exp_cpm.csv``. Example of input file::
123+
124+
cat exp.csv
125+
,Gene_1,Gene_2,Gene_3,Gene_4,Gene_5
126+
Sample_1,200,300,500,2000,7000
127+
Sample_2,400,600,1000,4000,14000
128+
Sample_3,200,300,500,2000,17000
129+
Sample_4,200,300,500,2000,2000
130+
131+
One can also provide input through standard input::
132+
133+
cat exp.csv | rnanorm cpm --out exp_cpm.csv
134+
135+
If file specified with ``--out`` already exists the command will fail. If you
136+
are sure that you wish to overwrite, use ``--force`` flag::
137+
138+
cat exp.csv | rnanorm cpm --force --out exp_cpm.csv
139+
140+
If no file is specified with ``--out`` parameter, output is printed to standard
141+
output::
142+
143+
cat exp.csv | rnanorm cpm > exp_cpm.csv
144+
145+
Methods TPM and FPKM require gene lengths. These can be provided either with GTF_
146+
file or with "gene lengths" file. The later is a two columns file. The first
147+
column should include the genes in the header of ``exp.csv`` and the second
148+
column should contain gene lengths computed by union exon model::
149+
150+
# Use GTF file
151+
rnanorm tpm exp.csv --gtf annotations.gtf > exp_out.csv
152+
# Use gene lengths file
153+
rnanorm tpm exp.csv --gene-lengths lenghts.csv > exp_out.csv
154+
155+
104156

105-
rnanorm fpkm exp.csv --gtf annotation.gtf --out exp_fpkm.csv
157+
.. _GTF: https://www.ensembl.org/info/website/upload/gff.html
106158

107159

108160
Contribute

docs/changelog.rst

Lines changed: 16 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -4,34 +4,25 @@ Change Log
44

55
All notable changes to this project are documented in this file.
66

7-
==========
8-
Unreleased
9-
==========
10-
11-
Added
12-
-----
13-
-
14-
15-
Fixed
16-
-----
17-
-
18-
19-
Changed
20-
-------
21-
-
227

238
==================
24-
0.0.1 - 2022-07-18
9+
2.0.0 - 2023-06-21
2510
==================
2611

2712
Added
2813
-----
29-
-
30-
31-
Fixed
32-
-----
33-
-
34-
35-
Changed
36-
-------
37-
-
14+
- Implementation of the following methods:
15+
16+
- CPM
17+
- FPKM
18+
- TPM
19+
- UQ
20+
- CUF
21+
- TMM
22+
- CTF
23+
24+
- Add a "toy" and GTEx dataset
25+
- Add command line interface for all of the above methods
26+
- Add tests
27+
- Support calculation of gene lengths from GTf or gene lengths file in TPM /
28+
FPKM

docs/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
# -- Project information -----------------------------------------------------
1212
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
1313

14-
project = "RNAseq normalization"
14+
project = "RNA-seq normalization"
1515
author = meta["Author"]
1616
release = meta["Version"]
1717
copyright = "2023, " + author

docs/contributing.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ Preparing release
6161

6262

6363
Describe the new features in ``changelog.rst``. Replace the Unreleased heading
64-
with the new version, followed by the release date (e.g.
64+
with the new version, followed by the release date (e.g.,
6565
``13.2.0 - 2018-10-23``).
6666

6767
Add the new dependencies to ``pyproject.toml`` and update the package version.

docs/guide.rst

Lines changed: 0 additions & 5 deletions
This file was deleted.

docs/index.rst

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,10 @@
1-
Welcome to RNAseq normalization's documentation!
2-
================================================
1+
Welcome to RNA-seq normalization's documentation!
2+
=================================================
33

44
.. toctree::
55
:maxdepth: 2
66
:caption: Contents:
77

8-
guide
98
ref
109
changelog
1110
contributing

docs/ref.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@ Normalization methods
1313
:nosignatures:
1414
:toctree: generated/
1515

16-
LibrarySize
1716
CPM
1817
FPKM
1918
TPM
@@ -32,5 +31,5 @@ Datasets
3231
:nosignatures:
3332
:toctree: generated/
3433

35-
datasets.load_rnaseq_toy
34+
datasets.load_toy_data
3635
datasets.load_gtex

pyproject.toml

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,15 +7,15 @@ build-backend = "setuptools.build_meta"
77

88
[project]
99
name = "rnanorm"
10-
description = "Common RNAseq normalization methods"
10+
description = "Common RNA-seq normalization methods"
1111
authors = [
1212
{name = "Genialis, Inc."},
1313
{email = "[email protected]"},
1414
]
1515
dynamic = ["version"]
1616
readme = "README.rst"
1717
license = {text = "Proprietary"}
18-
requires-python = ">=3.8, <3.11"
18+
requires-python = ">=3.8, <3.12"
1919
keywords = [
2020
"bio",
2121
"bioinformatics",
@@ -24,18 +24,22 @@ keywords = [
2424
"artificial intelligence",
2525
"python",
2626
"genialis",
27+
"rnaseq",
28+
"normalization",
2729
]
2830
classifiers = [
2931
"Development Status :: 4 - Beta",
3032
"Environment :: Console",
3133
"Intended Audience :: Developers",
3234
"Topic :: Software Development :: Libraries :: Python Modules",
35+
"License :: OSI Approved :: Apache Software License",
3336
"Operating System :: OS Independent",
3437
"Programming Language :: Python",
3538
"Programming Language :: Python :: 3",
3639
"Programming Language :: Python :: 3.8",
3740
"Programming Language :: Python :: 3.9",
3841
"Programming Language :: Python :: 3.10",
42+
"Programming Language :: Python :: 3.11",
3943
]
4044
dependencies = [
4145
"click",
@@ -74,7 +78,7 @@ rnanorm = "rnanorm.cli:main"
7478
[tool.setuptools_scm]
7579

7680
[tool.black]
77-
target-version = ["py38", "py39", "py310"]
81+
target-version = ["py38", "py39", "py310", "py311"]
7882
line-length = 99
7983

8084
[tool.isort]

src/rnanorm/annotation.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ def _gene_length(self, gtf_df: pd.DataFrame, gene_id_attr: str = "gene_id") -> p
7070
Group exon start & end coordinates by gene ID & chromosome &
7171
strand. Then perfrom merge and length calculation for each
7272
group separately. The latter is needed since ``gene_id_attr``
73-
is not unique in some annotations (e.g. RefSeq).
73+
is not unique in some annotations (e.g., RefSeq).
7474
"""
7575
gtf_df = gtf_df[gtf_df["feature_type"] == "exon"]
7676

0 commit comments

Comments
 (0)