Releases: reneshbedre/bioinfokit
Releases · reneshbedre/bioinfokit
Bioinformatics data analysis and visualization toolkit
analys.gff.gff_to_gtffunction updated to handle dot value for phase in CDS features- `Breast Cancer Wisconsin (Diagnostic) Data Set added
visuz.stat.rocfunction added for visualizing the ROCbartlettandlevenefunction added toanalys.statclass for checking the ANOVA assumptions
for datasets in stacked formattukey_hsdfunction updated for grouping order- Pandas series added as input for
fasta.extract_seqfunction extract_seqfunction moved tofastaclassextract_seqfunction deprecated fromanalys- visualization for single and multiple statistical bar charts updated for future releases
- Tukey HSD test updated for interaction effect. Pairwise comparison for interaction effect can be calculated.
gff_to_gtffunction updated for the GFF3 file for non-coding RNA transcripts. GFF3 files with non-coding transcripts
(e.g. from miRBase GFF3) can be converted to GTF- genFam enrichment analysis function added (
bioinfokit.analys.genfam.fam_enrich) - genfam test added
- Tukey HSD test added to perform multiple pairwise comparisons (
bioinfokit.analys.stat.tukey_hsd) - new option
mrna_feature_nameadded inanalys.gff.gff_to_gtfif the name of the feature (column 3 of GFF3 file) of
protein coding mRNA is other than 'mRNA' or 'transcript' (e.g. some GFF3 file has this feature named as
protein_coding_gene ) dimoption added tovisuz.cluster.screeplot,visuz.cluster.pcaplotandvisuz.cluster.biplotto control the
figure sizeseqcovmoved tofastqclasssra_dbfunction added underfastqclass for batch download of FASTQ files
from NCBI SRA database- In t-test, the one sample t and paired t-test added
- Two sample t-test switched to class based method
- t-test function name changed to
ttestfromttsam - programmatic access to chi-squared independence test dataset added
- boxplot removed from t-test
- 'adjustText' module added in
setup.py(issue #12) - In chi-squared test, the sum of probabilities is rounded to 10 for exact sum in case of floats
- chi-squared goodness of fit test added under the
stat.chisq - chi-squared independence test updated for output as class attributes and mosaic plot removed
mergevcfrenamed toconcatvcfto keep with conventional naming (issue # 9)- programmatic access to chi-squared independence test dataset added
marker.vcf_anotfunction updated for tab-delimited text output- The error message for volcano, inverted volcano, and MA plot updated
when there are no significant or non-significant genes (issue # 7) - The
vcf_anotfunction output updated for strand information - The manhatten plot updated to add the lables in sorted order for numerical strings
- The manhatten plot updated to add figname option
- TPM normalization function added
Bioinformatics data analysis and visualization toolkit
v0.9 has the following updates and changes (July 28, 2020)
- gene expression raw count normalization class added as 'analys.norm'
- CPM and RPKM normalization function added under 'analys.norm' class
- Sugarcane gene expression dataset added (Bedre et al., 2019)
- In
volcano, 'ma', andinvolcanoplots, checks for lfc_thr, counts, and pv_thr added - legend labels, position, and figname parameters added in
volcanoplot - utility to check the non-numeric values added for
ma,volcanoandinvolcano - plotlegend parameter added to
ma - the parameter for log fold change threshold lines added in
maplot - legend labels, position, and figname parameters added in
maplot tsneplotadded for t-SNE visualization- in
bardotdrop NA value function added to ignore missing values to plot dots - scRNA-seq dataset added (PBMC and Arabidopsis root cells)
fasta_readerandrev_commoved to newly createdfastaclasstsneplotandvcf_anotinitialized for future release- more parameters added in
biplot(cluster coloring, datapoints) fignameadded inhmapmafunction updated for absolute expression countssvgfigures addedpcafunction will be deprecated in future release- 2D and 3D loadings plot, biplot and scree plot functions added under the
clusterclass for PCA - programmatic access to iris and cotton dataset added
pcafunction will be deprecated in future release
Bioinformatics data analysis and visualization toolkit
v0.8 has the following updates and changes
- GFF3 to GTF file conversion utility added and updated under class
gff - In Manhatten plot (
visuz.marker.mhat), the labeling issue withmarkernamesparameter corrected (see issue # 4 on GitHub for details;) gstyleparameter added in Manhatten plot for box style annotationsplitvcffunction added for splitting VCF file into individual VCF files for each chromosomemergevcfmoved toanalys.markerclassreg_linfunction updated for multiple regression- degree of freedom fixed for t-test for regression coefficients
- VIF calculation for MLR updated
- functions
fastq_readerandfqreadcountermoved tofastqclass
Bioinformatics data analysis and visualization toolkit
v0.7 has the following updates and changes
split_fastqfunction added for splitting individual (left and right) paired-end fastq files from single
interleaved paired-end file- GFF3 to GTF file conversion utility added under class
gff - two-sample and Welch's t-test updated for CI and alpha parameter added
- module termcolor removed
- Programmatic access of dataset for
ttsamadded
Bioinformatics data analysis and visualization toolkit
v0.6 has the following updates and changes
- Programmatic access of dataset added (class
get_data) - More features for figures added (
figtype,axtickfontsize,axtickfontname,axxlabel,axylabel,xlm,ylm,
yerrlw,yerrcw) - In volcano plot, the typo for xlabel corrected (-log2(FoldChange) to log2(FoldChange))
helpwill be deprecated in future release- VIF calculation for MLR updated
- adjustText removed
Bioinformatics data analysis and visualization toolkit
v0.5
v0.5 has the following updates and changes
- Linear regression analysis added in
analys.statclass volcano,involcano,maandheatmapfunctions moved to newvisuz.gen_expclass- In
volcano, parameters for new box type labeling and threshold grid lines added corr_matupdated for new colormaps and moved to stat class- To visualize the graph in the console itself (e.g. Jupyter notebook), show parameter added
- Pandas dataframe input added for
volcano,involcano,corr_mat,ma,ttsam, andchisq ttsamandchisqmoved toanalys.statclass- graph control parameters added for
volcano,involcano,ma, andheatmap - documentation can also be accessed at https://reneshbedre.github.io/blog/howtoinstall.html
helpwill be deprecated in a future release- fixed the NumPy bug in
visuz.stat.bardot. Theintcast added to generate the number of samples, which does not accept
float (See details of NumPy bug: numpy/numpy#15345)
Bioinformatics data analysis and visualization toolkit
v0.4 has the following updates and changes
function analyis.format.fq_qual_var() added for detecting the FASTQ quality encoding format
help module added command-line help message
class fastq added for FASTQ related functions
Bioinformatics data analysis and visualization toolkit
v0.3 has the following updates and changes
- bar-dot plot function added
- command-line help message class added