Skip to content

Releases: akanz1/klib

v0.2.3

26 Dec 15:47

Choose a tag to compare

What's Changed

  • using poetry for environment and build/publish
  • added some tests
  • restructured package

Full Changelog: v0.2.2...v0.2.3

v0.2.2

17 Dec 15:42

Choose a tag to compare

Full Changelog: v0.2.1...v0.2.2

Fixes a bug in missingvalue plot (y-axis not displaying very small ratios of missing values). Thanks for pointing this out @Abermal

v0.2.1

28 Nov 11:35
42165c1

Choose a tag to compare

What's Changed

  • Update dependencies for python 3.10 by @akanz1 in #11

Full Changelog: v0.2.0...v0.2.1

v0.2.0

23 Aug 16:32

Choose a tag to compare

Changelog:

This release comes with several small fixes and improvements to the code quality.

v0.1.5

17 Jan 15:10

Choose a tag to compare

Changelog:

Changes

Update dist_plot()

  • Update the implementation of dist_plot() to be compatible with the latest version of seaborn (0.11.1). The old implementation is deprecated and will be removed in future versions.
  • Introduce sampling for large datasets (10k rows) what significantly speeds up plotting. Summary statistics continue to be based on the entire dataset, however, the figures use 10000 randomly sampled points.
  • Minor cosmetic changes

Several fixes and code quality improvements

v0.1.2

05 Nov 09:14

Choose a tag to compare

Adjustments & Fixes:

  • clean_column_names: adding additional cases to column name cleaning
  • data_cleaning: update the printout format, especially for large datasets with many duplicate rows
  • update and improve docstrings, code formatting and clarity

v0.1.1

07 Aug 14:43

Choose a tag to compare

Adjustments & Fixes:

  • dist_plot: avoid running into an error when the dataframe includes a binary columns

  • dist_plot: update the colors and slightly improve runtime

  • cat_plot: fixed hard coded colors in the heatmap of cat_plot

klib v0.1.0

06 Aug 05:16

Choose a tag to compare

v0.0.91

01 Aug 18:05

Choose a tag to compare

Changelog:

Additions

  • clean_column_names():
    Cleans the column names of the provided Pandas Dataframe and optionally provides hints on duplicate and long column names. This functionality is also added to data_cleaning() by default.

Changes

  • small fixes and refinements
    Revert from split = {None, 'pos', 'neg', 'above', 'below'} to split = {None, 'pos', 'neg', 'high', 'low'} for all correlation functions.

  • increase test coverage

  • update docstrings:
    Several updates to docstrings to improve clarity and conform with numpy style.

  • black formatting:
    Format the entire codebase with black.

v0.0.86

20 Jun 18:00

Choose a tag to compare

Changelog:

Changes

  • data_cleaning():

    • Changed the default setting to do a shallow instead of a deep analysis of memory_usage.
    • Lowers function runtime compared to the previous version by about 70% - 80%!
  • missingval_plot():

    • Minor changes to font size and spacing to accommodate very large datasets (40+ cols)
  • update docstrings:

    • Several updates to the readme, to the examples as well as to docstrings to improve clarity and formatting.