-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
move data science tools wvu https://github.com/WinVector/wvu
- Loading branch information
Showing
37 changed files
with
1,434 additions
and
7,560 deletions.
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,258 +1,9 @@ | ||
[wvpy](https://github.com/WinVector/wvpy) is a simple | ||
set of utilities for teaching data science and machine learning methods. | ||
They are not replacements for the obvious methods in sklearn. | ||
|
||
Some notes on the Jupyter sheet runner can be found [here](https://win-vector.com/2022/08/20/an-effective-personal-jupyter-data-science-workflow/) | ||
wvpy tools for converting Jupyter notebooks to and from Python files. | ||
|
||
Text and video tutotials here: [https://win-vector.com/2022/08/20/an-effective-personal-jupyter-data-science-workflow/](https://win-vector.com/2022/08/20/an-effective-personal-jupyter-data-science-workflow/). | ||
|
||
```python | ||
import numpy.random | ||
import pandas | ||
import wvpy.util | ||
Many of the data science functions have been moved to wvu [https://github.com/WinVector/wvu](https://win-vector.com/2022/08/20/an-effective-personal-jupyter-data-science-workflow/). | ||
|
||
wvpy.__version__ | ||
``` | ||
|
||
|
||
|
||
|
||
'0.2.7' | ||
|
||
|
||
|
||
Illustration of cross-method plan. | ||
|
||
|
||
```python | ||
wvpy.util.mk_cross_plan(10,2) | ||
``` | ||
|
||
|
||
|
||
|
||
[{'train': [1, 2, 3, 4, 9], 'test': [0, 5, 6, 7, 8]}, | ||
{'train': [0, 5, 6, 7, 8], 'test': [1, 2, 3, 4, 9]}] | ||
|
||
|
||
|
||
Plotting example | ||
|
||
|
||
```python | ||
help(wvpy.util.plot_roc) | ||
``` | ||
|
||
Help on function plot_roc in module wvpy.util: | ||
|
||
plot_roc(prediction, istrue, title='Receiver operating characteristic plot', *, truth_target=True, ideal_line_color=None, extra_points=None, show=True) | ||
Plot a ROC curve of numeric prediction against boolean istrue. | ||
:param prediction: column of numeric predictions | ||
:param istrue: column of items to predict | ||
:param title: plot title | ||
:param truth_target: value to consider target or true. | ||
:param ideal_line_color: if not None, color of ideal line | ||
:param extra_points: data frame of additional point to annotate graph, columns fpr, tpr, label | ||
:param show: logical, if True call matplotlib.pyplot.show() | ||
:return: calculated area under the curve, plot produced by call. | ||
Example: | ||
import pandas | ||
import wvpy.util | ||
d = pandas.DataFrame({ | ||
'x': [1, 2, 3, 4, 5], | ||
'y': [False, False, True, True, False] | ||
}) | ||
wvpy.util.plot_roc( | ||
prediction=d['x'], | ||
istrue=d['y'], | ||
ideal_line_color='lightgrey' | ||
) | ||
wvpy.util.plot_roc( | ||
prediction=d['x'], | ||
istrue=d['y'], | ||
extra_points=pandas.DataFrame({ | ||
'tpr': [0, 1], | ||
'fpr': [0, 1], | ||
'label': ['AAA', 'BBB'] | ||
}) | ||
) | ||
|
||
|
||
|
||
|
||
```python | ||
d = pandas.concat([ | ||
pandas.DataFrame({ | ||
'x': numpy.random.normal(size=1000), | ||
'y': numpy.random.choice([True, False], | ||
p=(0.02, 0.98), | ||
size=1000, | ||
replace=True)}), | ||
pandas.DataFrame({ | ||
'x': numpy.random.normal(size=200) + 5, | ||
'y': numpy.random.choice([True, False], | ||
size=200, | ||
replace=True)}), | ||
]) | ||
``` | ||
|
||
|
||
```python | ||
wvpy.util.plot_roc( | ||
prediction=d.x, | ||
istrue=d.y, | ||
ideal_line_color="DarkGrey", | ||
title='Example ROC plot') | ||
``` | ||
|
||
|
||
<Figure size 432x288 with 0 Axes> | ||
|
||
|
||
|
||
|
||
![png](output_7_1.png) | ||
|
||
|
||
|
||
|
||
|
||
|
||
0.903298366883511 | ||
|
||
|
||
|
||
|
||
```python | ||
help(wvpy.util.threshold_plot) | ||
``` | ||
|
||
Help on function threshold_plot in module wvpy.util: | ||
|
||
threshold_plot(d: pandas.core.frame.DataFrame, pred_var, truth_var, truth_target=True, threshold_range=(-inf, inf), plotvars=('precision', 'recall'), title='Measures as a function of threshold', *, show=True) | ||
Produce multiple facet plot relating the performance of using a threshold greater than or equal to | ||
different values at predicting a truth target. | ||
:param d: pandas.DataFrame to plot | ||
:param pred_var: name of column of numeric predictions | ||
:param truth_var: name of column with reference truth | ||
:param truth_target: value considered true | ||
:param threshold_range: x-axis range to plot | ||
:param plotvars: list of metrics to plot, must come from ['threshold', 'count', 'fraction', 'precision', | ||
'true_positive_rate', 'false_positive_rate', 'true_negative_rate', 'false_negative_rate', | ||
'recall', 'sensitivity', 'specificity'] | ||
:param title: title for plot | ||
:param show: logical, if True call matplotlib.pyplot.show() | ||
:return: None, plot produced as a side effect | ||
Example: | ||
import pandas | ||
import wvpy.util | ||
d = pandas.DataFrame({ | ||
'x': [1, 2, 3, 4, 5], | ||
'y': [False, False, True, True, False] | ||
}) | ||
wvpy.util.threshold_plot( | ||
d, | ||
pred_var='x', | ||
truth_var='y', | ||
plotvars=("sensitivity", "specificity"), | ||
) | ||
|
||
|
||
|
||
|
||
```python | ||
wvpy.util.threshold_plot( | ||
d, | ||
pred_var='x', | ||
truth_var='y', | ||
plotvars=("sensitivity", "specificity"), | ||
title = "example plot" | ||
) | ||
``` | ||
|
||
|
||
|
||
![png](output_9_0.png) | ||
|
||
|
||
|
||
|
||
```python | ||
|
||
wvpy.util.threshold_plot( | ||
d, | ||
pred_var='x', | ||
truth_var='y', | ||
plotvars=("precision", "recall"), | ||
title = "example plot" | ||
) | ||
``` | ||
|
||
|
||
|
||
![png](output_10_0.png) | ||
|
||
|
||
|
||
|
||
```python | ||
help(wvpy.util.gain_curve_plot) | ||
``` | ||
|
||
Help on function gain_curve_plot in module wvpy.util: | ||
|
||
gain_curve_plot(prediction, outcome, title='Gain curve plot', *, show=True) | ||
plot cumulative outcome as a function of prediction order (descending) | ||
:param prediction: vector of numeric predictions | ||
:param outcome: vector of actual values | ||
:param title: plot title | ||
:param show: logical, if True call matplotlib.pyplot.show() | ||
:return: None | ||
|
||
|
||
|
||
|
||
```python | ||
wvpy.util.gain_curve_plot( | ||
prediction=d['x'], | ||
outcome=d['y'], | ||
title = "gain curve plot" | ||
) | ||
``` | ||
|
||
|
||
|
||
![png](output_12_0.png) | ||
|
||
|
||
|
||
|
||
```python | ||
wvpy.util.lift_curve_plot( | ||
prediction=d['x'], | ||
outcome=d['y'], | ||
title = "lift curve plot" | ||
) | ||
``` | ||
|
||
|
||
|
||
![png](output_13_0.png) | ||
|
||
|
||
|
||
|
||
```python | ||
|
||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,34 +1,20 @@ | ||
============================= test session starts ============================== | ||
platform darwin -- Python 3.9.7, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 | ||
platform darwin -- Python 3.9.12, pytest-7.1.1, pluggy-1.0.0 | ||
rootdir: /Users/johnmount/Documents/work/wvpy/pkg | ||
plugins: anyio-3.5.0, cov-3.0.0 | ||
collected 20 items | ||
collected 4 items | ||
|
||
tests/test_cross_plan1.py . [ 5%] | ||
tests/test_cross_predict.py .. [ 15%] | ||
tests/test_deviance_calc.py . [ 20%] | ||
tests/test_eval_fn_pre_row.py . [ 25%] | ||
tests/test_match_auc.py . [ 30%] | ||
tests/test_nb_fns.py .... [ 50%] | ||
tests/test_onehot.py .. [ 60%] | ||
tests/test_perm_score_vars.py . [ 65%] | ||
tests/test_plots.py . [ 70%] | ||
tests/test_se.py . [ 75%] | ||
tests/test_search_grid.py .. [ 85%] | ||
tests/test_stats1.py . [ 90%] | ||
tests/test_threshold_stats.py . [ 95%] | ||
tests/test_typs_in_frame.py . [100%] | ||
tests/test_nb_fns.py .... [100%] | ||
|
||
---------- coverage: platform darwin, python 3.9.7-final-0 ----------- | ||
---------- coverage: platform darwin, python 3.9.12-final-0 ---------- | ||
Name Stmts Miss Cover | ||
--------------------------------------------- | ||
wvpy/__init__.py 3 0 100% | ||
wvpy/jtools.py 206 76 63% | ||
wvpy/pysheet.py 99 99 0% | ||
wvpy/render_workbook.py 54 54 0% | ||
wvpy/util.py 321 7 98% | ||
--------------------------------------------- | ||
TOTAL 683 236 65% | ||
TOTAL 362 229 37% | ||
|
||
|
||
============================= 20 passed in 12.71s ============================== | ||
============================== 4 passed in 8.92s =============================== |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,9 @@ | ||
Win Vector LLC tools for doing and teaching data science in Python 3 | ||
Win Vector LLC tools for converting Python Jupyter to and from Python source files | ||
https://github.com/WinVector/wvpy | ||
|
||
Some notes can be found here: https://github.com/WinVector/wvpy | ||
and here: https://win-vector.com/2022/08/20/an-effective-personal-jupyter-data-science-workflow/ | ||
|
||
Many of the data science functions have been moved to wvu https://github.com/WinVector/wvu | ||
|
||
|
Oops, something went wrong.