Skip to content

Commit f55ebc5

Browse files
committed
move data science tools wvu https://github.com/WinVector/wvu
1 parent 141a1e8 commit f55ebc5

37 files changed

+1434
-7560
lines changed

README.ipynb

Lines changed: 0 additions & 436 deletions
This file was deleted.

README.md

Lines changed: 3 additions & 252 deletions
Original file line numberDiff line numberDiff line change
@@ -1,258 +1,9 @@
1-
[wvpy](https://github.com/WinVector/wvpy) is a simple
2-
set of utilities for teaching data science and machine learning methods.
3-
They are not replacements for the obvious methods in sklearn.
41

5-
Some notes on the Jupyter sheet runner can be found [here](https://win-vector.com/2022/08/20/an-effective-personal-jupyter-data-science-workflow/)
2+
wvpy tools for converting Jupyter notebooks to and from Python files.
63

4+
Text and video tutotials here: [https://win-vector.com/2022/08/20/an-effective-personal-jupyter-data-science-workflow/](https://win-vector.com/2022/08/20/an-effective-personal-jupyter-data-science-workflow/).
75

8-
```python
9-
import numpy.random
10-
import pandas
11-
import wvpy.util
6+
Many of the data science functions have been moved to wvu [https://github.com/WinVector/wvu](https://win-vector.com/2022/08/20/an-effective-personal-jupyter-data-science-workflow/).
127

13-
wvpy.__version__
14-
```
158

169

17-
18-
19-
'0.2.7'
20-
21-
22-
23-
Illustration of cross-method plan.
24-
25-
26-
```python
27-
wvpy.util.mk_cross_plan(10,2)
28-
```
29-
30-
31-
32-
33-
[{'train': [1, 2, 3, 4, 9], 'test': [0, 5, 6, 7, 8]},
34-
{'train': [0, 5, 6, 7, 8], 'test': [1, 2, 3, 4, 9]}]
35-
36-
37-
38-
Plotting example
39-
40-
41-
```python
42-
help(wvpy.util.plot_roc)
43-
```
44-
45-
Help on function plot_roc in module wvpy.util:
46-
47-
plot_roc(prediction, istrue, title='Receiver operating characteristic plot', *, truth_target=True, ideal_line_color=None, extra_points=None, show=True)
48-
Plot a ROC curve of numeric prediction against boolean istrue.
49-
50-
:param prediction: column of numeric predictions
51-
:param istrue: column of items to predict
52-
:param title: plot title
53-
:param truth_target: value to consider target or true.
54-
:param ideal_line_color: if not None, color of ideal line
55-
:param extra_points: data frame of additional point to annotate graph, columns fpr, tpr, label
56-
:param show: logical, if True call matplotlib.pyplot.show()
57-
:return: calculated area under the curve, plot produced by call.
58-
59-
Example:
60-
61-
import pandas
62-
import wvpy.util
63-
64-
d = pandas.DataFrame({
65-
'x': [1, 2, 3, 4, 5],
66-
'y': [False, False, True, True, False]
67-
})
68-
69-
wvpy.util.plot_roc(
70-
prediction=d['x'],
71-
istrue=d['y'],
72-
ideal_line_color='lightgrey'
73-
)
74-
75-
wvpy.util.plot_roc(
76-
prediction=d['x'],
77-
istrue=d['y'],
78-
extra_points=pandas.DataFrame({
79-
'tpr': [0, 1],
80-
'fpr': [0, 1],
81-
'label': ['AAA', 'BBB']
82-
})
83-
)
84-
85-
86-
87-
88-
```python
89-
d = pandas.concat([
90-
pandas.DataFrame({
91-
'x': numpy.random.normal(size=1000),
92-
'y': numpy.random.choice([True, False],
93-
p=(0.02, 0.98),
94-
size=1000,
95-
replace=True)}),
96-
pandas.DataFrame({
97-
'x': numpy.random.normal(size=200) + 5,
98-
'y': numpy.random.choice([True, False],
99-
size=200,
100-
replace=True)}),
101-
])
102-
```
103-
104-
105-
```python
106-
wvpy.util.plot_roc(
107-
prediction=d.x,
108-
istrue=d.y,
109-
ideal_line_color="DarkGrey",
110-
title='Example ROC plot')
111-
```
112-
113-
114-
<Figure size 432x288 with 0 Axes>
115-
116-
117-
118-
119-
![png](output_7_1.png)
120-
121-
122-
123-
124-
125-
126-
0.903298366883511
127-
128-
129-
130-
131-
```python
132-
help(wvpy.util.threshold_plot)
133-
```
134-
135-
Help on function threshold_plot in module wvpy.util:
136-
137-
threshold_plot(d: pandas.core.frame.DataFrame, pred_var, truth_var, truth_target=True, threshold_range=(-inf, inf), plotvars=('precision', 'recall'), title='Measures as a function of threshold', *, show=True)
138-
Produce multiple facet plot relating the performance of using a threshold greater than or equal to
139-
different values at predicting a truth target.
140-
141-
:param d: pandas.DataFrame to plot
142-
:param pred_var: name of column of numeric predictions
143-
:param truth_var: name of column with reference truth
144-
:param truth_target: value considered true
145-
:param threshold_range: x-axis range to plot
146-
:param plotvars: list of metrics to plot, must come from ['threshold', 'count', 'fraction', 'precision',
147-
'true_positive_rate', 'false_positive_rate', 'true_negative_rate', 'false_negative_rate',
148-
'recall', 'sensitivity', 'specificity']
149-
:param title: title for plot
150-
:param show: logical, if True call matplotlib.pyplot.show()
151-
:return: None, plot produced as a side effect
152-
153-
Example:
154-
155-
import pandas
156-
import wvpy.util
157-
158-
d = pandas.DataFrame({
159-
'x': [1, 2, 3, 4, 5],
160-
'y': [False, False, True, True, False]
161-
})
162-
163-
wvpy.util.threshold_plot(
164-
d,
165-
pred_var='x',
166-
truth_var='y',
167-
plotvars=("sensitivity", "specificity"),
168-
)
169-
170-
171-
172-
173-
```python
174-
wvpy.util.threshold_plot(
175-
d,
176-
pred_var='x',
177-
truth_var='y',
178-
plotvars=("sensitivity", "specificity"),
179-
title = "example plot"
180-
)
181-
```
182-
183-
184-
185-
![png](output_9_0.png)
186-
187-
188-
189-
190-
```python
191-
192-
wvpy.util.threshold_plot(
193-
d,
194-
pred_var='x',
195-
truth_var='y',
196-
plotvars=("precision", "recall"),
197-
title = "example plot"
198-
)
199-
```
200-
201-
202-
203-
![png](output_10_0.png)
204-
205-
206-
207-
208-
```python
209-
help(wvpy.util.gain_curve_plot)
210-
```
211-
212-
Help on function gain_curve_plot in module wvpy.util:
213-
214-
gain_curve_plot(prediction, outcome, title='Gain curve plot', *, show=True)
215-
plot cumulative outcome as a function of prediction order (descending)
216-
217-
:param prediction: vector of numeric predictions
218-
:param outcome: vector of actual values
219-
:param title: plot title
220-
:param show: logical, if True call matplotlib.pyplot.show()
221-
:return: None
222-
223-
224-
225-
226-
```python
227-
wvpy.util.gain_curve_plot(
228-
prediction=d['x'],
229-
outcome=d['y'],
230-
title = "gain curve plot"
231-
)
232-
```
233-
234-
235-
236-
![png](output_12_0.png)
237-
238-
239-
240-
241-
```python
242-
wvpy.util.lift_curve_plot(
243-
prediction=d['x'],
244-
outcome=d['y'],
245-
title = "lift curve plot"
246-
)
247-
```
248-
249-
250-
251-
![png](output_13_0.png)
252-
253-
254-
255-
256-
```python
257-
258-
```

coverage.txt

Lines changed: 6 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,20 @@
11
============================= test session starts ==============================
2-
platform darwin -- Python 3.9.7, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
2+
platform darwin -- Python 3.9.12, pytest-7.1.1, pluggy-1.0.0
33
rootdir: /Users/johnmount/Documents/work/wvpy/pkg
44
plugins: anyio-3.5.0, cov-3.0.0
5-
collected 20 items
5+
collected 4 items
66

7-
tests/test_cross_plan1.py . [ 5%]
8-
tests/test_cross_predict.py .. [ 15%]
9-
tests/test_deviance_calc.py . [ 20%]
10-
tests/test_eval_fn_pre_row.py . [ 25%]
11-
tests/test_match_auc.py . [ 30%]
12-
tests/test_nb_fns.py .... [ 50%]
13-
tests/test_onehot.py .. [ 60%]
14-
tests/test_perm_score_vars.py . [ 65%]
15-
tests/test_plots.py . [ 70%]
16-
tests/test_se.py . [ 75%]
17-
tests/test_search_grid.py .. [ 85%]
18-
tests/test_stats1.py . [ 90%]
19-
tests/test_threshold_stats.py . [ 95%]
20-
tests/test_typs_in_frame.py . [100%]
7+
tests/test_nb_fns.py .... [100%]
218

22-
---------- coverage: platform darwin, python 3.9.7-final-0 -----------
9+
---------- coverage: platform darwin, python 3.9.12-final-0 ----------
2310
Name Stmts Miss Cover
2411
---------------------------------------------
2512
wvpy/__init__.py 3 0 100%
2613
wvpy/jtools.py 206 76 63%
2714
wvpy/pysheet.py 99 99 0%
2815
wvpy/render_workbook.py 54 54 0%
29-
wvpy/util.py 321 7 98%
3016
---------------------------------------------
31-
TOTAL 683 236 65%
17+
TOTAL 362 229 37%
3218

3319

34-
============================= 20 passed in 12.71s ==============================
20+
============================== 4 passed in 8.92s ===============================

output_10_0.png

-18.1 KB
Binary file not shown.

output_12_0.png

-17.2 KB
Binary file not shown.

output_13_0.png

-10.9 KB
Binary file not shown.

output_7_1.png

-15.9 KB
Binary file not shown.

output_9_0.png

-17.5 KB
Binary file not shown.

pkg/README.txt

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
1-
Win Vector LLC tools for doing and teaching data science in Python 3
1+
Win Vector LLC tools for converting Python Jupyter to and from Python source files
22
https://github.com/WinVector/wvpy
33

44
Some notes can be found here: https://github.com/WinVector/wvpy
55
and here: https://win-vector.com/2022/08/20/an-effective-personal-jupyter-data-science-workflow/
66

7+
Many of the data science functions have been moved to wvu https://github.com/WinVector/wvu
8+
79

0 commit comments

Comments
 (0)