|
1 | | -[wvpy](https://github.com/WinVector/wvpy) is a simple |
2 | | -set of utilities for teaching data science and machine learning methods. |
3 | | -They are not replacements for the obvious methods in sklearn. |
4 | 1 |
|
5 | | -Some notes on the Jupyter sheet runner can be found [here](https://win-vector.com/2022/08/20/an-effective-personal-jupyter-data-science-workflow/) |
| 2 | +wvpy tools for converting Jupyter notebooks to and from Python files. |
6 | 3 |
|
| 4 | +Text and video tutotials here: [https://win-vector.com/2022/08/20/an-effective-personal-jupyter-data-science-workflow/](https://win-vector.com/2022/08/20/an-effective-personal-jupyter-data-science-workflow/). |
7 | 5 |
|
8 | | -```python |
9 | | -import numpy.random |
10 | | -import pandas |
11 | | -import wvpy.util |
| 6 | +Many of the data science functions have been moved to wvu [https://github.com/WinVector/wvu](https://win-vector.com/2022/08/20/an-effective-personal-jupyter-data-science-workflow/). |
12 | 7 |
|
13 | | -wvpy.__version__ |
14 | | -``` |
15 | 8 |
|
16 | 9 |
|
17 | | - |
18 | | - |
19 | | - '0.2.7' |
20 | | - |
21 | | - |
22 | | - |
23 | | -Illustration of cross-method plan. |
24 | | - |
25 | | - |
26 | | -```python |
27 | | -wvpy.util.mk_cross_plan(10,2) |
28 | | -``` |
29 | | - |
30 | | - |
31 | | - |
32 | | - |
33 | | - [{'train': [1, 2, 3, 4, 9], 'test': [0, 5, 6, 7, 8]}, |
34 | | - {'train': [0, 5, 6, 7, 8], 'test': [1, 2, 3, 4, 9]}] |
35 | | - |
36 | | - |
37 | | - |
38 | | -Plotting example |
39 | | - |
40 | | - |
41 | | -```python |
42 | | -help(wvpy.util.plot_roc) |
43 | | -``` |
44 | | - |
45 | | - Help on function plot_roc in module wvpy.util: |
46 | | - |
47 | | - plot_roc(prediction, istrue, title='Receiver operating characteristic plot', *, truth_target=True, ideal_line_color=None, extra_points=None, show=True) |
48 | | - Plot a ROC curve of numeric prediction against boolean istrue. |
49 | | - |
50 | | - :param prediction: column of numeric predictions |
51 | | - :param istrue: column of items to predict |
52 | | - :param title: plot title |
53 | | - :param truth_target: value to consider target or true. |
54 | | - :param ideal_line_color: if not None, color of ideal line |
55 | | - :param extra_points: data frame of additional point to annotate graph, columns fpr, tpr, label |
56 | | - :param show: logical, if True call matplotlib.pyplot.show() |
57 | | - :return: calculated area under the curve, plot produced by call. |
58 | | - |
59 | | - Example: |
60 | | - |
61 | | - import pandas |
62 | | - import wvpy.util |
63 | | - |
64 | | - d = pandas.DataFrame({ |
65 | | - 'x': [1, 2, 3, 4, 5], |
66 | | - 'y': [False, False, True, True, False] |
67 | | - }) |
68 | | - |
69 | | - wvpy.util.plot_roc( |
70 | | - prediction=d['x'], |
71 | | - istrue=d['y'], |
72 | | - ideal_line_color='lightgrey' |
73 | | - ) |
74 | | - |
75 | | - wvpy.util.plot_roc( |
76 | | - prediction=d['x'], |
77 | | - istrue=d['y'], |
78 | | - extra_points=pandas.DataFrame({ |
79 | | - 'tpr': [0, 1], |
80 | | - 'fpr': [0, 1], |
81 | | - 'label': ['AAA', 'BBB'] |
82 | | - }) |
83 | | - ) |
84 | | - |
85 | | - |
86 | | - |
87 | | - |
88 | | -```python |
89 | | -d = pandas.concat([ |
90 | | - pandas.DataFrame({ |
91 | | - 'x': numpy.random.normal(size=1000), |
92 | | - 'y': numpy.random.choice([True, False], |
93 | | - p=(0.02, 0.98), |
94 | | - size=1000, |
95 | | - replace=True)}), |
96 | | - pandas.DataFrame({ |
97 | | - 'x': numpy.random.normal(size=200) + 5, |
98 | | - 'y': numpy.random.choice([True, False], |
99 | | - size=200, |
100 | | - replace=True)}), |
101 | | -]) |
102 | | -``` |
103 | | - |
104 | | - |
105 | | -```python |
106 | | -wvpy.util.plot_roc( |
107 | | - prediction=d.x, |
108 | | - istrue=d.y, |
109 | | - ideal_line_color="DarkGrey", |
110 | | - title='Example ROC plot') |
111 | | -``` |
112 | | - |
113 | | - |
114 | | - <Figure size 432x288 with 0 Axes> |
115 | | - |
116 | | - |
117 | | - |
118 | | - |
119 | | - |
120 | | - |
121 | | - |
122 | | - |
123 | | - |
124 | | - |
125 | | - |
126 | | - 0.903298366883511 |
127 | | - |
128 | | - |
129 | | - |
130 | | - |
131 | | -```python |
132 | | -help(wvpy.util.threshold_plot) |
133 | | -``` |
134 | | - |
135 | | - Help on function threshold_plot in module wvpy.util: |
136 | | - |
137 | | - threshold_plot(d: pandas.core.frame.DataFrame, pred_var, truth_var, truth_target=True, threshold_range=(-inf, inf), plotvars=('precision', 'recall'), title='Measures as a function of threshold', *, show=True) |
138 | | - Produce multiple facet plot relating the performance of using a threshold greater than or equal to |
139 | | - different values at predicting a truth target. |
140 | | - |
141 | | - :param d: pandas.DataFrame to plot |
142 | | - :param pred_var: name of column of numeric predictions |
143 | | - :param truth_var: name of column with reference truth |
144 | | - :param truth_target: value considered true |
145 | | - :param threshold_range: x-axis range to plot |
146 | | - :param plotvars: list of metrics to plot, must come from ['threshold', 'count', 'fraction', 'precision', |
147 | | - 'true_positive_rate', 'false_positive_rate', 'true_negative_rate', 'false_negative_rate', |
148 | | - 'recall', 'sensitivity', 'specificity'] |
149 | | - :param title: title for plot |
150 | | - :param show: logical, if True call matplotlib.pyplot.show() |
151 | | - :return: None, plot produced as a side effect |
152 | | - |
153 | | - Example: |
154 | | - |
155 | | - import pandas |
156 | | - import wvpy.util |
157 | | - |
158 | | - d = pandas.DataFrame({ |
159 | | - 'x': [1, 2, 3, 4, 5], |
160 | | - 'y': [False, False, True, True, False] |
161 | | - }) |
162 | | - |
163 | | - wvpy.util.threshold_plot( |
164 | | - d, |
165 | | - pred_var='x', |
166 | | - truth_var='y', |
167 | | - plotvars=("sensitivity", "specificity"), |
168 | | - ) |
169 | | - |
170 | | - |
171 | | - |
172 | | - |
173 | | -```python |
174 | | -wvpy.util.threshold_plot( |
175 | | - d, |
176 | | - pred_var='x', |
177 | | - truth_var='y', |
178 | | - plotvars=("sensitivity", "specificity"), |
179 | | - title = "example plot" |
180 | | - ) |
181 | | -``` |
182 | | - |
183 | | - |
184 | | - |
185 | | - |
186 | | - |
187 | | - |
188 | | - |
189 | | - |
190 | | -```python |
191 | | - |
192 | | -wvpy.util.threshold_plot( |
193 | | - d, |
194 | | - pred_var='x', |
195 | | - truth_var='y', |
196 | | - plotvars=("precision", "recall"), |
197 | | - title = "example plot" |
198 | | - ) |
199 | | -``` |
200 | | - |
201 | | - |
202 | | - |
203 | | - |
204 | | - |
205 | | - |
206 | | - |
207 | | - |
208 | | -```python |
209 | | -help(wvpy.util.gain_curve_plot) |
210 | | -``` |
211 | | - |
212 | | - Help on function gain_curve_plot in module wvpy.util: |
213 | | - |
214 | | - gain_curve_plot(prediction, outcome, title='Gain curve plot', *, show=True) |
215 | | - plot cumulative outcome as a function of prediction order (descending) |
216 | | - |
217 | | - :param prediction: vector of numeric predictions |
218 | | - :param outcome: vector of actual values |
219 | | - :param title: plot title |
220 | | - :param show: logical, if True call matplotlib.pyplot.show() |
221 | | - :return: None |
222 | | - |
223 | | - |
224 | | - |
225 | | - |
226 | | -```python |
227 | | -wvpy.util.gain_curve_plot( |
228 | | - prediction=d['x'], |
229 | | - outcome=d['y'], |
230 | | - title = "gain curve plot" |
231 | | -) |
232 | | -``` |
233 | | - |
234 | | - |
235 | | - |
236 | | - |
237 | | - |
238 | | - |
239 | | - |
240 | | - |
241 | | -```python |
242 | | -wvpy.util.lift_curve_plot( |
243 | | - prediction=d['x'], |
244 | | - outcome=d['y'], |
245 | | - title = "lift curve plot" |
246 | | -) |
247 | | -``` |
248 | | - |
249 | | - |
250 | | - |
251 | | - |
252 | | - |
253 | | - |
254 | | - |
255 | | - |
256 | | -```python |
257 | | - |
258 | | -``` |
0 commit comments