You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
</span><spanid="L-17"><ahref="#L-17"><spanclass="linenos">17</span></a><spanclass="s2">This<https://github.com/WinVector/pyvtreat> is the Python version of the vtreat data preparation system</span>
124
-
</span><spanid="L-18"><ahref="#L-18"><spanclass="linenos">18</span></a><spanclass="s2">(also available as an R package<https://winvector.github.io/vtreat/>.</span>
</span><spanid="L-20"><ahref="#L-20"><spanclass="linenos">20</span></a><spanclass="s2">vtreat is a DataFrame processor/conditioner that prepares</span>
127
-
</span><spanid="L-21"><ahref="#L-21"><spanclass="linenos">21</span></a><spanclass="s2">real-world data for supervised machine learning or predictive modeling</span>
128
-
</span><spanid="L-22"><ahref="#L-22"><spanclass="linenos">22</span></a><spanclass="s2">in a statistically sound manner.</span>
</span><spanid="L-24"><ahref="#L-24"><spanclass="linenos">24</span></a><spanclass="s2">vtreat takes an input DataFrame</span>
131
-
</span><spanid="L-25"><ahref="#L-25"><spanclass="linenos">25</span></a><spanclass="s2">that has a specified column called "the outcome variable" (or "y")</span>
132
-
</span><spanid="L-26"><ahref="#L-26"><spanclass="linenos">26</span></a><spanclass="s2">that is the quantity to be predicted (and must not have missing</span>
133
-
</span><spanid="L-27"><ahref="#L-27"><spanclass="linenos">27</span></a><spanclass="s2">values). Other input columns are possible explanatory variables</span>
134
-
</span><spanid="L-28"><ahref="#L-28"><spanclass="linenos">28</span></a><spanclass="s2">(typically numeric or categorical/string-valued, these columns may</span>
135
-
</span><spanid="L-29"><ahref="#L-29"><spanclass="linenos">29</span></a><spanclass="s2">have missing values) that the user later wants to use to predict "y".</span>
136
-
</span><spanid="L-30"><ahref="#L-30"><spanclass="linenos">30</span></a><spanclass="s2">In practice such an input DataFrame may not be immediately suitable</span>
137
-
</span><spanid="L-31"><ahref="#L-31"><spanclass="linenos">31</span></a><spanclass="s2">for machine learning procedures that often expect only numeric</span>
138
-
</span><spanid="L-32"><ahref="#L-32"><spanclass="linenos">32</span></a><spanclass="s2">explanatory variables, and may not tolerate missing values.</span>
</span><spanid="L-34"><ahref="#L-34"><spanclass="linenos">34</span></a><spanclass="s2">To solve this, vtreat builds a transformed DataFrame where all</span>
141
-
</span><spanid="L-35"><ahref="#L-35"><spanclass="linenos">35</span></a><spanclass="s2">explanatory variable columns have been transformed into a number of</span>
142
-
</span><spanid="L-36"><ahref="#L-36"><spanclass="linenos">36</span></a><spanclass="s2">numeric explanatory variable columns, without missing values. The</span>
143
-
</span><spanid="L-37"><ahref="#L-37"><spanclass="linenos">37</span></a><spanclass="s2">vtreat implementation produces derived numeric columns that capture</span>
144
-
</span><spanid="L-38"><ahref="#L-38"><spanclass="linenos">38</span></a><spanclass="s2">most of the information relating the explanatory columns to the</span>
145
-
</span><spanid="L-39"><ahref="#L-39"><spanclass="linenos">39</span></a><spanclass="s2">specified "y" or dependent/outcome column through a number of numeric</span>
</span><spanid="L-41"><ahref="#L-41"><spanclass="linenos">41</span></a><spanclass="s2">more). This transformed DataFrame is suitable for a wide range of</span>
148
-
</span><spanid="L-42"><ahref="#L-42"><spanclass="linenos">42</span></a><spanclass="s2">supervised learning methods from linear regression, through gradient</span>
</span><spanid="L-45"><ahref="#L-45"><spanclass="linenos">45</span></a><spanclass="s2">The idea is: you can take a DataFrame of messy real world data and</span>
152
-
</span><spanid="L-46"><ahref="#L-46"><spanclass="linenos">46</span></a><spanclass="s2">easily, faithfully, reliably, and repeatably prepare it for machine</span>
153
-
</span><spanid="L-47"><ahref="#L-47"><spanclass="linenos">47</span></a><spanclass="s2">learning using documented methods using vtreat. Incorporating</span>
154
-
</span><spanid="L-48"><ahref="#L-48"><spanclass="linenos">48</span></a><spanclass="s2">vtreat into your machine learning workflow lets you quickly work</span>
155
-
</span><spanid="L-49"><ahref="#L-49"><spanclass="linenos">49</span></a><spanclass="s2">with very diverse structured data.</span>
</span><spanid="L-51"><ahref="#L-51"><spanclass="linenos">51</span></a><spanclass="s2">Worked examples can be found `here`<https://github.com/WinVector/pyvtreat/tree/master/Examples>.</span>
</span><spanid="L-53"><ahref="#L-53"><spanclass="linenos">53</span></a><spanclass="s2">For more detail please see here: `arXiv:1611.09477</span>
160
-
</span><spanid="L-54"><ahref="#L-54"><spanclass="linenos">54</span></a><spanclass="s2">stat.AP`<https://arxiv.org/abs/1611.09477> (the documentation describes the R version,</span>
161
-
</span><spanid="L-55"><ahref="#L-55"><spanclass="linenos">55</span></a><spanclass="s2">however all of the examples can be found worked in Python </span>
</span><spanid="L-58"><ahref="#L-58"><spanclass="linenos">58</span></a><spanclass="s2">vtreat is available</span>
165
-
</span><spanid="L-59"><ahref="#L-59"><spanclass="linenos">59</span></a><spanclass="s2">as a `Python/Pandas package`<https://github.com/WinVector/vtreat>,</span>
166
-
</span><spanid="L-60"><ahref="#L-60"><spanclass="linenos">60</span></a><spanclass="s2">and also as an `R package`<https://github.com/WinVector/vtreat>.</span>
</span><spanid="L-18"><ahref="#L-18"><spanclass="linenos">18</span></a><spanclass="s2">This<https://github.com/WinVector/pyvtreat> is the Python version of the vtreat data preparation system</span>
125
+
</span><spanid="L-19"><ahref="#L-19"><spanclass="linenos">19</span></a><spanclass="s2">(also available as an R package<https://winvector.github.io/vtreat/>.</span>
</span><spanid="L-21"><ahref="#L-21"><spanclass="linenos">21</span></a><spanclass="s2">vtreat is a DataFrame processor/conditioner that prepares</span>
128
+
</span><spanid="L-22"><ahref="#L-22"><spanclass="linenos">22</span></a><spanclass="s2">real-world data for supervised machine learning or predictive modeling</span>
129
+
</span><spanid="L-23"><ahref="#L-23"><spanclass="linenos">23</span></a><spanclass="s2">in a statistically sound manner.</span>
</span><spanid="L-25"><ahref="#L-25"><spanclass="linenos">25</span></a><spanclass="s2">vtreat takes an input DataFrame</span>
132
+
</span><spanid="L-26"><ahref="#L-26"><spanclass="linenos">26</span></a><spanclass="s2">that has a specified column called "the outcome variable" (or "y")</span>
133
+
</span><spanid="L-27"><ahref="#L-27"><spanclass="linenos">27</span></a><spanclass="s2">that is the quantity to be predicted (and must not have missing</span>
134
+
</span><spanid="L-28"><ahref="#L-28"><spanclass="linenos">28</span></a><spanclass="s2">values). Other input columns are possible explanatory variables</span>
135
+
</span><spanid="L-29"><ahref="#L-29"><spanclass="linenos">29</span></a><spanclass="s2">(typically numeric or categorical/string-valued, these columns may</span>
136
+
</span><spanid="L-30"><ahref="#L-30"><spanclass="linenos">30</span></a><spanclass="s2">have missing values) that the user later wants to use to predict "y".</span>
137
+
</span><spanid="L-31"><ahref="#L-31"><spanclass="linenos">31</span></a><spanclass="s2">In practice such an input DataFrame may not be immediately suitable</span>
138
+
</span><spanid="L-32"><ahref="#L-32"><spanclass="linenos">32</span></a><spanclass="s2">for machine learning procedures that often expect only numeric</span>
139
+
</span><spanid="L-33"><ahref="#L-33"><spanclass="linenos">33</span></a><spanclass="s2">explanatory variables, and may not tolerate missing values.</span>
</span><spanid="L-35"><ahref="#L-35"><spanclass="linenos">35</span></a><spanclass="s2">To solve this, vtreat builds a transformed DataFrame where all</span>
142
+
</span><spanid="L-36"><ahref="#L-36"><spanclass="linenos">36</span></a><spanclass="s2">explanatory variable columns have been transformed into a number of</span>
143
+
</span><spanid="L-37"><ahref="#L-37"><spanclass="linenos">37</span></a><spanclass="s2">numeric explanatory variable columns, without missing values. The</span>
144
+
</span><spanid="L-38"><ahref="#L-38"><spanclass="linenos">38</span></a><spanclass="s2">vtreat implementation produces derived numeric columns that capture</span>
145
+
</span><spanid="L-39"><ahref="#L-39"><spanclass="linenos">39</span></a><spanclass="s2">most of the information relating the explanatory columns to the</span>
146
+
</span><spanid="L-40"><ahref="#L-40"><spanclass="linenos">40</span></a><spanclass="s2">specified "y" or dependent/outcome column through a number of numeric</span>
</span><spanid="L-42"><ahref="#L-42"><spanclass="linenos">42</span></a><spanclass="s2">more). This transformed DataFrame is suitable for a wide range of</span>
149
+
</span><spanid="L-43"><ahref="#L-43"><spanclass="linenos">43</span></a><spanclass="s2">supervised learning methods from linear regression, through gradient</span>
</span><spanid="L-46"><ahref="#L-46"><spanclass="linenos">46</span></a><spanclass="s2">The idea is: you can take a DataFrame of messy real world data and</span>
153
+
</span><spanid="L-47"><ahref="#L-47"><spanclass="linenos">47</span></a><spanclass="s2">easily, faithfully, reliably, and repeatably prepare it for machine</span>
154
+
</span><spanid="L-48"><ahref="#L-48"><spanclass="linenos">48</span></a><spanclass="s2">learning using documented methods using vtreat. Incorporating</span>
155
+
</span><spanid="L-49"><ahref="#L-49"><spanclass="linenos">49</span></a><spanclass="s2">vtreat into your machine learning workflow lets you quickly work</span>
156
+
</span><spanid="L-50"><ahref="#L-50"><spanclass="linenos">50</span></a><spanclass="s2">with very diverse structured data.</span>
</span><spanid="L-52"><ahref="#L-52"><spanclass="linenos">52</span></a><spanclass="s2">Worked examples can be found `here`<https://github.com/WinVector/pyvtreat/tree/master/Examples>.</span>
</span><spanid="L-54"><ahref="#L-54"><spanclass="linenos">54</span></a><spanclass="s2">For more detail please see here: `arXiv:1611.09477</span>
161
+
</span><spanid="L-55"><ahref="#L-55"><spanclass="linenos">55</span></a><spanclass="s2">stat.AP`<https://arxiv.org/abs/1611.09477> (the documentation describes the R version,</span>
162
+
</span><spanid="L-56"><ahref="#L-56"><spanclass="linenos">56</span></a><spanclass="s2">however all of the examples can be found worked in Python </span>
</span><spanid="L-59"><ahref="#L-59"><spanclass="linenos">59</span></a><spanclass="s2">vtreat is available</span>
166
+
</span><spanid="L-60"><ahref="#L-60"><spanclass="linenos">60</span></a><spanclass="s2">as a `Python/Pandas package`<https://github.com/WinVector/vtreat>,</span>
167
+
</span><spanid="L-61"><ahref="#L-61"><spanclass="linenos">61</span></a><spanclass="s2">and also as an `R package`<https://github.com/WinVector/vtreat>.</span>
0 commit comments