You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(also called "effects coded" in Jacob Cohen, Patricia Cohen,
138
-
*Applied Multiple Regression/Correlation Analysis for the Behavioral
139
-
Sciences*, 2nd edition, 1983). This allows principled use
137
+
(also called "effects coded"). This allows principled use
140
138
(including smoothing) of huge categorical variables (like zip-codes)
141
139
when building models. This is critical for some libraries (such as
142
140
'randomForest', which has hard limits on the number of
@@ -316,9 +314,9 @@ dTrainN %.>%
316
314
317
315
Related work:
318
316
319
-
*_Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences_, 2nd edition, 1983, Jacob Cohen, Patricia Cohen (called the concept “effects coded variables”).
320
-
*["A preprocessing scheme for high-cardinality categorical attributes in classification and prediction problems"](http://dl.acm.org/citation.cfm?id=507538) Daniele Micci-Barreca, ACM SIGKDD Explorations, Volume 3 Issue 1, July 2001 Pages 27-32.
321
-
*["Modeling Trick: Impact Coding of Categorical Variables with Many Levels"](http://www.win-vector.com/blog/2012/07/modeling-trick-impact-coding-of-categorical-variables-with-many-levels/) Nina Zumel, Win-Vector blog, 2012.
317
+
*["A Transformation for Simplifying the Interpretation of Coefficients of Binary Variables in Regression Analysis"](https://www.jstor.org/stable/2683780), Robert E. Sweeney and Edwin F. Ulveling; The American Statistician, vol. 26, no. 5, pp. 30-32, 1972.
318
+
*["A preprocessing scheme for high-cardinality categorical attributes in classification and prediction problems"](http://dl.acm.org/citation.cfm?id=507538) Daniele Micci-Barreca; ACM SIGKDD Explorations, Volume 3 Issue 1, July 2001 Pages 27-32.
319
+
*["Modeling Trick: Impact Coding of Categorical Variables with Many Levels"](http://www.win-vector.com/blog/2012/07/modeling-trick-impact-coding-of-categorical-variables-with-many-levels/) Nina Zumel; Win-Vector blog, 2012.
322
320
* "Big Learning Made Easy – with Counts!", Misha Bilenko, Cortana Intelligence and Machine Learning Blog, 2015.
0 commit comments