-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
confint with sparse factor levels breaks #49
Comments
Regarding your comment in #47 : My use case is only partly related to
In my case, resampling was done on the level of correlated observations, i.e. on subject level, with each subject having gone through every other possible study setting. So I actually did not had to deal with sparse factor levels (and dropping levels should be fine for random effects?). |
Just to understand you correctly: You were computing CIs for fixed effects and were not interestesd in the random effects? In that case I would agree that dropping unused levels should not pose any problem. Regarding the second part of your answer I have to rethink this. In a parametric setting, the CI would get rather big in that case as the standard error gets large. With setting the estimate to zero I meant only the estimate on the current fold which then becomes the basis for the CI. However, that isn't correct either as you are right. Currently the code just breaks. Perhaps we keep this behavior and simply throw a more informative error to let the user know that sparse categories hamper the computation of bootstrap CIs. Well; I have to check this in a small simulation.... |
Yes, exactly. Thanks for the response!
So did I. But I think precisely this proceeding is problematic. For example, think about a model for the probability of suffering a stroke (Yes / No). If there is a factor variable "suffered_stroke_before", which is zero / FALSE for most observations but highly predictive for Yes if one / TRUE, you certainly do not want to set the effect to zero for a large number of folds (though the corresponding confidence interval would probably just touch and not cross the value zero).
It's probably for the best. I would even go so far as to say, that CIs on the basis of bootstrapped (shrinked) boosting coefficients is a feature for advanced user (which are aware of the origin of those intervals) and throwing an error is in line with the actual purpose of the function (rather a "I'm aware of intervals do not necessarily comply with the nominal level and are biased due to the shrinkage"-function, than a a black box interval function, which always returns something). |
Start for test: ### check confidents intervals for factors with very small level frequencies
z <- factor(c(sample(1:5, 100, replace = TRUE), 6), levels = 1:6)
y <- rnorm(101)
mod <- mboost(y ~ bols(z))
confint(mod) (to be added to tests/regtest-inference.R) |
see #47
The text was updated successfully, but these errors were encountered: