Description
In the current mboost()
implementation only one argument weights
exists that is used for data and for resampling weights.
Data weights can occur for example as weights in surveys or as integration weights for functional response. Resampling weights are important as resamplling is mostly used to find the optimal stopping iteration.
It is somewhat weird that the weights are rescaled in mboost_fit()
, so that they sum up to 1 only when the weights are not integers, cf. rescale_weights()
https://github.com/boost-R/mboost/blob/master/R/helpers.R#L29
Thus, the rescaling is only done when the weights are not integers (assuming that resampling weights are integers and data weights are not?)
And one has to be careful what cv()
does when creating folds to be used in cvrisk()
, in the case thatmodel.weights
of the fitted object are not all equal to 1, see
library(mboost)
x <- sort(rnorm(10))
y <- x^2
y <- y - mean(y)
dat <- data.frame(y = y, x = x)
## model fit without weights
m <- mboost(y ~ bbs(x), data = dat)
## model fit with integer weights
myweights <- c(0, 2, 3, 1, 0, 3, 2, 1, 1, 2)
m_int <- mboost(y ~ bbs(x), data = dat, weights = myweights)
## model fit with non-integer weights
myweights3 <- myweights
myweights3[1] <- 0.1
m_int3 <- mboost(y ~ bbs(x), data = dat, weights = myweights3)
## look at model.weights
model.weights(m)
model.weights(m_int) ## weights are as specified
model.weights(m_int3) ## weights are rescaled
####### look at cv() that generates resampling folds to be used in cvrisk()
## in cv() probability for each observation to enter the BS-fold is proportional to its weight
set.seed(123)
cv(weights = model.weights(m_int3), type = "bootstrap", B = 3)
## for cross-validation the folds are multiplied with the weights
cv(weights = model.weights(m_int3), type = "kfold", B = 3)