Skip to content

allow for resampling weights and for data weights #37

Open
@sbrockhaus

Description

@sbrockhaus

In the current mboost() implementation only one argument weights exists that is used for data and for resampling weights.
Data weights can occur for example as weights in surveys or as integration weights for functional response. Resampling weights are important as resamplling is mostly used to find the optimal stopping iteration.
It is somewhat weird that the weights are rescaled in mboost_fit(), so that they sum up to 1 only when the weights are not integers, cf. rescale_weights()
https://github.com/boost-R/mboost/blob/master/R/helpers.R#L29
Thus, the rescaling is only done when the weights are not integers (assuming that resampling weights are integers and data weights are not?)

And one has to be careful what cv() does when creating folds to be used in cvrisk(), in the case thatmodel.weightsof the fitted object are not all equal to 1, see

library(mboost)

x <- sort(rnorm(10))
y <- x^2
y <- y - mean(y)
dat <- data.frame(y = y, x = x)

## model fit without weights
m <- mboost(y ~ bbs(x), data = dat)

## model fit with integer weights
myweights <- c(0, 2, 3, 1, 0, 3, 2, 1, 1, 2)
m_int <- mboost(y ~ bbs(x), data = dat, weights = myweights)

## model fit with non-integer weights
myweights3 <- myweights
myweights3[1] <- 0.1
m_int3 <- mboost(y ~ bbs(x), data = dat, weights = myweights3)

## look at model.weights
model.weights(m)
model.weights(m_int) ## weights are as specified
model.weights(m_int3) ## weights are rescaled 

####### look at cv() that generates resampling folds to be used in cvrisk()
## in cv() probability for each observation to enter the BS-fold is proportional to its weight
set.seed(123)
cv(weights = model.weights(m_int3), type = "bootstrap", B = 3)
## for cross-validation the folds are multiplied with the weights
cv(weights = model.weights(m_int3), type = "kfold", B = 3)

@fabian-s

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions