Skip to content

feature request: Handling Inf in data #1694

Open
@wpetry

Description

@wpetry

Description of current behavior

When Inf or -Inf are encountered in data, brm passes these rows to Stan, which fails because it is not able to evaluate the lp at the initial values. I think the standard troubleshooting for this error is to specify init = ... and/or to use more informative priors. But Stan will fail with this same error regardless of the initial values or priors specified. This appears to be a fitting issue, when in reality the source of the problem is in the data.

reprex:

library(brms)

x <- 0:100
mu <- 10 + 0.3 * x
y <- rnorm(mu, sd = 2)
dat <- data.frame(x, y)
dat$y[1] <- Inf

mod <- brm(y ~ 1 + x, data = dat)  # fails with Stan initialization error
mod2 <- lm(y ~ 1 + x, data = dat)  # base R regression gives a (somewhat) informative error in the same circumstance

Desired feature behavior

I think the best approach would be to stop the model fitting with an informative error instead of a warning. Infinite values are likely artifacts of errors during the calculation of variables and warrant re-examination before fitting any model (e.g., dividing by 0, log-transforming 0, etc.).

A softer approach would be to drop rows containing infinite values with a warning on the R side, then pass the cleaned data to Stan for fitting. This mirrors the handling of rows containing NA (absent user-specified imputation with mi()). I don't favor this approach because I'm not able to think of cases when it's still reasonable to fit a model after learning that some of the variable values are infinite.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions