-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.Rmd
129 lines (102 loc) · 4.43 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
output: github_document
editor_options:
chunk_output_type: console
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# sbcrs
<!-- badges: start -->
[![Travis build status](https://travis-ci.org/jasonmtroos/sbcrs.svg?branch=master)](https://travis-ci.org/jasonmtroos/sbcrs)
[![Codecov test coverage](https://codecov.io/gh/jasonmtroos/sbcrs/branch/master/graph/badge.svg)](https://codecov.io/gh/jasonmtroos/sbcrs?branch=master)
<!-- badges: end -->
This package provides tools to simplify the implementation of
simulation based calibration using rank statistics (Talts, Betancourt, Simpson, Vehtari, and Gelman, [arXiv:1804.06788](https://arxiv.org/abs/1804.06788)). It implements a very similar validation procedure to that in `rstan::sbc` but using a different workflow. In `rstan::sbc`, the Stan model must be modified by the user to generate rank statistics during sampling. In this package, the Stan model is left unmodified, and the code needed to calculate the rank statistics is written in R. This provides a potentially faster development workflow (Stan recompiles are not needed). It also allows SBC to be used in cases where generating parameters from their prior distributions, or the modeled data from the likelihood function, would be too complicated if written in Stan.
## Installation
<!-- You can install the released version of sbcrs from [CRAN](https://CRAN.R-project.org) with: -->
<!-- ``` r -->
<!-- install.packages("sbcrs") -->
<!-- ``` -->
You can install the development version from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("jasonmtroos/sbcrs")
```
To build the package vignettes, install the package using:
``` r
devtools::install_github("jasonmtroos/sbcrs", build_vignettes = TRUE)
```
The package vignettes are a useful starting point for understanding what this package does:
* `intro-to-sbc` provides an overview to simulation-based calibration, and the features of the SBC package
* `funnel` shows how SBC identified sampling problems in Neal's funnel
* `comparison-to-rstan-sbc` implements the example from `rstan::sbc` using this package, shows the rank statistics are the same, and provides a basis for understanding the different design philosophies behind the two approaches.
It is also useful to understand why this package does what it does. For that, see: [Validating Bayesian Inference Algorithms with Simulation-Based Calibration, arXiv:1804.06788](https://arxiv.org/abs/1804.06788).
## A Simple Illustration of What This Thing Does
```{r eval = FALSE}
library(rstan)
my_model <- stan_model(model_code = "
data {
vector[100] y;
vector[100] x;
vector[100] w;
}
parameters {
real alpha;
real beta;
real<lower = 0> sigma;
}
model {
alpha ~ std_normal();
beta ~ std_normal();
sigma ~ exponential(1);
y ~ normal(alpha + beta * x + beta^2 * w, sigma);
}
")
```
How can we be sure this Stan model is able to sample from the posterior distribution of `alpha`, `beta`, and `sigma`? [Read the paper!](https://arxiv.org/abs/1804.06788)
Here's the code:
```{r github-example, eval = FALSE}
library(sbcrs)
doParallel::registerDoParallel(parallel::detectCores())
x <- sample.int(5, 100, replace = TRUE)
w <- rnorm(100)
my_sbc <- SBC$new(
data = function(seed) {
list(w = w, x = x)
},
params = function(seed, data) {
set.seed(seed + 1e6)
alpha <- rnorm(1)
beta <- rnorm(1)
sigma <- rexp(1)
list(alpha = alpha, beta = beta, sigma = sigma)
},
modeled_data = function(seed, data, params) {
set.seed(seed + 2e6)
list(y = rnorm(100, params$alpha +
params$beta * data$x +
params$beta^2 * data$w, params$sigma))
},
sampling = function(seed, data, params, modeled_data, iters) {
data_for_stan <- c(data, modeled_data)
rstan::sampling(my_model, data = data_for_stan, seed = seed,
chains = 1, iter = 2 * iters, warmup = iters,
refresh = 200)
}
)
my_sbc$calibrate(N = 128, L = 30, keep_stan_fit = FALSE)
my_sbc$plot()
```
![plot of my_sbc$plot()](man/figures/README-github-example-1.png)
Looks good!
## Contributing
Please note that the 'sbcrs' project is released with a
[Contributor Code of Conduct](CODE_OF_CONDUCT.md).
By contributing to this project, you agree to abide by its terms.