Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chapter 10 and 11 #7

Merged
merged 4 commits into from
May 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
265 changes: 261 additions & 4 deletions 10_disease-risk-modeling.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,269 @@

**Learning objectives:**

- THESE ARE NICE TO HAVE BUT NOT ABSOLUTELY NECESSARY
- how to model disease risk
- how to map disease risk


## Introduction

- Bayesian spatial models have been used to understand geographic patterns and risk factors of childhood overweight and obesity prevalence in Costa Rica (Gómez et al. 2023), and mosquito-borne diseases in Brazil (Pavani, Bastos, and Moraga 2023).

- Spatial methods can also be extended to analyze areal data that are both spatially and temporally referenced.


## Modeling of lung cancer risk in Pennsylvania

Bayesian spatial model to estimate the risk of lung cancer and assess its relationship with smoking in Pennsylvania, USA, in 2002.

```{r message=FALSE, warning=FALSE}
library(tidyverse)
library(SpatialEpi)
data(pennLC)
names(pennLC)
```


```{r message=FALSE, warning=FALSE}
head(pennLC$data)
```

```{r}
library(sf)
map <- st_as_sf(pennLC$spatial.polygon)
countynames <- sapply(slot(pennLC$spatial.polygon, "polygons"),
function(x){slot(x, "ID")})
map$county <- countynames
head(map)
```

```{r}
d <- group_by(pennLC$data, county) %>%
summarize(Y = sum(cases))
head(d)
```
```{r}
p1 <- map %>%
right_join(d, by = c("county" = "county")) %>%
ggplot()+
geom_sf(aes(fill=Y), color="navy")+
scale_fill_continuous(low="white", high="navy")+
geom_sf_text(aes(label=county,alpha=Y,color="orange"), size=2)+
coord_sf()+
labs(title = "Lung cancer cases",
subtitle = "Pennsylvania 1990-1994",
caption = "Source: PennLC data",
fill = "Cases")+
theme_minimal()
p1
```


### Expected cases

$$
E_i = \sum_{j=1}^m r_j^{(s)} n_j^{(s)}
$$

where $r_j^{(s)}$ is the rate of disease in stratum $j$ and $n_j^{(s)}$ is the population in stratum $j$.


```{r}
pennLC$data <- pennLC$data[order(pennLC$data$county,
pennLC$data$race,
pennLC$data$gender,
pennLC$data$age), ]

E <- expected(population = pennLC$data$population,
cases = pennLC$data$cases,
# 2 races, 2 genders, and 4 age groups (2×2×4 = 16)
n.strata = 16)
d$E <- E
head(d)
```

Let's visualize the difference between observed and expected cases.
```{r message=FALSE, warning=FALSE}
map %>%
right_join(d, by = c("county" = "county")) %>%
pivot_longer(cols = c("Y", "E"),
names_to = "type",
values_to = "value") %>%
mutate(value=round(value))%>%
ggplot()+
geom_sf(aes(fill=value), color="navy")+
scale_fill_continuous(low="white", high="navy")+
geom_sf_text(aes(label=value,
alpha=value,
color="orange"),
fontface="bold",
size=2)+
coord_sf()+
facet_wrap(~type)+
guides(fill = guide_colorbar(title = "Cases"),alpha="none",color="none")+
labs(title = "Lung cancer Observed and Expected cases",
subtitle = "Pennsylvania 1990-1994",
caption = "Source: PennLC data",
fill = "Type")+
theme_minimal()+
theme(axis.text = element_blank(),
axis.title = element_blank(),
legend.position = "bottom")
```

### Standardized Mortality Ratios

$$
SMR_i = \frac{Y_i}{E_i}
$$

if $SMR_i > 1$, then the county has more cases than expected, and if $SMR_i < 1$, then the county has fewer cases than expected.

```{r}
# add the smoking data
d <- dplyr::left_join(d, pennLC$smoking, by = "county")
d$SMR <- d$Y/d$E
head(d)
```

```{r message=FALSE, warning=FALSE}
ggplot(d, aes(x=smoking, y=SMR))+
geom_point(aes(color=smoking))+
geom_smooth(se=FALSE)+
labs(title = "Lung cancer SMR vs Smoking",
subtitle = "Pennsylvania 1990-1994",
caption = "Source: PennLC data",
x = "Smoking",
y = "SMR")+
theme_minimal()
```

```{r}
map <- dplyr::left_join(map, d, by = "county")
```

```{r message=FALSE, warning=FALSE}
map %>%
ggplot()+
geom_sf(aes(fill=SMR), color="navy")+
scale_fill_continuous(low="white", high="navy")+
geom_sf_text(aes(label=county,
alpha=SMR,
color="orange"),
size=2)+
coord_sf()+
guides(fill = guide_colorbar(title = "Cases"),
alpha="none",
color="none")+
labs(title = "Lung cancer SMR",
subtitle = "Pennsylvania 1990-1994",
caption = "Source: PennLC data",
fill = "SMR")+
theme_minimal()
```


## Modeling diseased risk


$$
Y_i| \theta_i \sim Poisson(E_i \times \theta_i)
$$

where

- $\theta_i$ is the **relative risk** for county $i$,

- $u_i$ is the **structured random effect** for county $i$ modeled with an intrinsic conditional autoregressive (ICAR) model,

$$
u_i|u_{-i} \sim N(\bar{u}_{\delta_i} \frac{1}{\tau_u n_{\delta_i}})
$$


- $v_i$ is the random effect for stratum $i$

$$
v_i \sim N(0, \frac{1}{\tau_v})
$$


### Neighborhood structure


```{r message=FALSE, warning=FALSE}
library(spdep)
library(INLA)
nb <- poly2nb(map)
nb2INLA("map.adj", nb)
g <- inla.read.graph(filename = "map.adj")

```

### Model

```{r}
map$re_u <- 1:nrow(map)
map$re_v <- 1:nrow(map)
```


```{r}
formula <- Y ~ smoking +
f(re_u,
model = "besag",
graph = g,
scale.model = TRUE) +
f(re_v, model = "iid")
```


```{r eval=FALSE}
res <- inla(formula,
family = "poisson",
data = map,
E = E,
control.predictor = list(compute = TRUE),
control.compute = list(return.marginals.predictor = TRUE))
```


res$summary.fixed
mean sd 0.025quant 0.5quant
(Intercept) -0.3235 0.1498 -0.61925 -0.3233
smoking 1.1546 0.6226 -0.07569 1.1560
0.975quant mode kld
(Intercept) -0.02877 -0.3234 3.534e-08
smoking 2.37845 1.1563 3.545e-08


### Relative Risk

res$summary.fitted.values[1:3, ]
mean sd 0.025quant 0.5quant
fitted.Predictor.01 0.8781 0.05808 0.7648 0.8778
fitted.Predictor.02 1.0597 0.02750 1.0072 1.0592
fitted.Predictor.03 0.9646 0.05089 0.8604 0.9657
0.975quant mode
fitted.Predictor.01 0.9936 0.8778
fitted.Predictor.02 1.1150 1.0582
fitted.Predictor.03 1.0622 0.9681


# relative risk
map$RR <- res$summary.fitted.values[, "mean"]


# lower and upper limits 95% CI
map$LL <- res$summary.fitted.values[, "0.025quant"]
map$UL <- res$summary.fitted.values[, "0.975quant"]


See the map here:

<https://www.paulamoraga.com/book-spatial/disease-risk-modeling.html#mapping-smr>

## SLIDE 1 {-}

- ADD SLIDES AS SECTIONS (`##`).
- TRY TO KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF.

## Meeting Videos {-}

Expand Down
17 changes: 13 additions & 4 deletions 11_areal-data-issues.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,21 @@

**Learning objectives:**

- THESE ARE NICE TO HAVE BUT NOT ABSOLUTELY NECESSARY
- Type of Areal data issues


## Areal data issues

Areal data issues are common in spatial epidemiology. The most common are:


1. **Misaligned Data Problem (MIDP)**: which refers to a situation where the spatial data being analyzed are at a different scale than the one at which they were originally collected. This can lead to biased results because the data are not aligned with the scale of the analysis.

2. **Modifiable Areal Unit Problem (MAUP)**: The MAUP is a statistical bias that occurs when data are aggregated into different areal units. The MAUP can lead to different results depending on the scale of the analysis. For example, if you aggregate data at the county level, you may get different results than if you aggregate data at the state level.

3. **Special case of the MAUP**: **ecological fallacy**, which occurs when you make inferences about individuals based on group-level data. For example, if you find a correlation between income and disease at the county level, you may incorrectly assume that the same correlation exists at the individual level.

## SLIDE 1 {-}

- ADD SLIDES AS SECTIONS (`##`).
- TRY TO KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF.

## Meeting Videos {-}

Expand Down
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Imports:
mapview,
rmarkdown,
sf,
SpatialEpi,
spData,
spdep,
tidyverse
Expand Down