Skip to content

Commit 5f7842d

Browse files
authored
Add sample annotation merging guidance (#145)
* Add sample annotation merging guidance - added `ex_clin_data` object with additional sample annotation fields `smoking_status` and `alcohol_use` to demonstrate merging to a `soma_adat` object - updated README and loading and wrangling vignette article with section including code to join this object to the example_data adat
1 parent 3440882 commit 5f7842d

File tree

7 files changed

+141
-8
lines changed

7 files changed

+141
-8
lines changed

R/data.R

+6-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
#' group for studies or provide any metrics for SomaScan data in general.
88
#'
99
#' @name SomaScanObjects
10-
#' @aliases example_data ex_analytes ex_anno_tbl ex_target_names
10+
#' @aliases example_data ex_analytes ex_anno_tbl ex_target_names ex_clin_data
1111
#' @docType data
1212
#'
1313
#' @section Data Description:
@@ -64,6 +64,11 @@
6464
#' target names contained in `example_data`. This object (or one like it) is
6565
#' convenient at the console via auto-complete for labeling and/or creating
6666
#' plot titles on the fly.}
67+
#'
68+
#' \item{ex_clin_data}{A table containing `SampleId`, `smoking_status`, and
69+
#' `alcohol_use` fields for each clinical sample in `example_data` used to
70+
#' demonstrate how to merge sample annotation information to an existing
71+
#' `soma_adat` object.}
6772
#' }
6873
#'
6974
#' @source \url{https://github.com/SomaLogic/SomaLogic-Data}

README.Rmd

+30-2
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,7 @@ library(help = SomaDataIO)
157157

158158
## Objects and Data
159159

160-
The `SomaDataIO` package comes with four (4) objects available
160+
The `SomaDataIO` package comes with five (5) objects available
161161
to users to run canned examples (or analyses). They can be accessed once
162162
`SomaDataIO` has been attached via `library()`. They are:
163163

@@ -175,6 +175,9 @@ to users to run canned examples (or analyses). They can be accessed once
175175
* `ex_analytes`: the analyte (feature) variables in `example_data`
176176
* `ex_anno_tbl`: the annotations table associated with `example_data`
177177
* `ex_target_names`: a mapping object for analyte -> target
178+
* `ex_clin_data`: a table containing variables `SampleId`, `smoking_status` and
179+
`alcohol_use` to demonstrate merging clinical sample annotation information
180+
to a `soma_adat` object
178181
* See also `?SomaScanObjects`
179182

180183

@@ -212,7 +215,7 @@ adat_path
212215

213216
# `adat_path` should be the elaborated path and file name of the *.adat file to
214217
# be loaded into the R workspace from your local file system
215-
# (e.g. file_path = "PATH_TO_ADAT/my_adat.adat")
218+
# (e.g. adat_path = "PATH_TO_ADAT/my_adat.adat")
216219
my_adat <- read_adat(file = adat_path)
217220

218221
# test object class
@@ -238,6 +241,31 @@ S3 methods to the most popular
238241
methods(class = "soma_adat")
239242
```
240243

244+
#### Merging Sample Annotation Data
245+
246+
The `example_data` object includes some sample annotation data built-in, with
247+
the variables `Age` and `Sex` included for clinical samples, but in practice
248+
ADAT files generally do not have any clinical or sample annotation data fields
249+
included.
250+
251+
To merge sample annotation data into an existing `soma_adat` class object,
252+
use the `left_join()` method. Here, joining the `ex_clin_data` object adds in
253+
two additional clinical variables, `smoking_status` and `alcohol_use`:
254+
255+
```{r merge-annotations}
256+
# `clin_path` should be the elaborated path and file name of the *.csv or
257+
# similar file to be loaded into the R workspace from your local file system
258+
# (e.g. clin_path = "PATH_TO_CLIN/clin_data.csv")
259+
# clin_data <- readr::read_csv(clin_path)
260+
261+
merged_adat <- my_adat |>
262+
dplyr::left_join(ex_clin_data, by = "SampleId")
263+
264+
merged_adat |>
265+
dplyr::select(SampleId, Age, Sex, smoking_status, alcohol_use) |>
266+
head(n = 3)
267+
```
268+
241269
Please see the article [Loading and Wrangling SomaScan](https://somalogic.github.io/SomaDataIO/articles/tips-loading-and-wrangling.html)
242270
for more details about available `soma_adat` methods.
243271

README.md

+55-3
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
<!-- badges: start -->
77

88
![GitHub
9-
version](https://img.shields.io/badge/Version-6.1.0.9000-success.svg?style=flat&logo=github)
9+
version](https://img.shields.io/badge/Version-6.2.0.9000-success.svg?style=flat&logo=github)
1010
[![CRAN
1111
status](http://www.r-pkg.org/badges/version/SomaDataIO)](https://cran.r-project.org/package=SomaDataIO)
1212
[![Downloads](https://cranlogs.r-pkg.org/badges/SomaDataIO)](https://cran.r-project.org/package=SomaDataIO)
@@ -132,7 +132,7 @@ library(help = SomaDataIO)
132132

133133
## Objects and Data
134134

135-
The `SomaDataIO` package comes with four (4) objects available to users
135+
The `SomaDataIO` package comes with five (5) objects available to users
136136
to run canned examples (or analyses). They can be accessed once
137137
`SomaDataIO` has been attached via `library()`. They are:
138138

@@ -157,6 +157,10 @@ to run canned examples (or analyses). They can be accessed once
157157

158158
- `ex_target_names`: a mapping object for analyte -\> target
159159

160+
- `ex_clin_data`: a table containing variables `SampleId`,
161+
`smoking_status` and `alcohol_use` to demonstrate merging clinical
162+
sample annotation information to a `soma_adat` object
163+
160164
- See also `?SomaScanObjects`
161165

162166
------------------------------------------------------------------------
@@ -192,7 +196,7 @@ adat_path
192196

193197
# `adat_path` should be the elaborated path and file name of the *.adat file to
194198
# be loaded into the R workspace from your local file system
195-
# (e.g. file_path = "PATH_TO_ADAT/my_adat.adat")
199+
# (e.g. adat_path = "PATH_TO_ADAT/my_adat.adat")
196200
my_adat <- read_adat(file = adat_path)
197201

198202
# test object class
@@ -260,6 +264,54 @@ methods(class = "soma_adat")
260264
#> see '?methods' for accessing help and source code
261265
```
262266

267+
#### Merging Sample Annotation Data
268+
269+
The `example_data` object includes some sample annotation data built-in,
270+
with the variables `Age` and `Sex` included for clinical samples, but in
271+
practice ADAT files generally do not have any clinical or sample
272+
annotation data fields included.
273+
274+
To merge sample annotation data into an existing `soma_adat` class
275+
object, use the `left_join()` method. Here, joining the `ex_clin_data`
276+
object adds in two additional clinical variables, `smoking_status` and
277+
`alcohol_use`:
278+
279+
``` r
280+
# `clin_path` should be the elaborated path and file name of the *.csv or
281+
# similar file to be loaded into the R workspace from your local file system
282+
# (e.g. clin_path = "PATH_TO_CLIN/clin_data.csv")
283+
# clin_data <- readr::read_csv(clin_path)
284+
285+
merged_adat <- my_adat |>
286+
dplyr::left_join(ex_clin_data, by = "SampleId")
287+
288+
merged_adat |>
289+
dplyr::select(SampleId, Age, Sex, smoking_status, alcohol_use) |>
290+
head(n = 3)
291+
#> ══ SomaScan Data ═══════════════════════════════════════════════════════════════
292+
#> SomaScan version V4 (5k)
293+
#> Signal Space 5k
294+
#> Attributes intact ✓
295+
#> Rows 3
296+
#> Columns 5
297+
#> Clinical Data 5
298+
#> Features 0
299+
#> ── Column Meta ─────────────────────────────────────────────────────────────────
300+
#> ℹ SeqId, SeqIdVersion, SomaId, TargetFullName, Target, UniProt, EntrezGeneID,
301+
#> ℹ EntrezGeneSymbol, Organism, Units, Type, Dilution, PlateScale_Reference,
302+
#> ℹ CalReference, Cal_Example_Adat_Set001, ColCheck,
303+
#> ℹ CalQcRatio_Example_Adat_Set001_170255, QcReference_170255,
304+
#> ℹ Cal_Example_Adat_Set002, CalQcRatio_Example_Adat_Set002_170255, Dilution2
305+
#> ── Tibble ──────────────────────────────────────────────────────────────────────
306+
#> # A tibble: 3 × 6
307+
#> row_names SampleId Age Sex smoking_status alcohol_use
308+
#> <chr> <chr> <int> <chr> <chr> <chr>
309+
#> 1 258495800012_3 1 76 F Never Yes
310+
#> 2 258495800004_7 2 55 F Never Yes
311+
#> 3 258495800010_8 3 47 M Never No
312+
#> ════════════════════════════════════════════════════════════════════════════════
313+
```
314+
263315
Please see the article [Loading and Wrangling
264316
SomaScan](https://somalogic.github.io/SomaDataIO/articles/tips-loading-and-wrangling.html)
265317
for more details about available `soma_adat` methods.

data-raw/SomaScanObjects.R

+18-2
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,28 @@ data <- example_data
55
x <- ex_analytes
66
y <- ex_anno_tbl
77
z <- ex_target_names
8+
zz <- ex_clin_data
89

910
# 'new'
1011
example_data <- read_adat("example_data.adat") # download via wget
1112
ex_analytes <- getAnalytes(example_data)
1213
ex_anno_tbl <- getAnalyteInfo(example_data)
1314
ex_target_names <- getTargetNames(ex_anno_tbl)
1415

16+
withr::with_seed(123, {
17+
ex_clin_data <- example_data |>
18+
dplyr::filter(SampleType == "Sample") |>
19+
dplyr::mutate(
20+
smoking_status = sample(c("Current", "Past", "Never"),
21+
size = 170, replace = TRUE),
22+
alcohol_use = sample(c("Yes", "No"),
23+
size = 170, replace = TRUE)
24+
) |>
25+
select(SampleId, smoking_status, alcohol_use) |>
26+
as_tibble()
27+
})
28+
29+
1530
# 'save only if necessary'
1631
if ( !isTRUE(all.equal(data, example_data)) ) {
1732
save(example_data, file = "data/example_data.rda", compress = "xz")
@@ -20,6 +35,7 @@ if ( !isTRUE(all.equal(data, example_data)) ) {
2035
# 'save only if necessary'
2136
if ( !all(isTRUE(all.equal(x, ex_analytes)),
2237
isTRUE(all.equal(y, ex_anno_tbl)),
23-
isTRUE(all.equal(z, ex_target_names))) ) {
24-
save(ex_analytes, ex_anno_tbl, ex_target_names, file = "data/data_objects.rda", compress = "xz")
38+
isTRUE(all.equal(z, ex_target_names)),
39+
isTRUE(all.equal(zz, ex_clin_data))) ) {
40+
save(ex_analytes, ex_anno_tbl, ex_target_names, ex_clin_data, file = "data/data_objects.rda", compress = "xz")
2541
}

data/data_objects.rda

1.29 KB
Binary file not shown.

man/SomaScanObjects.Rd

+6
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

vignettes/articles/tips-loading-and-wrangling.Rmd

+26
Original file line numberDiff line numberDiff line change
@@ -180,6 +180,32 @@ males |>
180180
```
181181

182182

183+
#### Merging Sample Annotation Data
184+
185+
The `example_data` object includes some sample annotation data built-in, with
186+
the variables `Age` and `Sex` included for clinical samples, but in practice
187+
ADAT files generally do not have any clinical or sample annotation data fields
188+
included.
189+
190+
To merge sample annotation data into an existing `soma_adat` class object,
191+
use the `left_join()` method. Here, joining the `ex_clin_data` `tibble` object
192+
adds in two additional clinical variables, `smoking_status` and `alcohol_use`:
193+
194+
```{r merge-annotations}
195+
# `clin_path` should be the elaborated path and file name of the *.csv or
196+
# similar file to be loaded into the R workspace from your local file system
197+
# (e.g. clin_path = "PATH_TO_CLIN/clin_data.csv")
198+
# clin_data <- readr::read_csv(clin_path)
199+
200+
merged_adat <- my_adat |>
201+
dplyr::left_join(ex_clin_data, by = "SampleId")
202+
203+
merged_adat |>
204+
dplyr::select(SampleId, Age, Sex, smoking_status, alcohol_use) |>
205+
head(n = 3)
206+
```
207+
208+
183209
### Available S3 Methods `soma_adat`
184210

185211
```{r methods}

0 commit comments

Comments
 (0)