Skip to content

Commit

Permalink
Added Analyte Annotations & Sample Descriptions README (#4) (#5)
Browse files Browse the repository at this point in the history
- New Analyte annotations and Row/sample data field descriptions
  were added to the README.md file
- this relates to a question in the `SomaDataIO` package;
  issue SomaLogic/SomaDataIO#12

Co-authored-by: lcuddeback <[email protected]>
Co-authored-by: Alex Poole <[email protected]>
  • Loading branch information
3 people authored Dec 20, 2021
1 parent 45251b6 commit ca0250b
Showing 1 changed file with 93 additions and 19 deletions.
112 changes: 93 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,53 +1,127 @@
# SomaLogic-Data

### File: example_data.adat
## File: example_data.adat

#### Overview
### Overview

The `example_data.adat` is intended to provide existing and prospective
SomaLogic customers an example data file to enable analysis preparation prior
to receipt of SomaScan data, and also for those generally curious about the
SomaScan data deliverable. It is **not** intended to be used as a control
group for studies or provide any metrics for SomaScan data in general.
SomaScan data deliverable. Data in this file is **not** intended for
biological analysis purposes or to provide any metrics for SomaScan data in
general.

#### ADAT File Format
### ADAT File Format

The ADAT file format is a SomaLogic-specific, tab-delimited text file designed
to store SomaScan study data. This format is intended to be flexible and
self-describing. The fields in this example file may be different than the
fields in the \*.adat file for your study. However, all \*.adat files are comprised
of four main sections arranged in the following order:
fields in the \*.adat file for your study. However, all \*.adat files are
comprised of four main sections arranged in the following order:

- `HEADER` - Study-level information about the SomaScan experiment and how the
data was processed.
- `COL_DATA` - Field names and type associated with the SOMAmer reagents
(columns).
- `ROW_DATA` - Field names and type associated with sample metadata (rows).
- `ROW_DATA` - Field names and type associated with sample information (rows).
- `TABLE_BEGIN` - This section contains the experimental data organized into a
data matrix of SOMAmer Reagents (columns) by samples (rows). SomaScan measurements
are in relative flourescent units (RFU). The data block directly above the
measurement matrix describes the SOMAmer reagents and the data block to
data matrix of SOMAmer Reagents (columns) by samples (rows). SomaScan
measurements are in relative flourescent units (RFU). The data block directly
above the measurement matrix describes the SOMAmer reagents and the data block
to its left contains sample-specific (e.g. clinical) information.

#### Example File Description
### Example File Description

This file, `example_data.adat`, contains a SomaScan V4 study from healthy
normal individuals. The RFU measurements themselves and other identifiers
have been altered to protect personally identifiable information (PII),
but also retain underlying biological signal as much as possible.
There are 192 total EDTA-plasma samples across two 96-well plate runs
This file, `example_data.adat`, contains a SomaScan V4 study from a set of
human samples. The RFU measurements themselves and other identifiers
have been altered to protect personally identifiable information (PII),
but also retain underlying biological signal as much as possible.
There are 192 total EDTA-plasma samples across two 96-well plate runs
which are broken down by the following types:
* 170 clinical samples
* 10 calibrators (replicate controls for combining data across runs)
* 6 QC samples (replicate controls used to assess run quality)
* 6 Buffer samples (no protein controls)

#### Data Processing
### Data Processing

The standard V4 data normalization procedure for EDTA-plasma samples was
applied to this dataset.

#### Parsers and programatic tools for \*.adat files
## Sample and Analyte Annotations

In a standard SomaLogic ADAT, the section of information that
sits directly above the measurement data (RFU data matrix) is
the column meta data, which contains detailed information
and annotations about the analytes, `SeqIds`, and their targets.
See section below for further information about available
fields and their descriptions.

### Analyte Annotations:
Information describing the *analytes* is found to the above
the data matrix in a standard SomaLogic ADAT. This information may
consist of the any or all of the following:

| __Field__ | __Description__ | __Example__ |
| :----------------- | :------------------------------------ | :------------- |
| SeqId | SomaLogic sequence identifier | 2182-54\_1 |
| SeqidVersion | Version of SOMAmer sequence | 2 |
| SomaId | Target identifier, of the form SLnnnnnn (8 characters in length) | SL000318 |
| TargetFullName | Target name curated for consistency with UniProt name | Complement C4b |
| Target | SomaLogic Target Name | C4b |
| UniProt | UniProt identifier(s) | P0C0L4 P0C0L5 |
| EntrezGeneID | Entrez Gene Identifier(s) | 720 721 |
| EntrezGeneSymbol | Entrez Gene Symbol names | C4A C4B |
| Organism | Protein Source Organism | Human |
| Units | Relative Fluorescence Units | RFU |
| Type | SOMAmer target type | Protein |
| Dilution | Dilution mix assignment | 0.01% |
| PlateScale\_Reference | PlateScale reference value | 1378.85 |
| CalReference | Calibration sample reference value | 1378.85 |
| medNormRef\_ReferenceRFU | Median normalization reference value | 490.342 |
| Cal\_V4\_\<YY\>\_\<SSS\>\_\<PPP\> | Calibration scale factor (for given Year\_Study\_Plate) | 0.64 |
| ColCheck | QC acceptance criteria across all plates/sets | PASS |
| QcReference\_\<LLLLL\> | QC sample reference value (for given QC lot) | PASS |
| CalQcRatio\_V4\_\<YY\>\_\<SSS\>\_\<PPP\> | Post calibration median QC ratio to reference (for given Year\_Study\_Plate) | 1.04 |

#### Sample Annotations:
Information describing the *samples* is typically found to the left of
the data matrix in a standard SomaLogic ADAT. This information may
consist of clinical information provided by the client, or run-specific
diagnostic information included for assay quality control. Below are
some examples of what may be present in this section:

| __Field__ | __Description__ | __Examples__ |
| :---------------- | :------------------------------------------------ | :------------- |
| PlateId | Plate identifier | V4-18-004\_001, V4-18-004\_002 |
| ScannerID | Scanner used to analyze slide | SG12064173, SG14374437 |
| PlatePosition | Location on 96 well plate (A1-H12) | A1, H12 |
| SlideId | Agilent slide barcode | 258495800001 |
| Subarray | Agilent subarray (1 – 8) | 1,8 |
| SampleId | 1st form is Subject Identifier, 2nd form (calibrators, buffers) | 2031 |
| SampleType | 1st form for clinical samples (Sample), 2nd form as above | Sample, QC, Calibrator, Buffer |
| PercentDilution | Highest concentration the SOMAmer dilution groups | 20 |
| SampleMatrix | Sample matrix | Plasma-PPT |
| Barcode | 1D Barcode of aliquot | S622225 |
| Barcode2d | 2D Barcode of aliquot | 9876543210 |
| SampleNotes | Assay team sample observation | Cloudy, Low sample volume, Reddish |
| SampleDescription | Supplemental sample information | Plasma QC 1 |
| AssayNotes | Assay team run observation | Beads aspirated, Leak/Hole, Smear |
| TimePoint | Sample time point | Baseline |
| ExtIdentifier | Primary key for Subarray | EXID40000000032037 |
| SsfExtId | Primary key for sample | EID102733 |
| SampleGroup | Sample group | A, B |
| SiteId | Collection site | SomaLogic |
| TubeUniqueID | Unique tube identifier | 2031 |
| CLI | Cohort definition identifier | CLI6006F001 |
| HybControlNormScale | Hybridization control scale factor | 0.948304 |
| RowCheck | Normalization acceptance criteria for all row scale factors | PASS, FLAG |
| NormScale\_0\_5 | Median signal normalization scale factor (0.5% mix) | 1.02718 |
| NormScale\_0\_005 | Median signal normalization scale factor (0.005% mix) | 1.119754 |
| NormScale\_20 | Median signal normalization scale factor (20% mix) | 0.996148 |


## Parsers and Programatic Tools for \*.adat Files

- R packge: [SomaDataIO](https://github.com/SomaLogic/SomaDataIO)
- Python module: [canopy.py](https://github.com/SomaLogic/Canopy)

0 comments on commit ca0250b

Please sign in to comment.