Skip to content

Commit b92ebc6

Browse files
authored
Merge pull request #57 from poseidon-framework/moreFor270
suggestions for additional changes in 2.7.0
2 parents 7898bb7 + 6133bcd commit b92ebc6

File tree

3 files changed

+37
-27
lines changed

3 files changed

+37
-27
lines changed

README.md

Lines changed: 28 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,9 @@ Every package should have the following files:
1919

2020
It can also contain the following files:
2121

22-
- A `README.txt` file for arbitrary context information
23-
- A `CHANGELOG.txt` file to document changes to the package
22+
- A `README.md` file for arbitrary context information
23+
- A `CHANGELOG.md` file to document changes to the package
24+
- A `.ssf` file with information on the underlying raw sequencing data
2425

2526
Example:
2627

@@ -30,9 +31,10 @@ Switzerland_LNBA_Roswita/Switzerland_LNBA.plink.bed
3031
Switzerland_LNBA_Roswita/Switzerland_LNBA.plink.bim
3132
Switzerland_LNBA_Roswita/Switzerland_LNBA.plink.fam
3233
Switzerland_LNBA_Roswita/Switzerland_LNBA.janno
34+
Switzerland_LNBA_Roswita/Switzerland_LNBA.ssf
3335
Switzerland_LNBA_Roswita/Switzerland_LNBA.bib
34-
Switzerland_LNBA_Roswita/README.txt
35-
Switzerland_LNBA_Roswita/CHANGELOG.txt
36+
Switzerland_LNBA_Roswita/README.md
37+
Switzerland_LNBA_Roswita/CHANGELOG.md
3638
```
3739

3840
## The `POSEIDON.yml` file
@@ -47,7 +49,7 @@ Example:
4749
```
4850
poseidonVersion: 2.5.0
4951
title: Switzerland_LNBA_Roswita
50-
description: LNBA Switzerland genetic data not yet published # optional
52+
description: LNBA Switzerland genetic data not yet published
5153
contributor:
5254
- name: Roswita Malone
5355
@@ -58,18 +60,20 @@ lastModified: 2021-01-28
5860
genotypeData:
5961
format: PLINK
6062
genoFile: Switzerland_LNBA_Roswita.bed
61-
genoFileChkSum: 95b093eefacc1d6499afcfe89b15d56c # optional
63+
genoFileChkSum: 95b093eefacc1d6499afcfe89b15d56c
6264
snpFile: Switzerland_LNBA_Roswita.bim
63-
snpFileChkSum: 6771d7c873219039ba3d5bdd96031ce3 # optional
65+
snpFileChkSum: 6771d7c873219039ba3d5bdd96031ce3
6466
indFile: Switzerland_LNBA_Roswita.fam
65-
indFileChkSum: f77dc756666dbfef3bb35191ae15a167 # optional
67+
indFileChkSum: f77dc756666dbfef3bb35191ae15a167
6668
snpSet: 1240K
6769
jannoFile : Switzerland_LNBA_Roswita.janno
68-
jannoFileChkSum: 555d7733135ebcabd032d581381c5d6f # optional
69-
bibFile: sources.bib
70-
bibFileChkSum: 70cd3d5801cee8a93fc2eb40a99c63fa # optional
71-
readmeFile: README.txt # optional
72-
changelogFile: CHANGELOG.txt # optional
70+
jannoFileChkSum: 555d7733135ebcabd032d581381c5d6f
71+
sequencingSourceFile: Switzerland_LNBA_Roswita.ssf
72+
sequencingSourceFileChkSum: 19db1906240ee2f076e1a9659567dca4
73+
bibFile: Switzerland_LNBA_Roswita.bib
74+
bibFileChkSum: 70cd3d5801cee8a93fc2eb40a99c63fa
75+
readmeFile: README.md
76+
changelogFile: CHANGELOG.md
7377
```
7478

7579
When a package is modified in any way (e.g. updates of the context information in the `.janno` file), then the `packageVersion` field should be incremented and the `lastModified` field updated to the current date.
@@ -125,7 +129,7 @@ Example:
125129
}
126130
```
127131

128-
## The `README.txt` file
132+
## The `README.md` file
129133

130134
Informal information accompanying the package.
131135

@@ -135,19 +139,22 @@ Example:
135139
This package contains a rather interesting set of samples relevant for the peopling of the Territory of Christmas Island in the Indian Ocean. We consider this especially relevant, because ...
136140
```
137141

138-
## The `CHANGELOG.txt` file
142+
## The `CHANGELOG.md` file
139143

140144
Documentation of important changes in the history of a package.
141145

142146
Example:
143147

144148
```
145-
V 1.2.0: Fixed a spelling mistake in the site name "Hosenacker"->"Rosenacker"
146-
V 1.1.1: Added mtDNA contamination estimation to .janno file
147-
V 1.1.0: The authors of @Gassenhauer_2021 made some previously restricted samples for their publication available later and we added them
148-
V 1.0.0: Creation of the package
149+
- V 1.2.0: Fixed a spelling mistake in the site name "Hosenacker"->"Rosenacker"
150+
- V 1.1.1: Added mtDNA contamination estimation to .janno file
151+
- V 1.1.0: The authors of @Gassenhauer_2021 made some previously restricted samples for their publication available later and we added them
152+
- V 1.0.0: Creation of the package
149153
```
150154

151-
## The Sequencing Source file
155+
## The `.ssf` file
156+
157+
Poseidon 2.7.0 added an option to specify sequencing source data. This is a tab-separated table, much like the `.janno` file, but following a different schema, specified in the file `ssf_columns.tsv`.
158+
159+
Note that the primary entities in this table are sequencing entities (typically corresponding to DNA libraries or even multiple runs/lanes of the same library). The link to the Individuals listed in the `.janno`-file are made through a foreign-key relationship into `Poseidon_ID`.
152160

153-
Poseidon 2.7.0 added an option to specify sequencing source data. This is a tab-separated table, much like the Janno file, but following a different schema, specified in the file `sequencingSourceFile_columns.tsv`. Note that the primary entities in this table are Sequencing entities (typically corresponding to DNA libraries or even multiple runs/lanes of the same library). The link to the Individuals listed in the Janno-file are made through a foreign-key relationship into `Poseidon_ID`.

janno_columns.tsv

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ Relation_Type relationship type for relatives mentioned in Related_To as an arbi
99
Relation_Note arbitrary comments about the relations of this individual String FALSE FALSE FALSE FALSE FALSE
1010
Collection_ID id as defined by the provider/owner of a sample (e.g. grave 40 skeleton 2) String FALSE FALSE FALSE FALSE FALSE
1111
Country present-day political country String FALSE FALSE FALSE FALSE FALSE
12+
Country_ISO present-day political country expressed in ISO 3166-1 alpha-2 country codes String FALSE FALSE FALSE FALSE FALSE
1213
Location unspecified location information like administrative or topographic region or mountains/rivers/lakes/cities nearby String FALSE FALSE FALSE FALSE FALSE
1314
Site site name String FALSE FALSE FALSE FALSE FALSE
1415
Latitude latitude with up to 5 places after the decimal point Float FALSE FALSE TRUE -90 90 FALSE FALSE
@@ -23,11 +24,12 @@ Date_BC_AD_Stop upper (more recent) bound for the age, negative numbers for BC,
2324
Date_Note a free text field for arbitrary comments about the dating information String FALSE FALSE FALSE FALSE FALSE
2425
MT_Haplogroup mitochondrial haplogroup after phylotree.org as reported by Haplofind or Haplogrep String FALSE FALSE FALSE FALSE FALSE
2526
Y_Haplogroup Y-chromosome haplogroup reported as published, for internal data, please follow syntax with main branch + most terminal derived Y-SNP (e.g. R1b-P312) String FALSE FALSE FALSE FALSE FALSE
26-
Source_Tissue skeletal/tissue/source elements, specific bone name should be reported with an underscore (e.g. bone_phalanx), multiple values separated by ; in case of multiple libraries String TRUE FALSE FALSE FALSE FALSE
27+
Source_Tissue skeletal/tissue/source elements, specific bone name should be reported with an underscore (e.g. bone_phalanx), multiple values separated by ; String TRUE FALSE FALSE FALSE FALSE
2728
Nr_Libraries number of libraries Integer FALSE FALSE FALSE FALSE FALSE
29+
Library_Names identifiers of the libraries used to generate the genotype data, multiple values separated by ; String TRUE FALSE FALSE FALSE FALSE
2830
Capture_Type specifics of data generation method, multiple values separated by ; String TRUE TRUE FALSE Shotgun;1240K;ArborComplete;ArborPrimePlus;ArborAncestralPlus;TwistAncientDNA;OtherCapture;ReferenceGenome FALSE FALSE
29-
UDG “mixed” in case multiple libraries with different UDG treatment were merged String FALSE TRUE FALSE minus;half;plus;mixed FALSE FALSE
30-
Library_Built “ds” for double stranded, “ss” for single stranded, “mixed” in case multiple libraries with different protocols were merged String FALSE TRUE FALSE ds;ss;other FALSE FALSE
31+
UDG UDG treatment, “mixed” in case multiple libraries with different UDG treatment were merged String FALSE TRUE FALSE minus;half;plus;mixed FALSE FALSE
32+
Library_Built strandedness, “mixed” in case multiple libraries with different protocols were merged String FALSE TRUE FALSE ds;ss;mixed FALSE FALSE
3133
Genotype_Ploidy ploidy of the genotypes String FALSE TRUE FALSE diploid;haploid FALSE FALSE
3234
Data_Preparation_Pipeline_URL URL pointing to a description of the pipeline used to generate the genotype data from the source data String FALSE FALSE FALSE FALSE FALSE
3335
Endogenous % endogenous DNA as estimated from SG libraries (before capture), as for example estimated by EAGER, not on target and no quality filter, in case of multiple libraries report only the highest value Float FALSE FALSE TRUE 0 100 FALSE FALSE

sequencingSourceFile_columns.tsv renamed to ssf_columns.tsv

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
sequencingSourceFile_column_name description data_type multi choice range choice_options range_lower range_upper mandatory unique
2-
3-
Poseidon_ID The Poseidon_ID field that this sequencing entity corresponds to, from the Janno-file. String FALSE FALSE FALSE TRUE FALSE
2+
poseidon_IDs The Poseidon_IDs this sequencing entity corresponds to, from the Janno-file, multiple entries separated by ; String TRUE FALSE FALSE TRUE FALSE
3+
udg UDG treatment String FALSE TRUE FALSE minus;half;plus FALSE FALSE
4+
library_built strandedness String FALSE TRUE FALSE ds;ss FALSE FALSE
45
sample_accession The sample accession code as used in INSDC databases, including ENA and SRA (Example: SAMEA7050454) String FALSE FALSE FALSE TRUE TRUE
56
study_accession The study accession code as used in INSDC databases, including ENA and SRA (Example: PRJEB39316) String FALSE FALSE FALSE FALSE FALSE
67
run_accession The run accession code as used in INSDC databases, including ENA and SRA (Example: ERR4331996) String FALSE FALSE FALSE FALSE FALSE
@@ -19,4 +20,4 @@ fastq_aspera The Aspera-link (URL) to the FASTQ-file(s). (Example: fasp.sra.ebi.
1920
fastq_bytes The number of bytes of the FASTQ-file(s) in bytes Integer TRUE FALSE TRUE 0 Inf FALSE FALSE
2021
fastq_md5 The MD5 hash(es) of the FASTQ-file(s) String TRUE FALSE FALSE FALSE FALSE
2122
read_count The number of reads Integer FALSE FALSE TRUE 0 Inf FALSE FALSE
22-
submitted_ftp The URL(s) to the originally submitted file(s) before it got converted to FASTQ. This can sometimes be helpful for processing String TRUE FALSE FALSE FALSE FALSE
23+
submitted_ftp The URL(s) to the originally submitted file(s) before it got converted to FASTQ. This can sometimes be helpful for processing String TRUE FALSE FALSE FALSE FALSE

0 commit comments

Comments
 (0)