Skip to content

Commit

Permalink
Merge pull request #32 from malloryfreeberg/feature/external-links-op…
Browse files Browse the repository at this point in the history
…en-in-tabs

External links open in new tabs
  • Loading branch information
malloryfreeberg authored Aug 15, 2022
2 parents a0e2ace + 7975f00 commit 68ac477
Show file tree
Hide file tree
Showing 6 changed files with 96 additions and 94 deletions.
14 changes: 7 additions & 7 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,10 +39,10 @@ Summary of responsibilities of Central EGA nodes (EMBL-EBI & CRG) and Federated

## How do I start?

The [ELIXIR Federated Human Data (FHD) Community](https://elixir-europe.org/communities/human-data) provides a framework for the secure submission, archival, dissemination, and analysis of sensitive human data across Europe, and wider. All Federated EGA communications are currently sent through the ELIXIR FHD open channels. Welcome to join us!
The <a href="https://elixir-europe.org/communities/human-data" target="_blank">ELIXIR Federated Human Data (FHD) Community</a> provides a framework for the secure submission, archival, dissemination, and analysis of sensitive human data across Europe, and wider. All Federated EGA communications are currently sent through the ELIXIR FHD open channels. Welcome to join us!

1. **Join the [ELIXIR FHD Mailing List](https://elixir-europe.org/intranet/join-groups)** (select "Human Data") to hear about the latest updates and events from the FHD community.
1. **Attend the [ELIXIR FHD Community Calls](https://docs.google.com/document/d/10OwVvHbJ7i1gI1Iw4zmVsOs8kDrG077Y52juehiFcmU/edit)** to engage with the FEGA community. Meetings are the 1st Tuesday of the month @ 2pm CEST.
1. **Join the <a href="https://elixir-europe.org/intranet/join-groups" target="_blank">ELIXIR FHD Mailing List</a>** (select "Human Data") to hear about the latest updates and events from the FHD community.
1. **Attend the <a href="https://docs.google.com/document/d/10OwVvHbJ7i1gI1Iw4zmVsOs8kDrG077Y52juehiFcmU/edit" target="_blank">ELIXIR FHD Community Calls</a>** to engage with the FEGA community. Meetings are the 1st Tuesday of the month @ 2pm CEST.

Now that you are connected to the Federated Human Data community, you can learn more about Federated EGA specifically. **Read about the areas that interest you the most** (in no particular order):

Expand All @@ -62,17 +62,17 @@ Displayed below are the steps a node must take to become a full member of the Fe

The materials on this website guide you through onboarding information from the experiences of other nodes. Explore the topics using the navigation panel on the left. Take what you find useful to apply to your own node development.

You can also use the [Federated EGA Maturity Model](https://inab.github.io/fega-mm/) to plan and drive your own node's development. The Maturity Model is divided into different domains, sub-domains, and maturity indicators which closely align with the topics outlined in these materials. [**Here**](https://ega-archive.github.io/FEGA-onboarding/topics/maturity-model/) you can read more about the FEGA Maturity Model.
You can use the <a href="https://inab.github.io/fega-mm/" target="_blank">Federated EGA Maturity Model</a> to plan and drive your own node's development. The Maturity Model is divided into different domains, sub-domains, and maturity indicators which closely align with the topics outlined in these materials. You can [read more about how to interpret and use the FEGA Maturity Model](topics/maturity-model/).

There is no time limit on establishing a Federated EGA Node. Onboarding will take more or less time depending on existing infrastructure and governance models, availability of funding and resources, user needs, and other factors.

## Acknowledgements

We would like to thank all contributors, especially those mentioned in the [Contributors list](https://github.com/EGA-archive/FEGA-onboarding/blob/main/CONTRIBUTORS.yaml), the Federated EGA community for their support, and our funding partners.
We would like to thank all contributors, especially those mentioned in the <a href="https://github.com/EGA-archive/FEGA-onboarding/blob/main/CONTRIBUTORS.yaml" target="_blank">Contributors list</a>, the Federated EGA community for their support, and our funding partners.

Please see our [contributing guide](https://github.com/EGA-archive/FEGA-onboarding/blob/main/CONTRIBUTING.md) for information on how to contribute to the generation and maintenance of these materials. Thank you in advance for your contributions!
Please see our <a href="https://github.com/EGA-archive/FEGA-onboarding/blob/main/CONTRIBUTING.md" target="_blank">contributing guide</a> for information on how to contribute to the generation and maintenance of these materials. Thank you in advance for your contributions!

## License

The content of the FEGA onboarding materials and website are licensed under the [Creative Commons Attribution Share Alike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/).
The content of the FEGA onboarding materials and website are licensed under the <a href="https://creativecommons.org/licenses/by-sa/4.0/" target="_blank">Creative Commons Attribution Share Alike 4.0 International License</a>.

40 changes: 21 additions & 19 deletions docs/topics/data-metadata-management/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,52 +19,54 @@ Welcome! If you are involved in data or metadata standards aspects of establishi
**By exploring these materials, you will be able to:**
- Define data security and access management best practices
- Understand the EGA metadata standard and it's minimal requirements
- Identify data models specific to a domain (e.g. COVID-19) and apply them when needed
- Identify data models specific to a domain (*e.g.* COVID-19) and apply them when needed
- Comprehend the (meta)data flow within the Federated EGA Network

## 1. Learn about data security best practices

Central EGA has a [Security Strategy](https://ega-archive.org/files/European_Genome_phenome_Archive_Security_Overview.pdf) which defines best practices in ensuring data are stored securely. The EGA has helped develop the recommendations outlined the [GA4GH Security Technology Infrastructure document](https://www.ga4gh.org/docs/ga4ghtoolkit/data-security/2016May10_REV_SecInfrastructure.pdf), which defines guidelines, best practices, and standards for building and operating an infrastructure that promotes responsible data sharing in accordance with the [GA4GH Privacy and Security Policy](https://www.ga4gh.org/docs/ga4ghtoolkit/data-security/Privacy-and-Security-Policy.pdf).
Central EGA has a <a href="https://ega-archive.org/files/European_Genome_phenome_Archive_Security_Overview.pdf" target="_blank">Security Strategy</a> which defines best practices in ensuring data are stored securely. The EGA has helped develop the recommendations outlined the <a href="https://www.ga4gh.org/docs/ga4ghtoolkit/data-security/2016May10_REV_SecInfrastructure.pdf" target="_blank">GA4GH Security Technology Infrastructure document</a>, which defines guidelines, best practices, and standards for building and operating an infrastructure that promotes responsible data sharing in accordance with the <a href="https://www.ga4gh.org/docs/ga4ghtoolkit/data-security/Privacy-and-Security-Policy.pdf" target="_blank">GA4GH Privacy and Security Policy</a>.

Summary of best practices recommended for Federated EGA nodes:

| Item | Description | Examples/Templates |
|:---|:---|:---|
| Breach response protocol | A protocol for addressing potential security breaches, including consideration of other FEGA nodes, CEGA, key contacts, and institutional/organisational policies. | TBD |
| Risk register | A risk management tool used to track identified risks including information such as the nature of the risk, reference and owner, and mitigation measures. | TBD |
| Breach response protocol | A protocol for addressing potential security breaches, including consideration of other FEGA nodes, Central EGA, key contacts, and institutional/organisational policies. | Coming soon! |
| Risk register | A risk management tool used to track identified risks including information such as the nature of the risk, reference and owner, and mitigation measures. | Coming soon! |

<br/><br/>

## 2. Explore implemented standards

Central EGA largely adhere to [GA4GH standards](https://ega-archive.org/ga4gh). Specific standards already implemented are summarised below:
Central EGA largely adhere to <a href="https://ega-archive.org/ga4gh" target="_blank">GA4GH standards</a>. Specific standards already implemented are summarised below:

| Standard | Purpose | Specification Version | Supported Version | Implementation | Publication/Preprint |
|:---|:---|:---|:---|:---|:---|
| Beacon | Supports discovery of genomic variants, individuals, and individuals | v1.0.1 | v0.3 | [Specification](https://github.com/ga4gh-beacon/specification-v2), [Documentation](https://ega-archive.org/beacon/#/), [Endpoint](https://ega-archive.org/beacon-api/) | N/A |
| Crypt4GH | Enables direct byte-level compatible random access to encrypted genetic data stored in community standards (e.g. CRAM, VCF) | v1.0 | v1.0 | [Specification](http://samtools.github.io/hts-specs/crypt4gh.pdf), [Documentation](https://github.com/EGA-archive/crypt4gh), [Endpoints](https://ega-archive.org/federated#) | [DOI](https://doi.org/10.1093/bioinformatics/btab087) |
| Data Use Ontology (DUO) | Allows users to semantically tag datasets with usage restrictions so datasets can be automatically discoverable based on a researcher's authorization level or intended use. | 2021-02-23 | 2021-02-23 | [Specification](https://github.com/EBISPOT/DUO), [Documentation](https://ega-archive.org/data-use-conditions), [Endpoint](https://www.ebi.ac.uk/ols/api/ontologies) | [DOI](https://doi.org/10.1016/j.xgen.2021.100028) |
| htsget | Enables secure, efficient, and reliable access to sequencing read and variation data. | v1.3.0 | v1.0.0 | [Specification](http://samtools.github.io/hts-specs/htsget.html), [Documentation](https://github.com/EGA-archive/ega-download-client), [Endpoint](https://ega.ebi.ac.uk:8052/elixir/tickets/tickets) | [DOI](https://doi.org/10.1093/bioinformatics/bty492)|
| refget | Enables access to reference sequences using an identifier derived from the sequence itself. | v1.2.6 | N/A | [Specification](http://samtools.github.io/hts-specs/refget.html), Documentation, Endpoint | [DOI](https://doi.org/10.1093/bioinformatics/btab524) |
| Researcher IDs (passport, visa) | Specifies the collection of researchers that may access a dataset at any given time, and the credentials they must supply. | v1.0.1 | v1.0.1 | [Specification](https://github.com/ga4gh-duri/ga4gh-duri.github.io/blob/master/researcher_ids/ga4gh_passport_v1.md), [Documentation](https://docs.google.com/document/d/1FTzUYAfV5d2a0zoDkbY9Iy_L5NbSAnHeWnmY2NIrY8M/edit), [Endpoint](https://ega.ebi.ac.uk:8443/ega-permissions/swagger-ui/index.html) | [DOI](https://doi.org/10.1016/j.xgen.2021.100030) |
| Beacon | Supports discovery of genomic variants and individuals. | v1.0.1 | v0.3 | <a href="https://github.com/ga4gh-beacon/specification-v2" target="_blank">Specification</a>, <a href="https://ega-archive.org/beacon/#/" target="_blank">Documentation</a>, <a href="https://ega-archive.org/beacon-api/" target="_blank">Endpoint</a> | N/A |
| Crypt4GH | Enables direct byte-level compatible random access to encrypted genetic data stored in community standards (*e.g.* CRAM, VCF). | v1.0 | v1.0 | <a href="http://samtools.github.io/hts-specs/crypt4gh.pdf" target="_blank">Specification</a>, <a href="https://github.com/EGA-archive/crypt4gh" target="_blank">Documentation</a>, Endpoint | <a href="https://doi.org/10.1093/bioinformatics/btab087" target="_blank">DOI</a> |
| Data Use Ontology (DUO) | Allows users to semantically tag datasets with usage restrictions so datasets can be automatically discoverable based on a researcher's authorization level or intended use. | 2021-02-23 | 2021-02-23 | <a href="https://github.com/EBISPOT/DUO" target="_blank">Specification</a>, <a href="https://ega-archive.org/data-use-conditions" target="_blank">Documentation</a>, <a href="https://www.ebi.ac.uk/ols/api/ontologies" target="_blank">Endpoint</a> | <a href="https://doi.org/10.1016/j.xgen.2021.100028" target="_blank">DOI</a> |
| htsget | Enables secure, efficient, and reliable access to sequencing read and variation data including specific genomic regions. | v1.3.0 | v1.0.0 | <a href="http://samtools.github.io/hts-specs/htsget.html" target="_blank">Specification</a>, <a href="https://github.com/EGA-archive/ega-download-client" target="_blank">Documentation</a>, <a href="https://ega.ebi.ac.uk:8052/elixir/tickets/tickets" target="_blank">Endpoint</a> | <a href="https://doi.org/10.1093/bioinformatics/bty492" target="_blank">DOI</a> |
| refget | Enables access to reference sequences using an identifier derived from the sequence itself. | v1.2.6 | N/A | <a href="http://samtools.github.io/hts-specs/refget.html" target="_blank">Specification</a>, Documentation, Endpoint | <a href="https://doi.org/10.1093/bioinformatics/btab524" target="_blank">DOI</a> |
| Researcher IDs (Passports and visas) | Specifies the collection of researchers that may access a dataset at any given time, and the credentials they must supply. | v1.0.1 | v1.0.1 | <a href="https://github.com/ga4gh-duri/ga4gh-duri.github.io/blob/master/researcher_ids/ga4gh_passport_v1.md" target="_blank">Specification</a>, <a href="https://docs.google.com/document/d/1FTzUYAfV5d2a0zoDkbY9Iy_L5NbSAnHeWnmY2NIrY8M/edit" target="_blank">Documentation</a>, <a href="https://ega.ebi.ac.uk:8443/ega-permissions/swagger-ui/index.html" target="_blank">Endpoint</a> | <a href="https://doi.org/10.1016/j.xgen.2021.100030" target="_blank">DOI</a> |

<br/><br/>

### Data file standards

Recommended file formats for:
- Sequencing data (unaligned or aligned reads): [CRAM](http://samtools.github.io/hts-specs/CRAMv3.pdf), [BAM](https://samtools.github.io/hts-specs/SAMv1.pdf)
- Variant data: [VCF](https://samtools.github.io/hts-specs/VCFv4.2.pdf)
- Phenotype data: [Phenopackets](https://doi.org/10.1101/2021.11.27.21266944)
- Sequencing data (unaligned or aligned reads): <a href="http://samtools.github.io/hts-specs/CRAMv3.pdf" target="_blank">CRAM</a>, <a href="https://samtools.github.io/hts-specs/SAMv1.pdf" target="_blank">BAM</a>
- Variant data: <a href="https://samtools.github.io/hts-specs/VCFv4.2.pdf" target="_blank">VCF</a>
- Phenotype/clinical data: <a href="https://doi.org/10.1101/2021.11.27.21266944" target="_blank">Phenopackets</a>

### Metadata standards

The following resources represent EGA and community guidelines for submitted metadata:
- [EGA metadata model](https://ega-archive.org/submission/sequence/programmatic_submissions/prepare_xml)
- Ontologies: [Experimental Factor Ontology](https://www.ebi.ac.uk/efo/), [Data Use Ontology](https://github.com/EBISPOT/DUO)
- <a href="https://ega-archive.org/submission/sequence/programmatic_submissions/prepare_xml" target="_blank">EGA metadata model</a>
- Ontologies:
- <a href="https://www.ebi.ac.uk/efo/" target="_blank">Experimental Factor Ontology</a>
- <a href="https://github.com/EBISPOT/DUO" target="_blank">Data Use Ontology</a>
- Community-specific standards:
- [COVID-19 metadata mapping model across COVID-19 studies in Federated EGA (ELIXIR-CONVERGE)](https://zenodo.org/record/4893222)
- [COVID-19 Host Genetics Initiative data dictionary](https://docs.google.com/spreadsheets/d/1RXrJIzHKkyB8qx5tHLQjcBioiDAOrQ3odAuqMS3pUUI/edit#gid=549383528)
- <a href="https://zenodo.org/record/4893222" target="_blank">COVID-19 metadata mapping model across COVID-19 studies in Federated EGA (ELIXIR-CONVERGE)</a>
- <a href="https://docs.google.com/spreadsheets/d/1RXrJIzHKkyB8qx5tHLQjcBioiDAOrQ3odAuqMS3pUUI/edit#gid=549383528" target="_blank">COVID-19 Host Genetics Initiative data dictionary</a>

### Quality control

Expand All @@ -79,7 +81,7 @@ As defined in the Federated EGA Collaboration Agreement, and in alignment with d
- **Personal Metadata**. Information that describes or annotates research data to facilitate its interpretation or to describe the relationship between data elements that has the potential to identify a data subject. For example, demographic or ancestry information that can be used to identify individuals. Personal metadata are not made available through a public metadata catalogue and are not shared between EGA Central and the Node.
- **Research Data**. Omics or other forms of genetic (according to Art. 4 Nr. 13 GDPR) and health data (according to Art. 4 Nr. 15 GDPR) that are used for scientific research purposes. This is considered to be special category personal data under Art. 9(1) in conjunction with Art. 4 Nr. 1 GDPR.

Here you can view how the different types of [data flow within the Federated EGA network](https://docs.google.com/presentation/d/1IrU5jPJpGQ7n-WH-7WvJZjjH03ww9LfFMLK1kTBeAco/edit#slide=id.gcf2c0c3039_0_126).
Here you can view how the different types of <a href="https://docs.google.com/presentation/d/1IrU5jPJpGQ7n-WH-7WvJZjjH03ww9LfFMLK1kTBeAco/edit#slide=id.gcf2c0c3039_0_126" target="_blank">data flow within the Federated EGA network</a>.

## 4. What's next?

Expand Down
Loading

0 comments on commit 68ac477

Please sign in to comment.