Australian Reference Genome Atlas (ARGA)

ARGA will allow researchers to easily discover and access genomic data for Australian species in one place, faciliatating research and informing decision making in areas including conservation, biosecurity and agriculture.

ARGA Data

This repo is for Python and related code for data ingestion and pre-ingestion munging, prior to loading DwCA data into the Pipelines workflow.

Setup

Set up can be initiated by being in the base directory and running deploy.py, which should create a virtual environment and add a link to the src folder required to run the scripts. Further are provided as part of that script after it completes.

Processing Data

The configuration files in dataSources/location/database are how processing understands the procedure. To add a new source, you'll need to create a new sourceConfig file. To access that source, you can call one of the source processing tools with the syntax location-database. For example, to process the genbankSummary data source which is part of the ncbi, your source is called ncbi-genbankSummary, which is case sensitive.

The tools currently available are:

listSources.py
newSource.py
purgeSource.py
download.py
process.py
convert.py
getFields.py

listSources shows you a list of currently available sources. newSource creates a new source folder and basic config to kickstart adding a new source. purgeSource deletes a source that is no longer required. download is for simply downloading the data, which will vary based on your database type. processing is for running initial processing on files outlined in the source config. convert will then convert the old file to the new file mappings, as well as enrich and augment if required. getFields will read the pre-conversion file and give examples of field names and how they'll be mapped with the appropriate mapping file.

Issues repository

List of issues

Name		Name	Last commit message	Last commit date
Latest commit History 764 Commits
.github/workflows		.github/workflows
dataSources		dataSources
jupyter		jupyter
metadata		metadata
src		src
.containerignore		.containerignore
.env.template		.env.template
.gitignore		.gitignore
Containerfile		Containerfile
LICENSE		LICENSE
README.md		README.md
config.toml		config.toml
deploy.py		deploy.py
reqs.txt		reqs.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Australian Reference Genome Atlas (ARGA)

ARGA Data

Setup

Processing Data

Issues repository

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

ARGA-Genomes/arga-data

Folders and files

Latest commit

History

Repository files navigation

Australian Reference Genome Atlas (ARGA)

ARGA Data

Setup

Processing Data

Issues repository

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages