Datahugger - Where DOI 👐 Data

Datahugger is a tool to download scientific datasets, software, and code from a large number of repositories based on their DOI (wiki) or URL. With Datahugger, you can automate the downloading of data and improve the reproducibility of your research. Datahugger provides a straightforward Python interface as well as an intuitive Command Line Interface (CLI).

Supported repositories

Datahugger offers support for more than 377 generic and specific (scientific) repositories (and more to come!).

We are still expanding Datahugger with support for more repositories. You can help by requesting support for a repository in the issue tracker. Pull Requests are very welcome as well.

Installation

Datahugger requires Python 3.6 or later.

pip install datahugger

Getting started

Datahugger with Python

Load a dataset (or any digital asset) from a repository with the datahugger.get() function. The first argument is the DOI or URL, and the second is the folder name to store the dataset (it will be created if it does not exist).

The following code loads dataset 10.5061/dryad.mj8m0 into the folder data.

import datahugger

# download the dataset to the folder "data"
datahugger.get("10.5061/dryad.mj8m0", "data")

For an example of how this can integrate with your work, see the example workflow notebook or

Datahugger with command line

The command line function datahugger provides an easy interface to download data. The first argument is the DOI or URL, and the second argument is the name of the folder to store the dataset (will be created if it does not exist).

datahugger 10.5061/dryad.mj8m0 data

% datahugger 10.5061/dryad.mj8m0 data
Collecting...
NestTemperatureData.csv            : 100%|████████████████████████████████████████| 607k/607k
README_for_NestTemperatureData.txt : 100%|██████████████████████████████████████| 2.82k/2.82k
ExternalTemps.csv                  : 100%|██████████████████████████████████████| 1.06k/1.06k
README_for_ExternalTemps.txt       : 100%|██████████████████████████████████████| 2.82k/2.82k
InternalEggTempData.csv            : 100%|██████████████████████████████████████████| 664/664
README_for_InternalEggTempData.txt : 100%|██████████████████████████████████████| 2.82k/2.82k
SoilSimulation_Output.csv          : 100%|████████████████████████████████████████| 229M/229M
README_for_SoilSimulation_[...].txt: 100%|██████████████████████████████████████| 2.82k/2.82k
Dataset successfully downloaded.

Tip: On some systems, you have to quote the DOI or URL. For example: datahugger "10.5061/dryad.mj8m0" data.

Tips and tricks

No need to struggle with DOIs versus DOI URLs. They both work (and more). Example: The values 10.5061/dryad.x3ffbg7m8, doi:10.5061/dryad.x3ffbg7m8, https://doi.org/10.5061/dryad.x3ffbg7m8, and https://datadryad.org/stash/dataset/doi:10.5061/dryad.x3ffbg7m8 all point to the same dataset.
Do not republish the dataset when uploading your data to a scientific data repository. These storage resources can be used better :)

Contact

Please feel free to reach out with questions, comments, and suggestions. The issue tracker is a good starting point. You can also email me at [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 170 Commits
.github		.github
benchmark		benchmark
datahugger		datahugger
docs		docs
examples		examples
scripts		scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
datahugger_logo.svg		datahugger_logo.svg
datahugger_repocard_dark.svg		datahugger_repocard_dark.svg
datahugger_repocard_tagline.svg		datahugger_repocard_tagline.svg
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Datahugger - Where DOI 👐 Data

Supported repositories

Installation

Getting started

Datahugger with Python

Datahugger with command line

Tips and tricks

Contact

About

Releases 29

Contributors 9

Languages

License

J535D165/datahugger

Folders and files

Latest commit

History

Repository files navigation

Datahugger - Where DOI 👐 Data

Supported repositories

Installation

Getting started

Datahugger with Python

Datahugger with command line

Tips and tricks

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 29

Contributors 9

Languages