Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

downloaded datasets miss their metadata #32

Open
tilfischer opened this issue Dec 12, 2022 · 3 comments
Open

downloaded datasets miss their metadata #32

tilfischer opened this issue Dec 12, 2022 · 3 comments

Comments

@tilfischer
Copy link

Downloaded analytical datasets also contain a dataset_description.txt, which provides information on the dataset name, instrument, description and a list of files within this dataset including their checksum. The metadata of the sample, however, needs to be downloaded manually, but is not part of the downloaded ZIP file.

As an enhancement request I would suggest to always add the sample's metadata to the downloaded ZIP file of a dataset (e.g. 1H NMR spectroscopic data) by using BagIt. Beside of metadata in XML format, possibly also a rendered version as e.g. HTML file would be handy for human readers. Having the metadata with the downloaded dataset(s) would also link one dataset downloaded with its analytical data to other datasets with other analytical data of the same sample, as the related datasets are listed in the sample metadata as related Identifiers already.

If the datasets would contain the metadata in XML format (and possibly rendered as HTML for enhanced human readability), the dataset_description.txt could be omitted and the checksums could be listed in a separate text file.

@nicolejung
Copy link

needs discussion on format but the point is right

@tilfischer
Copy link
Author

tilfischer commented Feb 6, 2023

People to get in contact on this from other NFID4Chem repositories are RADAR(4Chem) and nmrXiv people. Both already implemented or will implement BagIt.

Edit: Connected to: #10

@tilfischer
Copy link
Author

needs discussion on format but the point is right

BagIt is a good starting point, as RO-Crates could be added to BagIt at a later point in time see https://www.researchobject.org/ro-crate/1.1/appendix/implementation-notes.html#adding-ro-crate-to-bagit

Best,
Tillmann

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants