Skip to content

Conversation

@effigies
Copy link
Collaborator

@effigies effigies commented Sep 29, 2025

The DataCite Metadata Schema (https://schema.datacite.org/meta/kernel-4/) is a fairly comprehensive structure for describing citable datasets, including contributors (including organizations and contributor roles), funders, licenses and abstracts.

The DataCite Metadata Working Group publish XSD and Invenio hosts JSON-Schema translations at https://github.com/inveniosoftware/datacite/tree/master/datacite/schemas.

Following the lead of CITATION.cff, we would use YAML, a superset of JSON that is interpreted as a JSON-compatible object.

I would propose that we add the Invenio JSON-Schema documents to the jsr:@bids/schema package, similar to https://jsr.io/@bids/schema/1.1.0/citation/schema.json.

Closes #1955.

TODO:

  • Add schema checks for MUST/SHOULD statements.
  • Recommend against having both CITATION.cff and datacite.yml? They could get out-of-sync, but if someone was syncing them, it might make the dataset legible to more tools.

@codecov
Copy link

codecov bot commented Sep 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.81%. Comparing base (7073cfe) to head (fa80514).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2221   +/-   ##
=======================================
  Coverage   82.81%   82.81%           
=======================================
  Files          22       22           
  Lines        1693     1693           
=======================================
  Hits         1402     1402           
  Misses        291      291           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@effigies
Copy link
Collaborator Author

Rebased on #2315.

Anybody have thoughts on this?

  • Recommend against having both CITATION.cff and datacite.yml? They could get out-of-sync, but if someone was syncing them, it might make the dataset legible to more tools.

@julia-pfarr
Copy link
Member

Who would be the "someone" syncing them? The authors of the dataset?

@effigies
Copy link
Collaborator Author

Yes, I was thinking an author might want to have both and use some tool to keep them in sync.

From a BIDS perspective, it's not ideal to allow things that can get out of sync without checking that they are synced, but I don't think we want to maintain crosswalks. So we can either say "pick one" or "you're on your own to keep them synced".

@effigies effigies marked this pull request as ready for review January 22, 2026 17:12
@effigies effigies requested a review from erdalkaraca as a code owner January 22, 2026 17:12
@yarikoptic
Copy link
Collaborator

I have mixed feelings about it:

  • I understand motivation and I would like us to adopt some "standard"ish record instead of breeding our own
  • but I hate adding ambiguity, e.g. now a simple Authors field would need to come from CITATION.cff OR datacite.yaml - for any tool interested in that would require understanding both formats to at least minimal degree.

I wonder if we should allow for them but then do mandate consistent specification across of them (if multiple to be allowed since serving different purposes, so I would not mandate one OR another), and do implement validation for consistency in them AND what we do have specified in dataset_description.json which would be our "home brewed" common harmonized record for the main information, at least for some base fields etc.

@effigies
Copy link
Collaborator Author

I'm not sure where that leaves us. Do we need crosswalks and validation before allowing datacite.yml? Or do we disallow duplication until we have crosswalks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support some datacite.yml in addition/instead of CITATION.cff

3 participants