Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mint DOIs for pyspedas (all versions) and each individual release #1037

Open
jameswilburlewis opened this issue Oct 10, 2024 · 3 comments
Open
Assignees
Labels
Admin General PySPEDAS administrative tasks or issues Documentation Examples, notebooks, installation guides, webinars, etc packaging python Issues involving Python and Python-related tools outside of pyspedas QA/Testing

Comments

@jameswilburlewis
Copy link
Contributor

TBD: Find some guidance on best practices for minting DOIs for software. Also, how would we include the DOI for a specific release inside that release itself (e.g. in an "about" or "version" routine)? Since I think you don't get the DOI after uploading the files, and it's probably not good practice to modify the release after the DOI is minted....

@jameswilburlewis jameswilburlewis added Admin General PySPEDAS administrative tasks or issues Documentation Examples, notebooks, installation guides, webinars, etc labels Oct 10, 2024
@jameswilburlewis jameswilburlewis self-assigned this Feb 12, 2025
@jameswilburlewis
Copy link
Contributor Author

Rebecca R sent a link to a tutorial for the zenodo integration with GitHub:

https://emilio-berti.github.io/idiv-git-introduction/21-github_zenodo/index.html

Last couple of issues to sort out:

How can I reserve DOIs (all-versions and specific release) so I can embed it in the repo, before I do the actual release to Github/Zenodo?

Where/how to embed the DOIs in the project, so there is a single source of truth for the correct DOI in the metadata, documentation, etc.

@jameswilburlewis
Copy link
Contributor Author

Rebecca let me know about the citation.cff convention for citing GitHub repositories, and ChatGPT has some suggested workflows for auto-updating citation.cff, pyproject.toml, and README.md, and making Sphinx pull DOIs from citation.cff.

Probably best to try this out in a sandbox repo before putting it into production on PySPEDAS.

@jameswilburlewis
Copy link
Contributor Author

jameswilburlewis commented Feb 13, 2025

Notes on the process:

  1. I had previously linked my Zenodo and GitHub accounts so didn't need to do that again for this task.

  2. While logged into Zenodo using my github account, the only repo that was showing up for me was pytplot under the MAVENSDC organization. Nothing from the SPEDAS organization was showing up. It wasn't obvious how to enable this; the instructions from zenodo said to "enable third party access to the zenodo app" in the repo settings, but when I navigated to that settings page, several apps were listed, but not zenodo. Here's what I had to do to solve it: under my PERSONAL github account, navigate to the "settings" page, then find "Integrations" sections in the sidebar and click "applications", then select the "Authorized OAuth Apps" tab. Zenodo appeared in that list. I clicked on Zenodo and scrolled down to "Organization access". My other organizations heliophysicsPy and MAVENSDC were enabled, but SPEDAS was not -- so I clicked to enable it.
    Then, back to the Zenodo Github page, I clicked "sync now", then got prompted to reload the page. Finally, the SPEDAS organization repositories showed up in the list, and I was able to toggle access to the zenodo_sandbox repository I'm working with.

https://docs.github.com/en/account-and-profile/setting-up-and-managing-your-personal-account-on-github/managing-your-membership-in-organizations/requesting-organization-approval-for-oauth-apps

  1. I had already reserved a DOI (10.5281/zenodo.14862018) and started a (draft) deposit for the zenodo_sandbox software package. I added URLs to pyproject.toml under [project.urls] as:

"DOI (Latest)" = "https://doi.org/10.5281/zenodo.14862018"
"DOI (All Versions)" = "https://doi.org/10.5281/zenodo.14862018"

and pushed the commit to Github. Upon publishing as release 1.1.1, I checked zenodo, and as I expected, it didn't update my draft deposit under "zenodo_sandbox", but created a new one with different DOIs named spedas/zenodo_sandbox: Zenodo_sandbox v1.1.1

  1. I made note of the new "All Versions" DOI from the Zenodo page for the Github release, and entered it into pyproject.toml. I started a new draft release with a reserved DOI: 10.5281/zenodo.14862210, entered that in pyproject.toml, bumped the release number to 1.1.2, pushed the commits, and did another github release. It didn't show up right away...perhaps because I forgot to update the version number in the draft release to 1.1.2? I updated the draft version number, then thought about it for a while, and decided to delete the draft, but I couldn't. Then I noticed that zenodo now had my Github release 1.1.2. I don't know if it was editing the version number in the draft release, or just the waiting a while, that made it eventually show up. It did use the DOI I had previously reserved.

Day 2: Trying to automate reserving a DOI prior to actually publishing a release on Github

  1. ChatGPT suggests the basic idea of implementing a Github action that uses the Zenodo API to reserve a DOI for the next release in the series defined by the concept DOI. (So I think you pretty much have to do an initial release from Github to get Zenodo to generate a concept DOI for this repo).

The initial suggestion was to trigger the action via saving a Github draft release. But after hours of fussing and searching, I could never get the action to fire just by saving a draft. Many people have complained about it in the forums, but it doesn't seem to be supported yet.

So instead I used "on: workflow_dispatch" so that I could trigger the action manually.

The documentation on the Zenodo API is pretty sparse, without many usage examples. ChatGPT's suggestions were all sort of along the right lines, but the details took quite a bit of tweaking. I was finally able to retrieve a list of depositions associated with the given API key (this could be a problem later?) Then filtering by the ones that had the right concept DOI, and taking the one with the max creation date, I could get the Zenodo record ID for the latest release so far.

Next step: continue using the Zenodo API to make a new deposit, using the record ID of the most recent release, and extract the DOI it reserved for us.

ChatGPT says we want to find the latest deposition associated with the target concept DOI, then do a deposit using that same latest-release-doi, and Zenodo will reserve a new DOI which we can get our hands on before we actually publish the release.

Regarding API keys and user accounts: In the sandbox repo, I set up the integration using my personal Github and Zenodo identities. But what if my Zenodo account goes away at some point? (Retirement, hit by a bus, etc....). My understanding is that although the concept and versioned DOIs would remain accessible, an incoming maintainer won't be able to make new releases under the same concept DOI if the original zenodo account no longer exists. So anyone who is considering using the Zenodo Github integration to manage software release DOIs on a production repository should strongly consider creating role accounts for this purpose, and including them in a succession plan!

Day 3:

The zenodo public records API doesn't require an access token to look up releases by concept DOI. You can also do the date sorting and limit to 1 response right in the query, so no need to slog through the JSON response trying to pick out the right record. This simplifies the Github workflow a bit.

Next step: we have the DOI of the latest published release. We'll use the record ID of the latest release and use that to create a new draft, then get the DOI for the draft.

This requires a Zenodo personal access token. Log into Zenodo with the role account you'll be using to manage releases, then click the dropdown menu next to your user email (upper right corner) and select "Applications". On the next page there will be a section for "Personal Access Tokens". Click on the "New token" control, give it a name, turn on the "deposit:write" and "deposit:actions" scopes, then click "Create". It will show the token string. Copy and paste this somewhere safe, like your password manager, because once you click off this screen, there is no way to see the token value again. Save the token in Zenodo.

To use this token in Github Actions without leaking it in log messages or other insecure output, you should add it as a repository secret. In your Github repository settings, scroll down to the "Security" settings, expand the "Secrets and variables" entry, and click "Actions". On the next page, in the "repository secrets" section, click on "New repository secret". Give it a name (e.g. ZENODO_API_TOKEN),paste the token string into the "Secret" field, then click "Add Secret".

We need to call the API endpoint /api/deposit/depositions/$LATEST_PUBLIC_RECORD_ID/actions/newversion?access_token=$ZENODO_API_TOKEN

It will only let you do this once, on the most recent record ID associated with the parent concept DOI. Once you do this, the draft record ID is now the latest, even if it's not visible in the public API. (So maybe we need to go back to doing this query in the private API?)

My workflow successfully created the new version, and the updated DOI (or at least the record ID) was present in the response body, but there was a bug retrieving it. So it wasn't going to work until the draft deposit was published. I pushed the new release from Github to finalize the pending draft in Zenodo, and then it showed up in the public API with the expected versioned DOI.

Next step: replacing the versioned DOIs in the distribution with the ones from the draft deposit. At least the following
files should be updated:

pyproject.toml
CITATION.cff

Sphinx documents should be able to pull the DOIs from pyproject.toml or CITATION.cff directly.
What about pyspedas.version() or similar tools?

I got my Github action to replace the versioned DOIs in pyproject.toml and CITATION.cff. pyproject.toml was hard to get right, because the DOIs look like this in the [project.urls] section:

[project.urls]
Homepage = "https://github.com/spedas/pyspedas"
Information = "https://spedas.org/wiki/"
Documentation = "https://pyspedas.readthedocs.io"
Issue_Tracker = "https://github.com/spedas/pyspedas/issues"
Source_Code = "https://github.com/spedas/pyspedas"
"DOI (Latest)" = "https://doi.org/10.5281/zenodo.14872810"
"DOI (All Versions)" = "https://doi.org/10.5281/zenodo.14862141"

The quotes and parens need to be part of the search pattern, and the replacement pattern also has quotes and parens. So you pretty much need to be a regex and shell language lawyer to get the escaping 100% right.

PATT='"DOI \(Latest\)".*'
REPL="\"DOI (Latest)\" = \"${NEW_DOI_URL}\""
sed -i -E "s|${PATT}|${REPL}|" pyproject.toml

This seems fragile. I wonder if there's a better way to represent both DOIs cleanly in [project.urls] without needing so much escaping. Maybe "DOI_versioned" and "DOI_concept"?

I just noticed that in the 1.1.3 release I pushed out (that had previously been created as a draft via the API newversion endpoint), there were actually two zip files in it: the old one for 1.1.2, and the new one for 1.1.3. I guess the draft release automatically inherits the files from the previous release if you do it that way? I hope the API has a way to remove the old files!

I also just noticed that some test from CITATION.cff appears as "Notes" in the Zenodo landing page for the release:

If you use this software, please cite it using the metadata from this file.

This is coming from the "message:" field of the file:

message: >-
  If you use this software, please cite it using the
  metadata from this file.

This is the default text when you use the online tool to generate CITATION.cff. So it might pay to put something more specific in there, rather than "metadata from this file". (In the Github "cite this repository" message, it will tack on a full URL derived from the "doi:" tag in the file.)

Flash forward to day 4:

The reserve_doi script now checks whether a draft release exists. If so, it reuses that draft, otherwise creates a new draft from the latest published release. It now removes any old files in the draft. Pushing the release out via Github created a clean result on Zenodo with only the desired release files, and the correct versioned DOI I had previously reserved.

Looking at the "Creators" list of the different releases, it appears that the Github-Zenodo integration only populates that with the Github users who had a commit in that specific release. The initial release seemed to have everyone, the later ones are just me. UPDATE: This is actually because I added a CITATION.cff that only had me. I added a fictitious user "Joe Blow" and he showed up in the next release. So we could manually edit CITATION.cff to get the contributors and their ORCIDs into the Zenodo record. HOWEVER: This method doesn't let you distinguish between "Creators" and "Contributors". As far as I can tell, everyone in CITATION.cff is added as a "Creator", with the role left blank. So we could kluge this by adding everyone to CITATION.cff, but we'd have to settle for no roles and no "Contributors". Anything beyond that, we'd have to go through the Zenodo API (and at that point, is there any reason left to use the Github-Zenodo integration?)

ChatGPT pointed out that I can use a .zenodo.json file to have more control over the metadata, so that might help resolve some of the snags I've been hitting.

https://developers.zenodo.org/#github

Still to do:

Do multiline pattern match and replacement of the "versioned DOI" stanza in CITATION.cff

Edit CITATION.cff so the note makes sense on Zenodo, and try adding back the original Github contributors.

Update the version number and release date in CITATION.cff. (This will entail extracting the release string from pyproject.toml)

Figure out where DOIs should go in the Sphinx docs and make sure they get updated in the reserve_doi workflow.

Build a zenodo_sandbox release and push it to test.pypi.org to see how the DOI metadata renders over there.

Add the DOIs to version.py (and/or maybe elsewhere?) and ensure they get updated in the workflow.

Day 5:

I tried adding a .zenodo.json file, with creators, contributors, funding, etc cribbed from the SPEDAS 1.00 JSON export. Now everything seems broken. When I reserve a DOI and publish a new release in Github, it only shows up in zenodo as a draft, with no files added, and with the title from the CITATION.cff file. Maybe there's a conflict between the zenodo.json and whereever else github is trying to pull information from. I guess I can try paring down the json file to something absolutely minimal and see what happens.

Actually, I had a draft release in a "zombie" state from yesterday. (Before I added zenodo.json) Somehow, releasing via Github wasn't putting it into the "published" state, just kept re-creating it as a draft as I kept deleting it from zenodo. The only way I could think of to clear it was to add a dummy file and publication date to the Zenodo draft, and publish it. Now we'll see what happens with my revised zenodo.json file on the next release attempt.

It seems like no matter what I do now, I can't get a github release pushed all the way to Zenodo if I run my doi reservation workflow first. I could swear it worked at least once, but it might be super sensitive to release titles exactly matching or something like that.

It's starting to look like it's easier and more reliable to skip the Github integration, manage the release drafts and publication from the Zenodo side (so manually uploading the Github zip file), and keep the github workflow just for the sake of automating the DOI updates for pyproject.toml and other affected files.

@jameswilburlewis jameswilburlewis added QA/Testing python Issues involving Python and Python-related tools outside of pyspedas packaging labels Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Admin General PySPEDAS administrative tasks or issues Documentation Examples, notebooks, installation guides, webinars, etc packaging python Issues involving Python and Python-related tools outside of pyspedas QA/Testing
Projects
None yet
Development

No branches or pull requests

1 participant