-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mint DOIs for pyspedas (all versions) and each individual release #1037
Comments
Rebecca R sent a link to a tutorial for the zenodo integration with GitHub: https://emilio-berti.github.io/idiv-git-introduction/21-github_zenodo/index.html Last couple of issues to sort out: How can I reserve DOIs (all-versions and specific release) so I can embed it in the repo, before I do the actual release to Github/Zenodo? Where/how to embed the DOIs in the project, so there is a single source of truth for the correct DOI in the metadata, documentation, etc. |
Rebecca let me know about the citation.cff convention for citing GitHub repositories, and ChatGPT has some suggested workflows for auto-updating citation.cff, pyproject.toml, and README.md, and making Sphinx pull DOIs from citation.cff. Probably best to try this out in a sandbox repo before putting it into production on PySPEDAS. |
Notes on the process:
"DOI (Latest)" = "https://doi.org/10.5281/zenodo.14862018" and pushed the commit to Github. Upon publishing as release 1.1.1, I checked zenodo, and as I expected, it didn't update my draft deposit under "zenodo_sandbox", but created a new one with different DOIs named spedas/zenodo_sandbox: Zenodo_sandbox v1.1.1
Day 2: Trying to automate reserving a DOI prior to actually publishing a release on Github
The initial suggestion was to trigger the action via saving a Github draft release. But after hours of fussing and searching, I could never get the action to fire just by saving a draft. Many people have complained about it in the forums, but it doesn't seem to be supported yet. So instead I used "on: workflow_dispatch" so that I could trigger the action manually. The documentation on the Zenodo API is pretty sparse, without many usage examples. ChatGPT's suggestions were all sort of along the right lines, but the details took quite a bit of tweaking. I was finally able to retrieve a list of depositions associated with the given API key (this could be a problem later?) Then filtering by the ones that had the right concept DOI, and taking the one with the max creation date, I could get the Zenodo record ID for the latest release so far. Next step: continue using the Zenodo API to make a new deposit, using the record ID of the most recent release, and extract the DOI it reserved for us. ChatGPT says we want to find the latest deposition associated with the target concept DOI, then do a deposit using that same latest-release-doi, and Zenodo will reserve a new DOI which we can get our hands on before we actually publish the release. Regarding API keys and user accounts: In the sandbox repo, I set up the integration using my personal Github and Zenodo identities. But what if my Zenodo account goes away at some point? (Retirement, hit by a bus, etc....). My understanding is that although the concept and versioned DOIs would remain accessible, an incoming maintainer won't be able to make new releases under the same concept DOI if the original zenodo account no longer exists. So anyone who is considering using the Zenodo Github integration to manage software release DOIs on a production repository should strongly consider creating role accounts for this purpose, and including them in a succession plan! Day 3: The zenodo public records API doesn't require an access token to look up releases by concept DOI. You can also do the date sorting and limit to 1 response right in the query, so no need to slog through the JSON response trying to pick out the right record. This simplifies the Github workflow a bit. Next step: we have the DOI of the latest published release. We'll use the record ID of the latest release and use that to create a new draft, then get the DOI for the draft. This requires a Zenodo personal access token. Log into Zenodo with the role account you'll be using to manage releases, then click the dropdown menu next to your user email (upper right corner) and select "Applications". On the next page there will be a section for "Personal Access Tokens". Click on the "New token" control, give it a name, turn on the "deposit:write" and "deposit:actions" scopes, then click "Create". It will show the token string. Copy and paste this somewhere safe, like your password manager, because once you click off this screen, there is no way to see the token value again. Save the token in Zenodo. To use this token in Github Actions without leaking it in log messages or other insecure output, you should add it as a repository secret. In your Github repository settings, scroll down to the "Security" settings, expand the "Secrets and variables" entry, and click "Actions". On the next page, in the "repository secrets" section, click on "New repository secret". Give it a name (e.g. ZENODO_API_TOKEN),paste the token string into the "Secret" field, then click "Add Secret". We need to call the API endpoint /api/deposit/depositions/$LATEST_PUBLIC_RECORD_ID/actions/newversion?access_token=$ZENODO_API_TOKEN It will only let you do this once, on the most recent record ID associated with the parent concept DOI. Once you do this, the draft record ID is now the latest, even if it's not visible in the public API. (So maybe we need to go back to doing this query in the private API?) My workflow successfully created the new version, and the updated DOI (or at least the record ID) was present in the response body, but there was a bug retrieving it. So it wasn't going to work until the draft deposit was published. I pushed the new release from Github to finalize the pending draft in Zenodo, and then it showed up in the public API with the expected versioned DOI. Next step: replacing the versioned DOIs in the distribution with the ones from the draft deposit. At least the following pyproject.toml Sphinx documents should be able to pull the DOIs from pyproject.toml or CITATION.cff directly. I got my Github action to replace the versioned DOIs in pyproject.toml and CITATION.cff. pyproject.toml was hard to get right, because the DOIs look like this in the [project.urls] section:
The quotes and parens need to be part of the search pattern, and the replacement pattern also has quotes and parens. So you pretty much need to be a regex and shell language lawyer to get the escaping 100% right.
This seems fragile. I wonder if there's a better way to represent both DOIs cleanly in I just noticed that in the 1.1.3 release I pushed out (that had previously been created as a draft via the API newversion endpoint), there were actually two zip files in it: the old one for 1.1.2, and the new one for 1.1.3. I guess the draft release automatically inherits the files from the previous release if you do it that way? I hope the API has a way to remove the old files! I also just noticed that some test from CITATION.cff appears as "Notes" in the Zenodo landing page for the release:
This is coming from the "message:" field of the file:
This is the default text when you use the online tool to generate CITATION.cff. So it might pay to put something more specific in there, rather than "metadata from this file". (In the Github "cite this repository" message, it will tack on a full URL derived from the "doi:" tag in the file.) Flash forward to day 4: The reserve_doi script now checks whether a draft release exists. If so, it reuses that draft, otherwise creates a new draft from the latest published release. It now removes any old files in the draft. Pushing the release out via Github created a clean result on Zenodo with only the desired release files, and the correct versioned DOI I had previously reserved. Looking at the "Creators" list of the different releases, it appears that the Github-Zenodo integration ChatGPT pointed out that I can use a .zenodo.json file to have more control over the metadata, so that might help resolve some of the snags I've been hitting. https://developers.zenodo.org/#github Still to do: Do multiline pattern match and replacement of the "versioned DOI" stanza in CITATION.cff Edit CITATION.cff so the note makes sense on Zenodo, and try adding back the original Github contributors. Update the version number and release date in CITATION.cff. (This will entail extracting the release string from pyproject.toml) Figure out where DOIs should go in the Sphinx docs and make sure they get updated in the reserve_doi workflow. Build a zenodo_sandbox release and push it to test.pypi.org to see how the DOI metadata renders over there. Add the DOIs to version.py (and/or maybe elsewhere?) and ensure they get updated in the workflow. Day 5: I tried adding a .zenodo.json file, with creators, contributors, funding, etc cribbed from the SPEDAS 1.00 JSON export. Now everything seems broken. When I reserve a DOI and publish a new release in Github, it only shows up in zenodo as a draft, with no files added, and with the title from the CITATION.cff file. Maybe there's a conflict between the zenodo.json and whereever else github is trying to pull information from. I guess I can try paring down the json file to something absolutely minimal and see what happens. Actually, I had a draft release in a "zombie" state from yesterday. (Before I added zenodo.json) Somehow, releasing via Github wasn't putting it into the "published" state, just kept re-creating it as a draft as I kept deleting it from zenodo. The only way I could think of to clear it was to add a dummy file and publication date to the Zenodo draft, and publish it. Now we'll see what happens with my revised zenodo.json file on the next release attempt. It seems like no matter what I do now, I can't get a github release pushed all the way to Zenodo if I run my doi reservation workflow first. I could swear it worked at least once, but it might be super sensitive to release titles exactly matching or something like that. It's starting to look like it's easier and more reliable to skip the Github integration, manage the release drafts and publication from the Zenodo side (so manually uploading the Github zip file), and keep the github workflow just for the sake of automating the DOI updates for pyproject.toml and other affected files. |
TBD: Find some guidance on best practices for minting DOIs for software. Also, how would we include the DOI for a specific release inside that release itself (e.g. in an "about" or "version" routine)? Since I think you don't get the DOI after uploading the files, and it's probably not good practice to modify the release after the DOI is minted....
The text was updated successfully, but these errors were encountered: