Releases: bio-guoda/preston
Releases · bio-guoda/preston
0.10.14
Features
n/a
Improvements
- support SciELO DOI inference #344 using
domain | doi prefix | example url | example doi |
---|---|---|---|
http://www.scielo.br | 10.1590 | http://www.scielo.br/scielo.php?script=sci_arttext&pid=S2236-89062014000200010 | https://doi.org/10.1590/s2236-89062014000200010 |
http://www.scielo.org.mx | NA | NA | NA |
http://www.scielo.cl | 10.4067 | https://www.scielo.cl/scielo.php?script=sci_pdf&pid=S0717-65382015000100003 | https://doi.org/10.4067/s0717-65382015000100003 |
http://www.scielo.org.co | NA | NA | NA |
http://www.scielo.org.ar | NA | NA | NA |
- update
man preston-ris-stream
documentation:
PRESTON-RIS-STREAM(1) Preston Manual PRESTON-RIS-STREAM(1)
NAME
preston-ris-stream - translates bibliographic citations from RIS format
into Zenodo metadata in JSON lines format
SYNOPSIS
preston ris-stream [--no-cache] [--no-progress] [--reuse-doi]
[-a=<hashType>] [-d=<depth>] [--data-dir=<dataDir>] [-l=<logMode>]
[-r=<provenanceAnchor>] [--tmp-dir=<tmpDir>]
[--community=<communities>[,<communities>...]]...
[--repos=<remotes>[,<remotes>...]]...
DESCRIPTION
Stream RIS records into line-json with Zenodo metadata
OPTIONS
-a, --algo, --hash-algorithm=<hashType>
Hash algorithm used to generate primary content identifiers.
Supported values: sha256, md5, sha1.
--community, --communities=<communities>[,<communities>...]
select which Zenodo communities to submit to. If community is known
(e.g., batlit, taxodros), default metadata is included.
-d, --depth=<depth>
folder depth of data dir
--data-dir=<dataDir>
Location of local content cache
-l, --log=<logMode>
Log format. Supported values: tsv, nquads.
--no-cache, --disable-cache
Disable local content cache
--no-progress
Disable progress monitor
-r, --anchor, --provenance-root, --provenance-anchor=<provenanceAnchor>
specify the provenance root/anchor of the command. By default, any
available data graph will be traversed up to it’s most recent
additions. If the provenance root is set, only specified provenance
signature and their origins are included in the scope.
--repos, --remote, --remotes, --include,
--repositories=<remotes>[,<remotes>...]
Included repository dependencies (e.g.,
https://linker.bio/,https://softwareheritage.org,https://wikimedia.org,https://dataone.org,https://zenodo.org)
--reuse-doi
use existing DOI in Zenodo deposit if available
--tmp-dir=<tmpDir>
Location of local tmp dir
EXAMPLES
1.
First, append the associated bhl pdf via:
preston track https://www.biodiversitylibrary.org/partpdf/326364
Following, generate a RIS record, record.ris:
cat > record.ris <<__EOL__
TY - BOOK
TI - Faber, Helen R May 5, 1913
T2 - Walter Deane correspondence
UR - https://www.biodiversitylibrary.org/part/326364
PY - 1913-05-05
AU - Faber, Helen R.,
ER -
__EOL__
Then, track record.ris using Preston into Zenodo metadata using:
cat record.ris\
| preston track
Finally, generate Zenodo metadata record.json using:
preston head\
| preston cat\
| preston ris-stream\
> record.json
where record.json:
{
"metadata": {
"description": "(Uploaded by Plazi from the Biodiversity Heritage Library) No abstract provided.",
"communities": [],
"http://www.w3.org/ns/prov#wasDerivedFrom": "https://linker.bio/line:hash://sha256/5fd5944b52b22efc56f901d96ff53a64c42e1f2264763e2f1074ac2c589e47cf!/L1-L7",
"http://www.w3.org/1999/02/22-rdf-syntax-ns#type": "application/x-research-info-systems",
"title": "Faber, Helen R May 5, 1913",
"upload_type": "publication",
"publication_type": "other",
"journal_title": "Walter Deane correspondence",
"publication_date": "1913-05-05",
"referenceId": "https://www.biodiversitylibrary.org/part/326364",
"filename": "bhlpart326364.pdf",
"keywords": [
"Biodiversity",
"BHL-Corpus",
"Source: Biodiversity Heritage Library",
"Source: https://biodiversitylibrary.org",
"Source: BHL"
],
"creators": [
{
"name": "Faber, Helen R."
}
],
"related_identifiers": [
{
"relation": "isDerivedFrom",
"identifier": "https://linker.bio/line:hash://sha256/5fd5944b52b22efc56f901d96ff53a64c42e1f2264763e2f1074ac2c589e47cf!/L1-L7"
},
{
"relation": "isDerivedFrom",
"identifier": "https://www.biodiversitylibrary.org/part/326364"
},
{
"relation": "isAlternateIdentifier",
"identifier": "urn:lsid:biodiversitylibrary.org:part:326364"
},
{
"relation": "isPartOf",
"identifier": "hash://sha256/3983c9abbba981838de5d47a5dadf94c4afcea7df63486effb71d780e592ebe8"
},
{
"relation": "hasVersion",
"identifier": "hash://md5/7fddbf186c6bbddb0b49919fc340bb61"
},
{
"relation": "hasVersion",
"identifier": "hash://sha256/9b30af8f432b78e0d739b0457376dac998057a5b4b5fccd52b81560ec1f4f146"
}
]
}
}
2025-07-09 PRESTON-RIS-STREAM(1)
Bugs
n/a
0.10.13
Features
n/a
Improvements
- include man pages for sub-commands #343 ; add example sections for
man preston-track
andman preston-cat
- follow multi-layer redirect
A -> B -> C
via alternateOf/seeAlso #336 - allow for tracing gbif dataset DOIs to their reported dwc endpoint
- add markdown export for google docs; remove epub and rtf
Example from man preston-track
preston track\
https://doi.org/10.15468/w6hvhv\
| preston dwc-stream\
| head -1\
| jq .\
> specimen.json
Bugs
n/a
0.10.12
0.10.11
Features
n/a
Improvements
- bhl pdf endpoints cannot be inferred from bhl item part id for extern… …ally hosted pdfs; related to #339
- when encountering a sciELO resource with pdf request; check for javas… …cript-redirects and register associated content; #336 fyi @myrmoteras
- favor alternate/seeAlso contentIds associated over content associated…… with original locations; related to #336
- support copy-paste tracking of GBIF dataset via their landing page ht…
…tps://www.gbif.org/dataset/e4d3fc77-1d94-495b-96ff-3fe8b8f7a3bd fyi @seltmann
Example:
preston track https://www.gbif.org/dataset/e4d3fc77-1d94-495b-96ff-3fe8b8f7a3bd\
| preston dwc-stream\
| head -1\
| jq .
yields
{
"http://www.w3.org/ns/prov#wasDerivedFrom": "line:zip:hash://sha256/c2e545a14943beb12878b57e02f0718d3e784151b692ffdb51ebf08ffbb73dfe!/occurrence.txt!/L2",
"http://www.w3.org/1999/02/22-rdf-syntax-ns#type": "http://rs.tdwg.org/dwc/terms/Occurrence",
"http://rs.tdwg.org/dwc/text/id": "861c9d4e-d8e1-11e2-99a2-0026552be7ea",
"http://rs.tdwg.org/dwc/terms/country": "MEXICO",
"http://rs.tdwg.org/dwc/terms/minimumDepthInMeters": null,
"http://rs.tdwg.org/dwc/terms/verbatimLongitude": "098 47 00 W",
"http://rs.tdwg.org/dwc/terms/datasetName": "AMNH Hymenoptera",
"http://rs.tdwg.org/dwc/terms/individualCount": "1",
"http://rs.tdwg.org/dwc/terms/associatedOrganisms": null,
"http://rs.tdwg.org/dwc/terms/stateProvince": "Tamaulipas",
"http://rs.tdwg.org/dwc/terms/basisOfRecord": "PreservedSpecimen",
"http://rs.tdwg.org/dwc/terms/infraspecificEpithet": null,
"http://rs.tdwg.org/dwc/terms/occurrenceID": "861c9d4e-d8e1-11e2-99a2-0026552be7ea",
"http://rs.tdwg.org/dwc/terms/municipality": null,
"http://rs.tdwg.org/dwc/terms/locality": "Padilla",
"http://rs.tdwg.org/dwc/terms/specificEpithet": "completa",
"http://rs.tdwg.org/dwc/terms/island": null,
"http://rs.tdwg.org/dwc/terms/family": "Apidae",
"http://rs.tdwg.org/dwc/terms/verbatimEventDate": "5/17/1952",
"http://rs.tdwg.org/dwc/terms/locationID": "0004be86-935b-4850-84fe-78ef6cbc2954",
"http://rs.tdwg.org/dwc/terms/minimumElevationInMeters": null,
"http://rs.tdwg.org/dwc/terms/phylum": "Arthropoda",
"http://rs.tdwg.org/dwc/terms/typeStatus": null,
"http://rs.tdwg.org/dwc/terms/class": "Insecta",
"http://purl.org/dc/terms/license": "Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/",
"http://rs.tdwg.org/dwc/terms/preparations": "Pinned",
"http://rs.tdwg.org/dwc/terms/county": null,
"http://rs.tdwg.org/dwc/terms/associatedOccurrences": null,
"http://rs.tdwg.org/dwc/terms/taxonID": "562d1f89-9d1b-4f4b-83a5-29f12f9e99f0",
"http://rs.tdwg.org/dwc/terms/order": "Hymenoptera",
"http://rs.tdwg.org/dwc/terms/genus": "Anthophorula",
"http://rs.tdwg.org/dwc/terms/catalogNumber": "AMNH_BEE 00198554",
"http://rs.tdwg.org/dwc/terms/institutionCode": "AMNH",
"http://rs.tdwg.org/dwc/terms/kingdom": "Animalia",
"http://rs.tdwg.org/dwc/terms/scientificName": "Anthophorula (Anthophorula) completa Cockerell, 1935",
"http://rs.tdwg.org/dwc/terms/recordedBy": "M. A. Cazier, W. J. Gertsch & R. Schrammel",
"http://rs.tdwg.org/dwc/terms/samplingProtocol": "Netting",
"http://rs.tdwg.org/dwc/terms/verbatimLatitude": "24 01 00 N",
"http://rs.tdwg.org/dwc/terms/subgenus": "Anthophorula",
"http://rs.tdwg.org/dwc/terms/waterBody": null,
"http://purl.org/dc/terms/rightsHolder": "American Museum of Natural History",
"http://rs.tdwg.org/dwc/terms/sex": "Male",
"http://rs.tdwg.org/dwc/terms/otherCatalogNumbers": null
}
Bugs
0.10.10
0.10.9
0.10.8
0.10.7
Features
n/a
Improvements
- populate journal issue as discussed in #327 suggested by Plazi's @myrmoteras
- remove trailing comma for RIS author string; related to #326 suggested by Plazi's @myrmoteras
- add biodiversity keyword for bhl related RIS entries; related to #328 suggested by Plazi's @myrmoteras
- use constant for Zenodo relation types; related to #331
- upgrade of jetty following GHSA-q4rv-gq96-w7c5 .
- add integration for zenodo license mapping; related to #325 as needed for Plazi's @mymoteras BHL license normalization
- introduce [preston zenodo --explicit-license-only]; related to #325
- towards supporting preston rpm install #332 @alexlancaster
Example using explicit license and license mapping for Zenodo deposits -
export ZENODO_ENDPOINT=https://sandbox.zenodo.org
export ZENODO_TOKEN=[secret]
git clone https://github.com/jhpoelen/bhl-corpus-tracker
cd bhl-corpus-tracker
# generate test set up to 50 deposits
./sample.sh 50
# generate the license map
cd target/[sample uuid]
cat zenodo-sample.json\
| preston track
LICENSE_MAP_VERSION=$(../../ls-part-licenses.sh| preston track | grep hasVersion | tail -n1 | grep -oE "hash://sha256/[a-f0-9]{64}")
preston ls\
| preston zenodo --explicit-license-only --license ${LICENSE_MAP_VERSION}
Bugs
n/a
0.10.6
Features
n/a
Improvements
- make taxodros pub info configurable; related to TaxoDros/TaxoDros.github.io#46 TaxoDros/TaxoDros.github.io#46
- add support for eol-style table dwc-a #319 fyi @jhammock @KatjaSchulz
- towards publishing a Zenodo deposit with multiple files; #322
- prune dependencies; related to #322
- refactor to make doi usage in RIS streamer configurable; related to #324
- wire up re-use doi configuration; related to zenodo/zenodo#2536
Bugs
n/a
0.10.5
Features
n/a
Improvements
- implement workaround for un-escaped characters Zenodo file content url #317
instead of url offered by Zenodo API
https://zenodo.org/api/records/13505983/files/Thuiller%20et%20al.%20-%202006%20-%20INTERACTIONS%20BETWEEN%20ENVIRONMENT,%20SPECIES%20TRAITS,%20.]/content
the following url is constructed (note the escaped ,
and ]
https://zenodo.org/api/records/13505983/files/Thuiller%20et%20al.%20-%202006%20-%20INTERACTIONS%20BETWEEN%20ENVIRONMENT%2C%20SPECIES%20TRAITS%2C%20.%5D/content
Bugs
n/a