Skip to content

Releases: bio-guoda/preston

0.9.0

26 Aug 18:55
Compare
Choose a tag to compare

Features

For example usage, see https://github.com/jhpoelen/bhl-corpus-tracker or below

# first track BHL item.txt and RIS metadata
preston track --algo md5\
 "https://biodiversitylibrary.org/data/part.txt"\
 "https://www.biodiversitylibrary.org/data/RIS/bhlpart.ris.zip"

# then track an associated pdf 
preston track --algo md5\
 https://www.biodiversitylibrary.org/partpdf/1

# then generate associated Zenodo metadata using
preston ls --algo md5\
 | preston ris-stream 

Improvements

Bugs

n/a

0.8.6

10 Jul 19:05
Compare
Choose a tag to compare

0.8.5

08 May 19:59
Compare
Choose a tag to compare

Features

ZOTERO_TOKEN=[SECRET] preston track https://www.zotero.org/groups/5435545/bat_literature_project

For more usage examples, see [1], https://github.com/bat-literature/bat-literature.github.io and https://bat-literature.github.io .

Example usage to track and copy pdf associated with a google doc with provenance data stored in data/ folder:

preston track "https://docs.google.com/document/d/1LMnC0lUw_DGIQV5Pa4lZhe_7-SIR-otgHSHDNkAwD7Q/edit"\
 | grep pdf\
 | grep hasVersion\
 | preston cat\
 > doc.pdf 

Improvements

Bugs

n/a

References

[1] Sherman AC et al. (2024) Bat Literature Corpus v0.1. https://github.com/bat-literature/bat-literature.github.io https://bat-literature.github.io https://linker.bio/hash://sha256/6ba3d79cf1fd6349012cb4e527b6727b3e41e140489fa9c02f132e2cdd88d189

[2] Poelen, J. H. (2024). A biodiversity dataset graph: Biological Associations in TaxonWorks hash://sha256/e4a47c067d6c125da60c9a1b92b5eecdea539cb8666cd3aed99db347ae5b8ed0 hash://md5/686007de79cc2a49ab23fd3debe56e3f (0.3) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.11151783

0.8.4

12 Mar 17:46
Compare
Choose a tag to compare

Features

n/a

Improvements

n/a

Bugs

  • prevent using malformed Zenodo self uri by percent encoding whitespaces. #279

0.8.3

29 Feb 17:16
Compare
Choose a tag to compare

Features

n/a

Improvements

  • improved support for streaming TaxoDros records in jsonl for Zenodo publication #275 also related to TaxoDros/TaxoDros.github.io#18 fyi @myrmoteras @slint - added Zenodo keywords and biodiversity related custom terms. Note the tags "keywords" and "custom" in the example metadata record for TaxoDros item shown below.
{
  "metadata": {
    "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": "taxodros-dros5",
    "keywords": [
      "Biodiversity",
      "Taxonomy",
      "fruit flies",
      "flies",
      "Animalia",
      "Arthropoda",
      "Insecta",
      "Diptera"
    ],
    "custom": {
      "dwc:kingdom": [
        "Animalia"
      ],
      "dwc:phylum": [
        "Arthropoda"
      ],
      "dwc:class": [
        "Insecta"
      ],
      "dwc:order": [
        "Diptera"
      ]
    },
    "referenceId": "abd el-halim et al., 2005",
    "related_identifiers": [
      {
        "relation": "isAlternateIdentifier",
        "identifier": "urn:lsid:taxodros.uzh.ch:id:abd%20el-halim%20et%20al.%2C%202005"
      },
      {
        "relation": "isDerivedFrom",
        "identifier": "https://linker.bio/line:hash://md5/ff86b940567d278e50fa00672cf96629!/L1-L10"
      },
      {
        "relation": "isDerivedFrom",
        "identifier": "10.5281/zenodo.10723540"
      },
      {
        "relation": "isPartOf",
        "identifier": "https://www.taxodros.uzh.ch"
      },
      {
        "relation": "isAlternateIdentifier",
        "identifier": "hash://md5/639988a4074ded5208a575b760a5dc5e"
      }
    ],
    "creators": [
      {
        "name": "Abd El-Halim, A.S."
      },
      {
        "name": "Mostafa, A.A."
      },
      {
        "name": "Allam, K.A.M.a."
      }
    ],
    "access_right": "restricted",
    "publication_date": "2005",
    "title": "Dipterous flies species and their densities in fourteen Egyptian governorates.",
    "publication_type": "article",
    "journal_title": "Journal of the Egyptian Society of Parasitology",
    "journal_volume": "35",
    "journal_pages": "351-362",
    "taxodros:method": "ocr",
    "http://www.w3.org/ns/prov#wasDerivedFrom": "line:hash://md5/ff86b940567d278e50fa00672cf96629!/L1-L10",
    "references": [
      "Bächli, G. (2024). TaxoDros - The Database on Taxonomy of Drosophilidae hash://md5/26a67012dde325cf2a3a058cc2f9c1b8 hash://sha256/ca86d74b318a334bddbc7c6a387a09530a083b8617718f5369ad548744c602d3 [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10723540"
    ],
    "filename": "Abd El-Halim et al., 2005.pdf",
    "upload_type": "publication",
    "communities": [
      {
        "identifier": "taxodros"
      },
      {
        "identifier": "biosyslit"
      }
    ],
    "description": "Uploaded by Plazi for TaxoDros. We do not have abstracts."
  }
}

Bugs

n/a

References

Bächli, G. (2024). TaxoDros - The Database on Taxonomy of Drosophilidae hash://md5/26a67012dde325cf2a3a058cc2f9c1b8 hash://sha256/ca86d74b318a334bddbc7c6a387a09530a083b8617718f5369ad548744c602d3 [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10723540

0.8.2

28 Feb 21:17
Compare
Choose a tag to compare

Features

  • added support for streaming metadata into Zenodo records, creating or updating when needed. Note that provided json Zenodo metadata should be presented in line-json: one json object per line.

Example Usage

cat metadata.json\
 | jq -c .\
 | preston track\
 | grep hasVersion\
 |  preston zenodo\
 --endpoint https://sandbox.zenodo.org\
 --access-token [your access token]

Where jq -c . ensures that json is in line-json, preston track versions the piped json, grep hasVersion only grabs the tracked content, not their previous version, and preston zenodo attempts to update Zenodo records extracted from versioned content.

Note that if metadata.json has an alternate identifier in it that is a content id, then that content will be included as a file. Also, the preston zenodo command will emit RDF statement indicative of the associated Zenodo record and related identifiers.

here's a sample snippet -

<urn:uuid:190939b5-5d59-45f4-a913-9666037eac8d> <http://purl.org/dc/terms/description> "An activity that creates or updates Zenodo records."@en <urn:uuid:190939b5-5d59-45f4-a913-9666037eac8d> .
<https://sandbox.zenodo.org/records/31836> <http://www.w3.org/ns/prov#wasDerivedFrom> <line:hash://sha256/cb94e7c16a617a56a55fbbd76c458333111053bc501d52ae34548b35967933b2!/L25> <urn:uuid:190939b5-5d59-45f4-a913-9666037eac8d> .
<https://sandbox.zenodo.org/records/31836> <http://www.w3.org/ns/prov#wasDerivedFrom> <https://linker.bio/line:hash://md5/ff86b940567d278e50fa00672cf96629!/L175241-L175251> <urn:uuid:190939b5-5d59-45f4-a913-9666037eac8d> .
<https://sandbox.zenodo.org/records/31836> <http://www.w3.org/ns/prov#wasDerivedFrom> <10.5281/zenodo.10593902> <urn:uuid:190939b5-5d59-45f4-a913-9666037eac8d> .
<https://sandbox.zenodo.org/records/31836> <http://www.w3.org/ns/prov#alternateOf> <hash://md5/96ee875c6d473e0095ccc6384fbebb1c> <urn:uuid:190939b5-5d59-45f4-a913-9666037eac8d> .
<https://sandbox.zenodo.org/records/31836> <http://www.w3.org/ns/prov#alternateOf> <urn:lsid:taxodros.uzh.ch:id:toda%2C%201985a> <urn:uuid:190939b5-5d59-45f4-a913-9666037eac8d> .
<https://sandbox.zenodo.org/records/31836> <http://purl.org/pav/lastRefreshedOn> "2024-02-28T20:56:56.969Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> <urn:uuid:190939b5-5d59-45f4-a913-9666037eac8d> .

Example of "pretty printed" of metadata line-json file, note hash://md5/639988a4074ded5208a575b760a5dc5e and "filename": "Abd El-Halim et al., 2005.pdf" .

{
  "metadata": {
    "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": "taxodros-dros5",
    "referenceId": "abd el-halim et al., 2005",
    "related_identifiers": [
      {
        "relation": "isAlternateIdentifier",
        "identifier": "urn:lsid:taxodros.uzh.ch:id:abd%20el-halim%20et%20al.%2C%202005"
      },
      {
        "relation": "isDerivedFrom",
        "identifier": "https://linker.bio/line:hash://md5/ff86b940567d278e50fa00672cf96629!/L1-L10"
      },
      {
        "relation": "isDerivedFrom",
        "identifier": "10.5281/zenodo.10593902"
      },
      {
        "relation": "isPartOf",
        "identifier": "https://www.taxodros.uzh.ch"
      },
      {
        "relation": "isAlternateIdentifier",
        "identifier": "hash://md5/639988a4074ded5208a575b760a5dc5e"
      }
    ],
    "creators": [
      {
        "name": "Abd El-Halim, A.S."
      },
      {
        "name": "Mostafa, A.A."
      },
      {
        "name": "Allam, K.A.M.a."
      }
    ],
    "access_right": "restricted",
    "publication_date": "2005",
    "title": "Dipterous flies species and their densities in fourteen Egyptian governorates.",
    "publication_type": "article",
    "journal_title": "Journal of the Egyptian Society of Parasitology",
    "journal_volume": "35",
    "journal_pages": "351-362",
    "taxodros:method": "ocr",
    "http://www.w3.org/ns/prov#wasDerivedFrom": "line:hash://md5/ff86b940567d278e50fa00672cf96629!/L1-L10",
    "references": [
      "Bächli, G. (2024). TaxoDros - The Database on Taxonomy of Drosophilidae hash://md5/4fa9eeed1c8cff2490483a48c718df02 hash://sha256/e05466f33c755f11bd1c2fa30eef2388bf24ff7989931bae1426daff0200af19 [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10593902"
    ],
    "filename": "Abd El-Halim et al., 2005.pdf",
    "upload_type": "publication",
    "communities": [
      {
        "identifier": "taxodros"
      },
      {
        "identifier": "biosyslit"
      }
    ],
    "description": "Uploaded by Plazi for TaxoDros. We do not have abstracts."
  }
}

Improvements

Bugs

n/a

References

Bächli, G. (2024). TaxoDros - The Database on Taxonomy of Drosophilidae hash://md5/26a67012dde325cf2a3a058cc2f9c1b8 hash://sha256/ca86d74b318a334bddbc7c6a387a09530a083b8617718f5369ad548744c602d3 [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10723540

0.8.1

30 Jan 18:27
Compare
Choose a tag to compare

Features

Improvements

  • improved support for streaming TaxoDros records in jsonl #275 fyi @myrmoteras @slint @lnielsen
    • DOI extraction
    • publication type inference (e.g., book, article, collection)
    • parsing of publication volume, series, pages
    • DROS3 record support
      Example record includes:
{
  "id": "aboim, 1945",
  "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": "taxodros-dros3",
  "keywords": [
    "melanogaster 1",
    "devel",
    "egg",
    "hist",
    "fig"
  ],
  "http://www.w3.org/ns/prov#wasDerivedFrom": "line:hash://sha256/efbba5753be41ce7a7fda25819e6c1e83ad1de6c195fba34faf279d3775605f3!/L31-L38"
}

and

{
  "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": "taxodros-dros5",
  "id": "aceituno et al., 2020",
  "authors": "Aceituno-Medina, M., Ordonez, A., Carrasco, M., Montoya, P., & Hernandez, E.,",
  "year": "2020",
  "title": "Mass Rearing, Quality Parameters, and Bioconversion in Drosophila suzukii (Diptera: Drosophilidae) for Sterile Insect Technique Purposes.",
  "type": "article",
  "journal": "J. econ. Ent.",
  "volume": "113",
  "pages": "1097Ð1104",
  "number": "3",
  "doi": "10.1093/jee/toaa022",
  "method": "ocr / doi:10.1093/jee/toaa022",
  "http://www.w3.org/ns/prov#wasDerivedFrom": "line:hash://sha256/54c249d040b1414380b8a509004b04781ef3c62a12715b627cfa8401829eae65!/L147-L157",
  "filename": "Aceituno et al., 2020.pdf"
}
  • introduce -f/--file option for providing lists of filenames/URLs to be tracked #277

Example usage:
preston track --file <(echo https://example.org)

where <(echo https://example.org) produces a file with a single line containing https://example.org

Bugs

n/a

References

Bächli, G. (2024). TaxoDros - The Database on Taxonomy of Drosophilidae hash://md5/4fa9eeed1c8cff2490483a48c718df02 hash://sha256/e05466f33c755f11bd1c2fa30eef2388bf24ff7989931bae1426daff0200af19 [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10593902

0.8.0

26 Jan 00:16
Compare
Choose a tag to compare

Features

Example - select first DROS5.TEXT literature record with DOI from Bächli, G. (2024)

preston \
 cat hash://md5/1037a9c831005710dc9bf14ee9a2e053\
 --remote https://zenodo.org\
 --algo md5\
 | preston taxodros-stream\
 --remote https://zenodo.org\
 --algo md5\
 | grep DOI\
 | head -n1\
 | jq .

produces:

{
  "id": "abram et al., 2022",
  "authors": "Abram, P.K., et al.,",
  "year": "2022",
  "title": "A Coordinated Sampling and Identification Methodology for Larval Parasitoids of Spotted-Wing Drosophila.",
  "journal": "J. econ. Ent., 115(4):922Ð942.",
  "doi": "10.1093/jee/toab237",
  "method": "ocr++ / DOI:10.1093/jee/toab237",
  "filename": "Abram et al., 2022.pdf",
  "http://www.w3.org/ns/prov#wasDerivedFrom": "line:hash://md5/42be783197504a12172920a7edc7cbfd!/L120-L128",
  "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": "taxodros-flatfile"
}

Improvements

  • enable line selection in text files with Mac line endings #276 to enable
preston cat\
 'line:hash://md5/42be783197504a12172920a7edc7cbfd!/L120-L128'\
 --remote https://linker.bio,https://zenodo.org\
 | tr '\r' '\n'

producing

.TEXT;
abram et al., 2022
.A Abram, P.K., et al.,
.J 2022
.S A Coordinated Sampling and Identification Methodology
for Larval Parasitoids of Spotted-Wing Drosophila.
.Z J. econ. Ent., 115(4):922�942.
.K ocr++ / DOI:10.1093/jee/toab237
.P Abram et al., 2022.pdf

Similar results can be obtained when requesting -

https://linker.bio/line:hash://md5/42be783197504a12172920a7edc7cbfd!/L120-L128

in a browser.

Bugs

n/a

References

Bächli, G. (2024). TaxoDros - The Database on Taxonomy of Drosophilidae hash://md5/d68c923002c43271cee07ba172c67b0b hash://sha256/3e41eec4c91598b8a2de96e1d1ed47d271a7560eb6ef350a17bc67cc61255302 [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10565403

0.7.17

22 Jan 18:59
Compare
Choose a tag to compare

Features

n/a

Improvements

  • make provenance anchor for redirect (badge) service configurable; related to #199 .

Bugs

n/a

0.7.16

10 Jan 19:04
Compare
Choose a tag to compare

Features

n/a

Improvements

  • introduce "no-access" badge in addition to "FAIR" and "unknown" badges; related to #199 and #273 .

Bugs

n/a