Skip to content
This repository was archived by the owner on Jan 25, 2024. It is now read-only.

Mine For Specimens

Mike Caprio edited this page Nov 8, 2016 · 34 revisions

####Discover and Dynamically Create Reference Links Between Publications and Catalogued Specimens

##Background

Problem:

Currently there is no easy way to link AMNH science publications in our DSpace Digital Library (http://digitallibrary.amnh.org/handle/2246/5) to specimens (Latin name and catalog number) in our research collections. We would like a solution to dynamically create bibliography that would match references to publications to specimens (Latin name and catalog number) in our collection database (platform = KE EMu (http://www.kesoftware.com/).

There are also images of our specimens in our Scientific Publications. Could images that match specimen numbers be extracted as saved as jpegs?

Finally, we have images of specimen catalog cards that have meaningless filenames... the filenames are not by specimen name or catalog number so there is no way to find a card that pertains to a specific specimen. Our goal for this part of the project would be to extract the scientific name and catalog number from the card and dump it into a spreadsheet with the filename, essentially creating an index for all the filenames. This would make it easier for Collections Managers to find specific cards and attach the jpeg of the card to the corresponding specimen record in the collection database.


##Solutions

  • Extract specimen numbers/names from digitized Scientific Publications. Find known specimen numbers and names in the text of [scientific publications]](http://digitallibrary.amnh.org/handle/2246/5), match them to corresponding specimens (Latin name and catalog number) in our collections database, and output a bibliography in XML format that could be imported into the bibliography module in KE Emu. For the schema for Bibliography Module in the documents folder for this project - https://github.com/amnh/HackTheStacks/tree/master/challenges/Mine_For_Specimens/documents). For a list of specimens (Latin name and catalog number) in our collections database, see the documents folder for this project - https://github.com/amnh/HackTheStacks/tree/master/challenges/Mine_For_Specimens/documents).

  • Extract images from the Scientific Publications. Scientific publications that include our specimens (Latin name and catalog number) have images we would like you to extract and save as jpegs. The filename of the jpeg should correspond the the specimen catalog number that is pictured in the image.

  • Extract the scientific name and the catalog number from the jpegs of specimen catalog cards. Extract and insert this data for each card / catalog page into a spreadsheet with the name of the file in which it appears (ie: p0005_74101.jpg). The images of the catalog cards will be provided to the team working on this challenge via an external drive.


##Resources

Clone this wiki locally