Skip to content

write_citation_pairs with less human intervention #43

@jeanetteclark

Description

@jeanetteclark

This is all a bit of a mess, there is definitely a better way to do it.

write_citation_pairs takes a data frame with a column for article id and one for dataset id. It loops through each row and uses crossref::cr_cn to retrieve a full citation for the paper using the article id. We need the information such as authors, title, etc to send to the metrics service.

crossref::cr_cn returns the citation in bibtex format (it can also return json and other formats, optionally). Then, that bibtex is passed to bib2df:bib2df, which parses the text string into a data frame. Parsing this text string is somewhat of a nightmare though, and I ended up refactoring bib2df to accommodate single line bibtex docs, which for some reason crossref::cr_cn started returning. So I did that here, but the method that I had to use requires that you know what the fields are for the bibtex entry are. Occasionally, a bibtex entry will come back with a really oddball field in it, and that field name has to be passed to the extra_fields argument of bib2dfand the function run again to get the correct parsing, otherwise the rest of the document is thrown off. This is all especially frustrating because we only need certain fields to pass to the metrics service, but the ENTIRE doc needs to be processed correctly.

So some options to make this require no human intervention:

  1. Capture the warning output from the first pass, parse it, feed the fields back in for a second pass
    • this seems ridiculous
  2. Have crossref::cr_cn just return the json, parse it, and extract what we need, bypassing bib2df entirely
  3. Find a more straightforward way to retrieve just the information we need, probably by querying the crossref API more directly

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions