Skip to content

Headnote missing and/or having wrong labels #1208

@ronny3

Description

@ronny3

Operating System and architecture (arm64, amd64, x86, etc.)

No response

What is your Java version

No response

Log and information

No response

Further information

I'm running the latest 0.8.1. version with this OA article: https://www.sciencedirect.com/science/article/pii/S1386505620310650

After the first page there is typical headnotes for author on the left side and journal on the right side.
According to docs they should be tagged with <note place="headnote">
For this PDF (and others) they are either

  1. missing completely. For the given example the journal is missing, and the author is missing in some pages.
  2. don't have note category. So they are only <note> and then <p>
  3. Are often inside tables. See the example table 3
    <figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"> <head>Table 3</head> <label>3</label> <figDesc>(continued ) </figDesc> <table> <row> <cell>O. Fennelly et al.</cell>
  4. Mess with references. Not this example, but another that was not OA, similar issue to the table3 above, where the author headnote was inferred as a reference in the bibl.

If I have understood Grobid correctly, the models segmentation and fulltext are responsible for this.
I am especially not sure about 4), what model to retrain for this purpose.
Also as I am new to Gorbid, I wonder how many examples would I need for this to improve? Should I go over your training files to see if you are tagging "headnote" correctly?
These are not important info, but according to docs they should be present, and they are causing issues, so I need some guidance!

Metadata

Metadata

Assignees

Labels

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions