Skip to content

0.8.2

Latest
Compare
Choose a tag to compare
@lfoppiano lfoppiano released this 11 May 17:40
· 25 commits to master since this release

What's Changed

Added

  • New model specialization/variants (flavors) mechanism #1151
  • Specialization/variant process for a lightweight processing that covers other types of scientific articles that are not following the general segmentation schema (e.g., corrections, editorial letters, etc.) #1202
  • Additional training data covering additional cases where the Data Availability statements are over multiple pages #1200
  • Added a flag that allows the output of the raw copyright information in TEI #1181
  • New Docker container for running end-to-end evaluation #1255
  • New Grobid client in Go #1159
  • Make the start/end page for header processing customizable #282
  • Return configuration processing parameters in TEI XML response header #1274

Changed

  • Update PDFalto recognition of non-standard fonts #1216
  • Revert text that does not belong to graphics as paragraphs instead of dropping it #1266
  • Updated Grobid lucene analyzers for CJK languages #1228

Fixed

  • Fix URL identification for certain edge cases #1190, #1191, #1185
  • Fix fulltext model training data #1107
  • Fix header model training data #1128
  • Updated the docker image's packages to reduce the vulnerabilities #1173
  • Fixed a bug in the handling of badly formatted figures/tables #1207
  • Correct replacement in the filenames of the fulltext generated files #1204
  • Fixed full-text block start #1203
  • Fix affiliation missing when using DL affiliation-address model #1166
  • Fixed various security vulnerabilities #1125 #1123 #1205
  • Avoid NPE when iterating over annotations that might have null bounding Boxes #1194

New Contributors

Full Changelog: 0.8.1...0.8.2