What's Changed
Added
- New model specialization/variants (flavors) mechanism #1151
- Specialization/variant process for a lightweight processing that covers other types of scientific articles that are not following the general segmentation schema (e.g., corrections, editorial letters, etc.) #1202
- Additional training data covering additional cases where the Data Availability statements are over multiple pages #1200
- Added a flag that allows the output of the raw copyright information in TEI #1181
- New Docker container for running end-to-end evaluation #1255
- New Grobid client in Go #1159
- Make the start/end page for header processing customizable #282
- Return configuration processing parameters in TEI XML response header #1274
Changed
- Update PDFalto recognition of non-standard fonts #1216
- Revert text that does not belong to graphics as paragraphs instead of dropping it #1266
- Updated Grobid lucene analyzers for CJK languages #1228
Fixed
- Fix URL identification for certain edge cases #1190, #1191, #1185
- Fix fulltext model training data #1107
- Fix header model training data #1128
- Updated the docker image's packages to reduce the vulnerabilities #1173
- Fixed a bug in the handling of badly formatted figures/tables #1207
- Correct replacement in the filenames of the fulltext generated files #1204
- Fixed full-text block start #1203
- Fix affiliation missing when using DL affiliation-address model #1166
- Fixed various security vulnerabilities #1125 #1123 #1205
- Avoid NPE when iterating over annotations that might have null bounding Boxes #1194
New Contributors
- @annelhote made their first contribution in #1179
- @miku made their first contribution in #1159
- @Schroedi made their first contribution in #1107
Full Changelog: 0.8.1...0.8.2