Description
Since there is no documentation here for the training process and the training data, we have to make guesses.
The current model for (logical / whole-page) layout-analysis contains 21 classes:
['annotation', 'binding', 'chapter', 'colour_checker', 'contained_work', 'contents', 'cover', 'edge', 'endsheet', 'epicedia', 'illustration', 'index', 'musical_notation', 'page', 'paste_down', 'preface', 'provenance', 'section', 'sermon', 'table', 'title_page']
This is clearly inadequate: it mixes very specialised, rare types (sermon
) with coarse, frequent ones (page
), also it is very unlikely that such fine differentiation is feasible just from the visual classification of pages, independent of each other (i.e. without sequence context). For example, how could the hierarchy levels chapter
and section
be discernable, reliably?
So IMO we should re-train this on a coarser set of types, say:
empty
(covering all non-text divs likebinding
,colour_checker
,cover
,endsheet
)title_page
contents
(also includingindex
)page
.
Perhaps additionally discerning table
, illustration
and musical_notation
pages is doable, but that may well be considered part of physical / structural layout analysis (as these region types rarely occur alone on a page).
Going back in the history, it is evident that the model has been trained on (an older version of) keras.applications.InceptionV3:
ocrd_anybaseocr/ocrd_anybaseocr/cli/ocrd_anybaseocr_layout_analysis.py
Lines 161 to 165 in 3e897af
So input seems to be 600x500px grayscale (1-channel), with a batch dimension in front.
It would help to know what training data was previously used, though.
@n00blet could you please comment?