Skip to content

layout-analysis: (re)train #114

Open
@bertsky

Description

@bertsky

Since there is no documentation here for the training process and the training data, we have to make guesses.

The current model for (logical / whole-page) layout-analysis contains 21 classes:

['annotation', 'binding', 'chapter', 'colour_checker', 'contained_work', 'contents', 'cover', 'edge', 'endsheet', 'epicedia', 'illustration', 'index', 'musical_notation', 'page', 'paste_down', 'preface', 'provenance', 'section', 'sermon', 'table', 'title_page']

This is clearly inadequate: it mixes very specialised, rare types (sermon) with coarse, frequent ones (page), also it is very unlikely that such fine differentiation is feasible just from the visual classification of pages, independent of each other (i.e. without sequence context). For example, how could the hierarchy levels chapter and section be discernable, reliably?

So IMO we should re-train this on a coarser set of types, say:

  • empty (covering all non-text divs like binding, colour_checker, cover, endsheet)
  • title_page
  • contents (also including index)
  • page.

Perhaps additionally discerning table, illustration and musical_notation pages is doable, but that may well be considered part of physical / structural layout analysis (as these region types rarely occur alone on a page).

Going back in the history, it is evident that the model has been trained on (an older version of) keras.applications.InceptionV3:

def define_model(self, model = 'inception_v3', num_classes=34, input_size=(600, 500, 1)):
input_dims = Input(shape=(input_size))
if model == "inception_v3":
model = inception_v3.InceptionV3(include_top=True, weights=None, classes=classes, input_tensor=input_dims)

def create_model(self, path, model_name='inception_v3', def_weights=True, num_classes=34, input_size=(600, 500, 1)):

model = self.define_model(model_name, num_classes, input_size)
model.load_weights(path)

size = 600, 500
img = Image.open(fname)
img_array = ocrolib.pil2array(img.resize((500, 600), Image.ANTIALIAS))
img_array = img_array[np.newaxis, :, :, np.newaxis]
results = self.start_test(model, img_array, fname, class_indices)

def start_test(self, model, img_array, filename, labels):
# shape should be 1,600,500 for keras
pred = model.predict(img_array)
pred_classes = np.argmax(pred, axis=1)

So input seems to be 600x500px grayscale (1-channel), with a batch dimension in front.

It would help to know what training data was previously used, though.

@n00blet could you please comment?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions