layout-analysis: (re)train

Since there is no documentation here for the training process and the training data, we have to make guesses.

The current model for (logical / whole-page) layout-analysis contains 21 classes:
```python
['annotation', 'binding', 'chapter', 'colour_checker', 'contained_work', 'contents', 'cover', 'edge', 'endsheet', 'epicedia', 'illustration', 'index', 'musical_notation', 'page', 'paste_down', 'preface', 'provenance', 'section', 'sermon', 'table', 'title_page']
```

This is clearly inadequate: it mixes very specialised, rare types (`sermon`) with coarse, frequent ones (`page`), also it is very unlikely that such fine differentiation is feasible just from the visual classification of pages, independent of each other (i.e. without sequence context). For example, how could the hierarchy levels `chapter` and `section` be discernable, reliably?

So IMO we should re-train this on a coarser set of types, say:
- `empty` (covering all non-text divs like `binding`, `colour_checker`, `cover`, `endsheet`)
-  `title_page`
-  `contents` (also including `index`)
-  `page`. 
 
Perhaps additionally discerning `table`, `illustration` and `musical_notation` pages is doable, but that may well be considered part of physical / structural layout analysis (as these region types rarely occur alone on a page).

Going back in the history, it is evident that the model has been trained on (an older version of) [keras.applications.InceptionV3](https://keras.io/api/applications/inceptionv3/):

https://github.com/OCR-D/ocrd_anybaseocr/blob/3e897af5fde12a3b1a2cd701c3d66e1f9cc74e78/ocrd_anybaseocr/cli/ocrd_anybaseocr_layout_analysis.py#L62-L66

https://github.com/OCR-D/ocrd_anybaseocr/blob/3e897af5fde12a3b1a2cd701c3d66e1f9cc74e78/ocrd_anybaseocr/cli/ocrd_anybaseocr_layout_analysis.py#L73

https://github.com/OCR-D/ocrd_anybaseocr/blob/3e897af5fde12a3b1a2cd701c3d66e1f9cc74e78/ocrd_anybaseocr/cli/ocrd_anybaseocr_layout_analysis.py#L81-L82

https://github.com/OCR-D/ocrd_anybaseocr/blob/3e897af5fde12a3b1a2cd701c3d66e1f9cc74e78/ocrd_anybaseocr/cli/ocrd_anybaseocr_layout_analysis.py#L161-L165

https://github.com/OCR-D/ocrd_anybaseocr/blob/3e897af5fde12a3b1a2cd701c3d66e1f9cc74e78/ocrd_anybaseocr/cli/ocrd_anybaseocr_layout_analysis.py#L85-L88

So input seems to be 600x500px grayscale (1-channel), with a batch dimension in front.

It would help to know what training data was previously used, though.

@n00blet could you please comment?


	def define_model(self, model = 'inception_v3', num_classes=34, input_size=(600, 500, 1)):
	input_dims = Input(shape=(input_size))
	if model == "inception_v3":
	model = inception_v3.InceptionV3(include_top=True, weights=None, classes=classes, input_tensor=input_dims)

	size = 600, 500
	img = Image.open(fname)
	img_array = ocrolib.pil2array(img.resize((500, 600), Image.ANTIALIAS))
	img_array = img_array[np.newaxis, :, :, np.newaxis]
	results = self.start_test(model, img_array, fname, class_indices)

	def start_test(self, model, img_array, filename, labels):
	# shape should be 1,600,500 for keras
	pred = model.predict(img_array)
	pred_classes = np.argmax(pred, axis=1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

layout-analysis: (re)train #114

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	model = self.define_model(model_name, num_classes, input_size)
	model.load_weights(path)

layout-analysis: (re)train #114

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions