-
Notifications
You must be signed in to change notification settings - Fork 31
Description
PAGE-XML features an implicit inheritance relation between various elements of the hierarchy:
Page/TextStyle → TextRegion*/TextStyle → TextLine/TextStyle → Word/TextStyle → Glyph/TextStyle
TextRegion*/@production → TextLine/@production → Word/@production → Glyph/@production
Page/@primaryScript → TextRegion*/@primaryScript → TextLine/@primaryScript → Word/@primaryScript → Glyph/@script
Page/@secondaryScript → TextRegion*/@secondaryScript → TextLine/@secondaryScript → Word/@secondaryScript → Glyph/@script
Page/@primaryLanguage → TextRegion*/@primaryLanguage → TextLine/@primaryLanguage → Word/@language
Page/@secondaryLanguage → TextRegion*/@secondaryLanguage → TextLine/@secondaryLanguage → Word/@language
Page/@readingDirection → TextRegion*/@readingDirection → TextLine/@readingDirection → Word/@readingDirection
Page/@textLineOrder → TextRegion*/@textLineOrder
These relations are only documented and cannot be automatically implemented in a generated DOM. But their semantics are important, and it would make writing processors much easier if they would be implemented.
For example, if I want to know if the current segment belongs to a certain script, I'd currently have to:
- check the element type, what kind of attribute name applies (
@script
or@primaryScript
/@secondaryScript
) - check if that is set locally
- otherwise check the parent element's
@primaryScript
etc
This is very hard to achieve with XPath (because disjunction/unions are only possible on nodesets, not on predicates). And with the DOM it requires a lot of code each time.
But we could facilitate this by simply propagating all inherited features during .build()
– in a patched ocrd_page_generateds
. We already have the user methods mechanism for patching, and we could simply use buildChildren
to propagate all of the above attributes (as a bottom up post-hook), because attributes of parents are built before those of children.
But for TextStyle
, it's more complicated: on all hierarchy levels except the Page
level, TextStyle
sorts after the logical children and thus is only built after they are built. Also, one would need to unify style attributes between levels (we usually have True
, False
and None
; so true/false from parents replaces none in children).