Skip to content

grc letters with dot below #57

@nisbet-hubbard

Description

@nisbet-hubbard

This is relevant specifically to grc. Because modern books of Ancient Greek often has to mark out uncertain letters in ancient sources, letters with dot below are a common occurrence but are at present not recognised by tesseract.

A fairly complete list of letters with dot below (except for the lunate sigma ϲ̣) can be found here: https://titus.uni-frankfurt.de/unicode/unicsel/grkkadd.htm

I wonder if recognising dot below shouldn’t be a feature behind a flag to be manually turned on because it might also pick up stains in older books (which however tend not to have such dots & so don’t require this feature). But this could make it difficult to deploy the feature in downstream projects like Internet Archive.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions