Skip to content

hertelm/ocr-postcorrection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

OCR postcorrection

Resources

ocr-errors.txt

A collection of isolated OCR errors from the cleaned ACL anthology reference corpus.

ACL-benchmark

A benchmark with random paragraphs from the ACL corpus (downloaded from Universität Heidelberg, because it was not available through the original source), with a manually corrected ground truth. The benchmark is presented in our paper about Tokenization Repair (under review). Here, it is filtered for OCR errors, excluding whitespace errors and hyphenation errors.

Related work by our group

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published