History¶

CorrectOCR is based on code created by:

Caitlin Richter (ricca@seas.upenn.edu)
Matthew Wickes (wickesm@seas.upenn.edu)
Deniz Beser (dbeser@seas.upenn.edu)
Mitchell Marcus (mitch@cis.upenn.edu)

See their article “Low-resource Post Processing of Noisy OCR Output for Historical Corpus Digitisation” (LREC-2018) for further details, it is available online: http://www.lrec-conf.org/proceedings/lrec2018/pdf/971.pdf

The original python 2.7 code (see original-tag in the repository) has been licensed under Creative Commons Attribution 4.0 (CC-BY-4.0, see also license.txt in the repository).

The code has subsequently been updated to Python 3 and further expanded by Mikkel Eide Eriksen (mikkel.eriksen@gmail.com) for the Copenhagen City Archives (mainly structural changes, the algorithms are generally preserved as-is). Pull requests welcome!

Welcome to CorrectOCR’s documentation!¶

History¶

Indices and tables¶