See their article “Low-resource Post Processing of Noisy OCR Output for Historical Corpus Digitisation” (LREC-2018) for further details, it is available online: http://www.lrec-conf.org/proceedings/lrec2018/pdf/971.pdf

The original python 2.7 code (see original-tag in the repository) has been licensed under Creative Commons Attribution 4.0 (CC-BY-4.0, see also license.txt in the repository).

The code has subsequently been updated to Python 3 and further expanded by Mikkel Eide Eriksen (mikkel.eriksen@gmail.com) for the Copenhagen City Archives (mainly structural changes, the algorithms are generally preserved as-is). Pull requests welcome!

