CorrectOCR.aligner module

class CorrectOCR.aligner.Aligner[source]

Bases: object

alignments(originalTokens, goldTokens)[source]

Aligns the original and gold tokens in order to discover the corrections that have been made.

Parameters
  • originalTokens (List[str]) – List of original text strings

  • goldTokens (List[str]) – List of gold text strings

Returns

A tuple with three elements:

  • fullAlignments – A list of letter-by-letter alignments (2-element tuples)

  • wordAlignments

  • readCounts – A dictionary of counts of aligned reads for each character.