CorrectOCR.correcter module

Correction Interface

The annotator will be presented with the tokens that match a heuristic bin that was marked for annotation.

They may then enter a command. The commands reflect the above settings, with an additional defer command to defer decision to a later time.

Prefixing the entered text with an exclamation point causes it to be considered the corrected version of the token. For example, if the token is “Wagor” and no suitable candidate is available, the annotator may enter !Wagon to correct the word.

Corrections are memoized, so the file need not be corrected fully in one session. To finish a session and save corrections, use the quit command.

A help command is available in the interface.

class CorrectOCR.correcter.CorrectionShell(tokens, dictionary, correctionTracking)[source]

Bases: cmd.Cmd

Interactive shell for making corrections to a list of tokens. Assumes that the tokens are binned.

Instantiate a line-oriented interpreter framework.

The optional argument ‘completekey’ is the readline name of a completion key; it defaults to the Tab key. If completekey is not None and the readline module is available, command completion is done automatically. The optional arguments stdin and stdout specify alternate input and output file objects; if not specified, sys.stdin and sys.stdout are used.

classmethod start(tokens, dictionary, correctionTracking, intro=None)[source]
Parameters
  • tokens (TokenList) – A list of Tokens.

  • dictionary – A dictionary against which to check validity.

  • correctionTracking (dict) – TODO

  • intro (Optional[str]) – Optional introduction text.

do_original(_)[source]

Choose original (abbreviation: o)

do_shell(arg)[source]

Custom input to replace token

do_kbest(arg)[source]

Choose k-best by number (abbreviation: just the number)

do_kdict(arg)[source]

Choose k-best which is in dictionary

do_memoized(arg)[source]
do_error(arg)[source]
do_linefeed(_)[source]
do_defer(_)[source]

Defer decision for another time.

do_quit(_)[source]