CorrectOCR.dictionary module

class CorrectOCR.dictionary.Dictionary(path=None, ignoreCase=False)[source]

Bases: Set[str]

Set of words to use for determining correctness of Tokens and suggestions.

Note: A Dictionary “contains” all “words” that contain at most 1 alphabetic letters, such as 8,5 or (600) or A4 .

Parameters
  • path (Optional[Path]) – A path for loading a previously saved dictionary.

  • ignoreCase (bool) – Whether the dictionary is case sensitive.

has_group(group)[source]
Return type

bool

clear()[source]

Remove all elements from this set.

add(group, word, nowarn=False, dirty=True)[source]

Add a new word (sans punctuation) to the dictionary. Silently drops non-alpha strings.

Parameters
  • word (str) – The word to add.

  • nowarn (bool) – Don’t warn about long words (>20 letters).

save_group(group)[source]
save(path=None)[source]

Save the dictionary.

Parameters

path (Optional[Path]) – Optional new path to save to.

clean(word)[source]
Return type

str