CorrectOCR.model module¶
-
class
CorrectOCR.model.
HMM
(path, multichars=None, dictionary=None)[source]¶ Bases:
object
- Parameters
path (
Path
) – Path for loading and saving.multichars – A dictionary of possible multicharacter substitutions (eg. ‘cr’: ‘æ’ or vice versa).
dictionary (
Optional
[Dictionary
]) – The dictionary against which to check validity.
-
property
init
¶ Initial probabilities.
- Return type
-
property
tran
¶ Transition probabilities.
- Return type
-
property
emis
¶ Emission probabilities.
- Return type
-
is_valid
()[source]¶ Verify that parameters are valid (ie. the keys in init/tran/emis match).
- Return type
-
kbest_for_word
(word, k)[source]¶ Generates k-best correction candidates for a single word.
- Parameters
- Return type
DefaultDict
[int
,KBestItem
]- Returns
A dictionary with ranked candidates keyed by 1..*k*.
-
class
CorrectOCR.model.
HMMBuilder
(dictionary, smoothingParameter, characterSet, readCounts, remove_chars, gold_words)[source]¶ Bases:
object
Calculates parameters for a HMM based on the input. They can be accessed via the three properties.
- Parameters
dictionary (
Dictionary
) – The dictionary to use for generating probabilities.smoothingParameter (
float
) – Lower bound for probabilities.characterSet – Set of required characters for the final HMM.
readCounts – See
Aligner
.remove_chars (
List
[str
]) – List of characters to remove from the final HMM.