CorrectOCR.model module¶
-
class
CorrectOCR.model.HMM(path, multichars=None, dictionary=None)[source]¶ Bases:
object- Parameters
path (
Path) – Path for loading and saving.multichars – A dictionary of possible multicharacter substitutions (eg. ‘cr’: ‘æ’ or vice versa).
dictionary (
Optional[Dictionary]) – The dictionary against which to check validity.
-
property
init¶ Initial probabilities.
- Return type
-
property
tran¶ Transition probabilities.
- Return type
-
property
emis¶ Emission probabilities.
- Return type
-
is_valid()[source]¶ Verify that parameters are valid (ie. the keys in init/tran/emis match).
- Return type
-
kbest_for_word(word, k)[source]¶ Generates k-best correction candidates for a single word.
- Parameters
- Return type
DefaultDict[int,KBestItem]- Returns
A dictionary with ranked candidates keyed by 1..*k*.
-
class
CorrectOCR.model.HMMBuilder(dictionary, smoothingParameter, characterSet, readCounts, remove_chars, gold_words)[source]¶ Bases:
objectCalculates parameters for a HMM based on the input. They can be accessed via the three properties.
- Parameters
dictionary (
Dictionary) – The dictionary to use for generating probabilities.smoothingParameter (
float) – Lower bound for probabilities.characterSet – Set of required characters for the final HMM.
readCounts – See
Aligner.remove_chars (
List[str]) – List of characters to remove from the final HMM.