CorrectOCR.config module¶
When invoked, CorrectOCR looks for a file named CorrectOCR.ini
in
the working directory. If found, it is loaded, and any entries will be
considered defaults to their corresponding option. These are the defaults:
[configuration]
characterSet = ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
dehyphenate = true
loglevel = INFO
[workspace]
rootPath = ./
goldPath = gold/
originalPath = original/
trainingPath = training/
nheaderlines = 0
language = Danish
docInfoBaseURL =
combine_hyphenated_images = true
[resources]
resourceRootPath = ./resources/
correctionTrackingFile = correction_tracking.json
dictionaryPath = dictionary/
hmmParamsFile = hmm_parameters.json
memoizedCorrectionsFile = memoized_corrections.json
multiCharacterErrorFile = multicharacter_errors.json
reportFile = report.txt
heuristicSettingsFile = settings.json
[storage]
type = fs
db_driver =
db_host =
db_user =
db_pass =
db_name =
[server]
host = 127.0.0.1
profile = false
dynamic_images = true
redirect_hyphenated = true
By default, CorrectOCR requires 4 subdirectories in the working
directory, which will be used as the current Workspace
:
original/
contains the original uncorrected files. If necessary, it can be configured with the--originalPath
argument.gold/
contains the known correct “gold” files. If necessary, it can be configured with the--goldPath
argument.training/
contains the various generated files used during training. If necessary, it can be configured with the--trainingPath
argument.
Corresponding files in original and gold are named
identically, and the filename without extension is considered the file
ID. The generated files in training/
have suffixes according to
their kind.
If generated files exist, CorrectOCR will generally avoid doing
redundant calculations. The --force
switch overrides this, forcing
CorrectOCR to create new files (after moving the existing ones out of
the way). Alternately, one may delete a subset of the generated files to
only recreate those.
The Workspace
also has a ResourceManager
(accessible in code via
.resources
) that handles access to the dictionary, HMM parameter
files, etc.
Environment Variables¶
Environment variables follow the format CORRECTOCR_<section>_<name>
in uppercase. For example, the Workspace root path can be configured by
setting CORRECTOCR_WORKSPACE_ROOTPATH
.
- class CorrectOCR.config.EnvOverride[source]¶
Bases:
configparser.BasicInterpolation
This class overrides the .ini file with environment variables if they exist.
They are checked according to this format: CORRECTOCR_<section>_<key>, all upper case.
Thus, to override the storage:db_server setting, set the CORRECTOCR_STORAGE_DB_SERVER variable.