When invoked, CorrectOCR looks for a file named
the working directory. If found, it is loaded, and any entries will be
considered defaults to their corresponding option. These are the defaults:
[configuration] characterSet = ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz dehyphenate = true loglevel = INFO [workspace] rootPath = ./ goldPath = gold/ originalPath = original/ trainingPath = training/ nheaderlines = 0 language = Danish docInfoBaseURL = combine_hyphenated_images = true [resources] resourceRootPath = ./resources/ correctionTrackingFile = correction_tracking.json dictionaryPath = dictionary/ hmmParamsFile = hmm_parameters.json memoizedCorrectionsFile = memoized_corrections.json multiCharacterErrorFile = multicharacter_errors.json reportFile = report.txt heuristicSettingsFile = settings.json [storage] type = fs db_driver = db_host = db_user = db_pass = db_name = [server] host = 127.0.0.1 profile = false dynamic_images = true redirect_hyphenated = true
By default, CorrectOCR requires 4 subdirectories in the working
directory, which will be used as the current
original/contains the original uncorrected files. If necessary, it can be configured with the
gold/contains the known correct “gold” files. If necessary, it can be configured with the
training/contains the various generated files used during training. If necessary, it can be configured with the
Corresponding files in original and gold are named
identically, and the filename without extension is considered the file
ID. The generated files in
training/ have suffixes according to
If generated files exist, CorrectOCR will generally avoid doing
redundant calculations. The
--force switch overrides this, forcing
CorrectOCR to create new files (after moving the existing ones out of
the way). Alternately, one may delete a subset of the generated files to
only recreate those.
Workspace also has a
ResourceManager (accessible in code via
.resources) that handles access to the dictionary, HMM parameter
Environment variables follow the format
in uppercase. For example, the Workspace root path can be configured by
- class CorrectOCR.config.EnvOverride¶
This class overrides the .ini file with environment variables if they exist.
They are checked according to this format: CORRECTOCR_<section>_<key>, all upper case.
Thus, to override the storage:db_server setting, set the CORRECTOCR_STORAGE_DB_SERVER variable.
- before_get(parser, section, option, value, defaults)¶
- CorrectOCR.config.setup(args, configfiles=['CorrectOCR.ini'])¶