CorrectOCR.server module¶
Below are some examples for a possible frontend. Naturally, they are only suggestions and any workflow and interface may be used.
Example User Interface¶
The Combo box would then contain the k-best suggestions from the backend, allowing the user to accept the desired one or enter their own correction.
Showing the left and right tokens (ie. tokens with index±1) enables to user to decide if a token is part of a longer word that should be hyphenated.
Endpoint Documentation¶
Errors are specified according to RFC 7807 Problem Details for HTTP APIs.
Resource |
Operation |
Description |
---|---|---|
1 Main |
Get list of documents |
|
2 Documents |
Get list of tokens in document |
|
3 Tokens |
Get random token |
|
Get token |
||
Update token |
||
Get token image |
-
GET
/
¶ Get an overview of the documents available for correction.
Example response:
HTTP/1.1 200 OK Content-Type: application/json [ { "docid": "<docid>", "url": "/<docid>/tokens.json", "count": 100, "corrected": 87 } ]
- Response JSON Array of Objects
docid (string) – ID for the document.
url (string) – URL to list of Tokens in doc.
count (int) – Total number of Tokens.
corrected (int) – Number of corrected Tokens.
-
GET
/
(string: docid)/token-
(int: index).json
¶ Get information about a specific
Token
Note: The data is not escaped; care must be taken when displaying in a browser.
Example response:
HTTP/1.1 200 OK Content-Type: application/json { "1-best": "Jornben", "1-best prob.": 2.96675056066388e-08, "2-best": "Joreben", "2-best prob.": 7.41372275428713e-10, "3-best": "Jornhen", "3-best prob.": 6.17986300962785e-10, "4-best": "Joraben", "4-best prob.": 5.52540106969346e-10, "Bin": 2, "Decision": "annotator", "Doc ID": "7696", "Gold": "", "Heuristic": "a", "Hyphenated": false, "Discarded": false, "Index": 2676, "Original": "Jornben.", "Selection": [], "Token info": "...", "Token type": "PDFToken", "image_url": "/7696/token-2676.png" }
- Parameters
docid (string) – The ID of the requested document.
index (int) – The placement of the requested Token in the document.
- Return
A JSON dictionary of information about the requested
Token
. Relevant keys for frontend display are original (uncorrected OCR result), gold (corrected version, if available), TODO
-
POST
/
(string: docid)/token-
(int: index).json
¶ Update a given token with a gold transcription and/or hyphenation info.
- Parameters
docid (string) – The ID of the requested document.
index (int) – The placement of the requested Token in the document.
- Request JSON Object
gold (string) – Set new correction for this Token.
hyphenate (string) – Optionally hyphenate to the left or right.
- Return
A JSON dictionary of information about the updated
Token
.
-
GET
/
(string: docid)/token-
(int: index).png
¶ Returns a snippet of the original document as an image, for comparing with the OCR result.
- Parameters
docid (string) – The ID of the requested document.
index (int) – The placement of the requested Token in the document.
- Query Parameters
leftmargin (int) – Optional left margin. See
PDFToken.extract_image()
for defaults. TODOrightmargin (int) – Optional right margin.
topmargin (int) – Optional top margin.
bottommargin (int) – Optional bottom margin.
- Return
A PNG image of the requested
Token
.
-
GET
/
(string: docid)/tokens.json
¶ Get information about the
Tokens
in a given document.- Parameters
docid (string) – The ID of the requested document.
Example response:
HTTP/1.1 200 OK Content-Type: application/json [ { "info_url": "/<docid>/token-0.json", "image_url": "/<docid>/token-0.png", "string": "Example", "is_corrected": true }, { "info_url": "/<docid>/token-1.json", "image_url": "/<docid>/token-1.png", "string": "Exanpie", "is_corrected": false } ]
- Response JSON Array of Objects
info_url (string) – URL to Token info.
image_url (string) – URL to Token image.
string (string) – Current Token string.
is_corrected (bool) – Whether the Token has been corrected at the moment.
-
GET
/random
¶ Returns a 302-redirect to a random token from a random document. TODO: filter by needing annotator
Example response:
HTTP/1.1 302 Found Location: /<docid>/token-<index>.json