Document OCR
Extract all text from any document.
Available models
Model | Availability | Release date | Description |
---|---|---|---|
ocr-2.2.1 | Latest | 2024-06-11 | Additional support for Japanese character set. |
ocr-2.1.1 | Deprecated | 2024-04-04 | Improved text detection for single characters and special characters. |
ocr-2.1.0 | Deprecated | 2024-02-28 | Additional support for Hanja, Hanzi and Kanji. Improved accuracy and performance. |
ocr-1.0.0 | Deprecated | 2023-04-10 | An OCR model specialized for English and Korean. Resilient against real-world images, including wrinkled papers and rotated text. |
Request
POST https://api.upstage.ai/v1/document-ai/ocr
Parameters
Request headers
Authorization string Required |
Request body
document file Required |
schema string Optional |
model string Optional |
Requirements
- Supported file formats: JPEG, PNG, BMP, PDF, TIFF, HEIC, DOCX, PPTX, XLSX
- Maximum file size: 50MB
- Maximum number of pages per file: 30 pages (For files exceeding 30 pages, the first 30 pages are processed)
- Maximum pixels per page: 100,000,000 pixels. For non-image files, the pixel count is determined after converting to images at a standard of 300 DPI.
- Supported character sets: Alphanumeric, Hangul, and Hanja are supported. Hanzi and Kanji are in beta versions, indicating that they are available but not fully supported.
- Text size: Optimized for text size that is approximately under 30% of the page size. Examples that don't meet these standards are considered bad examples, and could result in a response error.
Hanja, Hanzi, and Kanji are writing systems based on Chinese characters used in Korean, Chinese, and Japanese writing systems. Despite sharing similarities, they possess distinct visual representations, pronunciations, meanings, and usage conventions within their respective linguistic contexts. For more information, see this article (opens in a new tab).
Response
Functionality overview
- Data hierarchy: The API currently supports the hierarchy of document, page, and words.
- Unknown characters: Characters that the model detects but cannot recognize are considered "unknown characters" and are marked by the character
�
. - Response time: Files with less than 30 words take approximately two seconds. Longer documents can take up to tens of seconds.
- Timeout: There is a server-side 3 minute timeout for all requests.
Return values
apiVersion string |
confidence float |
mimeType string |
modelVersion string |
numBilledPages integer |
pages list |
pages[].confidence float |
pages[].height integer |
pages[].width integer |
pages[].text string |
pages[].words list |
pages[].words[].text string |
pages[].words[].confidence float |
*.id integer |
*.boundingBox object |
*.vertices list |
*.x integer |
*.y integer |
stored boolean |
text string |
metadata object |
Example
Request
curl -X POST https://api.upstage.ai/v1/document-ai/ocr \
-H "Authorization: Bearer UPSTAGE_API_KEY" \
-F "document=@hello.png"
Response
{
"apiVersion": "1.1",
"confidence": 0.9924988460974842,
"metadata": {
"pages": [
{
"height": 256,
"page": 1,
"width": 786
}
]
},
"mimeType": "multipart/form-data",
"modelVersion": "ocr-2.2.1",
"numBilledPages": 1,
"pages": [
{
"confidence": 0.9924988460974842,
"height": 256,
"id": 0,
"text": "Print the words \nhello, world",
"width": 786,
"words": [
{
"boundingBox": {
"vertices": [
{
"x": 65,
"y": 52
},
{
"x": 221,
"y": 55
},
{
"x": 221,
"y": 104
},
{
"x": 64,
"y": 101
}
]
},
"confidence": 0.9950619419121907,
"id": 0,
"text": "Print"
},
{
"boundingBox": {
"vertices": [
{
"x": 243,
"y": 49
},
{
"x": 341,
"y": 52
},
{
"x": 340,
"y": 105
},
{
"x": 241,
"y": 102
}
]
},
"confidence": 0.9989913157886589,
"id": 1,
"text": "the"
},
{
"boundingBox": {
"vertices": [
{
"x": 368,
"y": 52
},
{
"x": 553,
"y": 51
},
{
"x": 553,
"y": 105
},
{
"x": 368,
"y": 105
}
]
},
"confidence": 0.9890200556796326,
"id": 2,
"text": "words"
},
{
"boundingBox": {
"vertices": [
{
"x": 214,
"y": 131
},
{
"x": 470,
"y": 149
},
{
"x": 467,
"y": 206
},
{
"x": 210,
"y": 188
}
]
},
"confidence": 0.9933670202605895,
"id": 3,
"text": "hello,"
},
{
"boundingBox": {
"vertices": [
{
"x": 527,
"y": 145
},
{
"x": 748,
"y": 143
},
{
"x": 749,
"y": 192
},
{
"x": 527,
"y": 194
}
]
},
"confidence": 0.986053896846349,
"id": 4,
"text": "world"
}
]
}
],
"stored": true,
"text": "Print the words \nhello, world"
}