Documentation
APIs
Document OCR

Document OCR

Extract all text from any document.

Available models

ModelAvailabilityRelease dateDescription
ocr-2.2.1Latest2024-06-11Additional support for Japanese character set.
ocr-2.1.1Deprecated2024-04-04Improved text detection for single characters and special characters.
ocr-2.1.0Deprecated2024-02-28Additional support for Hanja, Hanzi and Kanji. Improved accuracy and performance.
ocr-1.0.0Deprecated2023-04-10An OCR model specialized for English and Korean. Resilient against real-world images, including wrinkled papers and rotated text.

Request

POST https://api.upstage.ai/v1/document-ai/ocr

Parameters

Request headers

Authorization string Required
Authentication token, format: Bearer API_KEY

Request body

document file Required
The document file to be processed. Supported file formats are listed here.

schema string Optional
An optional parameter that specifies the response format. If set, the output is converted to the format of the corresponding OCR API. Valid values are "clova", "google" or None. All values are provided exclusively by Upstage models, and are irrelevant with each service provider.
Default value is None.

model string Optional
An optional parameter that specifices the model version to be used. Available models can be found at the top of this document.

Requirements

  • Supported file formats: JPEG, PNG, BMP, PDF, TIFF, HEIC, DOCX, PPTX, XLSX
  • Maximum file size: 50MB
  • Maximum number of pages per file: 30 pages (For files exceeding 30 pages, the first 30 pages are processed)
  • Maximum pixels per page: 100,000,000 pixels. For non-image files, the pixel count is determined after converting to images at a standard of 300 DPI.
  • Supported character sets: Alphanumeric, Hangul, and Hanja are supported. Hanzi and Kanji are in beta versions, indicating that they are available but not fully supported.
  • Text size: Optimized for text size that is approximately under 30% of the page size. Examples that don't meet these standards are considered bad examples, and could result in a response error.

Hanja, Hanzi, and Kanji are writing systems based on Chinese characters used in Korean, Chinese, and Japanese writing systems. Despite sharing similarities, they possess distinct visual representations, pronunciations, meanings, and usage conventions within their respective linguistic contexts. For more information, see this article (opens in a new tab).

Response

Functionality overview

  • Data hierarchy: The API currently supports the hierarchy of document, page, and words.
  • Unknown characters: Characters that the model detects but cannot recognize are considered "unknown characters" and are marked by the character .
  • Response time: Files with less than 30 words take approximately two seconds. Longer documents can take up to tens of seconds.
  • Timeout: There is a server-side 3 minute timeout for all requests.

Return values

apiVersion string
A string representing the version of the API being used. A bump in the major version indicates a backward-incompatible update, while a minor version increase signifies a backward-compatible update.

confidence float
A float value between 0 and 1, representing the overall confidence score for the entire document. A higher value indicates greater confidence in the accuracy of the document's content.

mimeType string
The MIME type of the input file (e.g., "multipart/form-data").

modelVersion string
A string representing the version of the model being used.

numBilledPages integer
The total count of pages in the input file that have been processed and are chargeable.

pages list
A list of page objects containing information about the words on each page and the overall confidence for the page.

pages[].confidence float
A float value between 0 and 1, representing the overall confidence score for the entire page. A higher value indicates greater confidence in the accuracy of the page's content.

pages[].height integer
The height of the page in pixels.

pages[].width integer
The width of the page in pixels.

pages[].text string
A string representing the text of the entire page, typically created by concatenating the individual words' serialized texts.

pages[].words list
A list of word objects within a page, each containing information about the word's text, confidence score, and bounding box.

pages[].words[].text string
A string representing the text of the word.

pages[].words[].confidence float
A float value between 0 and 1, representing the confidence score for the word's detection and recognition. A higher value indicates greater confidence in the accuracy of the word.

*.id integer
The unique identifier for an object.

*.boundingBox object
An object containing information about the bounding box of a word, defined by the vertices of the box.

*.vertices list
A list of vertices (x, y coordinates) that define the corners of the bounding box.

*.x integer
The x-coordinate of a vertex in the bounding box.

*.y integer
The y-coordinate of a vertex in the bounding box.

stored boolean
A boolean indicating whether the input was stored. If true, the data has been stored. If false the data has been discarded instantly.

text string
A string representing the text of the entire document, typically created by concatenating the pages' serialized texts.

metadata object
An object containing metadata about a document, such as page size and page numbers.

Example


Request

hello.png
hello.png
curl -X POST https://api.upstage.ai/v1/document-ai/ocr \
-H "Authorization: Bearer UPSTAGE_API_KEY" \
-F "document=@hello.png"

Response

{
    "apiVersion": "1.1",
    "confidence": 0.9924988460974842,
    "metadata": {
        "pages": [
            {
                "height": 256,
                "page": 1,
                "width": 786
            }
        ]
    },
    "mimeType": "multipart/form-data",
    "modelVersion": "ocr-2.2.1",
    "numBilledPages": 1,
    "pages": [
        {
            "confidence": 0.9924988460974842,
            "height": 256,
            "id": 0,
            "text": "Print the words \nhello, world",
            "width": 786,
            "words": [
                {
                    "boundingBox": {
                        "vertices": [
                            {
                                "x": 65,
                                "y": 52
                            },
                            {
                                "x": 221,
                                "y": 55
                            },
                            {
                                "x": 221,
                                "y": 104
                            },
                            {
                                "x": 64,
                                "y": 101
                            }
                        ]
                    },
                    "confidence": 0.9950619419121907,
                    "id": 0,
                    "text": "Print"
                },
                {
                    "boundingBox": {
                        "vertices": [
                            {
                                "x": 243,
                                "y": 49
                            },
                            {
                                "x": 341,
                                "y": 52
                            },
                            {
                                "x": 340,
                                "y": 105
                            },
                            {
                                "x": 241,
                                "y": 102
                            }
                        ]
                    },
                    "confidence": 0.9989913157886589,
                    "id": 1,
                    "text": "the"
                },
                {
                    "boundingBox": {
                        "vertices": [
                            {
                                "x": 368,
                                "y": 52
                            },
                            {
                                "x": 553,
                                "y": 51
                            },
                            {
                                "x": 553,
                                "y": 105
                            },
                            {
                                "x": 368,
                                "y": 105
                            }
                        ]
                    },
                    "confidence": 0.9890200556796326,
                    "id": 2,
                    "text": "words"
                },
                {
                    "boundingBox": {
                        "vertices": [
                            {
                                "x": 214,
                                "y": 131
                            },
                            {
                                "x": 470,
                                "y": 149
                            },
                            {
                                "x": 467,
                                "y": 206
                            },
                            {
                                "x": 210,
                                "y": 188
                            }
                        ]
                    },
                    "confidence": 0.9933670202605895,
                    "id": 3,
                    "text": "hello,"
                },
                {
                    "boundingBox": {
                        "vertices": [
                            {
                                "x": 527,
                                "y": 145
                            },
                            {
                                "x": 748,
                                "y": 143
                            },
                            {
                                "x": 749,
                                "y": 192
                            },
                            {
                                "x": 527,
                                "y": 194
                            }
                        ]
                    },
                    "confidence": 0.986053896846349,
                    "id": 4,
                    "text": "world"
                }
            ]
        }
    ],
    "stored": true,
    "text": "Print the words \nhello, world"
}