Document Parse (Asynchronous)

How it works

Users can submit an inference request for an image or a PDF document with up to 1000 pages using the asynchronous Inference Request API in Document Parse. Upon receiving the request, the API immediately returns a Request ID. After the input file is transferred and validated, it is divided into batches of 10 pages each, and inference is performed on each batch. Users can check the status of each batch in real-time using the Inference Response API. The inference results of each batch can be downloaded using the provided download_url in the response. The results will be available for downloading for 30 days from the time of the request.

Inference Request

POST https://api.upstage.ai/v1/document-ai/async/document-parse

Parameters

Request headers

Authorization string Required
Authentication token, format: Bearer API_KEY

Request body

document file Required
The document file to be processed. Supported file formats are listed here.

ocr string Optional
A string value indicating whether to perform OCR inference on the document before layout detection. The possible value is one of auto and force. The default is auto which means that OCR is performed for image input only. When this option is set to auto for PDF or non-image documents, the engine directly extracts text and coordinates from the document without converting it to images. Otherwise, the engine converts the input file to images and performs OCR inference before layout detection.

coordinates boolean Optional
A boolean value indicating wheter to return coordinates of bounding boxes of each layout element. The default is true

output_formats List of string Optional
A list of string value indicating in which each layout element output is formatted. Possible values are text, html, and markdown. The default value is ["html"]

model string Optional
A string value indicating which model is used for inference. The API uses the latest version of model unless user specify certain model version.

base64_encoding List of string Optional
A list of string value indicating which layout category should be provided as base64 encoded string. All category names can be found here. This feature is useful when user wants to crop the layout element from the original document image and store and use it for their own purpose. For example, users can extract image base64 encoding of all tables of the input document with ["table"] All layout categories can be specified.

Requirements

Supported file formats: JPEG, PNG, BMP, PDF, TIFF, HEIC, DOCX, PPTX, XLSX
Maximum file size: 50MB
Maximum number of pages per file: 1000 pages (For files exceeding 1000 pages, the API will return error)
Maximum pixels per page: 100,000,000 pixels. For non-image files, the pixel count is determined after converting to images at a standard of 150 DPI.
Supported character sets (for OCR inferece): Alphanumeric, Hangul, and Hanja are supported. Hanzi and Kanji are in beta versions, which means they are available but not fully supported.

ⓘ

Hanja, Hanzi, and Kanji are writing systems based on Chinese characters used in Korean, Chinese, and Japanese writing systems. Despite sharing similarities, they possess distinct visual representations, pronunciations, meanings, and usage conventions within their respective linguistic contexts. For more information, see this article.

ⓘ

For best results, follow these guidelines:

Use high-resolution documents to ensure legibility of text.

Ensure a minimum document width of 640 pixels.

The performance of the model might vary depending on the text size. Ensure that the smallest text in the image is at least 2.5% of the image's height. For example, if the image is 640 pixels tall, the smallest text should be at least 16 pixels tall.

Returns

Request ID for the request.

{
    "request_id": "e7b1b3b0-1b3b-4b3b-8b3b-1b3b3b3b3b3b"
}

Retrieve inferece results

GET https://api.upstage.ai/v1/document-ai/requests/{request_id}

Parameters

Request headers

Authorization string Required
Authentication token, format: Bearer API_KEY

Returns

id string
Unique identifier for the request.

status string
Current status of the request. Possible values are submitted, started, completed, and failed.

model string
A string representing the version of the model being used.

failure_message string
A message indicating the reason for failure, if any.

total_pages integer
Total number of pages in the document.

completed_pages integer
Number of pages processed.

batches list
A list of batches containing the inference results and detail for each batch.

batches[].id string
Unique identifier for the batch.

batches[].model string
A string representing the version of the model being used.

batches[].status string
Status of the batch. Possible values are submitted, started, completed, failed, and retrying.

batches[].failure_message string
A message indicating the reason for failure, if any.

batches[].download_url string
A presigned URL to download the inference results of the batch. The schema of the file is described in the Return Values section. Please note that the download_url for inference results is generated each time a user retrieves the results and is valid only for 15 minutes.

batches[].start_page integer
Page number where the batch starts. Starts from 1 for the first batch.

batches[].end_page integer
Page number where the batch ends.

batches[].requested_at date
The date and time when the batch was requested, formatted according to RFC3339.

batches[].updated_at date
The date and time when the status of the batch was last updated, formatted according to RFC3339.

requested_at date
The date and time when the request was submitted, formatted according to RFC3339.

updated_at date
The date and time when the status of the request was last updated, formatted according to RFC3339.

Inference History

GET https://api.upstage.ai/v1/document-ai/requests

Parameters

Request headers

Authorization string Required
Authentication token, format: Bearer API_KEY

Returns

id string
Unique identifier for the request.

status string
Current status of the request. Possible values are scheduled, started, completed, and failed.

model string
A string representing the version of the model being used.

requested_at date
The date and time when the request was submitted, formatted according to ISO 8601 standards.

updated_at date
The date and time when the status of the request was last updated, formatted according to ISO 8601 standards.

Error Handlings

In asynchronous APIs, errors can occur in three different scenarios: request errors, batch-inference errors, or failures to retrieve the request result.

Request error

This error occurs when the input document cannot be handled by the API or if there's an error during processing, such as image conversion or page split. In case of a request failure, instead of returning a request ID, the API returns an object with error code and message.

{ 
  code: xxx, # http status code
  message: "", # detail error message 
}

The table below shows the error message and code for typical error cases.

error message	http status code
invalid model name	400
no document in the request	400
uploaded document is too large. max allowed size is 50MB	413
unsupported document format	415

Batch inference error

Due to the engine dividing the input document into batches of 10 pages each, each batch may have a different status for model inference. As the input file has already been validated before inference, batch inference errors are most likely caused by internal server errors. When this occurs, the API response for retrieving the result will show a failed status in the batches field with a failure_message.

Below code block shows the schema of batches field in the API response for inference result.

"batches": { 
  "id": number; 
  "model": string; 
  "status": scheduled | started | completed | failed | retrying;
  "failure_message": string; 
  "download_url": string;
  "start_page": number; 
  "end_page": number; 
  "requested_at": date; 
  "updated_at": date;
}[]

Failures to retrieve the request result

Normally, this error can occur when the download_url for each batch fails to generate. Due to security reasons, the download URL is only valid for 15 minutes and a new one is generated every time the user retrieves the inference result. When a user submits a long document, the server needs to create a download URL for each batch, and sometimes the server may struggle to create a high number of pre-signed URLs.

Example

Inference Request

Request

curl -X POST https://api.upstage.ai/v1/document-ai/async/document-parse \
-H "Authorization: Bearer UPSTAGE_API_KEY" \
-F "document=@invoice.png" \
-F "ocr=true"

Returns

{
    "request_id": "e7b1b3b0-1b3b-4b3b-8b3b-1b3b3b3b3b3b"
}

Retrieve inferece results

Request

curl https://api.upstage.ai/v1/document-ai/requests/REQUEST_ID \
-H "Authorization: Bearer UPSTAGE_API_KEY"

Returns

 
{
    "id": "e7b1b3b0-1b3b-4b3b-8b3b-1b3b3b3b3b3b",
    "status": "completed",
    "model": "document-parse",
    "failure_message": "",
    "total_pages": 28,
    "completed_pages": 28,
    "batches": [
        {
            "id": 0,
            "model": "document-parse-240910",
            "status": "completed",
            "failure_message": "",
            "download_url": "https://download-url",
            "start_page": 1,
            "end_page": 10,
            "requested_at": "2024-07-01T14:47:01.863880448Z",
            "updated_at": "2024-07-01T14:47:15.901662097Z"
        },
        {
            "id": 1,
            "model": "document-parse-240910",
            "status": "completed",
            "failure_message": "",
            "download_url": "https://download-url",
            "start_page": 11,
            "end_page": 20,
            "requested_at": "2024-07-01T14:47:01.863880448Z",
            "updated_at": "2024-07-01T14:47:13.59782266Z"
        },
        {
            "id": 2,
            "model": "document-parse-240910",
            "status": "completed",
            "failure_message": "",
            "download_url": "https://download-url",
            "start_page": 21,
            "end_page": 28,
            "requested_at": "2024-07-01T14:47:01.863880448Z",
            "updated_at": "2024-07-01T14:47:37.061016766Z"
        }
    ],
    "requested_at": "2024-07-01T14:47:01.863880448Z",
    "completed_at": "2024-07-01T14:47:43.023542045Z"
}

Inference History

Request

curl https://api.upstage.ai/v1/document-ai/requests \
-H "Authorization: Bearer UPSTAGE_API_KEY"

Returns

{
    "requests": [
        {
            "id": "e7b1b3b0-1b3b-4b3b-8b3b-1b3b3b3b3b3b",
            "status": "completed",
            "model": "document-parse",
            "requested_at": "2024-07-01T14:47:01.863880448Z",
            "completed_at": "2024-07-01T14:47:43.023542045Z"
        }
    ]
}

Document Parse Migrating from Layout Analysis