ⓘ

Document Parse is new name for Layout Analysis! Layout Analysis will be deprecated on 11/10 and here is migration guide.

Document Parse

Upstage Document Parse is a powerful AI Model designed to automatically convert any document to HTML. It detects layout elements such as paragraphs, tables, images, and more to determine the structure of the document. The API then serializes the elements according to reading order, and finally converts the document into HTML.

Available models

Model	Availability	Release date	Description
document-parse	Latest	2024-09-10	Major update in API spec and changed the model name to `document-parse`. Support for Microsoft Word, Excel, and Powerpoint. Markdown output for tables and list items. Base64 encoding of extracted images for all requested layout categories. `document-parse` is an alias for our latest Document Parse model. (Currently `document-parse-240910`)
layout-analysis-0.4.0 `beta`	Available until Nov 10, 2024	2024-07-04	Improved the accuracy for table recognition. Added new layout elements: `heading1`, `list`, `index`, and `footnote`. Changed the default value for `ocr` field to `false`
layout-analysis-0.3.1 `beta`	Deprecated	2024-06-17	Fixed a bug where extracted text from table elements was truncated.
layout-analysis-0.3.0 `beta`	Deprecated	2024-06-11	Improved the inference speed by 2x for digital-born PDF documents.
layout-analysis-0.2.1 `beta`	Deprecated	2024-05-02	Removed unnecessary `<thead>` tags from table elements and fixed bugs.
layout-analyzer-0.2.0 `beta`	Deprecated	2024-04-04	Improved the accuracy for table recognition and performance for layout detection.
layout-analyzer-0.1.0 `beta`	Deprecated	2024-02-28	A layout analyzer model which detects elements within a document, recognizes tables, and serializes elements according to reading order.

Request

POST https://api.upstage.ai/v1/document-ai/document-parse

Parameters

Request headers

Authorization string Required
Authentication token, format: Bearer API_KEY

Request body

document file Required
The document file to be processed. Supported file formats are listed here.

ocr string Optional
A string value indicating whether to perform OCR inference on the document before layout detection. The possible value is one of auto and force. The default is auto which means that OCR is performed for image input only. When this option is set to auto for PDF or non-image documents, the engine directly extracts text and coordinates from the document without converting it to images. Otherwise, the engine converts the input file to images and performs OCR inference before layout detection.

coordinates boolean Optional
A boolean value indicating wheter to return coordinates of bounding boxes of each layout element. The default is true

output_formats List of string Optional
A list of string value indicating in which each layout element output is formatted. Possible values are text, html, and markdown. The default value is ["html"]

model string Optional
A string value indicating which model is used for inference. The API uses the latest version of model unless user specify certain model version.

base64_encoding List of string Optional
A list of string value indicating which layout category should be provided as base64 encoded string. All category names can be found here. This feature is useful when user wants to crop the layout element from the original document image and store and use it for their own purpose. For example, users can extract image base64 encoding of all tables of the input document with ["table"]. All layout categories can be specified.

Requirements

Supported file formats: JPEG, PNG, BMP, PDF, TIFF, HEIC, DOCX, PPTX, XLSX
Maximum file size: 50MB
Maximum number of pages per file: 100 pages (For files exceeding 100 pages, the first 100 pages are processed. If you want to process more than 100 pages, please use the asynchronous API)
Maximum pixels per page: 100,000,000 pixels. For non-image files, the pixel count is determined after converting to images at a standard of 150 DPI.
Supported character sets (for OCR inferece): Alphanumeric, Hangul, and Hanja are supported. Hanzi and Kanji are in beta versions, which means they are available but not fully supported.

ⓘ

Hanja, Hanzi, and Kanji are writing systems based on Chinese characters used in Korean, Chinese, and Japanese writing systems. Despite sharing similarities, they possess distinct visual representations, pronunciations, meanings, and usage conventions within their respective linguistic contexts. For more information, see this article.

ⓘ

For best results, follow these guidelines:

Use high-resolution documents to ensure legibility of text.

Ensure a minimum document width of 640 pixels.

The performance of the model might vary depending on the text size. Ensure that the smallest text in the image is at least 2.5% of the image's height. For example, if the image is 640 pixels tall, the smallest text should be at least 16 pixels tall.

Response

Functionality overview

Unknown characters: Characters are not recognized by the OCR model are considered "unknown characters" and are marked by the character �.
Response time: Standard documents containing up to 200 words take approximately three seconds. Longer documents can take up to tens of seconds.
Timeout: There is a server-side 5 minutes timeout for all requests.

Return values

api string
A string representing the version of the API being used. A bump in the major version indicates a backward-incompatible update, while a minor version increase signifies a backward-compatible update.

model string
A string representing the version of the model being used.

elements list
A list of element objects, each containing information about the elements's text, confidence score, and bounding box.

elements[].id integer
The unique identifier for an element.

elements[].category string
The category of a given element. Categories are in {paragraph, table, figure, chart, header, footer, caption, equation, heading1, list, index, footnote}.

elements[].page integer
The page number for an element. Starts from 1.

elements[].content.text string
A string representing the text within an element. It usually provides an OCR result for elements.

elements[].content.html string
A string representing the text within an element, in HTML format. See the Categories and HTML tags section for more information.

elements[].content.markdown string
A markdown representing the text within an element. For others than heading1, table, list are the same with text field.

elements[].coordinates list
A list with four (x, y) coordinates defining the corners of an element's bounding box. Each value is 4 digit decimal value between 0 and 1 which is the relative position of input document. In order to get the absolute position in pixel value, users can simply multiply the x and y with the width and height of the image document respectively.

elements[].base64_encoding string
A base64 encoded string value of image cropped with the coordinates of the layout element.

content.text string
A string representing the text of the entire document, typically created by concatenating the pages' serialized texts.

content.html string
A string representing the text of the entire document in HTML format, typically created by concatenating the pages' serialized HTML snippets.

content.markdown string
A mardown representing the text of the entire document, typically created by concatenating the pages' serialized markdown texts.

usage.pages integer
The total count of pages in the input file that have been processed and are chargeable.

Understanding model output

Layout Categories and HTML tags

Upstage Document Parse identifies various layout elements in input documents and generates HTML to represent the input in a digitized format. The layout categories are a predefined set, and you can see detailed examples of all types of layout categories that the models can detect in the example figure below.

The table below explains layout categories and corresponding HTML tags. It uses the specific HTML tag when it represents the same layout category in both printed and HTML documents. If there is no suitable HTML tag for the layout category, it uses a <p> tag with a data-category attribute to explain the layout category detected by the model.

Category	HTML
table	`<table> .. </table>`
figure	`<img> .. </img>`
chart	`<img data-category="chart"> .. </img>`
heading1	`<h1>... </h1>`
header	`<header> .. </header>`
footer	`<footer> .. </footer>`
caption	`<caption> .. </caption>`
paragraph	`<p data-category="paragraph">..</p>`
equation	`<p data-category="equation">..</p>`
list	`<p data-category="list">..</p>`
index	`<p data-category="index">..</p>`
footnote	`<p data-category="footnote"> </p>`

Using Equation output

For the equation category output, the model provides recognized text in LaTex format (opens in a new tab) within the <p data-category="equation"> tag to ensure proper display using popular equation rendering engines like MathJax (opens in a new tab). The output of a sample equation image looks like below.

{
  "category": "equation",
  "content": {
      "html": "<p id='3' data-category='equation'>$$a_{n}=\\sum_{k=1}^{n}{\\frac{2k+1}{\\,1^{2}+2^{2}+3^{2}+\\cdots+k^{2}}}$$</p>",
      "markdown": "$$a_{n}=\\sum_{k=1}^{n}{\\frac{2k+1}{\\,1^{2}+2^{2}+3^{2}+\\cdots+k^{2}}}$$,
      "text": "n 2k+1 \nan  \nk=1 12+22+32 + · · +k2"
  },
}

In the response, the content.html and content.markdown fields contain the recognized equation in LaTex format, while the content.text field contains the OCR result text. Because the OCR model does not support equation recognition at the moment, the content.text field may not contain the correct equation text. For equation category, the OCR result in the content.text field is provided for backward compatibility.

Users may have difficulty rendering the content.html field value properly in their HTML file. This is because the API response in JSON format escapes the \ character. To resolve this issue, they can use JavaScript to unescape the HTML. The example code below demonstrates how to render the equation properly in an HTML file by importing the MathJax library and unescaping the API response.

<body>
    <div id="equation" /> <!-- placeholder for equation -->
    <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
    <script type="text/javascript">
        <!-- put the API response to innerHTML field -->
        document.getElementById('equation').innerHTML = "<p id='5' data-category='equation'>$$f(x)=a_{0}+\\sum_{n=1}^{\\infty}\\left(a_{n}\\cos{\\frac{n\\pi x}{L}}+b_{n}\\sin{\\frac{n\\pi x}{L}}\\right)$$</p>"
    </script>
<body>

Please note that users can also obtain the base64 encoded image of the equation from the original document by specifying ["equation"] in the base64_encoding field of the request body.

Four-decimal format coordinates

From Document Parse 1.0, users can employ relative coordinates when drawing bounding boxes or cropping elements from input documents. The absolute position can be calculated by multiplying the x and y values in the coordinates field of each item by the width and height of the requested image.

"coordinates": [
	{
	  "x": 0.0276,
	  "y": 0.0178
	},
	{
	  "x": 0.1755,
	  "y": 0.0178
	},
	{
	  "x": 0.1755,
	  "y": 0.0641
	},
	{
	  "x": 0.0276,
	  "y": 0.0641
	}
],

Why relative coordinates? This is because users previously experienced difficulties when attempting to render bounding boxes on non-image documents. The Document Parse engine internally converts input documents to images for visual feature detection and recognition. During conversion, it uses various methods to find a balance between image quality and speed, and it ends up selecting different DPI settings based on the input document size. Without the size information of the internally converted image, users were unable to draw bounding boxes accurately on their documents. To address this issue, the DP engine provides the relative position of each bounding box point, allowing it to work with any size of converted image as long as the width-to-height ratio is maintained.

Examples

Request

curl -X POST https://api.upstage.ai/v1/document-ai/document-parse \
-H "Authorization: Bearer UPSTAGE_API_KEY" \
-F "document=@invoice.png"

Response

{
  "api": "2.0",
  "content": {
    "html": "<h1 id='0' style='font-size:22px'>INVOICE</h1>\n<h1 id='1' style='font-size:20px'>Company<br>Upstage</h1>\n<br><h1 id='2' style='font-size:18px'>Invoice ID</h1>\n<br><h1 id='3' style='font-size:14px'>휴 INV-AJ355548</h1>\n<h1 id='4' style='font-size:18px'>Invoice Date</h1>\n<br><h1 id='5' style='font-size:18px'>9/7/1992</h1>\n<h1 id='6' style='font-size:16px'>Mamo<br>Lucy Park</h1>\n<h1 id='7' style='font-size:18px'>Address</h1>\n<br><h1 id='8' style='font-size:16px'>7 Pepper Wood Street, 130 Stone Comer<br>Terrace<br>Wilkes Barre, Pennsylvania, 18768<br>United States</h1>\n<h1 id='9' style='font-size:16px'>Email</h1>\n<br><h1 id='10' style='font-size:16px'>Ikitchenman0@arizona.edu</h1>\n<br><h1 id='11' style='font-size:20px'>Service Details Form</h1>\n<h1 id='12' style='font-size:16px'>Name<br>Sung Kim</h1>\n<h1 id='13' style='font-size:16px'>260 'ess<br>Gwangovolungang:co 338, Gyeongg do.<br>Sanghyeon-dong, Sui-gu<br>Yongin-si, South Korea</h1>\n<h1 id='14' style='font-size:18px'>Additional Request</h1>\n<br><p id='15' data-category='paragraph' style='font-size:14px'>Vivamus vestibulum sagittis sapien. Cum sociis natoque<br>penatibus 항목 magnis dfs parturient montes, nascetur ridiculus<br>mus.</p>\n<h1 id='16' style='font-size:14px'>TERMS AND CONDITIONS</h1>\n<p id='17' data-category='list' style='font-size:14px'>L TM Seir that not be lable 1층 the Buyer drectly indirectly for any loun or damage sufflered by 전액 Buyer<br>2. The 별 www. the product for ore 과 관한 from the date 설 shipment.<br>3. Any ourchase order received by ~ sele - be interpreted 추가 accepting the offer Ma the 18% offer writing The buyer may<br>purchase 15 The offer My the Terms and Conditions the Seller included The offer</p>",
    "markdown": "",
    "text": ""
  },
  "elements": [
    {
      "category": "heading1",
      "content": {
        "html": "<h1 id='0' style='font-size:22px'>INVOICE</h1>",
        "markdown": "",
        "text": ""
      },
      "coordinates": [
        {
          "x": 0.0648,
          "y": 0.0517
        },
        {
          "x": 0.2405,
          "y": 0.0517
        },
        {
          "x": 0.2405,
          "y": 0.0953
        },
        {
          "x": 0.0648,
          "y": 0.0953
        }
      ],
      "id": 0,
      "page": 1
    },
    {
      "category": "heading1",
      "content": {
        "html": "<h1 id='1' style='font-size:20px'>Company<br>Upstage</h1>",
        "markdown": "",
        "text": ""
      },
      "coordinates": [
        {
          "x": 0.0657,
          "y": 0.2651
        },
        {
          "x": 0.1606,
          "y": 0.2651
        },
        {
          "x": 0.1606,
          "y": 0.3168
        },
        {
          "x": 0.0657,
          "y": 0.3168
        }
      ],
      "id": 1,
      "page": 1
    },
    {
      "category": "heading1",
      "content": {
        "html": "<br><h1 id='2' style='font-size:18px'>Invoice ID</h1>",
        "markdown": "",
        "text": ""
      },
      "coordinates": [
        {
          "x": 0.5712,
          "y": 0.0748
        },
        {
          "x": 0.671,
          "y": 0.0748
        },
        {
          "x": 0.671,
          "y": 0.101
        },
        {
          "x": 0.5712,
          "y": 0.101
        }
      ],
      "id": 2,
      "page": 1
    },
    {
      "category": "heading1",
      "content": {
        "html": "<br><h1 id='3' style='font-size:14px'>휴 INV-AJ355548</h1>",
        "markdown": "",
        "text": ""
      },
      "coordinates": [
        {
          "x": 0.788,
          "y": 0.076
        },
        {
          "x": 0.9287,
          "y": 0.076
        },
        {
          "x": 0.9287,
          "y": 0.0972
        },
        {
          "x": 0.788,
          "y": 0.0972
        }
      ],
      "id": 3,
      "page": 1
    },
    {
      "category": "heading1",
      "content": {
        "html": "<h1 id='4' style='font-size:18px'>Invoice Date</h1>",
        "markdown": "",
        "text": ""
      },
      "coordinates": [
        {
          "x": 0.572,
          "y": 0.1232
        },
        {
          "x": 0.6941,
          "y": 0.1232
        },
        {
          "x": 0.6941,
          "y": 0.1484
        },
        {
          "x": 0.572,
          "y": 0.1484
        }
      ],
      "id": 4,
      "page": 1
    },
    {
      "category": "heading1",
      "content": {
        "html": "<br><h1 id='5' style='font-size:18px'>9/7/1992</h1>",
        "markdown": "",
        "text": ""
      },
      "coordinates": [
        {
          "x": 0.8525,
          "y": 0.1224
        },
        {
          "x": 0.9293,
          "y": 0.1224
        },
        {
          "x": 0.9293,
          "y": 0.1468
        },
        {
          "x": 0.8525,
          "y": 0.1468
        }
      ],
      "id": 5,
      "page": 1
    },
    {
      "category": "heading1",
      "content": {
        "html": "<h1 id='6' style='font-size:16px'>Mamo<br>Lucy Park</h1>",
        "markdown": "",
        "text": ""
      },
      "coordinates": [
        {
          "x": 0.0658,
          "y": 0.3331
        },
        {
          "x": 0.15,
          "y": 0.3331
        },
        {
          "x": 0.15,
          "y": 0.3846
        },
        {
          "x": 0.0658,
          "y": 0.3846
        }
      ],
      "id": 6,
      "page": 1
    },
    {
      "category": "heading1",
      "content": {
        "html": "<h1 id='7' style='font-size:18px'>Address</h1>",
        "markdown": "",
        "text": ""
      },
      "coordinates": [
        {
          "x": 0.0662,
          "y": 0.4061
        },
        {
          "x": 0.1482,
          "y": 0.4061
        },
        {
          "x": 0.1482,
          "y": 0.4286
        },
        {
          "x": 0.0662,
          "y": 0.4286
        }
      ],
      "id": 7,
      "page": 1
    },
    {
      "category": "heading1",
      "content": {
        "html": "<br><h1 id='8' style='font-size:16px'>7 Pepper Wood Street, 130 Stone Comer<br>Terrace<br>Wilkes Barre, Pennsylvania, 18768<br>United States</h1>",
        "markdown": "",
        "text": ""
      },
      "coordinates": [
        {
          "x": 0.067,
          "y": 0.4332
        },
        {
          "x": 0.3962,
          "y": 0.4332
        },
        {
          "x": 0.3962,
          "y": 0.5173
        },
        {
          "x": 0.067,
          "y": 0.5173
        }
      ],
      "id": 8,
      "page": 1
    },
    {
      "category": "heading1",
      "content": {
        "html": "<h1 id='9' style='font-size:16px'>Email</h1>",
        "markdown": "",
        "text": ""
      },
      "coordinates": [
        {
          "x": 0.0656,
          "y": 0.5326
        },
        {
          "x": 0.1235,
          "y": 0.5326
        },
        {
          "x": 0.1235,
          "y": 0.5579
        },
        {
          "x": 0.0656,
          "y": 0.5579
        }
      ],
      "id": 9,
      "page": 1
    },
    {
      "category": "heading1",
      "content": {
        "html": "<br><h1 id='10' style='font-size:16px'>Ikitchenman0@arizona.edu</h1>",
        "markdown": "",
        "text": ""
      },
      "coordinates": [
        {
          "x": 0.0654,
          "y": 0.5614
        },
        {
          "x": 0.2874,
          "y": 0.5614
        },
        {
          "x": 0.2874,
          "y": 0.5834
        },
        {
          "x": 0.0654,
          "y": 0.5834
        }
      ],
      "id": 10,
      "page": 1
    },
    {
      "category": "heading1",
      "content": {
        "html": "<br><h1 id='11' style='font-size:20px'>Service Details Form</h1>",
        "markdown": "",
        "text": ""
      },
      "coordinates": [
        {
          "x": 0.5727,
          "y": 0.2112
        },
        {
          "x": 0.8149,
          "y": 0.2112
        },
        {
          "x": 0.8149,
          "y": 0.2417
        },
        {
          "x": 0.5727,
          "y": 0.2417
        }
      ],
      "id": 11,
      "page": 1
    },
    {
      "category": "heading1",
      "content": {
        "html": "<h1 id='12' style='font-size:16px'>Name<br>Sung Kim</h1>",
        "markdown": "",
        "text": ""
      },
      "coordinates": [
        {
          "x": 0.573,
          "y": 0.2657
        },
        {
          "x": 0.6563,
          "y": 0.2657
        },
        {
          "x": 0.6563,
          "y": 0.3177
        },
        {
          "x": 0.573,
          "y": 0.3177
        }
      ],
      "id": 12,
      "page": 1
    },
    {
      "category": "heading1",
      "content": {
        "html": "<h1 id='13' style='font-size:16px'>260 'ess<br>Gwangovolungang:co 338, Gyeongg do.<br>Sanghyeon-dong, Sui-gu<br>Yongin-si, South Korea</h1>",
        "markdown": "",
        "text": ""
      },
      "coordinates": [
        {
          "x": 0.5724,
          "y": 0.3401
        },
        {
          "x": 0.891,
          "y": 0.3401
        },
        {
          "x": 0.891,
          "y": 0.4232
        },
        {
          "x": 0.5724,
          "y": 0.4232
        }
      ],
      "id": 13,
      "page": 1
    },
    {
      "category": "heading1",
      "content": {
        "html": "<h1 id='14' style='font-size:18px'>Additional Request</h1>",
        "markdown": "",
        "text": ""
      },
      "coordinates": [
        {
          "x": 0.0648,
          "y": 0.6681
        },
        {
          "x": 0.2482,
          "y": 0.6681
        },
        {
          "x": 0.2482,
          "y": 0.6962
        },
        {
          "x": 0.0648,
          "y": 0.6962
        }
      ],
      "id": 14,
      "page": 1
    },
    {
      "category": "paragraph",
      "content": {
        "html": "<br><p id='15' data-category='paragraph' style='font-size:14px'>Vivamus vestibulum sagittis sapien. Cum sociis natoque<br>penatibus 항목 magnis dfs parturient montes, nascetur ridiculus<br>mus.</p>",
        "markdown": "",
        "text": ""
      },
      "coordinates": [
        {
          "x": 0.4191,
          "y": 0.6684
        },
        {
          "x": 0.9132,
          "y": 0.6684
        },
        {
          "x": 0.9132,
          "y": 0.7332
        },
        {
          "x": 0.4191,
          "y": 0.7332
        }
      ],
      "id": 15,
      "page": 1
    },
    {
      "category": "heading1",
      "content": {
        "html": "<h1 id='16' style='font-size:14px'>TERMS AND CONDITIONS</h1>",
        "markdown": "",
        "text": ""
      },
      "coordinates": [
        {
          "x": 0.0649,
          "y": 0.8303
        },
        {
          "x": 0.2506,
          "y": 0.8303
        },
        {
          "x": 0.2506,
          "y": 0.8523
        },
        {
          "x": 0.0649,
          "y": 0.8523
        }
      ],
      "id": 16,
      "page": 1
    },
    {
      "category": "list",
      "content": {
        "html": "<p id='17' data-category='list' style='font-size:14px'>L TM Seir that not be lable 1층 the Buyer drectly indirectly for any loun or damage sufflered by 전액 Buyer<br>2. The 별 www. the product for ore 과 관한 from the date 설 shipment.<br>3. Any ourchase order received by ~ sele - be interpreted 추가 accepting the offer Ma the 18% offer writing The buyer may<br>purchase 15 The offer My the Terms and Conditions the Seller included The offer</p>",
        "markdown": "",
        "text": ""
      },
      "coordinates": [
        {
          "x": 0.0679,
          "y": 0.8717
        },
        {
          "x": 0.9261,
          "y": 0.8717
        },
        {
          "x": 0.9261,
          "y": 0.9558
        },
        {
          "x": 0.0679,
          "y": 0.9558
        }
      ],
      "id": 17,
      "page": 1
    }
  ],
  "model": "document-parse-240910",
  "usage": {
    "pages": 1
  }
}

Function calling Asynchronous