Document Parse is new name for Layout Analysis! Layout Analysis will be deprecated on 11/10 and here is migration guide.
Document Parse
Upstage Document Parse is a powerful AI Model designed to automatically convert any document to HTML. It detects layout elements such as paragraphs, tables, images, and more to determine the structure of the document. The API then serializes the elements according to reading order, and finally converts the document into HTML.
Available models
Model | Availability | Release date | Description |
---|---|---|---|
document-parse | Latest | 2024-09-10 | Major update in API spec and changed the model name to document-parse . Support for Microsoft Word, Excel, and Powerpoint. Markdown output for tables and list items. Base64 encoding of extracted images for all requested layout categories.document-parse is an alias for our latest Document Parse model. (Currently document-parse-240910 ) |
layout-analysis-0.4.0 beta | Available until Nov 10, 2024 | 2024-07-04 | Improved the accuracy for table recognition. Added new layout elements: heading1 , list , index , and footnote . Changed the default value for ocr field to false |
layout-analysis-0.3.1 beta | Deprecated | 2024-06-17 | Fixed a bug where extracted text from table elements was truncated. |
layout-analysis-0.3.0 beta | Deprecated | 2024-06-11 | Improved the inference speed by 2x for digital-born PDF documents. |
layout-analysis-0.2.1 beta | Deprecated | 2024-05-02 | Removed unnecessary <thead> tags from table elements and fixed bugs. |
layout-analyzer-0.2.0 beta | Deprecated | 2024-04-04 | Improved the accuracy for table recognition and performance for layout detection. |
layout-analyzer-0.1.0 beta | Deprecated | 2024-02-28 | A layout analyzer model which detects elements within a document, recognizes tables, and serializes elements according to reading order. |
Request
POST https://api.upstage.ai/v1/document-ai/document-parse
Parameters
Request headers
Authorization string Required |
Request body
document file Required |
ocr string Optional |
coordinates boolean Optional |
output_formats List of string Optional |
model string Optional |
base64_encoding List of string Optional |
Requirements
- Supported file formats: JPEG, PNG, BMP, PDF, TIFF, HEIC, DOCX, PPTX, XLSX
- Maximum file size: 50MB
- Maximum number of pages per file: 100 pages (For files exceeding 100 pages, the first 100 pages are processed. If you want to process more than 100 pages, please use the asynchronous API)
- Maximum pixels per page: 100,000,000 pixels. For non-image files, the pixel count is determined after converting to images at a standard of 150 DPI.
- Supported character sets (for OCR inferece): Alphanumeric, Hangul, and Hanja are supported. Hanzi and Kanji are in beta versions, which means they are available but not fully supported.
Hanja, Hanzi, and Kanji are writing systems based on Chinese characters used in Korean, Chinese, and Japanese writing systems. Despite sharing similarities, they possess distinct visual representations, pronunciations, meanings, and usage conventions within their respective linguistic contexts. For more information, see this article.
For best results, follow these guidelines:
- Use high-resolution documents to ensure legibility of text.
- Ensure a minimum document width of 640 pixels.
- The performance of the model might vary depending on the text size. Ensure that the smallest text in the image is at least 2.5% of the image's height. For example, if the image is 640 pixels tall, the smallest text should be at least 16 pixels tall.
Response
Functionality overview
- Unknown characters: Characters are not recognized by the OCR model are considered "unknown characters" and are marked by the character
�
. - Response time: Standard documents containing up to 200 words take approximately three seconds. Longer documents can take up to tens of seconds.
- Timeout: There is a server-side 5 minutes timeout for all requests.
Return values
api string |
model string |
elements list |
elements[].id integer |
elements[].category string |
elements[].page integer |
elements[].content.text string |
elements[].content.html string |
elements[].content.markdown string |
elements[].coordinates list |
elements[].base64_encoding string |
content.text string |
content.html string |
content.markdown string |
usage.pages integer |
Understanding model output
Layout Categories and HTML tags
Upstage Document Parse identifies various layout elements in input documents and generates HTML to represent the input in a digitized format. The layout categories are a predefined set, and you can see detailed examples of all types of layout categories that the models can detect in the example figure below.
The table below explains layout categories and corresponding HTML tags.
It uses the specific HTML tag when it represents the same layout category in both printed and HTML documents.
If there is no suitable HTML tag for the layout category, it uses a <p>
tag with a data-category
attribute to explain the layout category detected by the model.
Category | HTML |
---|---|
table | <table> .. </table> |
figure | <img> .. </img> |
chart | <img data-category="chart"> .. </img> |
heading1 | <h1>... </h1> |
header | <header> .. </header> |
footer | <footer> .. </footer> |
caption | <caption> .. </caption> |
paragraph | <p data-category="paragraph">..</p> |
equation | <p data-category="equation">..</p> |
list | <p data-category="list">..</p> |
index | <p data-category="index">..</p> |
footnote | <p data-category="footnote"> </p> |
Using Equation output
For the equation
category output, the model provides recognized text in LaTex format (opens in a new tab) within the <p data-category="equation">
tag to ensure proper display using popular equation rendering engines like MathJax (opens in a new tab).
The output of a sample equation image looks like below.
{
"category": "equation",
"content": {
"html": "<p id='3' data-category='equation'>$$a_{n}=\\sum_{k=1}^{n}{\\frac{2k+1}{\\,1^{2}+2^{2}+3^{2}+\\cdots+k^{2}}}$$</p>",
"markdown": "$$a_{n}=\\sum_{k=1}^{n}{\\frac{2k+1}{\\,1^{2}+2^{2}+3^{2}+\\cdots+k^{2}}}$$,
"text": "n 2k+1 \nan \nk=1 12+22+32 + · · +k2"
},
}
In the response, the content.html
and content.markdown
fields contain the recognized equation in LaTex format, while the content.text
field contains the OCR result text.
Because the OCR model does not support equation recognition at the moment, the content.text
field may not contain the correct equation text. For equation category, the OCR result in the content.text
field is provided for backward compatibility.
Users may have difficulty rendering the content.html
field value properly in their HTML file.
This is because the API response in JSON format escapes the \
character. To resolve this issue, they can use JavaScript to unescape the HTML.
The example code below demonstrates how to render the equation properly in an HTML file by importing the MathJax library and unescaping the API response.
<body>
<div id="equation" /> <!-- placeholder for equation -->
<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
<script type="text/javascript">
<!-- put the API response to innerHTML field -->
document.getElementById('equation').innerHTML = "<p id='5' data-category='equation'>$$f(x)=a_{0}+\\sum_{n=1}^{\\infty}\\left(a_{n}\\cos{\\frac{n\\pi x}{L}}+b_{n}\\sin{\\frac{n\\pi x}{L}}\\right)$$</p>"
</script>
<body>
Please note that users can also obtain the base64 encoded image of the equation from the original document by specifying ["equation"]
in the base64_encoding
field of the request body.
Four-decimal format coordinates
From Document Parse 1.0, users can employ relative coordinates when drawing bounding boxes or cropping elements from input documents. The absolute position can be calculated by multiplying the x
and y
values in the coordinates
field of each item by the width and height of the requested image.
"coordinates": [
{
"x": 0.0276,
"y": 0.0178
},
{
"x": 0.1755,
"y": 0.0178
},
{
"x": 0.1755,
"y": 0.0641
},
{
"x": 0.0276,
"y": 0.0641
}
],
Why relative coordinates? This is because users previously experienced difficulties when attempting to render bounding boxes on non-image documents. The Document Parse engine internally converts input documents to images for visual feature detection and recognition. During conversion, it uses various methods to find a balance between image quality and speed, and it ends up selecting different DPI settings based on the input document size. Without the size information of the internally converted image, users were unable to draw bounding boxes accurately on their documents. To address this issue, the DP engine provides the relative position of each bounding box point, allowing it to work with any size of converted image as long as the width-to-height ratio is maintained.
Examples
Request
curl -X POST https://api.upstage.ai/v1/document-ai/document-parse \
-H "Authorization: Bearer UPSTAGE_API_KEY" \
-F "document=@invoice.png"
Response
{
"api": "2.0",
"content": {
"html": "<h1 id='0' style='font-size:22px'>INVOICE</h1>\n<h1 id='1' style='font-size:20px'>Company<br>Upstage</h1>\n<br><h1 id='2' style='font-size:18px'>Invoice ID</h1>\n<br><h1 id='3' style='font-size:14px'>휴 INV-AJ355548</h1>\n<h1 id='4' style='font-size:18px'>Invoice Date</h1>\n<br><h1 id='5' style='font-size:18px'>9/7/1992</h1>\n<h1 id='6' style='font-size:16px'>Mamo<br>Lucy Park</h1>\n<h1 id='7' style='font-size:18px'>Address</h1>\n<br><h1 id='8' style='font-size:16px'>7 Pepper Wood Street, 130 Stone Comer<br>Terrace<br>Wilkes Barre, Pennsylvania, 18768<br>United States</h1>\n<h1 id='9' style='font-size:16px'>Email</h1>\n<br><h1 id='10' style='font-size:16px'>Ikitchenman0@arizona.edu</h1>\n<br><h1 id='11' style='font-size:20px'>Service Details Form</h1>\n<h1 id='12' style='font-size:16px'>Name<br>Sung Kim</h1>\n<h1 id='13' style='font-size:16px'>260 'ess<br>Gwangovolungang:co 338, Gyeongg do.<br>Sanghyeon-dong, Sui-gu<br>Yongin-si, South Korea</h1>\n<h1 id='14' style='font-size:18px'>Additional Request</h1>\n<br><p id='15' data-category='paragraph' style='font-size:14px'>Vivamus vestibulum sagittis sapien. Cum sociis natoque<br>penatibus 항목 magnis dfs parturient montes, nascetur ridiculus<br>mus.</p>\n<h1 id='16' style='font-size:14px'>TERMS AND CONDITIONS</h1>\n<p id='17' data-category='list' style='font-size:14px'>L TM Seir that not be lable 1층 the Buyer drectly indirectly for any loun or damage sufflered by 전액 Buyer<br>2. The 별 www. the product for ore 과 관한 from the date 설 shipment.<br>3. Any ourchase order received by ~ sele - be interpreted 추가 accepting the offer Ma the 18% offer writing The buyer may<br>purchase 15 The offer My the Terms and Conditions the Seller included The offer</p>",
"markdown": "",
"text": ""
},
"elements": [
{
"category": "heading1",
"content": {
"html": "<h1 id='0' style='font-size:22px'>INVOICE</h1>",
"markdown": "",
"text": ""
},
"coordinates": [
{
"x": 0.0648,
"y": 0.0517
},
{
"x": 0.2405,
"y": 0.0517
},
{
"x": 0.2405,
"y": 0.0953
},
{
"x": 0.0648,
"y": 0.0953
}
],
"id": 0,
"page": 1
},
{
"category": "heading1",
"content": {
"html": "<h1 id='1' style='font-size:20px'>Company<br>Upstage</h1>",
"markdown": "",
"text": ""
},
"coordinates": [
{
"x": 0.0657,
"y": 0.2651
},
{
"x": 0.1606,
"y": 0.2651
},
{
"x": 0.1606,
"y": 0.3168
},
{
"x": 0.0657,
"y": 0.3168
}
],
"id": 1,
"page": 1
},
{
"category": "heading1",
"content": {
"html": "<br><h1 id='2' style='font-size:18px'>Invoice ID</h1>",
"markdown": "",
"text": ""
},
"coordinates": [
{
"x": 0.5712,
"y": 0.0748
},
{
"x": 0.671,
"y": 0.0748
},
{
"x": 0.671,
"y": 0.101
},
{
"x": 0.5712,
"y": 0.101
}
],
"id": 2,
"page": 1
},
{
"category": "heading1",
"content": {
"html": "<br><h1 id='3' style='font-size:14px'>휴 INV-AJ355548</h1>",
"markdown": "",
"text": ""
},
"coordinates": [
{
"x": 0.788,
"y": 0.076
},
{
"x": 0.9287,
"y": 0.076
},
{
"x": 0.9287,
"y": 0.0972
},
{
"x": 0.788,
"y": 0.0972
}
],
"id": 3,
"page": 1
},
{
"category": "heading1",
"content": {
"html": "<h1 id='4' style='font-size:18px'>Invoice Date</h1>",
"markdown": "",
"text": ""
},
"coordinates": [
{
"x": 0.572,
"y": 0.1232
},
{
"x": 0.6941,
"y": 0.1232
},
{
"x": 0.6941,
"y": 0.1484
},
{
"x": 0.572,
"y": 0.1484
}
],
"id": 4,
"page": 1
},
{
"category": "heading1",
"content": {
"html": "<br><h1 id='5' style='font-size:18px'>9/7/1992</h1>",
"markdown": "",
"text": ""
},
"coordinates": [
{
"x": 0.8525,
"y": 0.1224
},
{
"x": 0.9293,
"y": 0.1224
},
{
"x": 0.9293,
"y": 0.1468
},
{
"x": 0.8525,
"y": 0.1468
}
],
"id": 5,
"page": 1
},
{
"category": "heading1",
"content": {
"html": "<h1 id='6' style='font-size:16px'>Mamo<br>Lucy Park</h1>",
"markdown": "",
"text": ""
},
"coordinates": [
{
"x": 0.0658,
"y": 0.3331
},
{
"x": 0.15,
"y": 0.3331
},
{
"x": 0.15,
"y": 0.3846
},
{
"x": 0.0658,
"y": 0.3846
}
],
"id": 6,
"page": 1
},
{
"category": "heading1",
"content": {
"html": "<h1 id='7' style='font-size:18px'>Address</h1>",
"markdown": "",
"text": ""
},
"coordinates": [
{
"x": 0.0662,
"y": 0.4061
},
{
"x": 0.1482,
"y": 0.4061
},
{
"x": 0.1482,
"y": 0.4286
},
{
"x": 0.0662,
"y": 0.4286
}
],
"id": 7,
"page": 1
},
{
"category": "heading1",
"content": {
"html": "<br><h1 id='8' style='font-size:16px'>7 Pepper Wood Street, 130 Stone Comer<br>Terrace<br>Wilkes Barre, Pennsylvania, 18768<br>United States</h1>",
"markdown": "",
"text": ""
},
"coordinates": [
{
"x": 0.067,
"y": 0.4332
},
{
"x": 0.3962,
"y": 0.4332
},
{
"x": 0.3962,
"y": 0.5173
},
{
"x": 0.067,
"y": 0.5173
}
],
"id": 8,
"page": 1
},
{
"category": "heading1",
"content": {
"html": "<h1 id='9' style='font-size:16px'>Email</h1>",
"markdown": "",
"text": ""
},
"coordinates": [
{
"x": 0.0656,
"y": 0.5326
},
{
"x": 0.1235,
"y": 0.5326
},
{
"x": 0.1235,
"y": 0.5579
},
{
"x": 0.0656,
"y": 0.5579
}
],
"id": 9,
"page": 1
},
{
"category": "heading1",
"content": {
"html": "<br><h1 id='10' style='font-size:16px'>Ikitchenman0@arizona.edu</h1>",
"markdown": "",
"text": ""
},
"coordinates": [
{
"x": 0.0654,
"y": 0.5614
},
{
"x": 0.2874,
"y": 0.5614
},
{
"x": 0.2874,
"y": 0.5834
},
{
"x": 0.0654,
"y": 0.5834
}
],
"id": 10,
"page": 1
},
{
"category": "heading1",
"content": {
"html": "<br><h1 id='11' style='font-size:20px'>Service Details Form</h1>",
"markdown": "",
"text": ""
},
"coordinates": [
{
"x": 0.5727,
"y": 0.2112
},
{
"x": 0.8149,
"y": 0.2112
},
{
"x": 0.8149,
"y": 0.2417
},
{
"x": 0.5727,
"y": 0.2417
}
],
"id": 11,
"page": 1
},
{
"category": "heading1",
"content": {
"html": "<h1 id='12' style='font-size:16px'>Name<br>Sung Kim</h1>",
"markdown": "",
"text": ""
},
"coordinates": [
{
"x": 0.573,
"y": 0.2657
},
{
"x": 0.6563,
"y": 0.2657
},
{
"x": 0.6563,
"y": 0.3177
},
{
"x": 0.573,
"y": 0.3177
}
],
"id": 12,
"page": 1
},
{
"category": "heading1",
"content": {
"html": "<h1 id='13' style='font-size:16px'>260 'ess<br>Gwangovolungang:co 338, Gyeongg do.<br>Sanghyeon-dong, Sui-gu<br>Yongin-si, South Korea</h1>",
"markdown": "",
"text": ""
},
"coordinates": [
{
"x": 0.5724,
"y": 0.3401
},
{
"x": 0.891,
"y": 0.3401
},
{
"x": 0.891,
"y": 0.4232
},
{
"x": 0.5724,
"y": 0.4232
}
],
"id": 13,
"page": 1
},
{
"category": "heading1",
"content": {
"html": "<h1 id='14' style='font-size:18px'>Additional Request</h1>",
"markdown": "",
"text": ""
},
"coordinates": [
{
"x": 0.0648,
"y": 0.6681
},
{
"x": 0.2482,
"y": 0.6681
},
{
"x": 0.2482,
"y": 0.6962
},
{
"x": 0.0648,
"y": 0.6962
}
],
"id": 14,
"page": 1
},
{
"category": "paragraph",
"content": {
"html": "<br><p id='15' data-category='paragraph' style='font-size:14px'>Vivamus vestibulum sagittis sapien. Cum sociis natoque<br>penatibus 항목 magnis dfs parturient montes, nascetur ridiculus<br>mus.</p>",
"markdown": "",
"text": ""
},
"coordinates": [
{
"x": 0.4191,
"y": 0.6684
},
{
"x": 0.9132,
"y": 0.6684
},
{
"x": 0.9132,
"y": 0.7332
},
{
"x": 0.4191,
"y": 0.7332
}
],
"id": 15,
"page": 1
},
{
"category": "heading1",
"content": {
"html": "<h1 id='16' style='font-size:14px'>TERMS AND CONDITIONS</h1>",
"markdown": "",
"text": ""
},
"coordinates": [
{
"x": 0.0649,
"y": 0.8303
},
{
"x": 0.2506,
"y": 0.8303
},
{
"x": 0.2506,
"y": 0.8523
},
{
"x": 0.0649,
"y": 0.8523
}
],
"id": 16,
"page": 1
},
{
"category": "list",
"content": {
"html": "<p id='17' data-category='list' style='font-size:14px'>L TM Seir that not be lable 1층 the Buyer drectly indirectly for any loun or damage sufflered by 전액 Buyer<br>2. The 별 www. the product for ore 과 관한 from the date 설 shipment.<br>3. Any ourchase order received by ~ sele - be interpreted 추가 accepting the offer Ma the 18% offer writing The buyer may<br>purchase 15 The offer My the Terms and Conditions the Seller included The offer</p>",
"markdown": "",
"text": ""
},
"coordinates": [
{
"x": 0.0679,
"y": 0.8717
},
{
"x": 0.9261,
"y": 0.8717
},
{
"x": 0.9261,
"y": 0.9558
},
{
"x": 0.0679,
"y": 0.9558
}
],
"id": 17,
"page": 1
}
],
"model": "document-parse-240910",
"usage": {
"pages": 1
}
}