CapabiltiesDocument Parse

Migrate to Document Parse from Layout Analysis

We have launched Document Parse to replace Layout Analysis! Document Parse supports more document types, markdown output, chart detection, equation recognition, and more features to come. The last version of Layout Analysis, layout-analysis-0.4.0, will be discontinued by November 10, 2024.

This major update introduces changes in both request and response formats. The following document provides guidance on how users can migrate from Layout Analysis to Document Parse.

API Path Change

As the service name changed to Document Parse, the API path is also changed.

Layout Analysis 🌙

POST https://api.upstage.ai/v1/document-ai/layout-analysis

Document Parse ☀️

POST https://api.upstage.ai/v1/document-ai/document-parse

Request Format Change

The ocr parameter has been updated to a string type with possible values of force or auto. The default value is auto, which performs OCR inference on image inputs and not on PDF and other types of documents. Users processing non-image documents and requiring OCR before layout detection should set ocr="force". Otherwise, the ocr parameter can be removed from the request body.

For users consuming both the text and html results together, such as using HTML for tables and text for other elements, they must now provide output_formats="['html', 'text']" in the request body to ensure they receive both values. Failing to do so will result in only the html value being returned.

Layout Analysis 🌙

import requests
 
api_key = "UPSTAGE_API_KEY"
filename = "invoice.png"
 
url = "https://api.upstage.ai/v1/document-ai/layout-analysis"
headers = {"Authorization": f"Bearer {api_key}"}
files = {"document": open(filename, "rb")}
response = requests.post(url, headers=headers, files=files)
print(response.json())

Document Parse ☀️

import requests
 
api_key = "UPSTAGE_API_KEY"
filename = "invoice.png"
 
url = "https://api.upstage.ai/v1/document-ai/document-parse"
headers = {"Authorization": f"Bearer {api_key}"}
files = {"document": open(filename, "rb")}
data = {"output_formats": "['html', 'text']"} # in case you need both text and html
response = requests.post(url, headers=headers, files=files, data=data)
print(response.json())

Response Format Change

In the response format, there are two major changes for Layout Analysis users: coordinates and content.

In the Document Parse, text, html, and markdown are grouped under content field so when user refer to text or html in their code, now they need to change it to content.text and content.html respectively. This is the same for both element and document level.

Layout Analysis 🌙

{
    "html" : "<p id='0'>This is example<p>",
    "text" : "This is example",
    "elements": [{
        "category": "paragraph",
        "html": "<p id='0'>This is example</p>",
        "id": 0,
        "page": 1,
        "text": "This is example"
   }]
}

Document Parse ☀️

{
    "content": {
        "html" : "<p id='0'>This is example<p>",
        "text" : "This is example",
    },
    "elements": [{
        "category": "paragraph",
        "content": {
            "html": "<p id='0'>This is example</p>",
            "text": "This is example",
            "markdown": "" // can be empty depending on output_formats parameter
        },
        "id": 0,
        "page": 1,
   }]
}

In the Document Parse, the bounding_box element has been replaced with coordinates. The coordinates maintain the same list of four x and y values, but the value type has been changed to integer with 4 decimal places, referred to as relative coordinates. This change is introduced to ensure that bounding boxes are rendered properly regardless of the image size, as long as the width-to-height ratio is maintained.

Layout Analysis 🌙

{
    "elements": [{
        "bounding_box": [
            {
                "x": 86,
                "y": 423
            },
            {
                "x": 194,
                "y": 423
            },
            {
                "x": 194,
                "y": 489
            },
            {
                "x": 86,
                "y": 489
            }
        ],
        ...
   }]
}

Document Parse ☀️

{
    "elements": [{
        "coordinates": [
            {
                "x": 0.0648,
                "y": 0.0517
            },
            {
                "x": 0.2405,
                "y": 0.0517
            },
            {
                "x": 0.2405,
                "y": 0.0953
            },
            {
                "x": 0.0648,
                "y": 0.0953
            }
        ],
        ...
   }]
}

Users may have the same bounding_box values from the coordinates and the images' width and height values.

import json
from PIL import Image
 
# Sample JSON data
data = '''
{
    "coordinates": [
        {"x": 0.0276, "y": 0.0178},
        {"x": 0.1755, "y": 0.0178},
        {"x": 0.1755, "y": 0.0641},
        {"x": 0.0276, "y": 0.0641}
    ]
}
'''
# Load JSON data
coordinates = json.loads(data)['coordinates']
 
# Open the image
with Image.open('your_image.jpg') as img:
    width, height = img.size
 
    # Calculate bounding box
    bounding_box = [{"x": int(coord['x'] * width), "y": int(coord['y'] * height)} for coord in coordinates]
 
print("Bounding Box:", bounding_box)

On this page