Migrate to Document Parse from Layout Analysis
We have launched Document Parse to replace Layout Analysis! Document Parse supports more document types, markdown output, chart detection, equation recognition, and more features to come.
The last version of Layout Analysis, layout-analysis-0.4.0
, will be discontinued by November 10, 2024.
This major update introduces changes in both request and response formats. The following document provides guidance on how users can migrate from Layout Analysis to Document Parse.
API Path Change
As the service name changed to Document Parse, the API path is also changed. Users can still access the model by calling the layout analysis API path until Nov 10.
Layout Analysis 🌙
POST https://api.upstage.ai/v1/document-ai/layout-analysis
Document Parse ☀️
POST https://api.upstage.ai/v1/document-ai/document-parse
Request Format Change
The ocr
parameter has been updated to a string type with possible values of force
or auto
. The default value is auto
, which performs OCR inference on image inputs and not on PDF and other types of documents. Users processing non-image documents and requiring OCR before layout detection should set ocr="force"
. Otherwise, the ocr
parameter can be removed from the request body.
For users consuming both the text
and html
results together, such as using HTML for tables and text for other elements, they must now provide output_formats=['html', 'text']
in the request body to ensure they receive both values. Failing to do so will result in only the html
value being returned.
Layout Analysis 🌙
curl -X POST https://api.upstage.ai/v1/document-ai/layout-analysis \
-H "Authorization: Bearer UPSTAGE_API_KEY" \
-F "document=@invoice.png"
Document Parse ☀️
curl -X POST https://api.upstage.ai/v1/document-ai/document-parse \
-H "Authorization: Bearer UPSTAGE_API_KEY" \
-F "document=@invoice.png"
Response Format Change
In the response format, there are two major changes for Layout Analysis users: coordinates
and content
.
In the Document Parse, text
, html
, and markdown
are grouped under content
field so when user refer to text
or html
in their code, now they need to change it to content.text
and content.html
respectively. This is the same for both element and document level.
Layout Analysis 🌙
{
"html" : "<p id='0'>This is example<p>",
"text" : "This is example",
"elements": [{
"category": "paragraph",
"html": "<p id='0'>This is example</p>",
"id": 0,
"page": 1,
"text": "This is example"
}]
}
Document Parse ☀️
{
"content": {
"html" : "<p id='0'>This is example<p>",
"text" : "This is example",
},
"elements": [{
"category": "paragraph",
"content": {
"html": "<p id='0'>This is example</p>",
"text": "This is example",
"markdown": "" // can be empty depending on output_formats parameter
},
"id": 0,
"page": 1,
}]
}
In the Document Parse, the bounding_box
element has been replaced with coordinates
. The coordinates
maintain the same list of four x
and y
values, but the value type has been changed to integer with 4 decimal places, referred to as relative coordinates. This change is introduced to ensure that bounding boxes are rendered properly regardless of the image size, as long as the width-to-height ratio is maintained.
Layout Analysis 🌙
{
"elements": [{
"bounding_box": [
{
"x": 86,
"y": 423
},
{
"x": 194,
"y": 423
},
{
"x": 194,
"y": 489
},
{
"x": 86,
"y": 489
}
],
...
}]
}
Document Parse ☀️
{
"elements": [{
"coordinates": [
{
"x": 0.0648,
"y": 0.0517
},
{
"x": 0.2405,
"y": 0.0517
},
{
"x": 0.2405,
"y": 0.0953
},
{
"x": 0.0648,
"y": 0.0953
}
],
...
}]
}
Users may have the same bounding_box
values from the coordinates
and the images' width and height values.
import json
from PIL import Image
# Sample JSON data
data = '''
{
"coordinates": [
{"x": 0.0276, "y": 0.0178},
{"x": 0.1755, "y": 0.0178},
{"x": 0.1755, "y": 0.0641},
{"x": 0.0276, "y": 0.0641}
]
}
'''
# Load JSON data
coordinates = json.loads(data)['coordinates']
# Open the image
with Image.open('your_image.jpg') as img:
width, height = img.size
# Calculate bounding box
bounding_box = [{"x": int(coord['x'] * width), "y": int(coord['y'] * height)} for coord in coordinates]
print("Bounding Box:", bounding_box)