Documentation
Getting started
Tutorials
Digitize documents

Digitize documents

Are you interested in extracting text from a particular document file while ensuring that the text retains its original reading order, as part of a digitization process? Follow the steps below.

Get an API key

Prepare your API key to call the Upstage Layout Analysis API. If you don't have a key, you need to generate one by following the directions in the quick start guide.

Prepare a document

Prepare a document you would like to read text from. It can either be an image file, or a multi-page PDF file. Here, we use a single-page image file.


invoice.png
invoice.png

Make requests

Paste the code below into your terminal to run your first API request. Make sure to replace UPSTAGE_API_KEY with your secret API key.

import requests
 
api_key = "UPSTAGE_API_KEY"
input_filename = "invoice.png"
 
url = "https://api.upstage.ai/v1/document-ai/layout-analysis"
headers = {"Authorization": f"Bearer {api_key}"}
files = {"document": open(input_filename, "rb")}
data = {"ocr": True}
response = requests.post(url, headers=headers, files=files, data=data).json()
print(response)

Write HTML to file

Finally, get the HTML value from the response and write to a file.

output_filename = "invoice.html"
html = response["html"]
 
with open(output_filename, "w") as f:
    f.write(html)

The output HTML file will contain the following contents:

inovice.html
<p id='0' style='font-size:22px'>INVOICE</p>
<p id='1' style='font-size:20px'>Company<br>Upstage</p>
<br>
<p id='2' style='font-size:14px'># INV-AJ355548</p>
<p id='3' style='font-size:16px'>Name<br>Lucy Park</p>
<p id='4' style='font-size:16px'>Address</p>
<br>
<p id='5' style='font-size:16px'>Invoice ID</p>
<p id='6' style='font-size:14px'>7 Pepper Wood Street, 130 Stone Corner<br>Terrace
<br>Wilkes Barre, Pennsylvania, 18768<br>United States</p>
<br>
<p id='7' style='font-size:18px'>Invoice Date 9/7/1992</p>
<p id='8' style='font-size:18px'>Email<br>Ikitchenman0@arizona.edu</p>
<br>
<p id='9' style='font-size:20px'>Service Details Form</p>
<p id='10' style='font-size:16px'>Name<br>Sung Kim</p>
<p id='11' style='font-size:18px'>Address<br>Gwanggyojungang-ro 338, Gyeonggi-do,
<br>Sanghyeon-dong, Suji-gu<br>Yongin-si, South Korea</p>
<p id='12' style='font-size:16px'>Additional Request Vivamus vestibulum sagittis sapien.
Cum sociis natoque<br>penatibus et magnis dis parturient montes, nascetur ridiculus<br>mus.</p>
<p id='13' style='font-size:14px'>TERMS AND CONDITIONS</p>
<p id='14' style='font-size:14px'>1. The Seller shall not be liable to the Buyer directly or 
indirectly for any loss or damage suffered by the Buyer.<br>2. The Seller warrants the product
for one (1) year from the date of shipment.<br>3. Any purchase order received by the seller 
will be interpreted as accepting this offer and the sale offer in writing. The buyer may
<br>purchase the product in this offer only under the Terms and Conditions of the Seller
included in this offer.</p>

View in a Web browser

You can view the final invoice.html in a Web browser.


invoice.html.png
invoice.html