Document QA
Use the DocVision API to input a document and ask a question. DocVision API is excellent at answering open-ended question and extracting information from documents. It's suitable for processing simple documents easily and at low cost.
Available models
Model | Release date | Context Length | Description |
---|---|---|---|
solar-docvisionpreview | 2024-09-10 | 8192 | A model specialized for Document Visual Question Answering (opens in a new tab). solar-docvision supports English only at this time.solar-docvision is an alias for our latest Solar DocVision model. (Currently solar-docvision-preview-240910 ) |
Capabilities
Solar DocVision is trained to perform question-answering tasks on documents by extracting relevant information. Our model supports two main functionalities:
- Extractive Question Answering (Extractive QA)
- Key Information Extraction (KIE)
Extractive Question Answering (Extractive QA)
Extractive QA is a task that involves extracting appropriate answers from documents based on given questions.
For example, when presented with a business card and asked "What is the phone number?", Solar DocVision can extract the correct phone number from the document.
Key Information Extraction (KIE)
In addition to Extractive QA, Solar DocVision can perform basic Key Information Extraction tasks.
Using specific prompts, Solar DocVision can extract structured data from documents and present it in JSON format.
For instance, given a business card image, you could use the following prompt:
Extract the information from the business card. Format the output as JSON.
Solar DocVision would produce a response like this:
{
"name": "John Smith",
"phone": "123-456-7890",
"email": "john.smith@example.com",
"company": "Tech Innovations Inc."
}
This capability allows for efficient extraction of multiple pieces of information from a single document.
Limitations
As of now, Solar DocVision doesn't support some functionalities, such as summarization, reasoning, or chat-based interactions.
Request
POST https://api.upstage.ai/v1/solar/chat/completions
Parameters
The messages
parameter is a list of message objects. Each message object has a role
(must be "user" for DocVision model) and content
. Currently, the model accept only one message with the "user" role.
A "user" message is where you place your question and document image. The message.content
object will contain both question and image. For detail, see parameters and example sections below.
Request headers
Authorization string Required |
Request body
messages list Required |
messages[].content list Required |
messages[].content[].type list Required |
messages[].content[].text string Optional |
messages[].content[].image_url object Optional |
messages[].content[].image_url.url string Optional |
messages[].role string Required |
model string Required |
max_tokens integer Optional |
stream boolean Optional |
temperature float Optional |
top_p float Optional |
Requirements
- Supported image formats: JPEG, PNG
- Maximum image size: 16MB
- Maximum image dimensions: 4096 pixels for both width and height
Response
Return values
Returns a chat.completion
object, or a streamed sequence of chat.completion.chunk
objects if the request is streamed.
The chat completion object
id string |
object string |
created integer |
model string |
system_fingerprint null |
choices list |
choices[].finish_reason string |
choices[].index integer |
choices[].message object |
choices[].message.content string |
choices[].message.role string |
choices[].logprobs null |
usage object |
usage.completion_tokens integer |
usage.prompt_tokens integer |
usage.total_tokens integer |
The chat completion chunk object
id string |
object string |
created integer |
model string |
system_fingerprint null |
choices list |
choices[].finish_reason string |
choices[].index integer |
choices[].delta object |
choices[].delta.content string |
choices[].delta.role string or null |
choices[].logprobs null |
Example
Request
curl --location 'https://api.upstage.ai/v1/solar/chat/completions' \
--header 'Authorization: Bearer UPSTAGE_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
"model": "solar-docvision",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/0/0b/ReceiptSwiss.jpg/340px-ReceiptSwiss.jpg"
}
},
{
"type": "text",
"text": "How much is Latte Macchiato?"
}
]
}
]
}'
Response
{
"id": "b3773198-1280-4bc4-ba8c-ea5d907fdff9",
"object": "chat.completion",
"created": 1725432431,
"model": "solar-docvision-preview-240910",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": " 4.50\n\n"
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 1907,
"completion_tokens": 8,
"total_tokens": 1915
},
"system_fingerprint": null
}