Documentation
Getting started
Models

Models

Upstage is at the forefront of developing a suite of AI models tailored for diverse business needs, such as Solar LLM and Document AI with Upstage's mission to achieve AGI (Artificial General Intelligence) for work.

Solar LLM

Upstage Solar is a compact yet powerful large-language model (LLM).

ModelRelease dateContext LengthDescription
solar-1-mini-chat2024-05-02 beta32768A compact LLM offering superior performance to GPT-3.5, with robust multilingual capabilities for both English and Korean, delivering high efficiency in a smaller package.
solar-1-mini-chat is alias for our latest solar-1-mini-chat model.(Currently solar-1-mini-chat-240502)
solar-1-mini-embedding-query2024-03-12 beta4096Solar-base Query Embedding model with a 4k context limit. This model is optimized for embedding user's question in information-seeking tasks such as retrieval & reranking.
solar-1-mini-embedding-passage2024-03-12 beta4096Solar-base Passage Embedding model with a 4k context limit. This model is optimized for embedding documents or texts to be searched.
solar-1-mini-translate-enko2024-02-22 beta32768English-to-Korean translation specialized model based on the solar-mini. Maximum context length is 32k tokens.
solar-1-mini-translate-koen2024-02-22 beta32768Korean-to-English translation specialized model based on the solar-mini. Maximum context length is 32k tokens.
solar-1-mini-groundedness-check2024-05-02 beta32768Solar-based groundedness check model with a 32k context limit.
solar-1-mini-groundedness-check is alias for our latest solar-1-mini-groundedness-check model.(Currently solar-1-mini-groundedness-check-240502)

For details about the model architecture, see this paper (opens in a new tab).


Document OCR

Extract all text from any document.

ModelAvailabilityRelease dateDescription
ocr-2.1.1Latest2024-04-04Improved text detection for single characters and special characters.
ocr-2.1.0Deprecated2024-02-28Additional support for Hanja, Hanzi and Kanji. Improved accuracy and performance.
ocr-1.0.0Deprecated2023-04-10An OCR model specialized for English and Korean. Resilient against real-world images, including wrinkled papers and rotated text.

Layout Analysis

Extract tables and figures from any document.

ModelAvailabilityRelease dateDescription
layout-analysis-0.2.1Latest2024-05-02 betaRemoved unnecessary <thead> tags from table elements and fixed bugs.
layout-analyzer-0.2.0Deprecated2024-04-04 betaImproved the accuracy for table recognition and performance for layout detection.
layout-analyzer-0.1.0Deprecated2024-02-28 betaA layout analyzer model which detects elements within a document, recognizes tables, and serializes elements according to reading order.

Key Information Extraction

Extract key information from target documents.

ModelAvailabilityRelease dateDescription
receipt-extraction-3.2.0Latest2024-04-11Additional support for English. Improved accuracy and performance.
receipt-extractor-1.0.0Deprecated2023-04-11An extractor model for paper receipts, that include store descriptions and list of items. Works best for Korean receipts.