Models
Upstage is at the forefront of developing a suite of AI models tailored for diverse business needs, such as Solar LLM and Document AI with Upstage's mission to achieve AGI (Artificial General Intelligence) for work.
Solar LLM
Upstage Solar is a compact yet powerful large-language model (LLM).
Model | Release date | Context Length | Description |
---|---|---|---|
solar-1-mini-chat | 2024-05-02 beta | 32768 | A compact LLM offering superior performance to GPT-3.5, with robust multilingual capabilities for both English and Korean, delivering high efficiency in a smaller package. solar-1-mini-chat is alias for our latest solar-1-mini-chat model.(Currently solar-1-mini-chat-240502 ) |
solar-1-mini-embedding-query | 2024-03-12 beta | 4096 | Solar-base Query Embedding model with a 4k context limit. This model is optimized for embedding user's question in information-seeking tasks such as retrieval & reranking. |
solar-1-mini-embedding-passage | 2024-03-12 beta | 4096 | Solar-base Passage Embedding model with a 4k context limit. This model is optimized for embedding documents or texts to be searched. |
solar-1-mini-translate-enko | 2024-02-22 beta | 32768 | English-to-Korean translation specialized model based on the solar-mini. Maximum context length is 32k tokens. |
solar-1-mini-translate-koen | 2024-02-22 beta | 32768 | Korean-to-English translation specialized model based on the solar-mini. Maximum context length is 32k tokens. |
solar-1-mini-groundedness-check | 2024-05-02 beta | 32768 | Solar-based groundedness check model with a 32k context limit. solar-1-mini-groundedness-check is alias for our latest solar-1-mini-groundedness-check model.(Currently solar-1-mini-groundedness-check-240502 ) |
For details about the model architecture, see this paper (opens in a new tab).
Document OCR
Extract all text from any document.
Model | Availability | Release date | Description |
---|---|---|---|
ocr-2.1.1 | Latest | 2024-04-04 | Improved text detection for single characters and special characters. |
ocr-2.1.0 | Deprecated | 2024-02-28 | Additional support for Hanja, Hanzi and Kanji. Improved accuracy and performance. |
ocr-1.0.0 | Deprecated | 2023-04-10 | An OCR model specialized for English and Korean. Resilient against real-world images, including wrinkled papers and rotated text. |
Layout Analysis
Extract tables and figures from any document.
Model | Availability | Release date | Description |
---|---|---|---|
layout-analysis-0.2.1 | Latest | 2024-05-02 beta | Removed unnecessary <thead> tags from table elements and fixed bugs. |
layout-analyzer-0.2.0 | Deprecated | 2024-04-04 beta | Improved the accuracy for table recognition and performance for layout detection. |
layout-analyzer-0.1.0 | Deprecated | 2024-02-28 beta | A layout analyzer model which detects elements within a document, recognizes tables, and serializes elements according to reading order. |
Key Information Extraction
Extract key information from target documents.
Model | Availability | Release date | Description |
---|---|---|---|
receipt-extraction-3.2.0 | Latest | 2024-04-11 | Additional support for English. Improved accuracy and performance. |
receipt-extractor-1.0.0 | Deprecated | 2023-04-11 | An extractor model for paper receipts, that include store descriptions and list of items. Works best for Korean receipts. |