Documentation
Getting started
Counting tokens

Counting tokens

On this page, you will learn what tokens and tokenizers are, and how to count tokens in your requests to Solar models.

Token and Tokenizer

A token is a piece of text, such as a word, subword, or even character, that the model processes individually. Tokenization is the process of breaking down text into tokens. A tokenizer is a tool or algorithm that performs this task. It converts raw text into a sequence of tokens, which the model can then use for training or inference. The tokenizer ensures that the input text is broken down in a way that balances efficiency and accuracy, enabling the LLM to generate meaningful and coherent responses.

Context length of the model

Your request to the Solar model is encoded into tokens and fed to the model to generate output. Solar models have limitations on maximum number of tokens they can handle at a time, known as context length. Refer to the context length columns on Models page to check these limitations.

For text generation models (Chat, Translation, Groundedness Check, Document QA), the context length includes both input and output tokens. For example, if model’s context length is 4096 and the input length is 3096, it can generate up to 1000 tokens. Therefore, to generate the desired output, you need to request proper amount of tokens to model at a time. If the sum of input tokens and max_tokens exceeds context length of model, You will receive a 400 status code.

For embedding models, the context length only includes input tokens. The number of input tokens should be less than or equal to the model's context length.

Depending on the model and your request, additional tokens might be appended to your request before fed to model. Check out this notebook (opens in a new tab) for more information.

How to count tokens

You can use the HuggingFace Tokenizers (opens in a new tab) python library to count the tokens in your request.

Install Library

pip install tokenizers

Count tokens

from tokenizers import Tokenizer
 
tokenizer = Tokenizer.from_pretrained("upstage/solar-pro-preview-tokenizer")
text = "Hi, how are you?"
enc = tokenizer.encode(text)
print("Encoded input:")
print(enc)
 
inv_vocab = {v: k for k, v in tokenizer.get_vocab().items()}
tokens = [inv_vocab[token_id] for token_id in enc.ids]
print("Tokens:")
print(tokens)
 
number_of_tokens = len(enc.ids)
print("Number of tokens:", number_of_tokens)

Here is a list of tokenizers and their respective supported models.

tokenizermodel
upstage/solar-pro-preview-tokenizer (opens in a new tab)• solar-pro
upstage/solar-docvision-preview-tokenizer (opens in a new tab)• solar-docvision
upstage/solar-1-mini-tokenizer (opens in a new tab)• solar-1-mini-chat
• solar-1-mini-chat-ja
• solar-1-mini-chat-ja
• solar-1-mini-translate-enko
• solar-1-mini-translate-koen
• solar-1-mini-groundedness-check
• solar-embedding-1-large-query
• solar-embedding-1-large-passage

How to count tokens for image

For models that accept image input (e.g., solar-docvision), the image also contributes to the input tokens. Token usage varies depending on the image size and aspect ratio. Image input can be at most 4032 tokens.

You can use the following Python code to count the tokens for an image.

# =============================================================================
# This code includes parts derived from or inspired by the "mm_utils.py" file
# from the LLaVA project (https://github.com/haotian-liu/LLaVA).
#
# The original code is licensed under the Apache License 2.0. You may obtain
# a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
#
# The code is used here under the terms of the Apache License 2.0.
#
# Copyright 2024 Upstage
# =============================================================================
 
 
import sys, argparse
from transformers.image_processing_utils import select_best_resolution
 
 
def get_anyres_image_grid_shape(image_size, grid_pinpoints, patch_size):
    """
    Calculate the shape of the image patch grid after the preprocessing for images of any resolution.
    Args:
        image_size (`tuple`):
            The size of the input image in the format (width, height).
        grid_pinpoints (`List`):
            A list containing possible resolutions. Each item in the list should be a tuple or list
            of the form `(height, width)`.
        patch_size (`int`):
            The size of each image patch.
    Returns:
        tuple: The shape of the image patch grid in the format (width, height).
    """
    if not isinstance(grid_pinpoints, list):
        raise ValueError("grid_pinpoints should be a list of tuples or lists")
 
    height, width = select_best_resolution(image_size, grid_pinpoints)
    return height // patch_size, width // patch_size
 
def count_image_tokens(width, height):
    input_size = (width, height)
 
    base_size = 784
    pinpoints = [[base_size * 1, base_size * 4], [base_size * 4, base_size * 1], [base_size * 2, base_size *2]]
    patch_size = 14
    downsampling_ratio = 2
    special_tokens = 11
 
    num_patch_height, num_patch_width = get_anyres_image_grid_shape(input_size, pinpoints, base_size)
 
    original_height, original_width = input_size
    current_height, current_width = num_patch_height * (base_size // patch_size // downsampling_ratio), num_patch_width * (base_size // patch_size // downsampling_ratio)
 
    original_aspect_ratio = original_width / original_height
    current_aspect_ratio = current_width / current_height
 
    if original_aspect_ratio > current_aspect_ratio:
        scale_factor = current_width / original_width
        new_height = int(original_height * scale_factor)
        padding = (current_height - new_height) // 2
        tokens = current_width* (current_height - padding * 2 + 1)
    else:
        scale_factor = current_height / original_height
        new_width = int(original_width * scale_factor)
        padding = (current_width - new_width) // 2
        tokens = (current_height + 1) * (current_width - padding * 2)
 
    final_tokens = tokens + (base_size // patch_size // downsampling_ratio) ** 2 + special_tokens
    return final_tokens
 
def main():
    parser = argparse.ArgumentParser(description="Count the number of tokens in the solar-docvision model.")
    parser.add_argument("width", type=int, help="The width of the input image.")
    parser.add_argument("height", type=int, help="The height of the input image.")
    args = parser.parse_args()
 
    if len(sys.argv) != 3:
        raise ValueError("Please provide the width and height of the input image.")
 
    if args.width <= 0 or args.height <= 0:
        raise ValueError("The width and height must be positive integers.")
 
    width = args.width
    height = args.height
 
    tokens = count_image_tokens(width, height)
    print(tokens)
 
if __name__ == "__main__":
    main()
 

To run the code, use the following command:

# usage: python count_image_tokens.py <width> <height>
python count_image_tokens.py 1920 1080