Tokenization: How Tokens Shape AI Efficiency and Cost
Create an abstract symbols of various languages and a glowing AI model set against a gradient blue background, symbolizing technology and innovation.

Tokenization: How Tokens Shape AI Efficiency and Cost



"Not all tokenizers are created equal thus enter discussion"


In the diverse landscape of generative AI, understanding tokens and their influence on AI models and pricing is crucial. This article aims to shed some light on the concept of tokens, explore how they vary across different languages, models, and reveal their impact on the cost of using AI systems.


What is a Token?

In the context of generative AI, a token is a unit of text—be it a word, part of a word, or even just a character that is used to break down natural language into manageable "chunks"

Example of how a token might look like:

https://platform.openai.com/tokenizer

We have been told that 1 token for english language is normally is 0.75 of a word or you might of heard 1 token is 4 characters, as we move forward you will see clear examples.

  • 1 token ~= 4 chars in English
  • 1 token ~= ? words
  • 100 tokens ~= 75 words


When you input text into an AI model, the tokenizer breaks down the text into these manageable pieces, allowing the model to process and analyze the information more effectively and predict the next token.


You might want to see what we mean with next token prediction:


Tokenizer in an AI Model

A tokenizer is a tool within AI models that splits text into tokens. This process is essential because it transforms raw text into a structured format that the AI can understand. Different AI models may use different tokenization methods depending on their architecture and the tasks they are designed to perform. For example, some models might tokenize at the word level, while others focus on subwords or characters, impacting how the AI interprets the input.


Here is an example when comparing 3 models from AI21 Labs, Amazon, Meta. I used the same prompt for the 3 models "Hello, could you provide me with a short overview of Amazon's history?". The input token count for the 3 models varies from 10 to 29 token.


https://us-west-2.console.aws.amazon.com/bedrock/home?region=us-west-2#/chat-playground


Now, I wanted to understand the impact of umlaut a on a word and the token count for german umlaut. I tried with the word H?user meaning houses, I also tried with Hauser to understand the impact of ?. The token count would change for +1, but also the context and response was way off since the umlaut provide relevant context.

https://us-west-2.console.aws.amazon.com/bedrock/home?region=us-west-2#/chat-playground
https://us-west-2.console.aws.amazon.com/bedrock/home?region=us-west-2#/chat-playground


In German, you can substitute "?" with "ae." This is a common practice when special characters like umlauts (?, ?, ü) are not available on a keyboard, or in situations where only ASCII characters are permitted. In this case the count remains similar to umlaut but lost all context.

https://us-west-2.console.aws.amazon.com/bedrock/home?region=us-west-2#/chat-playground


Now the interesting part is instead of using H?user, I use the english word Houses - both having 6 characters the token count decreases as the language changes.

https://us-west-2.console.aws.amazon.com/bedrock/home?region=us-west-2#/chat-playground


Token Differentiation in Languages or Using Special Characters

Tokenization becomes particularly interesting when dealing with different languages or special characters. For instance, the word "über" in German is tokenized differently than "uber" in English due to the special character "ü." In some models, "über" might be split into more tokens than "uber," reflecting the complexity added by special characters or accents. This differentiation is crucial in multilingual AI systems, as it affects the model's ability to understand and generate text accurately across different languages.


Understanding Input and Output in AI Models

In discussions about generative AI models, terms like "context window", "input max window" or "output max window" frequently come up, referring to the maximum number of tokens a model can process in a single batch or the output of that process. This token count is crucial for determining how much text the model can handle at once. However, the efficiency of the tokenizer plays a significant role in how these numbers translate into actual performance. A highly efficient tokenizer might break down text into fewer tokens, making a 28k token window nearly equivalent in functional capacity to a less efficient tokenizer with a 32k window. Thus, when comparing models, it's important to consider not just the maximum token count but also how the tokenizer handles different types of text and language - as this can greatly affect the context window.


The Influence of Token in Price Per Token

The cost of using a generative AI model often depends on the number of tokens processed. This pricing model means that the way text is tokenized can directly impact the cost of an operation. More complex tokenization that results in a higher token count could lead to higher costs for the same text. For businesses and developers, understanding this relationship is essential for optimizing expenses, especially when processing large volumes of text.


Here is an example of prices, models but we never mention the efficiency of tokenizers which would make this pricing comparable?

https://medium.com/@daniellefranca96/commercial-models-price-comparison-dc5837acc7b6


Takeaway: Test Different Models, Simplify Input

To effectively manage costs and performance, it's beneficial to experiment with different AI models to see how they tokenize text. Simplifying the input text, such as by standardizing characters or removing special characters, removing stop words, creating summaries of input text will reduce token counts and, consequently, costs but you might loose context as we saw previously with the

example with H?user and Hauser.


You can compress prompts as seen in LLMLingua below:


Or you can improve the model architecture to achieve better results having a "better" tokenizer.

... Llama 3 uses a tokenizer with a vocabulary of 128K tokens that encodes language much more efficiently, which leads to substantially improved model performance...


Jonathon Croydon

Insurance - Product Design - MIC Global

6 个月

If you agents and are using OpenAi API look at the cost, work out what you should be paying through OpenAi cost summary. Make a 10 API request and see how much it actually cost you. You may be paying extra tokens for agents.

Basant Kumar

SEO Specialist | Driving Online Visibility

7 个月

I love diving deep into the nuances of tokenization methods. Let's keep exploring these fascinating differences. ?? #AI #MachineLearning

Exciting insights. Can't wait to dive into the details of tokenization models. ??

Vincent Valentine ??

CEO at Cognitive.Ai | Building Next-Generation AI Services | Available for Podcast Interviews | Partnering with Top-Tier Brands to Shape the Future

7 个月

Tokenization varies across models, creating intriguing differences. A token efficiency ratio could indeed enhance model comparison. What's your take on this fascinating subject? Matias Undurraga Breitling

要查看或添加评论,请登录

社区洞察

其他会员也浏览了