How economics have flipped on LLM-based classifiers on external data.
Over the last 15 years, NLP-based classifiers have been the golden way to add more context, classification, and information to external data.
In my world, the world of real-time processing of external data; classifiers are essential components!
The philosophy for NLP-based classifiers has been one of running as few as possible to give the extra information needed for insights. They are expensive! However were the cheapest option. That has changed now.
Catching up to speed on LLM and NLP-based classifiers
This section is a compact refresher for those newer to understanding LLMs, NLP, and Classifiers. Feel free to skip if you know this all!
A Classifier is the name we give to the capability of reading and detecting specific elements in a piece of content. A classifier would be used to add critical metadata to a piece of content. Sentiment classification can help us filter to content where people are not happy. It is the outsourcing of some of our human reasoning to technology, so that systems can scale where people are unable to.
Natural Language Processing (NLP) uses a number of elements (computational logistics, machine learning, rule-based modelling, etc) to consume data like a human would. This is perfect for simple classification! It is a great technology for handling simple text with clear and straightforward features. To make an NLP-based classifier, you would start with an available library, train and tweak till you get a desired accuracy, and then host it. Changes and expansions to the classifier (such as a new language), would either rely on a separate classifier or retraining. Note: Whole books can be written on the economics and optimizations of NLP-based classifiers.
LLM-based classifiers uses a “Large Language Model” such as ChatGPT, Gemini, Lamda, etc. These are deep neural networks that are already pre-trained on massive amounts of data. They are more versatile and adaptable than NLP solutions. They require little to no training, and can go far beyond the average NLP-based classifier. They also require significant technology resources ($$$), and also rely on significant pre-processing and post-processing.
Let’s build a product. (Analogy time!)
To properly compare the economics of NLP-based and LLM-based classifiers, we need to look from a product perspective.
Let’s say your product provides a dashboard of global sentiment towards different brands, divided by country. It also alerts if an influencer or publication is modifying that sentiment in real time. A very powerful product!
With this product, we need to go from realtime social feeds and news feeds to the dashboard source data, then we have an intensive process with many classifiers and operations running. Let’s refer to this as the Data Pipeline.
Here are the parts required for our data pipeline:
I know that I have oversimplified it massively, as you would have multiple classifiers for each item, as various languages, speech patterns, and brands not having common product names globally. Routing, filtering, event merging, spam detection, etc…
However, for the purposes of this article, let’s simplify it a bit to 5 classifiers.
External data is part of the problem
At Datastreamer, many of our customers also add to their Pipelines a few other elements. They often start with “Jobs” which handle the ingesting and filtering of data streams to core keywords/terms/patterns before classifiers. This drops the total content being classified to 4-5 million pieces of content per day, reducing the “classifier-eligible” content to 1.3% of the raw volume feeds. Let’s assume this example product already has those pipeline elements, as well as other things like queue management, recovery, orchestration, etc…
In short, the Pipelines needed to support the classifiers are not simple one-step items.
Let’s, however, remove the Pipeline elements from the equation. If you want to learn more on that, feel free to explore more. However, the focus is on the economics.
Within the pipeline that we are thinking through, we have the following elements:
The Economics of NLP-based classifiers
Within the world of NLP-based classifiers is always the battle of In house vs 3rd party. Economically they often come out pretty close in cost, the costs are simply spread around differently.
In-house NLP Classifier
You first look into the in-house option. You have the team that can build this classifier. After 3-4 months you have reached a high accuracy level with an average speed of ~80 millisecond classification. Then you should consider yourself lucky!
Most classifiers take 3-4 months each for initial rocky release. If you are were looking to start with a single language, and later expand to more language; you could easily spend months of effort on the first 10 languages (reminder, you are going global). Assuming you are on AWS, let’s use a Trn1.xlarge, with a smaller GPU capacity, and a cost savings commit in place.
80 milliseconds * 5,000,000 pieces of content per day * 5 classifiers = 2,000,000 inference seconds per day.
With only 86,400 seconds in a day, the economics would require 24 (23.15) of those wonderful servers, or $17,377.91/month. (Not including training, support, pipeline, data source costs).
3rd Party NLP Classifier (Like AWS Comprehend)
There are many companies that also offer pre-built, pre-trained classifiers. One such is AWS Comprehend.
AWS’s Comprehend is an NLP set of products. They are limited in nature and can only do 2-3 of the classifiers we need. However, if we assumed that it had all 5, let’s look at the math. (One AWS Comprehend “Inference Unit” is 100 characters. Assuming 600 characters per piece of content, that would be 6 IUs)
5,000,000 pieces of content per day * 5 classifiers * 6 (IUs) = 150,000,000 per day.
Since AWS Comprehend is 0.000025 per IU at the highest discounts available, you are left with a daily bill of $3750, or $112,500 per month.
Whichever way you look, the economics lead to the philosophy of having as few classifiers as possible, and running them on as few sources as possible. To deliver our product, we had a choice:
Economic Ranking (5 Classifiers):
领英推荐
The Economics of LLM-based classifiers
Classifiers based on LLMs modify the economics because of a few reasons. The best options are externally hosted (Gemini, chatGPT), and while some can be self-hosted, but they are not comparable in many ways. They also require no training, can handle complex requirements, and are very adaptable. They suffer, however, from their success. They have been built to interact in conversational manner, which requires more Pipeline support to operate in high-volume and real-time manners.
To be effective, they require a lot of pre-processing (prompt construction, feeding specific data and context, micro-batching…), and post-processing (converting responses in data fields, forcing data consistency, merging data to core document, etc…). Our data science lead has written an amazing article into the efforts required here.
In our above case, which parallels many Datastreamer customers, there are requirements for high complexity + high volume + low cost, all surrounded by the requirement of speed and hands-free. This additional processing and requirements actually is part of Datastreamer’s origin story, which I am happy to tell over a coffee.
Flipping to the economics, LLMs can process in batch (cheaper) or on a per request basis (faster). To meet the requirements, let’s look at per-request, and through the pre-processing and post-processing, reduce the cost to find a middle ground.
LLMs run off of Input tokens and Output tokens for billing. For the 30+ LLM classifiers running in the Datastreamer platform, LLMs generally have a 7:2 ration of Input to Output. They are also generally billed by the millions of tokens. Tokens are never a specific number, but tend to gravitate to equalling ~4.2 characters.
Using this and our above math from AWS Comprehend:
600 character (per piece of content) * 5,000,000 pieces of content per day / 4.2 characters to token = 714 million Input tokens per day.
Then for output tokens, we can use the above ratio: 700 million input tokens at a 7:2 ratio = 200 million output tokens.
At ChatGPT 4o pricing (as of August - gpt-40), $5 per million input tokens, and $15 per million output tokens. Which would give us:
(700*5) input tokens daily + (200*0.60) output tokens daily * 5 classifiers = $975,000 per month.
Here is what changes it. The rise of the Mini versions a short time ago. ChatGPT Mini pricing (as of August - gpt-4o-mini), $0.15 per million input tokens, and $0.60 per million output tokens. Which would give us:
(700*0.15) input tokens daily + (200*0.60) output tokens daily * 5 classifiers = $33,750 per month.
If we simply look at the cost of the Classifiers themselves, the economics go bonkers now. For context, here is where we are, in a ranked format. Remember that we are making some broad assumptions as well.
Economic Ranking (5 Classifiers):
The Other Costs: Beyond the Classifier
As you can see in the above ranking, before the advent of the “mini” versions of the LLMs, using an LLM classifier was simply not reliable at scale for these real-time use cases. The mini LLM-classifers are crawling up the ranking.
The average in house NLP Classifier can take 6+ weeks of data science effort to operate on a very specialized use cases. Expansions to cover other languages and optimize are on average an additional 4 weeks. As we are designing a global product, let’s assume 2 months for creation (don’t forget your training data!), 1 month of fine-tuning, and 3 weeks per language for an additional 20 languages (trying to avoid different alphabets). Leave with a nice 16 months of effort.
You also have 5 of these classifiers. Eeek!
Globally the costs of a year is wildly varied, and the composition of a team is different. However for the sake of a number, let’s assume human maintenance (data science, engineering, QA, training data, etc) of the classifier to be $12-15k/month of effort. Let’s take this cost and effort (16 months), and we need it live within a year. Let’s hope no new items are on the roadmap for the next 8 quarters!
16 months of effort * $13.5k “effort month” / 12 months in a year (gotta move fast!) * 5 Classifiers = $90k added per month.
Raising our In House NLP Classifiers by a significant amount. The 3rd party classifiers are starting to look like they were not priced ridiculously anymore! Here is the flip I described. When did this happen? July 18, 2024.
Economic Ranking (5 Classifiers):
Turning our eyes back to the LLM classifiers, there is a surprising amount of processing required. Prompts being the easiest part! The data needs heavy processing of the response.
Everything needs to be geared to minimizing the prompt costs and consumption as much as possible.
The LLM-Classifier components at Datastreamer need to micro-batch the content, strip to the essential elements, carefully and dynamically “spoon feed” the LLM classifier. Increasing the output tokens by 10% turns into a $4,500 difference within a month.
After the LLM-Classifier returns it will often respond in a manner that needs further processing to standardize and strip into needed metadata addition. Generally all this pre and post processing (Pipeline) raises the cost of the classifier by 100%-150%. Let’s go with 100%
$33.8k for the mini-LLM classifiers * 2 = $69.64k/month
Economic Ranking (5 Classifiers):
The Final Calculations
Over the last months, the world of real-time data classification has changed drastically. The raw costs of a single classifier have flipped to favor LLM-based classifiers. Let’s take our example of the 5 classifiers, and break them down into individual classifiers.
If you have the right infrastructure, streaming pipeline technology, data science team, and pre & post-processing knowledge; then switching from NLP-based classifier to LLM-based classifiers saves 56%, removes over a year from your speed to market, and adds almost limit-less capabilities. A no-brainer since July 2024.
*As we have seen with many NLP providers and the examples above, this may only solve 40-50% of use cases, meaning you will need to supplement with one of the other solutions anyways.
** If you have streaming data pipeline technology in place, if you don’t, you should talk to our team at Datastreamer.
BONUS: LLM Flexibility
Here are 20 of the LLM-classifiers that we have in the Datastreamer platform, to give an idea of the possibilities. Hope it provides some inspirations for your own!