Selling with Data #44 - The cost of creating AI
Warning: This article is longer and geekier than my past article, but I am trying something different. Please let me know your feedback.
We are entering the pivotal point of AI. Many companies are coming off the sidelines and green lighting AI projects, especially generative AI. I just had a buddy tell me that his leadership team asked him to create a website page outlining all the AI that the company is doing. He was struggling since most of his AI projects were still in labs or small pockets hidden in the business. Showcasing AI has gone from leading to keeping up. This focus on accelerating AI is creating an amazing time to be a seller of AI.
AI investments are happening in two areas - companies that are building the AI and companies that are using AI. I mostly see companies using AI to improve digital labor, intelligent automation, advanced security, and customer care, for example. These are great uses and I love seeing them.
In this article, I will focus on the other category, the cost of building AI models.
The cost to build AI models
According to IDC, worldwide spending in AI is set to grow 19.6% year-over-year in 2022 to $432.8 billion – well on its way to breaching the $500 billion mark by 2024?(source). So far, 88% of that spend has been on AI software. In the next several years, the share spent on AI hardware is expected to grow faster than other categories.
The need for more horse power to run large language model (LLMs) require faster and more complex infrastructure at a higher cost. To provide an example, consider the average cost or running a basic ML model versus running a LLM.
Example 1 - A basic ML solution costs about $25,000 / year - $15,750 in labor and $9,000 in infrastructure, (source). Here's how that breaks down: Infrastructure: A single machine in the cloud w/o load management @ $9,000 / year Data Support: Labor to pull data - @ $6,750 Engineering / Deployment: Labor to move the model from the data scientist workstation to the cloud - @ $9,000
Example 2 - ChatGPT or other large models like Meta's LLama can cost about $2M-4M+ to build and run. CNBC's, ChatGPT and generative AI are booming, but the costs can be extraordinary, says " Analysts and technologists estimate that the critical process of training a large language model such as GPT-3 could cost over $4 million." Another source says Open AI GPT model for a single training run can cost $12 million - OpenAI requires ~3,617 HGX A100 servers (28,936 GPUs) to serve Chat GPT (source: Hitchhiker’s Guide to ML Training Infrastructure). Meta's largest model, LLama is estimated to cost $2.4m, 2,048 Nvidia A100 GPUs, 1.4 trillion tokens, 1 million GPU hours to train (source: ChatGPT and generative AI are booming, but the costs can be extraordinary)
Granted, comparing a simple ML model to Meta's largest model isn't an apples to apples comparison, and some ML models can be really big. It is safe to assume the cost to run a LLM versus a simple ML model will be more expensive. As LLMs are doing more and more, they are growing bigger and more complex, and the cost to train and run them is going up.
AI and business models for business to consumer companies
Many of the largest foundation models already in production are concentrated with a handful of big companies who can afford to pay. Business to consumer (B2C) companies, like Microsoft and OpenAI, are offering these models to users and often not yet charging for the AI services. They are free to users because the companies are following Google's advertisement driven business model. Google offers search services to users at no charge, spends billions on infrastructure, data management, and labor costs behind the scenes, and makes billions on advertisement revenue from optimized ad placement based on user search history. In many cases, large AI B2C companies are willing to take the loss on the cost of infrastructure and development of the AI models so that millions of users engage and train their model. When?so many users are improving and training the models at scale, this can result in long term monetization potential of the user data.
As the saying goes "If you are not paying for something, you're not the customer; you're the product.” Every prompt on ChatGPT is a user training the model, sharing a little bit about themselves, and in turn adding value to the product.
This business model comes at a cost. The Information article, OpenAI’s Losses Doubled to $540 Million as It Developed ChatGPT, says OpenAI's $540 Million loss "reflects the steep costs of training its machine-learning models during the period before it started selling access to the chatbot." OpenAI isn't alone. Financial analysts estimate Microsoft’s Bing AI chatbot, which is powered by an OpenAI ChatGPT model, needs at least?$4 billion of infrastructure?to serve responses to all Bing users.
Model costs - training and inference costs
As more enterprise companies are banning open models like ChatGPT due to fears of the quality of the training data and privacy of their employee searches (link), many companies are looking for substitutes to general use models, with model options that provide governance, traceability, and trust. For enterprise companies providing their own version of these models isn't cheap. The cost falls into two main categories - training and inference costs.
Training cost are the cost of resources, computational power, and data, required to train a model. Training is required to adjust the model based on the information specific to the training dataset to improve the model's accuracy and minimize errors.
There are differing approaches to model training.
领英推荐
Training is typically charged per compute hour, which makes picking the right training approach important.
Prompt-tuning uses embedding models to automatically perform prompt engineering to replace a human manually and randomly experimenting with different prompts. Picture a person using ChatGPT through trial and error by changing the prompt input with more detailed and different terms until the output starts to match their expectation. By using a embedding models, prompt-tuning is able identify the right prompts with less iterations resulting in lower inference costs.?More information from IBM Research on prompt-tuning can be found here.
The best approach to model training is finding the best balance the quality of the output for the lowest cost.
I am already seeing signs that smaller models trained on cleansed data can outperform larger models on specific tasks, for example coding models, financial models, and other specific use case models. Companies are looking at the ROI of models cost against the return of the use case plus there is a high sense of urgency that using a smaller model can speed up the the time to production. Using the right tuning approach can make models more accessible to more companies.
A recent report by OpenAI found that the cost of training large AI models is expected to rise from $100 million to $500 million by 2030 (source), suggesting that only the wealthiest companies will be able to afford to develop and use AI technology. Before reacting, it is worth considering the source of the information and that some companies, like OpenAI, benefit from framing model creation as prohibitively expensive to create a moat and persuade companies to use their LLMs versus creating smaller models or using approaches like prompt-tuning, which are available at lower costs. Working through the various options of training are important.
Inference costs are the resources required to deploy and run the trained AI model on new data, after the training process is complete.
Once a model is trained, the usage of the model is referred to as inference. Before the model can be used, the model needs to be hosted or deployed somewhere to allow API queries to call to the model. Every time a query is made, server compute is used to run the query through the model and to generate the output. The bigger the model, the higher the compute. This is why inference pricing, is typically provided per model using tokens, i.e., 1K token ~750 words. For example, OpenAI has 7 different rates for inference depending what model is used and the compute needed per model-?https://openai.com/pricing.
An alternative is for customers to host the model themselves. If self-hosting, customers pay for the compute infrastructure versus paying a small price per query. For companies in regulated businesses, or for sensitive data that isn't allowed on the public cloud, self-hosting may be the best option. This approach requires an upfront infrastructure investment, and a draw back is that it isn't able to easily leverage elastic capacity offered on the cloud.
According to OpenAI's?report?from 2018, most compute used for deep learning is spent not on training but on inference. Inference costs far exceed training costs when deploying a model at any reasonable scale.
Despite the cost, the value of the models often produces a positive ROI with significant advancements in insights driving automation that increase revenue, reduce cost, and increase reach into new markets.
If AI models are expensive, what should enterprise companies do
What should enterprise sellers who want to stay ahead of the curve do
What do you think is the biggest obstacle or opportunity for enterprise companies to deploy AI, and what is the best way enterprise seller can help?
A special thanks to Maryam Ashoori, PhD and Carlo Appugliese for the help on this article.
Good selling.
General Manager, Technical Community and Client Engineering
1 年Good follow up article - https://www.malaymail.com/amp/news/tech-gadgets/2023/05/21/dark-cloud-over-chatgpt-revolution-the-cost/70245
Founder and President, Farland Group
1 年Another great article Ayal Steinberg - I appreciate the depth you provided here. The space is moving fast and you've provided excellent insight into areas that enterprise CI/TOs are trying to get their arms around. We've started to hear about an increasing focus on recruiting for prompt engineers. If I'm reading and understanding prompt tuning correctly (which I may not be!) it seems like this explosion of prompt engineers may not be a long term trend. Will tuning replace this role? Keep up the great insights. I learn with every read!
Senior Sales & Business Leader | Driving Multi-Million Dollar Revenue Growth in AI, Cloud, & SaaS | Expertise in APAC & Global Markets
1 年Great one Ayal! and yes, it was a long read :-) Would this be a good summary? - The cost of building a foundational model is super high and only a few companies today have the resources to invest. - The cost of providing these models is high, falling into two main categories: training and inference costs. - Current foundational models are focused on B2C on a freemium model more to outsource training the model than training it for specific industry needs. - Enterprises are fearful due to the quality of the training data and privacy of their employee searches and are looking for substitutes to general use models that provide governance, traceability, and trust - The real market (who will pay for this) will be enterprises but they need someone to build models specific to their industry, that is regulated, and that the industry needs to trust. Or they would spend on building their own models (high upfront cost - might work out in the long run) Anything I missed?
I love the warning.
General Manager, IBM Power Systems
1 年Great read, thanks Ayal.