On ChatGPT, LLM and ML Infra
Recently there have been lots of discussions on ChatGPT, LLM (Large Language Model) and how they would change the way we use data, model and infra in the industry. There are both great insights and confusions. I want to share some of my thoughts on this topic based on my experience.
I’ll put the conclusions out first:
The ML task and model landscape
Let’s begin with the lay of the land of ML tasks and typical models. I’ll use the below diagram to illustrate the general landscape. It will be easy to see where things like Deep Learning and language models fit on the chart.
As seen in the diagram, majority of the ML tasks in the industry fall into two main categories: behavior prediction and content understanding/generation. For example, recommending content to a user (by predicting whether the user will like it) belongs to the former, and generating a picture based on a prompt belongs to the latter. This categorization is high level and not absolute.?
In both of these tasks, models have evolved over multiple generations from shallow algorithms to deep algorithms in the last couple years. The entire upper half of the plane are using Deep Learning models. Now with this context, let’s talk about a few interesting topics.
Not just LLMs use Deep Learning, other models can be deep too
Language and vision models have been frequently discussed in connection with Deep Learning, and it created a false impression that they (especially LLMs) are the only (or primary) models that use DL. In fact, DL has already been widely used in many models outside of the language/vision domain.?
While language models are getting bigger in the last couple years, the recommendation models have been evolving at the same time using DL too. Today these recommendation models built by tech companies are also large and deep, and they are the primary models being monetized by the industry.
领英推荐
LLMs don’t replace other models?
Language models are created to perform a specific set of language related ML tasks. They are not designed to perform other tasks such as recognizing pictures, recommending videos, or detecting fraudulent accounts.?
LLMs are more powerful versions of language models, but that doesn’t change the fact that they are not designed for every ML task. Majority of the ML tasks outside of language domain will continue to be done by the models designed for those purposes, and the need for these tasks in the industry will certainly continue to grow. And like language models, these non-language models will evolve as well.
What LLMs will replace are the language models with lower ROI. Note that ROI is the key here, since it only makes sense to use a model when the return outweighs the cost, and LLMs are very expensive to train and run today.
However, one thing LLMs can potentially do is to work with other models on complex tasks, and outrun legacy systems without LLMs. Read on.
LLMs can help other models work better
Language and vision models are not only capable of working standalone, but they are also increasingly being used to enhance other models in recommendation tasks today, such as pre-processing data for downstream models to consume. As a result of their enhanced ability to interpret and generate information, LLMs could unlock new potential to boost the performance of other models in these tasks, by working at upstream or downstream of these models.
Although I don’t have information on the internal design of the new Bing search experience, I suspect it’s one of such examples where large recommendation model work in conjunction with LLM to perform a complex task of search recommendation + conversation, where it does both behavior prediction and content understanding/gen. Based on the preview result, it has certainly brought the search experience to a new level. Such architecture will be more prevalent going forward.
What do all these mean for ML infra
As discussed above, LLMs are advanced version of smaller language models, but they won’t replace models built for other tasks such as recommendation. In addition, LLMs can enhance how other models are used and help them perform more complex tasks. As LLMs are evolving, non-language/vision models such as recommendation models are evolving too.
This means infra used to support both language models and non-language models (such as recommendation models) will continue to be needed. Furthermore, with the rapid increase of both model size and hybrid usage of models, there will be a growing demand for infrastructure to scale well and to support different model types to work together seamlessly.
What's your thought? Let me know and I'm happy to discuss.