Taking Large Language Models To The Next Level
Bill Franks
Internationally recognized chief analytics officer who is a thought leader, speaker, consultant, and author focused on analytics, data science, and AI
In recent weeks, I’ve written several blogs related to the limitations and misunderstandings of popular large language models (LLMs) like ChatGPT. I’ve talked about common misunderstandings as well as areas where today’s tools can be expected to perform better (or worse). Here, I’m going to outline an approach that I believe represents the future of LLMs in terms of how to make them more useful, accurate, and impactful. I am already seeing the approach being implemented and expect the trend to accelerate. Let’s dive in!
Ensemble Models – Proven For Machine Learning, Coming To LLM Applications
One of the approaches that helped increase the power of machine learning models, as well as classic statistical models, is ensemble modeling. Once processing costs came down sufficiently, it became possible to execute a wide range of modeling methodologies against a dataset to see what works best. In addition, it was discovered that, as with the well documented concept of The Wisdom of the Crowds, the best predictions often came not from the best individual model, but from an averaging of many different predictions from many different models.
Each modeling methodology has strengths and weaknesses, and none will be perfect. However, taking the predictions from many models jointly into account can yield strong results that converge - on average - to a better answer than any individual model provides.
Let’s set aside this concept for a moment to introduce another concept that we need before we can get to the main point.
Applications Versus Models – They Are Not The Same!
The next concept to understand is the difference between a given LLM model (or any type of model) and an application that lets users interact with that model. This may sound at first like a minor distinction, but it is not! For example, marketing mix models have been used for years to assess and allocate marketing spend. The ability to actually drive value from marketing mix models skyrocketed when they were built behind enterprise marketing applications that allowed users to tweak settings, simulate the associated impacts, and then submit an action to be operationalized.
While the marketing mix models supply the engine that drives the process, the application is like the steering wheel and gas pedal that allow a user to make use of the underlying models effectively. LLMs themselves aren’t user ready when built as they are effectively a massive number of weights. When we say we’re “using ChatGPT” or another LLM today, what we’re really doing is interacting with an application that is sitting on top of the underlying LLM model. That application serves to enable the model to be put to practical use.
Now let’s tie the last two themes together to get to the point…
领英推荐
Taking LLMs To The Next Level
The future of LLMs, in my opinion, lies in the process of bringing the prior two concepts together. To make LLMs truly useful, accurate, and easy to interact with, it will be necessary to build sophisticated application layers on top that utilize an ensemble approach for getting users the answers they desire. What does that mean? Let’s continue to dive in deeper.
If I ask a traditional search engine and an LLM model the same question, I may get very similar or very different answers, depending on a variety of factors. However, each answer likely has some truth and usefulness that can be extracted. Next-level LLM applications will develop methods for getting results from an LLM, a traditional search engine, and possibly other sources, and then use those results to compare, contrast, and fact check each other. The final output returned to the user will then be a “best” combination of the various outputs along with an assessment of how reliable the answer is deemed to be.
In other words, if an LLM and a search engine provide almost the same answer, there is a good chance it is mostly accurate. If the answers differ greatly and those differences can’t be explained, we could have an issue with hallucinations and so we can be warned that there is low confidence and that we should perform additional manual checks of the information.
Adding Additional Engines To The Mix
My envisioned ensemble approach will make use of a range of specialized engines as well. For example, Wolfram|Alpha has a plug in that will let ChatGPT pass off computational tasks to it. This is important because ChatGPT is notoriously bad at computations because it isn’t a computation engine. By passing computational tasks off to an engine meant for computation, the final answer generated by the LLM application will be superior to the answer generated without making use of such an engine.
In time, LLM applications will evolve to use a wide range of specialized engines used to handle specific types of computation. There might be engines that handle questions related to specific scientific disciplines, such as genetics or chemistry, that are specially trained for the computations and content associated with those disciplines. The common thread will be the text-based prompts we feed the application that it can then parse and pass around to the various engines before combining all the answers received together, synthesizing a blended answer from it all, and returning it to us.
It is important to note that the process of blending the ensemble of answers together is itself a huge problem that is likely even more complex than any of the underlying models. So, it will take some time to realize the potential of the approach.
Winning with LLM Ensemble Applications
Over time, it is easy to imagine an LLM application that passes prompts to multiple underlying LLM models (an ensemble of LLM models), as well as a range of specialized engines for specific types of content (an ensemble of specialized engines), before consolidating all the results into a cohesive answer (an ensemble of ensembles if you will!). In other words, a successful LLM application will go far beyond simply passing a prompt to an underlying LLM model for processing.
I believe that LLMs themselves are already quickly becoming commoditized. The money and the future aren’t in providing a better LLM at this point (though improvements will continue to come) as much as in providing better applications. These applications will make use of an ensemble approach to take advantage of various available LLMs alongside other specialized models and engines that handle specific types of computations and content. The result will be a powerful set of solutions that help AI reach its potential.
President, Advanced Analytics ● Wharton, Senior Fellow, Lecturer, UC Berkeley ● Analytics and Data Science Expert ● Insights | Strategy | Results
1 年Reconciling search engine results with LLMs is an astute insight. Thanks for sharing the blog.
Ecosystem Architecture - Modern Hybrid Infrastructure and Data Architecture Designed for Analytics
1 年So many threads. All fascinating. Thanks for brining these topics to the table. I still see Dr. Chandra as a startup name - the Original prompt engineer.
Chief Revenue Officer I Analytics & Strategy Leader I 3AI Thought Leader I Fintech Enthusiast
1 年Bill , your approach makes sense. I come from the banking analytics perspective, I can see LLM's being trained specifically on proprietary trading or compliance data (text or numeric) and then being ensembled with more generic LLM (PaLM, GPT 4.0 etc.) to produce more stable, robust and accurate responses that bankers can trust 99.99999% of time. Thanks for sharing your approach.
Bill Franks Thanks for Sharing! ?