Driving Value From LLMs – The Winning Formula
Microsoft Copilot

Driving Value From LLMs – The Winning Formula

I have observed a pattern in the recent evolution of LLM-based applications that appears to be a winning formula. The pattern combines the best of multiple approaches and technologies. It provides value to users and is an effective way to get accurate results with contextual narratives - all from a single prompt. The pattern also takes advantage of the capabilities of LLMs beyond content generation, with a heavy dose of interpretation and summarization. Read on to learn about it!

?

The Early Days Of Generative AI (only 18 – 24 months ago!)

In the early days, almost all of the focus with generative AI and LLMs was on creating answers to user questions. Of course, it was quickly realized that the answers generated were often inconsistent, if not wrong. It ends up that hallucinations are a feature, not a bug, of generative models. Every answer was a probabilistic creation, whether the underlying training data had an exact answer or not! Confidence in this plain vanilla generation approach waned quickly.

In response, people started to focus on fact checking generated answers before presenting them to users and then providing both updated answers and information on how confident the user could be that an answer is correct. This approach is effectively, “let’s make something up, then try to clean up the errors.” That’s not a very satisfying approach because it still doesn’t guarantee a good answer. If we have the answer within the underlying training data, why don’t we pull out that answer directly instead of trying to guess our way to it probabilistically? By utilizing a form of ensemble approach, recent offerings are achieving much better results.

?

Flipping The Script

Today, the winning approach is all about first finding facts and then organizing them. Techniques such as Retrieval Augmented Generation (RAG) are helping to rein in errors while providing stronger answers. This approach has been so popular that Google has even begun rolling out a massive change to its search engine interface that will lead with generative AI instead of traditional search results. You can see an example of the offering in the image below (from this article). The approach makes use of a variation on traditional search techniques and the interpretation and summarization capabilities of LLMs more than an LLM’s generation capabilities.


Ron Amadeo / Google via Ars Technica

The key to these new methods is that they start by first finding sources of information related to a user request via a more traditional search / lookup process. Then, after identifying those sources, the LLMs summarize and organize the information within those sources into a narrative instead of just a listing of links. This saves the user the trouble of reading multiple of the links to create their own synthesis. For example, instead of reading through five articles listed in a traditional search result and summarizing them mentally, users receive an AI generated summary of those five articles along with the links. Often, that summary is all that’s needed.

?

It Isn’t Perfect

The approach isn’t without weaknesses and risks, of course. Even though RAG and similar processes look up “facts”, they are essentially retrieving information from documents. Further, the processes will focus on the most popular documents or sources. As we all know, there are plenty of popular “facts” on the internet that simply aren’t true. As a result, there are cases of popular parody articles being taken as factual or really bad advice being given because of poor advice in the documents identified by the LLM as relevant. You can see an example below from an article on the topic.


Google / The Conversation via Tech Xplore

In other words, while these techniques are powerful, they are only as good as the sources that feed them. If the sources are suspect, then the results will be too. Just as you wouldn’t take links to articles or blogs seriously without sanity checking the validity of the sources, don’t take your AI summary of those same sources seriously without a critical review.

Note that this concern is largely irrelevant when a company is using RAG or similar techniques on internal documentation and vetted sources. In such cases, the base documents the model is referencing are known to be valid, making the outputs generally trustworthy. Private, proprietary applications using this technique will therefore perform much better than public, general applications. Companies should consider these approaches for internal purposes.

?

Why This Is The Winning Formula

Nothing will ever be perfect. However, based on the options available today, approaches like RAG and offerings like Google’s AI Overview are likely to have the right balance of robustness, accuracy, and performance to dominate the landscape for the foreseeable future. Especially for proprietary systems where the input documents are vetted and trusted, users can expect to get highly accurate answers while also receiving help synthesizing the core themes, consistencies, and differences between sources.

With a little practice at both initial prompt structure and follow up prompts to tune the initial response, users should be able to more rapidly find the information they require. For now, I’m calling this approach the winning formula – until I see something else come along that can beat it!

Steve VanWieren

Award-Winning Data & Analytics Professional | Strategy | Team Formation & Development | AI/ML Solutions

6 个月

It really has been quite a fascinating 24 months, hasn't it? The evolution from AI hitting the mainstream to now has created a lot of confusion in my opinion. Where do you see SLM's fitting into this mix?

JP Snow

Customer Analytics & Strategy Leader | Empowering Growth Through Customer Insight and Transformative Scale | Fractional CAO/CCO/CRO | Author & Advisor

6 个月

An opportunity I see in this is how RAG offers more potential for the algorithms to learn how to find and evaluate the original sources. So much online content is replicating the same things, often verbatim. A basic form of LLM would weight a fact based on how much it's replicated instead of weighting the credibility of the source. RAG's are more easily enhanced to evaluate wisely.

Larry Scheurich

Senior Healthcare Data Architect helping healthcare organizations transform patient outcomes through innovative data architecture and analytics solutions.

6 个月

Great article! As an experienced Data Engineer, I can say this triggered a lot of thoughts, especially when it comes to unstructured data. There are facts, mis-represented facts, opinions, and sentiment. Trying to get to the "best" answer across all of those is daunting. My experience tells me that there are far fewer facts than there are mis-represented facts and opinions. If algorithms/techniques want to improve their accuracy, that is the nut to crack. Unfortunately, I fear this is too big and elusive.

Julia Bardmesser

Accelerate the Business Value of Your Data & Make it an Organizational Priority | ex-CDO advising CDOs at Data4Real | Keynote Speaker & Bestselling Author | Drove Data at Citi, Deutsche Bank, Voya and FINRA

6 个月

Great points, Bill Franks! It all goes back to having useful and usable information and data, doesn't it?

要查看或添加评论,请登录

Bill Franks的更多文章

  • Do NOT Deploy THAT Chatbot!

    Do NOT Deploy THAT Chatbot!

    But, DO deploy THIS chatbot! One of the most touted applications of generative AI and LLM models is the chatbot. From…

    13 条评论
  • The Complexities Of Computing Analytics ROI

    The Complexities Of Computing Analytics ROI

    Computing the ROI of any analytics, machine learning, or artificial intelligence (AI) process can be more complicated…

    16 条评论
  • Artificial Intelligence Concerns & Predictions For 2025

    Artificial Intelligence Concerns & Predictions For 2025

    As 2024 comes to an end, I find myself worrying about a couple of aspects of the rapid progress we’ve made with AI this…

    35 条评论
  • Data Science Collaboration In The Age Of AI

    Data Science Collaboration In The Age Of AI

    The world of data science has become so complex, especially given the rise of AI, that no individual can be an expert…

  • How Artificial Intelligence Adds Value To The Research Process

    How Artificial Intelligence Adds Value To The Research Process

    Let’s dive into an example of AI making our daily life easier and enhancing our job performance - all while creating…

    7 条评论
  • Large Language Model Usage: Assessing The Risks And Ethics

    Large Language Model Usage: Assessing The Risks And Ethics

    With the ever-expanding use of large language models (LLMs) to generate information for users, there is an urgent need…

    2 条评论
  • Your Sensitive Data Is Public Record

    Your Sensitive Data Is Public Record

    I have had my data stolen multiple times over the years when a company that has my data is the victim of a hack. It…

    21 条评论
  • Foundational Generative AI Models Are Like Operating Systems

    Foundational Generative AI Models Are Like Operating Systems

    With generative AI evolving so rapidly, there aren’t many things that can be stated with confidence about exactly where…

    10 条评论
  • Same AI + Different Deployment Plans = Different Ethics

    Same AI + Different Deployment Plans = Different Ethics

    This month I will address an aspect of the ethics of artificial intelligence (AI) and analytics that I think many…

    14 条评论
  • A Paranoid Future Scenario For AI

    A Paranoid Future Scenario For AI

    Ready for a paranoid view about how AI, deepfakes, censorship, and the metaverse can be combined in a trust-destroying…

    19 条评论