Co-pilots are like Multivitamin Pills for Olympians
Very lame picture produced by Microsoft "Designer"

Co-pilots are like Multivitamin Pills for Olympians

There has been a recent spate of stories along the lines of "GenAI not as expected", or "too costly, too little", etc. Much of this is nonsensical thinking, or an absence of any thinking at all. Ironically, this gives us little hope for adopting tools that seemingly ask us to think less, not more. However, these conclusions are false, or should be treated with caution, as I hope to lay out.

As ever, we have to stand back and take stock.

On the one hand, LLMs are truly astounding. They can do things previously thought impossible and that would have required PhD-level researchers. If you cannot extract value from this technology, given the pivotal role of language in business, then you are probably not doing it right. This is a generalization, but also a truism.

Whilst LLM benchmarks do NOT translate into KPI gains 1:1, they do tell us something. It is worth familiarizing oneself with what they are and then doing the work to figure out how to translate them into KPI gains. I will give an example shortly.

With LLM adoption, the same rules of innovation still apply, as they always will in competitive arenas. There is no magic wand of sprinkling LLMs onto a business and supercharging productivity. This is like a jogger taking a multivitamin pill and expecting to become an olympian.

Investment in technology ought to produce an s-curve of benefits. The timing of that curve and extraction of value over time is part of strategy. Many seem to expect rapid acceleration from day zero, moreover via the worst of all possible deployments: giving everyone a co-pilot -- aka, the multivitamin pill.

In many cases, the multivitamin pill might get a boost from a Vitamin B-2 pill, which is a RAG used to build an "answer bot". Again, marginal gains, but mostly if your initial process was deficient.

Well, that might not be true either because the old adage still applies: garbage in, garbage out. So, if your source docs are deficient, the LLM could make matters worse from both a point-solution and holistic perspective.

However, many existing processes are not that deficient, thanks to years of digital transformation. And here's the issue: in many cases, unless the LLM can replace the entire process, it likely only achieves marginal gains -- process dependencies are still swamped by the non-LLM parts. Worse still, if improperly deployed (see below), the inevitable need for remedial checking of results can nullify intended gains. Often this hits a tipping point wherein users drop the LLM altogether and return to familiar territory of a spreadsheet, CRM, dashboard, or even just common sense. No one pays attention to this in advance, too enamoured by the initial demo of apparently magical proportions.

This is the counterintuitive aspect to adopting GenAI. If you focus on "low-hanging fruit" -- some lazy RAG implementation -- then you are probably not going to get much and the astounding power of LLMs will seem unremarkable. (Actually, take a look at folks like Snorkel who produce good data about performance of "out of the box" RAG vs. a holistically tuned system -- in their case via weak supervision and labeling functions.)

Understandably, the hype is hard to avoid, made worse by letting underqualified folks lead the GenAI charge. They produce their own kind of hype -- everyone wants to be, or claims to be, an AI expert these days.

Don't be fooled by impressive demos. Crossing the demo-to-realization gap is where 90% the expertise is typically needed, besides the ability to identify strategic use cases vs. "low-hanging fruit".

Or, put another way: the default out-of-the-box RAG, or similar, will produce seemingly impressive results if all you are measuring is how well the agent seems to answer questions. But when you take a step back, the productivity gains are minimal. Moreover, it becomes apparent that to get to the next level of performance, the following two realities emerge:

  1. Every additional feature is now a project wherein it becomes obvious that letting the LLM do all the work is not quite true, or completely false.
  2. Once the libraries and models stop working -- i.e. you hit some brick wall in performance or features -- it now requires deeper technical expertise than anticipated to solve.

Returning to benchmarks. One impressive score is the radical improvement in GSM8K -- the ability to solve complex verbal math problems. This might seem irrelevant, as in no business is asking workers to solve math puzzles per se. But this core LLM ability has the potential to translate into complex reasoning of various numerical problems within your domain. The challenge is not to find such existing problems in the organization as configured, but to reimagine or re-configure processes or product flows such that the ability to do such reasoning becomes a competitive advantage.

As an example, consider complex manufacturing systems. Many parts of the puzzle are already well optimized with decades of IT investment and product iterations, such as core project management, the world of CAD etc -- i.e. where Autodesk alone has invested billions in innovation. You're not going to take LLM "magic dust" and make any of these tools 10x better. Start-ups with this pitching pattern: "Better mousetrap via LLMs" should -- and increasingly do -- receive $0 investment. (In one of my roles as a start-up filter, I've seen tons of these pitches.)

However, much of the "In Real Life" (IRL) aspects of manufacturing management take place via "Dark data" or "Shadow IT", including, say, unstructured reports shared via messaging apps and so on. These parts have been hard to optimize or penetrate, for a number of reasons, some of them potentially irreducible to easy LLM-friendly problems.

The better approach is to ask how might the well-optimized process steps be augmented using "Classic" AI/ML such that the LLM can produce benefits via its special abilities, like solving math puzzles.

In the rush to adopt GenAI, classical AI has been overlooked -- wrongly. Well, it's worse than that: many uninformed leaders have stopped investing in hardcore data science and AI/ML capabilities because they believe that LLMs will now do everything.

Non-LLM AI/ML has massive untapped potential. What many leaders (and their non-expert advisors) have missed is that the entire field has been boosted for the same reasons LLMs became possible: an abundance of data and computation. Even now, recent techniques in, say, anomaly detection, have been made possible by GPUs (e.g. Matrix Profile).

So, how to think about this classic AI plus LLMs via the lense of benchmarks like GSM8K?

In the case of asking an LLM to reason about a business task, this is clearly not well posed as some arbitrary math problem with ladders of length beta leaning against houses at angle theta. Think of the classic AI/ML as being able to pre-process and extract meaningful data that can be fed into the "math reasoning machine" in order for it to become a "business reasoning machine".

As an analogy, think of the Classic AI/ML as like a pair of x-ray spectacles that can peer into the engine of a car to see causal events versus relying solely upon the diagnostic codes from some instrument.

It is this "bottoms up" data probing and processing ability that will lead to the sci-fi "Oracle-like" situational awareness of some uber-enterprise-bot, not just the LLM! If the LLM is the "bionic brain", then we still need to give it bionic limbs, eyes, and so on -- aka, "classic" AI/ML.

Traditional AI can give the LLM more insightful data to work with -- data that it hitherto couldn't see. But now we have the next issue -- the dreaded "hallucinations". The best way to think of hallucinations is to realize that the LLM has been trained on a large swathe of information that influences every result.

It is a mistake to think of this as "general knowledge". It isn't that an LLM knows about Shakespeare, say, but also the many criticisms of Shakespeare, perhaps even the various mystical accounts of his sonnets, and so on, along with all manner of rhyme and meter insights. This isn't "general knowledge", this is vast amounts of "expert knowledge".

The problem with this pre-trained expert knowledge existing inside of the vanilla LLM is that it acts like a center of mass with enough gravity to pull answers towards its own pre-existing "understanding" of the world via many seen-in-training examples versus your particular example - the actual project in hand. (For the technical, see the paper "Embers of Autoregression" for an 80-page discussion.)

This is what I call the "probabilistic distortion", similar to lenticular distortion. Well, the analogy is probably poor, but you get the idea. (Actually, it's probably like a "reality distortion field", although not the one Steve Jobs had in mind.)

In a nutshell, LLMs struggle to pay attention to your details versus the many it has in its powerful memory bank (training weights). There are two ways to ameliorate this tendency:

  1. Use classic AI/ML to give the precise x-ray insights data for the LLM to reason with -- think of this as both providing actionable data and providing a super-massive gravitational field to counteract the LLM's own gravitational pull
  2. Adapt the "math reasoning" (and other forms of LLM "reasoning") to find useful ways to bridge the unstructured "shadow IT" data with the x-ray data. This would fall under the topics of holistic tuning (including graph-based systems), but in ways that generally require AI expertise vs. merely dumping data into a RAG, or whatever.

In conclusion, sprinkling LLM magic dust onto business problems in a naive way is liking taking a multivitamin pill and expecting olympian level results. Rather, you have to dig deep to build a set of core skills and strengths. One of these still comes from "classic" AI/ML, a field that has advanced significantly, both in terms of performance and availability (due to the abundance of open source libraries and learning materials). In this rubric, think of the LLM as an orchestrator of many specialist ("bionic") AIs. With this approach, it becomes possible to rethink business processes via some of the reasoning advances of LLMs, as seen in their benchmarks, like the GSM8K math-solving benchmark.

要查看或添加评论,请登录

Paul Golding的更多文章

  • Real-time Distributed Data Science is *The* Future!

    Real-time Distributed Data Science is *The* Future!

    No! Data is NOT your differentiator..

    4 条评论
  • Back Propagation: Holistic Overview

    Back Propagation: Holistic Overview

    Introduction This post is for readers interested to know the wizardry of AI behind the curtain. It is an attempt to…

    2 条评论
  • Making the Winning Move: from Beyond Human to Beyond Organizational AI.

    Making the Winning Move: from Beyond Human to Beyond Organizational AI.

    When I first worked with the brilliant innovator Geoff McGrath at McLaren (Formula One), we were hawking the promise of…

  • The New Innovation: Right-brained AI

    The New Innovation: Right-brained AI

    GenAI is still in its infancy, yet many leaders already want to know what they should do differently beyond exploring…

  • Will AI Kill Creativity?

    Will AI Kill Creativity?

    This was a question posed in a philosophy forum. Below is my hurried response (slighted edited and extended to make…

    1 条评论
  • Companies not using AI will lose sales

    Companies not using AI will lose sales

    Mind Meld with the Customer How would you like to mind-meld with the customer, as if you had perfect knowledge of all…

    1 条评论
  • The AI Paradigm: Scaling

    The AI Paradigm: Scaling

    Summary The emergence of Large Language Models with their impressive beyond-human performance (in many benchmarks)…

    1 条评论
  • Mindless Data

    Mindless Data

    I my last post, I wrote about being "data driven" operationally, and in the post before that I wrote about the mindsets…

  • ChatGPT is the Tip of the Iceberg

    ChatGPT is the Tip of the Iceberg

    ChatGPT is just the tip of the iceberg that enterprises find themselves crashing into even though the underlying mass –…

    4 条评论
  • The Killer Use Case for Generative AI is Empowering Enterprise Citizens

    The Killer Use Case for Generative AI is Empowering Enterprise Citizens

    Digital Democratization: Tech Osmosis Following on from a recent post about digital democratization, let's explore…

    6 条评论

社区洞察

其他会员也浏览了