How to leverage Gen-AI in the enterprise and avoid the pitfalls

How to leverage Gen-AI in the enterprise and avoid the pitfalls

Every company appears eager to dive into the world of generative AI and implement it for an initial use case. While there are numerous promising applications for LLMs, there are just as many ill-advised choices. It's crucial to delve deeper into LLMs and grasp the distinguishing characteristics that separate the effective use cases from those that could become cumbersome burdens or outright disasters.

One of the first things to understand about LLMs is that they are ideally applied to conversational situations where a person is interacting with a digital application.

LLMs are less adept at handling logical operations, the type of tasks typically managed by rule-based coding. Despite the hype suggesting LLMs are transforming fields like supply chain, finance, or logistics, this isn't entirely accurate. While these sectors might occasionally present suitable LLM use cases, they aren't where LLMs are expected to excel. As of now, LLMs haven't sparked a revolution in any particular field. We are still in the process of exploring their potential industrial applications, so a bit of patience is required. Don’t get caught up in the breathless hype.

There are a couple of characteristics to look for when looking for LLM use cases. I’ll list them here and go into each of them.

These are the characteristics it must have:

  • Involves a qualitative task not a quantitative or logical task
  • Is tolerant of occasional errors

It should probably have some of these characteristics:

  • Makes use of written language
  • Helps a person digest or make sense of a large or confusing body of information
  • Is used as a second line of defense in preventing mistakes not the primary one
  • Helps people make use of a complex control system, user interface or domain specific language
  • Is used for learning or employee training or mentoring

Let’s go through a few of these and then will give some examples of actual use cases that would be good ones followed by some use cases that are suspect.

Qualitative not quantitative?

AI and machine learning are often utilized for quantitative tasks, but generative AI isn't usually the best fit for such purposes. Specifically considering LLMs, it's important to recognize that they make educated guesses based on patterns they've observed, aiming to mimic an "expert" response to a question. Their goal is to sound knowledgeable, regardless of whether they truly are experts. They strive to provide accurate information based on what they've learned. However, when faced with gaps in information, they may "hallucinate," or interpolate, to fill in the blanks. Techniques like RAG (Retrieval Augmented Generation), a straightforward but crude approach, improve this to some extent, but this inherent feature will always remain. LLMs aren't primarily designed for reliability or precision. It was somewhat of a happy accident that they were found to be useful for this at all.

LLMs are unlikely to excel in mathematical or logical tasks. Their design aims for the appearance of logic rather than actual execution of it. Essentially, they're crafted to emulate human-like reasoning, functioning more as tricksters or impersonators. Companies may release LLMs that appear adept at logic and reasoning, but this will largely remain an illusion. They might perform logically in familiar scenarios, yet the illusion dissipates when faced with truly novel challenges. The truth is, we don't necessarily require LLMs for logic tasks; traditional software methods usually handle this well. At best, LLMs for reasoning might find niche applications or serve as backups when standard programming approaches hit their limits.

Error tolerance

The issue of hallucination is widely acknowledged, frequently discussed, and at times, exaggerated. It's easy to imagine worst-case scenarios where LLMs give advice leading to tragic outcomes, raising significant concerns among lawyers. In reality, companies are cautious and usually over-cautious. It's fairly evident that LLMs are best suited for situations where a margin of error is acceptable. We're not on the brink of replacing doctors with LLMs or having them manage aircraft. Those interacting with LLMs must understand that they're not infallible when it comes to retrieving information, treating them more like knowledgeable employees who occasionally make mistakes.

However, the underlying issue runs a bit deeper. There's a risk of humanizing LLMs too much, leading to incorrect assumptions about their potential failures. While they do make mistakes, these errors often differ from those made by humans. They lack a sense of common sense and context and sometimes struggle to gauge the significance of the information being discussed. Unlike people, who might withhold a response when unsure of the answer but aware of its importance, LLMs tend to forge ahead. For instance, a seasoned attorney wouldn't casually respond to a question like "What is the name of the judge?" (imagine an employee drafting a letter) if uncertain. They understand the gravity of accuracy in such details and would refrain from guessing. An LLM, on the other hand, is more likely to take a stab and suggest "Judge Scalia" simply because it's a recognizable name that vaguely fits the context.

So the applications should be error tolerant. A music application that tells users information on a band, song or album might be a fun application for a company like Spotify. If occasionally it says Robert Plant was the lead singer of Pink Floyd, well… no one is going to die. As long as this kind of thing is not too common (it’s a perfect RAG problem and should be easy), the application can still create a lot of value.

The most effective way to utilize LLMs is when they assist in tasks that you can subsequently verify for accuracy. For instance, if you ask ChatGPT how to enlarge the thumbnails on your desktop, it will likely provide an answer that is correct. You'll promptly discern if it's accurate or not. Sometimes, you can inform it of its mistake, and it will apologize and attempt to provide a correct response, possibly succeeding on the second try.? If not, maybe you just Google it and read through some promising looking documents just like everyone did in the old days (i.e. a year ago). So not only is the thumbnail resize problem something that is non-critical, it is also something where you can verify correctness after making use of the information. In these situations, at worst, the LLM can be unhelpful. It will never however lead to disastrous outcomes.

Second line of defense

One approach to mitigating risks is to employ an LLM as a secondary layer of defense. Taking an extreme example, an LLM (or similar tool) could potentially verify air traffic control plans for safety. I'm not suggesting this becomes the primary safety mechanism, but rather an additional layer of protection. This setup might catch certain catastrophic errors. Picture this scenario: the LLM runs, detects what it interprets as a potential danger, and alerts air traffic controllers accordingly. As discussed earlier, this process is verifiable. If the LLM doesn't produce many false positive alerts, it proves useful whenever it identifies a real danger that was previously overlooked. At worst, it remains inactive and misses the same dangers. The crucial point here is that it either adds positive value or remains neutral—it cannot introduce negative value, other than wasted investment.

In reality, we might not begin with such an extreme example involving matters of life or death. This approach could be applied to detect certain types of fraud, such as phishing attempts, phone scams, and the like. Consider an app that operates on a phone, analyzing phone conversations in real time to flag potential scams. Scams can target anyone but are particularly effective with older adults who may have diminished cognitive abilities. For instance, the app might recognize a scam where someone claims to be from the bank, needing the user's password to check on suspicious transactions. It could then end the call and send a message to the user, alerting them that the call is likely a scam. Often, when someone brings attention to a probable scam, the victim recognizes it for what it is. There might be some security concerns to deal with there but they are likely solvable.

In fact, I suspect there are many applications of LLMs for dementia patients, perhaps as an assistant that can not only help the patient with simple things including lost memories but also identify aberrant behavior and call a guardian for intervention. Dementia patients often go through cycles of high functioning and low functioning and identifying low functioning situations quickly could prevent some types of harm. I suspect this would be an excellent area of research.

Document manipulations?

LLM use cases abound when it comes to helping people navigate large quantities of text. Legal search is an example. Imagine, you are given 5 years worth of text messaging data and want to identify as quickly as possible when discussions of drug trafficking were occurring. I’m pretty sure drug dealers don’t say things like? “Be sure to use the untraceable phone before calling the cocaine dealers. The drug mules are going to be here tomorrow so we can give them the drugs then”. They use sly lingo like, “Use the burner for calling the bricklayer, homes. Runners here tomorrow”. (Bricks are slang for kilos of drugs and runners are drug mules. ChatGPT told me this).

Sometimes you would like a summary of a large document. Maybe that will tell you whether you actually need to read it in more detail. Another common use case is loading a large number of documents into an LLM for creating a query tool. Such tools are best used paired with some more traditional document search tools that can actually produce relevant excerpts which can be used to support and verify the answer given by the LLM. There are a number of techniques for making this efficient and accurate. New variants of LLMs will soon be optimized for this very common and particular task. Expect to see some products very soon. As usual, you can count on products popping up to solve problems that are universal across industry so don’t invest too much in those areas. Invest in your niche, company-specific areas.

Learning and training

We have only scratched the surface of ways that LLMs can be used for education and training. Expect there to be considerable advances in this as educators learn how to best put LLMs to work educating the next generation, new employees and well.. all of us. Many of us who frequently use ChatGPT and similar LLMs treat it the same way we treated our parents when we were kids. “Dad, is there snow on the moon? What kind of dinosaurs lived in our neighborhood? Why do dogs spin around before lying down to sleep? Can cats eat eggs?” or perhaps somewhat more grown-up questions.

ChatGPT is very useful when you have a question where you lack natural keywords where search engines could be of more help. The other day for example I asked:

“Is there some kind of solid material that will create constant pressure on a membrane when it is squeezed the same way that a liquid will?”. It answered: “Yes, there are solid materials that can exhibit behavior similar to liquids when subjected to pressure. One common example is silicone gel or elastomers.”

I also asked, “Are there other materials which act like wet cornstarch where they move like a liquid if moved slowly but act like a solid under more force?”. It answered:

“Yes, there are materials known as non-Newtonian fluids that exhibit the same behavior as wet cornstarch (also known as oobleck). These materials behave like a liquid when handled gently or moved slowly, but they act like a solid under sudden or forceful impact. …”

Note that neither question is one where Google (a keyword indexing search engine) would likely have been helpful. In the second one, it told me about the useful search term “non-Newtonian fluids”. Once I have that, traditional searching will be more productive. This makes ChatGPT a very good tool when you are learning something so new to you that you lack the lexicon. Learning the lexicon of a subject is often a task for beginners in a new area and LLMs are very helpful in doing that. Questions like “What are those triangular support beams called that you sometimes see on a bridge?” (trusses, they are called) might be an embarrassing question for a new engineering student to ask a professor or even fellow students. They won’t be embarrassed to ask ChatGPT. It’s like being five again and having Mom and Dad around except it never gets tired of answering.

Cue Sarah Conor. "Watching John with the machine, it was suddenly so clear. The terminator would never stop. It would never leave him... "

Natural language to domain specific language

Another important use case for LLMs is in the translation of natural language into smaller domain specific languages including the operations of controls or user interfaces. This is a broad area of application. LLMs can translate natural language into computer code or query languages like SQL. It can be used for non-technical users making analytical charts, e.g.,? “Make me a bar chart showing total sales of ball-peen hammers by month, going back to 2016. Make separate bars for the US, Mexico and Canada.”

People can and do learn how to write code, SQL and use tools like Tableau for creating charts and those people will likely be more powerful users than those relying on natural language. Still, this opens up a lot of possibilities (and danger) for people who only occasionally need to do these things and will not remember the commands or techniques when they had to do it 6 months ago. Everything said before holds true here as well. The best use cases are ones with error tolerance and ability to verify correctness.

For example, even power-users of graphics languages like myself will often benefit from asking ChatGPT to write code for us: “Write a python program using the Bokeh library to make a hexbin heatmap with a rainbow color scale”.

This will cause it to spit out a bunch of Python code like so.

It may or may not run (this one will) but the goal is really just to print out a bunch of lines from which I can copy and paste to speed me up. It will usually get the imports right and give the right classes to use such as “hexbin” and show that “hex_tile” is the right method to run on the figure class. I’ll have to make changes anyway to connect to my data.

Unless I frequently make hexbin heat maps, I am probably going to forget what these functions and methods are called. I know other ways to get this information such as the Bokeh documentation page, or another codebase of mine,? but this is faster, at least if it gets it right. If it doesn’t get it right I will soon know, so it’s verifiable. There is little danger that I will do the wrong thing because I understand the language. LLMs specialized for coding will get better and better at this just as we will get better at prompting. This simply accelerates the ability of experts to make use of their domain specific languages including entire programming languages. It's also probably the place where Gen-AI has proven most valuable.

Operating graphical UIs is really just the same thing. You might still use Tableau but ask it these same types of questions and it will tell you what you need to do. You will probably be able to tell when it has told you the wrong thing. LLMs will likely be ubiquitous as helper agents in any complex UI products. Think Clippy but smarter and hopefully less annoying. If you have such software, consider adding this kind of feature. Even if it is limited in ability now, it will likely improve as LLM tech and your own capability progresses so it’s a good area to invest in.

There is still more danger when non-experts use these low-code approaches to solve problems. They have their places but will inevitably be used inappropriately. Low-code often has similar characteristics to DIY surgery. I once tried to lance a blackened toenail with a red hot needle. It didn’t go well, despite the clear directions on the internet. This is a space which needs to be handled with care but there are places where it makes sense.

Poor use cases

I've mentioned a couple of good use cases. Here are a few poor ones.

An example of a poor use case would be using Gen-AI to suggest supply chain decisions. You can train Gen-AI on supply chain decisions or plans and it will spit out new ones but these are not going to be very good. For one, it is mostly focused on getting the form correct. It will focus on cosmetic features. To an LLM $46,000 and $406,000 look very much the same. They only differ by one character. But obviously as numbers, they are vastly different. Gen-AI is not going to be able to make use of complex quantitative models or reasoning such as linear programming. Math is not it's strong suit.

Any plan for daisy-chaining LLMs together into complex graphs should be suspect. This is an active area of research and it may be fruitful someday but, currently, there is no solid knowledge on what should come of it. Whether LLMs can self-organize into helpful communities or whether they spin off into a death cult is completely unknown. Don't get ahead of yourself. Keep the connections between LLMs simple. Remember SkyNet. Don't let them organize.

Anything that relies on logic can be dangerous. LLMs simply don't do logic. They can talk about doing logic but that isn't the same. If you ask them to explain how they reached a decision, they might give a reasonable sounding answer. It's just that it wasn't what it actually did. It has no more insight into explaining its own decision making than we do when explaining how we knew a picture was a cat and not a dog. We don't make that decision through logic. We use pattern matching. Gen-AI uses pattern matching for everything. It has no other mode of operation.

People often claim or assume that Gen-AI will be very important in healthcare. People always seem to think that every technology will be important in healthcare but I think that might simply be because healthcare needs to much help. Something better help it right?

Healthcare is tricky though for a few reasons. One is that people's health is a critical thing and mistakes could have very bad outcomes. Healthcare is also quite regulated so doing anything in healthcare is difficult especially when you might be putting patients at risk. There are also privacy concerns that also have regulations. For these reasons, people are always excited about applying new tech to healthcare until they actually try and then they go away screaming.

Gen-AI is likely to be used in healthcare but it's probably an area that comes in phase two when we really know what we are doing and have had plenty of time to think where it is best applied. The second line of defense approach might be a good place to start. Imagine an LLM which runs after a doctors manual diagnosis that they are required to at least view. I some cases, they might point out a better diagnosis which the doctor will sometimes find valuable. The doctor is still making the diagnosis but uses the LLM to reduce mistakes.

Beware the pied pipers

Generative AI is most certainly in the early stages of a speculative bubble. If you’ve been around the block, you should know how this goes. If you haven’t seen as many of these cycles, please use caution and common sense. There is every incentive for everyone who is selling in this space to exaggerate and make use of carnival barker sales techniques and claiming that Gen-AI is that cure-all you have been looking for. Here are a few tips.

First, be on the lookout for FOMO, Fear of Missing Out. Nothing gets adopted quickly and effectively by industry. Sorry, it just doesn’t. It takes a decade or more for anything significantly different, as this is. Your competitors are unlikely to put you out of business because you haven’t converted the whole business to Gen-AI by year end. The FOMO tactic is a common, effective but disingenuous sales approach. The first round of Gen-AI applications are likely to be unsuccessful anyway so you won’t be too far behind if you can learn from the mistakes of others. If you missed the whole Hadoop fad but got started with Spark four years later, you probably came out ahead.

Crawl before you walk and walk before you run. Your first Gen-AI project should not be a network of LLM agents to handle your supply chain planning. Nobody knows how to do that and you don’t need to be the research guinea pig. Start with an example like I described above. Those seem to be easy enough and in most cases will demonstrate value. Like any software project, you want fast feedback cycles where you can learn from the user whether you are on the right track or whether you are overthinking it. Perhaps you are overly eager just to use a new technology that isn’t really needed.


I’ve described some common characteristics of good Gen-AI use cases which will hopefully spark some ideas of what might work for you. Keep a healthy dose of skepticism. The tech bros are never as smart as they want you to believe. No one knows what is coming next. Few people have even demonstrated clear successes in using Gen-AI to solve real problems in industry. Some such claims are currently under review. But Gen-AI is an important advance. It does solve some real problems that were not previously solvable and that’s not going to disappear. There are going to be important use cases in every industry. What is needed now is experimentation and better product thinking on how to adopt it. It may require some outside the box thinking to find the sources of largest value. The more obvious use cases may also be the ones of limited value even if they are the best places to get started. Don’t get hung up on the idea of LLMs as cheaper employees. LLMs are best used when interfacing with people. They usually don’t make good people replacements. In the few places where they do, it is in jobs that are robotic in nature anyway; areas that have always been targeted for automation. Instead, think of things you can’t do now and perhaps haven’t even dreamed of doing. Today’s paradigm of AI is pattern matching. When there are rich patterns of behavior in your business there is the possibility of automation through AI whether Gen-AI or more direct approaches. Gen-AI can model any probability distribution in any parameter space. It isn't limited to text and images. Stay open minded and clear headed and enjoy your journey into Gen-AI.

Embracing Gen-AI propels us forward - Elon Musk. It's about innovating responsibly, ensuring tech enhances lives ?? #Innovation #GenAI

Sam Panini

Transformation Strategist | Intrapreneur | Professional Generalist | Product, Operations & Alignment Engineer | Small Brand & Culture Geek | Cornell MBA

11 个月

Generating ideas > solving problems, lends itself to creative, non-deterministic tasks..



David Johnston的更多文章

  • Are we ready for the Long Winter?

    Are we ready for the Long Winter?

    Ah, my sweet summer child. You never care for the tales of the AI revolution, preferring instead the dark stories of AI…

    1 条评论
  • LLMs can pass math tests but can't do math

    LLMs can pass math tests but can't do math

    People who are marketing LLMs and want to impress others are spending a lot of time trying to get their LLM to perform…

    30 条评论
  • How to think about Large Language Models

    How to think about Large Language Models

    Large Language Models are truly amazing things. There is no denying the importance of this breakthrough.

    28 条评论
  • Why great developers should make great business executives

    Why great developers should make great business executives

    I've often thought that there should be reasons why great software developers should make great business executives. Of…

    1 条评论
  • Why mirrors confuse us

    Why mirrors confuse us

    People are often under the impression that mirrors swap left and right. But that seems weird when you think about it a…

  • Design of information systems in the age of AI

    Design of information systems in the age of AI

    Many enterprises are facing a very similar problem these days. This is the problem of how to use AI to open up a freer…

  • Don't go to college

    Don't go to college

    If you’ve followed my writing you might have noticed a theme. I write about a wide variety of topics including some…

    13 条评论
  • Fed up with gerrymandering

    Fed up with gerrymandering

    A judge is overseeing a case about gerrymandering between the two main political parties in a State. Both parties, when…

  • My AI Writing Compendium

    My AI Writing Compendium

    I decided to make a collated list of the major AI articles I have written; 10 so far. If someone really wants to know…

    1 条评论
  • Early Transcript from Bluetooth Design Committee (Dec 7, 1997)

    Early Transcript from Bluetooth Design Committee (Dec 7, 1997)

    Breaking News: A transcript has been found from the key meetings where the Bluetooth technology was developed. Chief…

