Mastering GenAI in Business: Productivity, Applications & Market Dynamics
Philipp Masefield
Head Beyond Services @ AXA | Leadership in Business, People & Digital Transformations - hands-on & advisory | Insurance IT Executive (4yrs), Project Manager (PMP, 10+yrs) | Early-Stage Investor
TL;DR
The article explores the business implications and potential applications of large language models (LLMs) in three areas: market dynamics, the future of knowledge work, and working with LLMs. It highlights the need for productive, value-generating applications to drive returns in the AI ecosystem and identifies two competitive AI fronts: high-performance, general-purpose models and highly-specialized models tailored for specific use cases. The article also introduces a third position: good-enough models that offer sufficient performance for many use-cases at lower costs. The future of knowledge work section discusses the potential impact of generative AI on knowledge work and jobs, emphasizing the importance of focusing on automating tasks, not entire jobs. Finally, the working with LLMs section offers practical advice for leveraging large language models effectively, emphasizing the need to embrace the probabilistic nature of LLMs, experiment to discover capabilities and limitations, guide LLMs with practical wisdom, and craft prompts iteratively. [^1]
This is the second piece in my three-part series on my generative AI (GenAI) learning journey since ChatGPT's viral launch just over a year ago. In the first article, “Charting GenAI's Course with a Tech-First, Multidisciplinary Approach”, I had:
The final article in this series will explore the Societal perspective.
If 2023 was the year of 'tech fascination' with the new Generative AI (GenAI) phenomenon, will 2024 be the year of finding productive, value-generating applications?
One driver for this shift towards value will be the economic pressure to generate returns. As Andrew Ng clearly points out, for the AI ecosystem to be sustainable, there needs to be real value-generating applications, whether consumer or business focused, that can generate revenue to support the companies developing these large language models. So far, the space has largely been funded by investors, but for it to move beyond fascination into true productivity, monetizable applications using this technology will likely have to emerge over the next year to close that revenue gap. [^Cent]
In the following sections, we'll explore the business implications of Large Language Models (LLMs) across three key aspects:
[^Cent]
Market Dynamics
Providers of LLMs
In November 2023, I shared my sketch of how I understood the market dynamics playing out around LLM capabilities, illustrating a two-dimensional landscape of performance and specificity. I concluded that two competitive AI fronts existed:
(1) A very small number of high-performance, general-purpose models that continually push the boundaries of what's possible. Dominated by a small number of players with substantial resources - OpenAI, Anthropic, Google, maybe one or two more players? - this space is as competitive as it is innovative.
(2) A broader area for highly-specialized and much smaller models that are tailored for their specific use cases. Through their smaller size they will be far more economical to build and operate, and can even outperform the largest general-purpose models in terms of quality in their specific niche.
The basic logic still applies, and I still see this as the overall market dynamics. Yet, recently through personal experimentation, I have come to appreciate that there might be a third position:
(3) Good-enough performance models, often open-source, offering sufficient performance for many use-cases at lower costs, enabling new business models (think freemium) or large-scale processing (for a viable cost).
This is a view that is also reflected in a recent analysis concluding that the high-end models will likely capture most of the long-term value. However, it's also evident that second-tier models, which strike a balance between quality and cost, will create a significant market niche worth billions of dollars, especially when optimized. Illustrating this, here are sample prices for different models (as of writing, available on OpenRouter, indicated per 1 million tokens input / output):
.. Category 1: Context >100k: [^2]
.. Category 3: Use-cases with a 4k context window:
Clearly, depending on the specific use case and business model, alternatives to the default OpenAI model should be explored. Generally, there's a perception that the “AI industry is experiencing a pivotal shift as inference costs plummet”, with highly competitive pricing for GPT-3.5 class models. This segment has become commoditized. In such a market, with a clear leader (and only a few close followers) and a large group of second-tier providers, only a few “can actually make money off these models”:
AI (-Inside) Products Landscape
The makers of these Large Language Models, first and foremost OpenAI, have ushered in a new era of products and services. Over this last year, we've seen an explosion in end-user products that leverage this technology. I would classify these products into two main categories: those that offer AI-enhanced features for existing products, and those where AI is the primary offering.
In the case of existing products, AI-enhanced features provide value when they introduce new capabilities that align with the core offering or enhance existing functionality. Notion serves as a good example of this. However, it is worth noting that many existing products simply add AI features to capitalize on the current hype surrounding AI.
The second category are the new end-user products that are primarily build on and around generative AI. This category represents the multitude of AI tools that have popped up for specialized use cases. Examples include the countless AI copywriters. These products(*) often merely abstract technical settings with use-case-specific business logic which guides and restricts users, all while providing a user-friendly interface. Technologically, these products can be explained quite easily as a combination of:
(*): There certainly are noteworthy exceptions, products build on GenAI that are far more. One such example that I use frequently is Perplexity, which combines a LLM in the role of a “reasoning engine” with a search index as the “knowledge engine” to create their “answer engine” (excellent interview on Exponential View).
Shifting our focus, what do the rapid advances in AI capabilities mean for the way we work?
Future of Knowledge Work
Early 2023 saw economy-level estimates predicting that generative AI could significantly impact work, particularly white-collar jobs. For instance, Goldman Sachs estimated an AI automation potential of 25-50% across nearly two-thirds of US occupations. In April 2023, initial studies highlighted GenAI's potential to boost productivity in office work. Specifically, LLMs demonstrated a skill-leveling effect for certain call center tasks, disproportionately benefiting lower-skilled workers and raising average skill levels overall (see my visualization attempt). Although many office jobs differ from call center work, and it was only a smaller study, this early evidence remains noteworthy.
So is AI really such a big deal for the future of work? In September 2023, Ethan Mollick answered: “We have a new paper that strongly suggests the answer is YES". As a co-author, he was referring to their very robust study with BCG consultants that confirmed early productivity impact indications, and provided some insights on integrating human and AI capabilities. Based on this study, I wrote about my personal takeaways for approaching and using LLMs.
In all these studies, Andrew Ng 's insight is key: Focus on automating tasks, not entire jobs. As he highlights, the strategy is to discern which tasks within a job's spectrum are suitable for automation. While businesses consistently pursue improvement, GenAI introduces a significant leap in capabilities. It's not solely about core tasks—automating the supporting tasks can also lead to marked efficiency improvements.
Interestingly, a similar yet distinct leveling dynamic can be observed in the entertainment industry: Amateur creators benefit, while mid-tier artists struggle against the sheer volume enabled by AI. However, top stars leverage the technology to extend their reach even further. (The Economist provides a good read on this: Now AI can write, sing and act …).
Given the various studies (and several others not mentioned), I believe we should all be motivated to engage with generative AI and its potential. So let’s get more practical.
Working With LLMs
In working with Large Language Models, remember that perfection isn't attainable by merely handing off tasks. There are different ways to think about this - and the most important message might actually be that we need to actively and consciously think about it.
According to MIT's Thomas Malone (in a 2019 article, so in a time of Machine Learning before Generative AI), ”it’s more useful to think of AI in terms of humans and computers complementing one another within the context of smart groups”. During these collaborations, AI can assume various roles: tools, assistants, peers, or managers. In an MIT course on AI, Professor Malone elaborates on these roles:
I consider this a reasonably useful way of thinking about working with LLMs, as it can guide the way we interact and what we expect to receive.
Microsoft’s analogy of the “Copilot” in everything is instantly appealing. But obviously, the analogy is imperfect at best, and at times even misleading - at least at the current state of technology. Yes, it is correct that the ultimate authority and accountability lies with the captain on a plane rather than the copilot (so the human user rather than the AI). But as the captain, I don’t think currently AI’s are ready for “your controls” and handling the entire flight and landing …
I am partial to Reid Hoffman ’s notion of seeing GenAI as “human amplification” (e.g., on his “Possible” podcast series), with the key that we as the humans still remain in the driving seat. A good example for this can be the process of writing, where LLMs can play an equalizing part, allowing people struggling with writing to overcome this. Think about it in this way: The writing process can be broken down into the input (or prompt formulation), text generation, and editing. While LLMs excel at rapidly generating text, humans still provide the creative spark through interesting inputs in the prompts and then apply judgment in editing. This allows more people to contribute their ideas by utilizing AI for the text generation step. In that sense, LLMs can act as an equalizer, amplifying input from a broader range of thinkers.
LLMs Are Not Like Software
Unlike deterministic software, LLMs represent a paradigm shift that needs to be experienced firsthand. Tellingly, ongoing empirical research aims to understand LLMs' behaviors, capabilities, and limits (see e.g.,?EmotionPrompt or that just adding a single sentence makes Claude overcome its reluctance to answer). It is remarkable that LLMs exhibit emergent behaviors not readily apparent from analyzing their code.
So the software analogy to understand LLMs does not work. The most effective approach may be to anthropomorphize LLMs, treating their differing capabilities as distinct personalities rather than software functions. So to “treat AI as people [might be] pragmatically, the most effective way to use the AIs available to us today”, and sometimes encouragement can unlock a LLM's hidden potential (as this illustrative example shows). As such, I have found it useful to keep in mind what Ethan Mollick states in On-boarding your AI Intern:
They are weird, somewhat alien interns that work infinitely fast and sometimes lie to make you happy (..) Just like any new worker, you are going to have to learn its strengths and weaknesses; you are going to have to learn to train and work with it; and you are going to have to get a sense of where it is useful and where it is just annoying.
The key takeaway: firsthand experimentation - grounded in a conceptual understanding of the technology [^3] - is essential to develop an intuitive grasp of LLMs' capabilities, limitations, and therefore potential business applications.
Prompt Crafting As The Enabling Technique
I address prompt crafting from a business rather than technology perspective because, the way I see it, it is part of the business-driven interface to the technology. To create value, we need to approach the technology from a business needs perspective - technology as the enabler rather than the driver (aka ‘tech in search of a problem’). And if LLMs are to become our new paradigm for interacting with information, then by extension natural language is the user interface.
There was this time much through the first half of 2023 when LinkedIn was full of listicles of 1-line prompts with headlines in the style of 'XX prompts to save you YYY hours' that received hundreds or even thousands of likes. Luckily, already as early as March or April I had discovered Dave Birss’s CREATE prompting formula as a kind of antidote:
This formula was instructive for me to quickly gain more useful ChatGPT responses than from those simplistic 1-liners. I'm glad these simplistic prompt listicles now seem to have all but disappeared.
Even though “AI prompt engineering isn’t the future”, it is at least currently still a skill that is a differentiator for the value you can get from using GenAI tools. I’ve reflected before on the difference the quality of a prompt can make and have since continued to experiment with prompt crafting, and learn from other’s insights. Clearly, there are "strategies and tactics for getting better results from large language models", and this also still applies to GPT-4. So how you approach and then craft your prompts does matter. From Open AI's guide I find that in particular the strategy of “write clear instructions” always applies. In a sense, you as the human actually need to take the effort of clearly articulating your thoughts before you can expect even an AI to understand. And also the tactics listed in the guide are a good way to achieve this:
There are many more points that can be considered in crafting prompts. In the end, it is down to each one of us to experiment what works best for our own needs.
领英推荐
Getting Hands-On
With GenAI having been launched as a consumer service, rather than first as an enterprise product, this still foremost is a personal productivity tool and with that also the ‘burden’ of learning about its use cases and practical application falls foremost on each one of us.
Exploration and How I Think About Use Cases
In the rapidly evolving landscape of GenAI capabilities, hands-on experimentation is not just beneficial, it's imperative for understanding its potential use. As @Nathan Warren points out, the best way to understand these models really is to personally spend time experimenting. Or, as Ethan Mollick emphasizes, only “using AI will teach you how to use AI” in your specific domain and that you need to be “using AI a lot until you figure out what it is good and bad at”.
I would argue that experimenting with and using GenAI needs to happen in two different modes:
As an example of explorative learning, I noticed that while GPT-4 Turbo was overall good at helping me craft sophisticated prompts, it did get confused about the interaction of the different roles. For example, when asking GPT-4 to write a prompt for an interaction between an AI assistant and a user, the part on ‘who asks whom’ got rather confusing, ending up in the AI telling the human to perform all kinds of writing tasks. So based on this learning, I then adapted my approach to roles-based use cases.
Another practical takeaway is that I often reset chats. The main reason is that the LLM retains the full conversation in its memory as part of its context window. This has two practical effects: first, it increases the cost of inference (by operating on far more tokens); and second, at least sometimes, the LLM may begin losing focus on the task at hand.
Based on the understanding gained through explorative learning, we can then approach practical use cases. When considering potential use cases, I ask myself a few key questions:
To address my use cases, I find it helpful to think in terms of Jobs to be Done, and then to think about what professionals I would like to have on my team to help me achieve the different parts or tasks of this job. Based on this, I view GenAI as empowering my multidisciplinary team, with each professional represented by an AI Persona who will be entrusted with certain tasks. And if a task is defined generically enough, a single persona can be hired to contribute in several jobs.
Use Cases Within Corporate Role
In corporate environments, playful exploration and practical applications face valid constraints. However, they remain possible and essential, with two primary considerations:
Recently, at least within my corporate setting, the second caveat has been mitigated: Only just missing the one-year anniversary of ChatGPT’s public debut, an internal SecureGPT (GPT-3.5 based, chat-interface) has been introduced. This development opens up many new use cases that I will consider exploring.
Earlier in 2023, I discovered a few practical use cases that were value-adding to me – here are two illustrative examples:
Bypass Steep Learning Curve of a New Software Syntax
In May 2023, I was setting up the data preprocessing in an ETL scenario (Extract, Transform, Load) and opted for Power Query in Excel due to various reasons. Despite my familiarity with its potential, I had never used it before. Instead of digging into countless user documentation and forum entries to learn the specific formula syntax of Power Query Editor, I turned to Bing, which was powered by an early GPT-4 version with internet access. I described my goals conceptually and had Bing generate the syntax for me. Although not every formula worked perfectly at first, Bing proved to be a capable debugging assistant. This approach enabled me to accomplish in a few hours what would have otherwise taken me several days of learning and struggling.
The ability to simply write prompts detailing what you want to achieve, and then having AI provide the specifics, is a real enhancement and allows you to focus on the value-creating part of a job. The LLM (Bing, on my iPad, in this case), closed a capability gap for me.
Thinking Partner
GenAI can serve as a thinking partner to iteratively work through a problem. AI in this context takes on more the role of a peer rather than of a tool. An example for this usage was thinking about a potential collaboration contract. I had a few high-level ideas and some alternative approaches I wanted to think through. What worked for me was the following approach:
In conclusion, the AI was not able to do something I wasn’t capable of, and more importantly, it didn’t just give me some perfect solution. The point here is that AI facilitated my own thinking, letting me efficiently explore different ideas while keeping me engaged with sometimes novel or unexpected contributions. This was similar to asking a group of colleague to drop whatever they were doing, and to attend my need of collaboratively thinking through my topic.
Personal Pursuit Use Cases
Unconstrained by a corporate setting in my personal pursuits, I have been using the power of LLMs in a value-adding manner for various use-cases throughout much of 2023.
The absence of constraints, however, implies the lack of common built-in safeguards found in enterprise settings. Consequently, as a user you must adopt a more informed approach to data protection and consent, considering your specific use cases. In practice, this actually means reading the terms of use for any service you subscribe to (you may be surprised by what you accept as legitimate data use) and determining if you trust the provider to adhere to these terms.
In short, the business realities still apply in the shiny new world of GenAI: there is ‘no free lunch’ and either you pay for the product or else you are the product.
My Multidisciplinary AI Team
Within the context of my personal pursuits, I have build a multidisciplinary team which allows us together to produce better results than any one of us on our own [^4]. As a practical illustration, here is a snapshot of my current team:
These AI personas have proven to be valuable, time-saving team members in some of my personal pursuits.
And then obviously there are many more prompts that I might be using for singular tasks, less linked to a persona and used in more diverse settings. An example could be to generate a quick summary for a text. Over time though, I’ve noticed that I start to leverage these prompts into more specialized settings, generally also with better results. At that point I have often created a persona that encapsulates many of the expectations towards the output to leverage a more straightforward prompt. In the example of the summary prompt, this might be used by a journalist, or an academic, and result in a distinct output.
How I Use LLMs Practically
Over this past year, my approach to using large language models (LLMs) has evolved significantly. Initially, I, like probably almost everyone, started off simply using the ChatGPT chat interface. As my explorations expanded, I started experimenting with both ChatGPT and Claude, selecting the best tool for each specific task, or switching when hitting usage limits.
The next phase in my LLM journey involved transitioning to paid API-based usage. There are two primary options: the all-in-one, closed system, as offered by OpenAI, and more open systems that provide access to diverse models from various model makers.
OpenAI offers several alternatives beyond the simple chat interface. For instance, the ChatGPT Pro version grants access to the latest model and features like GPTs. Another useful method is accessing models through an API on a pre-paid, pay-per-use basis. Additionally, the OpenAI Playground provides a testing environment to experiment with different models, system instructions, and hyperparameters for conversations. This is an effective way to understand how your application would interact with the API, though it's worth noting that using the playground consumes your prepaid credit (ha!). The key takeaway is that, depending on your use case, this is an easy way to experiment and learn about the significant potential that lies behind steering model outputs using advanced methods like priming with system instructions and adjusting hyperparameters. This approach can also serve as an alternative way to access GPT-4 or GPT-4 Turbo without subscribing to Plus (though Bing would be a free alternative).
Personally, I've gravitated towards more open systems. During my exploration of applications, I've come to appreciate TypingMind, which offers three key benefits. First, it provides API-based access to a wide array of models, both proprietary and open source, without having to deal with the API from a technical perspective. Secondly, it offers a library for saving characters (essentially the system instructions for the AI persona) and prompts. Lastly, the interface allows for efficient model selection and persona utilization, along with the ability to choose prompts from the library or input new ones during interactions. This proves to be an invaluable interface for leveraging various LLMs in a chat mode while supporting efficient reusability of proven prompts.
On a side note, since OpenAI introduced GPTs last November, I regularly reconsider whether I should switch back to the OpenAI world and subscribe to ChatGPT Plus, primarily for the integration ecosystem it is becoming. Shifting towards a workflow-focused approach, as opposed to the current task-centric one, will likely be the next 'evolutionary' step in my GenAI journey. [^Cent]
Practical Advice for Leveraging LLMs
As we wrap up our exploration of the business implications and practical uses of large language models, I am convinced that now is the moment to adopt a pragmatic, hands-on approach to harnessing these high-potential tools.
To support your efforts in this - in case you have not spent the necessary hours of explorative learning so far -, here is a bit of orientation that should not only serve as actionable advice but also demonstrate the practical use of LLMs in a real-world scenario by adding explanations in ‘behind the scenes’. And, if you have spent the tens of hours necessary to gain this intuitive understanding, I would love to hear your thoughts or additions to this practical advice.
Practical Advice for Leveraging Large Language Models (LLMs)
By following these practical tips, you can collaborate effectively with LLMs and harness their power to generate valuable insights and solutions. Remember, a hands-on approach is essential for success in this rapidly evolving field.
Behind the Scenes of Writing this Advice
How I (with AI-support) generated this practical advice section:
For the Ghostwriter task I used Mistral's Medium model, with a total cost of $0.00662 for 1,062 input and 460 output tokens.
For the third and final article in my series, I'll be transitioning our focus from the practical applications we've discussed, to a broader, societal perspective on the implications of these developments.
Endnotes:
Note: Throughout the writing process, I have utilized LLMs to varying degrees, though any significant contributions are explicitly noted.
[^0]: I used GPT-4 Turbo to craft an image description through several iterations, using the summarized article (see [1]) as basis in the initial prompt. Also, I significantly increased the temperature setting compared to e.g., content editing tasks. Once satisfied with the prompt, I then instructed GPT-4 Turbo to generate the image, triggering the DALL-E 3 plugin responsible for image creation, with:
Generate a 16:9 landscape-oriented image depicting a chessboard with uniquely designed chess pieces of varying sizes to symbolize different AI models. The backdrop should feature a digital sunrise that signifies the rise of AI in knowledge work. In the foreground, a human hand should be fine-tuning a gear that seamlessly integrates into a translucent brain composed of network mesh, representing the strategic development and iterative process of enhancing large language models.
Total cost for the entire process, using a series of models (Mistral’s Medium for summarizing the extensive article, then GPT-4 Turbo to craft an essence-capturing visual description, and DALL-E 3 to actually generate the image), amounted to $0.12 and took me about 10 minutes for thinking and instructing.
[^1]: Article summarize by Mistral’s Medium; unedited.
[^Cent]: Written in a ‘Centaur’ mode: I provide my notes, then my AI ‘Ghostwriter’ persona drafts these rough notes into a coherent text, and finally I do the quality control and necessary edits myself.
[^2]: Google’s Gemini Pro (preview) with 131k context for $0.25 / $0.50 per 1M tokens would be in this tier 1 class, but I have not included it for two reasons: firstly, it is offered as a loss-leader for this preview phase and not for production use, and secondly, my informal non-systematic testing doesn’t have it perform at a tier 1 level, not even comparing favorably with some lower-tier models.
[^3]: I tried to provide a primer on this conceptual understanding of technology that I advocate in my first article of this series.
[^4]: Yes, I admit that this might be my decades of managing and leading teams in a professional setting that shines through ;-) - I firmly believe a team of strong contributors will achieve more than one single star.
[^5]: To get started, I would recommend to explore - beyond the obvious OpenAI ChatGPT - Microsoft's Bing for free access to GPT-4 (though with somewhat modified behavior), and Anthropic's Claude (if accessible) to evaluate its text generation capabilities. By experimenting with these tools, you not only discover the best fit for your needs but also develop a more intuitive understanding of each tool's unique features. This hands-on experience enables you to make informed decisions and leverage AI effectively in your content creation processes.
NSV Mastermind | Enthusiast AI & ML | Architect AI & ML | Architect Solutions AI & ML | AIOps / MLOps / DataOps Dev | Innovator MLOps & DataOps | NLP Aficionado | Unlocking the Power of AI for a Brighter Future??
8 个月Great insights on the future of AI and its applications in business! ?? #generativeAI #Business savvy
Senior Marketing Automation Specialist | Marketing Consultant | ???????? ???????? ???? ?????????????? ???
8 个月Great insights into the business implications of large language models! The future of knowledge work looks promising with generative AI.
START Global | Kickstart Innovation
8 个月Thank you for many pointers to be explored. What I particularly like is the view on cost in relation to size/mightiness. Very soon the suppliers could compete on cost efficiency to build and run models that are 'fit for the task'.