Charting GenAI's Course with a Tech-First, Multidisciplinary Approach
Image description by Claude 2.1 based on article text; image generated with midjourneyai.ai [^0]

Charting GenAI's Course with a Tech-First, Multidisciplinary Approach

TL;DR

The article, which chronicles the author's learning journey with generative AI over the past year since ChatGPT's viral launch, emphasizes the need to approach GenAI as a multifaceted phenomenon by applying the three lenses of Technology, Business, and Society. The article provides background on the decades-long development of AI, explaining how large language models represent a new "fourth wave". The piece outlines the differences between concepts like AI, machine learning, LLMs, and ChatGPT. It details the factors driving rapid advancements - algorithms, data, and compute power. The author emphasizes the need to conceptually grasp the technology when identifying business opportunities. Key technical concepts like models, prompts, and hyperparameters are introduced.[^1]

The next article in this series will cover the Business perspective.


Buckle up—the pace of change shows no signs of slowing after the launch of ChatGPT in late 2022 marked an inflection point for public understanding of artificial intelligence's potential. ChatGPT achieved viral user growth, as the fastest consumer app to, in just 2 months, reach 100M users. And a year in, it is still going strong with some 100M weekly active users.

I was not one of the first (few million) to get on board with ChatGPT; it was around year-end 2022 that this ‘viral thing’ caught my attention in earnest. Viewing the initial hype through Amara’s Law—whereby people tend to overestimate short-term effects and underestimate long-term impacts—I approached this new technology with cautious optimism. The understanding I had gained from dabbling in the business application of Data Analytics and Machine Learning (or what some now call “classic AI”) in 2018/2019 [^2] equipped me to recognize that ChatGPT was more than just a fun tool or a tech fad.I perceived ChatGPT as potentially transformative—a technology ripe with value-creation potential—prompting me to delve into it as an extended learning journey.

Why am I writing this article and sharing insights from this first-year journey with generative AI? I am convinced that articulating learning —not merely accumulating information—helps further solidify understanding, which is then refined through your reactions and discussions.

I have always considered myself a generalist, with a multi-disciplinary approach to new topics and to connecting the dots. With this ‘bias’, I strongly advocate the need to approach GenAI as a multifaceted phenomenon and with a willingness for ongoing learning, especially for a topic that is evolving as rapidly as it is. To understand GenAI and its implications, I like to apply the three lenses of Technology, Business, and Society, or - to use Design Thinking terminology - Feasibility, Viability, and Desirability respectively.

Philipp Masefield (Nov. 2023)

The exploration of these three interdependent lenses is an interdependent and iterative process, yet there's a logical sequence to navigate:

  1. Technology: Conceptually grasping its essence and potential allows us to infer implications for...
  2. Business opportunities: Identifying how the technology's strengths can be leveraged, in practical experimentation and use cases, while also weighing potential weaknesses, opportunities, and risks leads to...
  3. Society: Evaluating the technology's societal impact fosters an informed dialogue about its desirability, acceptability, and required safeguards.

I am also structuring my writing about my learning journey according to these lenses, starting with Technology for the rest of this article.


Technology

With the intention of creating business value, which generally is my point-of-view, I see the need for a conceptual understanding of technology, rather than an in-depth technical expertise. This is necessary to identify the opportunities that the technological advances enable, to understand the challenges, to be realistic about the limitations, and — in a technology as rapidly evolving as Gen AI — also to have an appreciation of the advances that might bring new opportunities and shift or overcome current limitations. Or, to put it another way: "While not everyone needs to know the technical details, they should understand what the technology does and what it can and cannot do".


What Is GenAI

Defining Large Language Models

NVIDIA, clearly one of the big winners in the AI-race so far, defines Large Language Models (LLMs) as “deep learning algorithms that can recognize, summarize, translate, predict, and generate content using very large datasets”. Or Gartner’s glossary definition explains LLMs as “a specialized type of artificial intelligence (AI) that has been trained on vast amounts of text to understand existing content and generate original content”. Both are maybe a bit too short to be informative — AI might do better: Asking Perplexity.ai (with this prompt) for a definition of LLMs yields a more illuminating definition:

“Large Language Models (LLMs) are advanced artificial intelligence systems that have been trained on extensive text data to understand and generate human-like text. These models, which can have billions of parameters, are capable of remarkable generalization abilities, allowing them to provide plausible responses to a wide array of prompts, even in a zero-shot context where they haven't been specifically trained for the task at hand. They can be used in various applications, from answering questions and writing essays to aiding in planning problems. However, despite their impressive capabilities, LLMs also have limitations, such as the potential for generating inaccurate or misleading information, also known as "hallucination". Recent research has focused on improving these models, including methods for editing their behavior within specific domains, and techniques for controlling their outputs”

GenAI, LLM, GPT - Not All The Same Thing

AI, GenAI, LLM, ChatGPT, GPT-4 are not all the same. Sometimes it just helps (at least me) to sketch things out:

Artificial Intelligence (AI) is the broadest category, even covering almost forgotten approaches such as expert systems. MIT’s Professor Malone provides an intuitive definition for AI as “machines acting in ways that seem intelligent”.

Machine Learning (ML) refers to computer programs that learn from experience rather than relying solely on explicit programming to produce solutions. The main types of machine learning are Supervised Learning (which uses labeled data to map inputs to outputs), Unsupervised Learning (which identifies patterns for uses like recommendation systems or customer segmentation), and Reinforcement Learning (which relies on reward feedback).

Deep Learning is based on neural network architectures that loosely imitate brain structures, with ‘deep’ denoting the many layers within these networks.

Generative AI (GenAI), enabled by Foundation Models that allow broad application beyond narrow machine learning tasks, encompasses multi-modal generation of new content, which can be sound, images, or language. Well-known products to generate images would be Stable Diffusion, Midjourney, or DALL-E.

Large Language Models (LLMs) are a type of Generative AI specifically for Natural Language Processing tasks. The dominant architecture is the Transformer, as revealed in Google’s groundbreaking "Attention is All You Need" paper. And then of course there is ChatGPT (as the product), that was the viral awakening of us all to the world of GenAI. ChatGPT is based on the GPT-3.5 model series, or on GPT-4 with the Plus subscription. Beyond OpenAI's GPT-x models, there are other proprietary LLMs like Anthropic's Claude (currently at Claude 2.1) or Google's Gemini series. There are also open source options such as Meta's LLaMA or Mistral's Mixtral - just to name two of the most prominent models.


Throughout this article series, I will (try to) be deliberate in my terminology—generative AI, large language models, and ChatGPT are not interchangeable concepts. As a disclaimer, over this past year my focus has been almost exclusively on LLMs, as I see the most potential and relevance in that specific area of GenAI both for my work and interests.


Generative AI Is Not An Overnight Success

Generative AI has recently captivated not only the tech community, surpassing many expectations with its rapid advancements. Yet, it is important to understand that it represents not a sudden leap but the culmination of decades of developments. The current capabilities of AI might even be seen as a somewhat predictable development, in particular considering the 10X increase year-on-year of available compute over nearly a decade.

The field of Artificial Intelligence really started already in the 1950s with the introduction of the Turing Test, coining the term ‘artificial intelligence’, and the first (very primitive) machine learning from data with artificial neural networks. A MIT Sloan course on Artificial Intelligence delineates the subsequent development of AI in waves:

MIT’s late Professor Winston talked about a fourth wave that would go beyond the impressive machine learning advances of perception and recognition, and be about systems that would be more like us with cognition and reasoning. Large Language Models (LLM) could be seen as this fourth wave of AI. Mustafa Suleyman outlines and explains this development in his book The Coming Wave: In the mid-2010s, AI's leap forward was fueled by supervised deep learning, which relies on models learning from labeled data. The accuracy of AI predictions is often contingent on the quality of these labels. However, large language models (LLMs) represent a paradigm shift by successfully training on unstructured, real-world text, rendering the vast corpus of internet text a valuable resource. The 2017 Google research paper "Attention is All You Need" proposed the Transformers network architecture with attention mechanisms that laid the foundation for the revolution in LLMs. Since then, Transformers have been the driving force behind the rapid advancements. The acronym "GPT" stands for "Generative Pre-Trained Transformer", and OpenAI has since been at the forefront of this development, setting key achievements with their releases:

  • GPT-1 emerged in mid-2018 as the inaugural Generative Pre-Trained Transformer model, boasting 117 million parameters. It demonstrated the potential of unsupervised learning to tackle complex language understanding tasks.
  • GPT-2 in early 2019 with a massively larger 1.5 billion parameter model, significantly enhancing its text generation prowess.
  • GPT-3 was another leap with 175 billion parameters for very advanced text-generation in mid 2020. Suleyman reflects on the release of GPT-3 as a pivotal moment when “people started to truly grasp the magnitude of what was happening”. The subsequent public release of ChatGPT (based on GPT-3.5) in November 2022 brought LLMs into mainstream awareness as ChatGPT achieved viral status.
  • The launch of GPT-4 in March 2023 showed yet another impressive advance in capabilities and is rumored to have 1.7 trillion parameters (though with an architectural change of using a combination of models (“Mixture of Experts) rather than one single “dense” model).

Within a few years, there has been an explosion of the capabilities of Large Language models, considering, that, as Suleyman notes, “it wasn’t long ago that processing natural language seemed too complex, too varied, too nuanced for modern AI.”


How Do LLMs Work?

There are countless explanations of how Large Language Models (LLMs) work, from the overly succinct “next word prediction” to highly technical and lengthy in-depth explanations. Here an attempt to provide a useful summary explanation [^Cent],[^3]:

LLMs are a class of deep learning models that have been trained on vast datasets of textual data, allowing them to develop a statistical understanding of language.

At their core, LLMs convert text into numeric representations that capture semantic meaning. Words are split into common groups of characters known as "tokens" that frequently appear together, like words or punctuation. Each token is then assigned a vector representation, with tokens of similar meaning placed close together in a high-dimensional "word embedding" space. This allows LLMs to discern linguistic relationships and nuances.

LLMs utilize a transformer architecture composed of encoder and decoder components. The encoder maps input text into the model's word vector space. Layers within the model then update these representations by exchanging contextually relevant information between tokens using an attention mechanism. This allows the model to construct meaning from the entire context rather than processing inputs sequentially.

After processing the input, LLMs are able to generate new text autoregressively, predicting the most likely next token at each step based on the previous tokens. With sufficient data and compute power, this statistical language modeling allows LLMs to reach high levels of coherence and even display emergent abilities like reasoning and summarization.

During training, LLMs are fed vast datasets of text and tasked with predicting masked words, learning associations between related concepts that may be distant within passages. Models are then fine-tuned on specialized datasets to optimize performance on specific tasks like translation or question answering.


Ongoing Advances

As I have pointed out in a previous post, the rapidly improving performance of LLMs has been and continues to be driven by a combination of three factors:

?? Algorithm or model sophistication is driven by talent, primarily in industry labs. There is also a counterintuitive reality around algorithms, as a MIT AI course reveals: As algorithms grow in size and complexity, their abilities expand. Deep learning algorithms, wielding tens of millions of parameters, defy expectations by becoming more proficient learners as they grow more complex. These sophisticated architectures enable a standardized method for processing varied data types by transforming inputs (whether words, images, or other types) into vector representations, simplifying the transformation of information across forms. This feature highlights the increasing versatility and complexity of machine learning models in deciphering and converting different data formats.[^Cent]

?? Data, which means harvesting vast amounts of internet data, enriched by specialized datasets. A development that has lead to the vast amounts of data available is, as Mustafa Suleyman points out in The Coming Wave, that “software has eaten the world” and that this means that there is data on almost anything which can now serve to train and improve AI systems.

??? Compute, requiring access to the most advanced computational resources and also the deep-pockets to finance this, which is motivated by the promise of outsized economic returns. An important consideration here is that access to compute can become a limiting factor for ongoing innovation. This is an important concern which has already been raised some time ago, as for example that even “exceptionally endowed university like Stanford can’t afford” the access needed to make significant contributions to the research agenda.

OpenAI's 2020 paper reveals a power-law relationship between language model accuracy and scaling, with performance improving as model size, dataset size, and compute resources increase. This trend held true over seven orders of magnitude, suggesting that ‘the larger, the better’. And considering the rapid AI advancements demonstrated over the past year, we seem to be in the steep part of the S-curve, with "a couple to a few more years of the exponential phase left to run". What further technological breakthroughs might we anticipate at this rate?

Even if AI advancements stagnate at the current state of the art, Ethan Mollick asserts that "there is a lot of juice left in GPT-4". This implies significant untapped business value even from today's available capabilities.


Key Concepts To Understand

There are a few technical concepts or terms that need to be understood to effectively work as a business person with LLMs. Some of the terms I’ve encountered in my ongoing experimentation, which make a difference for certain use cases, are the following:

  • Models, as mentioned in the previous section, are different ‘brains’ you can use, some potentially smarter or better at specific tasks than others. Beyond OpenAI’s offerings (GPT-3.5, GPT-4, GPT-4 Turbo, etc.), there are options such as Anthropic’s Claude, Google’s Gemini, and numerous open-source alternatives. There are a few things to consider here, especially from a business perspective (more on that later).
  • Prompts are requests given to a LLM and serve as the primary mode of interaction. When crafting prompts, it can be beneficial to distinguish between general user prompts and initial system instructions, serving as a model’s priming.
  • Hyperparameters, of which “temperature” is arguably the most significant, act as “a creativity or randomness input slider". A low temperature results in a more predictable output, while a high temperature fosters creativity or randomness. Furthermore, OpenAI and other models include additional parameters such as "top p", "frequency penalty", and "presence penalty". In essence, these hyperparameters shape the model's text generation behavior.
  • Tokens are a measure of text quantity, where approximately 100 tokens represent 75 words. Tokens determine the price paid for inference. It’s worth noting that various model providers price input and output tokens differently. Input tokens represent data fed into the model, while output tokens are the model-generated output.
  • Context window, finally, is like the memory or ‘mental capacity’ of a model. A standard 4k context window allows for 4,000 tokens to be held in memory, covering the entire chat interaction with the initial input, the models response, and any further interactions. Working with extensive text and multiple interactions with the LLM may lead to the model “forgetting” earlier queries or deviating from the topic, indicating an exceeded context window.


In my second article, I'll explore the business perspective, sharing my views on market dynamics, the impact of GenAI on knowledge work, and the significance of practical, hands-on experimentation with use cases to create value - building on the conceptual understanding covered in the current article. My third article will delve into societal considerations, shedding light on some broader implications.



Endnotes:

Throughout the writing process, I have utilized LLMs to varying degrees, though any significant contributions are explicitly noted.

[^0]: Image description suggested by Anthropic’s Claude 2.1 based on the article’s text: "A brain with lightbulbs flashing above it, symbolizing the statistical understanding of language that enables large language models to generate human-like text. Technical concepts like neural networks and computational power are subtly indicated around the brain to capture the technological essence."

[^1]: Article summarize by Anthropic’s Claude 2.1, integrating an additional aspect with Mistral’s Mixtral 8x7B, and some final edits by me.

[^2]: While taking a course or two in Data Science, my curiosity shifted towards Machine Learning, particularly after completing @Andrew Ng's "AI for Everyone" course and Machine Learning Yearning. With this newly gained understanding, I became intrigued by the potential of leveraging advanced analytics and Machine Learning to address a significant business challenge. My hypothesis was that the migration of a legacy Life insurance book could be reframed as a data problem, solvable more efficiently through cutting-edge analytics and Machine Learning, as opposed to the traditional, hard-coded big-bang ETL approach. And yes, I managed to substantiate this hypothesis with a consulting partner through a successful Proof of Concept in 2020 (and just recently this massive migration has been completed).

[^3]: Based on my notes from the following sources:

[^Cent]: Written in a ‘Centaur’ mode: I provide my notes, then my AI ‘Ghostwriter’ persona drafts these rough notes into a coherent text, and finally I do the quality control and minor edits myself. (More on this in my next article.)

Your insightful approach to generative AI highlights the importance of a comprehensive understanding across various disciplines. ?? By delving into the technicalities and potential business applications, you're setting a strong foundation for leveraging AI in transformative ways. ???? Let's take your journey a step further by exploring how generative AI can enhance your work quality and efficiency. ?? Imagine the possibilities when these technologies are applied to your current projects! I'd love to discuss how generative AI can specifically benefit your endeavors. Book a call with us to unlock new opportunities and streamline your processes with AI. ?? Cindy ??

回复
Mohsene Chelirem

Arabic Localization QA (LocQA | QA tester) | ex-Apple | Multilingual Expert in Localization Quality Assurance | Polyglot: Arabic, French, Italian, English

9 个月

Impressive journey! What societal impacts of generative AI have you discovered along the way?

回复
Alex Carey

AI Speaker & Consultant | Helping Organizations Navigate the AI Revolution | Generated $50M+ Revenue | Talks about #AI #ChatGPT #B2B #Marketing #Outbound

9 个月

Count me in for this fascinating exploration of generative AI!

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了