We need to talk about LLM fine-tuning

We need to talk about LLM fine-tuning

Why solving the right problem really matters

In the pursuit of self-driving car technology, we have often framed the problem too narrowly, focusing on the technical challenges of building software that can autonomously navigate a vehicle through crowded streets. However, as we reflect on the slow progress over the past 20 years, it becomes clear that a broader perspective is necessary. The true underlying need is not to build autonomous driving software. It’s to develop cost-effective mechanisms that can safely and reliably transport more people and goods between places so we can free up those humans from the grunt work of operating those transport systems.

When we reframe the problem in this way, it becomes apparent that we already have a fantastic starting point: trains. Trains have been transporting people and goods autonomously for decades, utilizing well-developed technologies and infrastructure. They offer a level of automation, cost-effectiveness, and safety that surpasses what has been achieved with research into self-driving cars thus far.

By raising our focus to the fundamental need of reducing human involvement in complexities of operating transport machines, rather than solely on the technical challenges of autonomous vehicles, new and potentially more promising and cost effective solutions begin to emerge in the minds of product designers.

Moreover, this approach encourages us to think beyond the narrow confines of individual vehicle automation and explore systemic solutions. For example, a comprehensive public transportation network that seamlessly connects trains, buses, and other modes could provide a more holistic and effective solution.

The lesson here is that the way we frame a problem can significantly impact the solutions that emerge. By focusing on the high-level outcome rather than the technical challenges, we can unlock new possibilities and avoid becoming trapped in a narrow, technology-centric mindset.

In the fast-moving world of generative AI, I’ve witnessed a growing fascination with the process of fine-tuning large language models. This technique, which involves adapting a pre-trained model to a specific task or domain, has captivated the imaginations of many AI practitioners. However, there is a risk that this focus on how to perform fine-tuning may be distracting us from the more fundamental challenge of developing AI systems that truly specialize in addressing real-world problems.

Fine-tuning, in its essence, is akin to the classic software customization process. Just as software developers would tweak and modify code to tailor an application to a client's needs, the fine-tuning of language models is an attempt to mold a general-purpose AI system to a specific use case. It's a way to leverage the vast knowledge and capabilities inherent in these large models and refine them for a particular task.

The appeal of fine-tuning is understandable. It offers a paradigm to create specialized AI applications, leveraging the foundations laid by the original model developers. Much like software customization delivers tailored solutions to end-users. After all, no two companies are identical and in enterprise software, there’s no one-size-fits all.

However, this focus on mechanics of fine-tuning is analogous to the focus on the mechanics of a self-driving car. Precious human bandwidth gets sunk into incredibly difficult technical challenges that multiply as they overlap. Meanwhile, our attention drifts away from the desired outcomes for our users.

Specialization goes beyond mere fine-tuning; it involves a deeper understanding of the problem domain, the incorporation of domain-specific knowledge and jargon. This mode of specialization is crucial for delivering truly robust, reliable, and effective AI solutions. I hear this regularly from prospects, “yeah ChatGPT is clever but it doesn’t know how my business works!”

In the world of generative AI, the emphasis on fine-tuning large language models may have inadvertently diverted attention from the more general need for AI systems that exhibit specialized knowledge. When an LLM is fine-tuned too extensively or with insufficient care, it can become hyper-specialized, losing its ability to generalize and adapt to new, unforeseen situations. This overfitting can lead to a model that performs exceptionally well on the specific task it was trained for but quickly breaks down when faced with even minor deviations.

Imagine a fine-tuned LLM designed to generate highly specialized management reports. While it may produce impressive results within the confines of the training data, the model may struggle to adapt when presented with new data formats, terminologies, or contextual information. This lack of robustness can lead to catastrophic failures, where the model generates nonsensical or even harmful output, undermining the very purpose it was designed to serve.

Building software that creates management reports is not difficult. We already know how to extract data from databases (SQL) and process it (modeling scripting, arithmetic/statistics). Just like the best approach to moving people autonomously is not to fixate on the mechanics of self-driving cars, the best approach to democratizing data analytics is not to fixate on the mechanics of fine-tuning. The real problem is we don’t have enough humans who know SQL, advanced statistics and Python syntax! We should remember this genuine need and integrate tried and trusted techniques in smart ways to make the most of the incredible usability innovations now available with generative AI.

In the case of generative AI with analytical workloads, the innovative part isn’t the production of the actual analysis. We don’t need to reinvent statistics or databases! The benefit is enabling non-technologists or non-data scientists to do that work by having the AI do the grunt work like finding the right data in your systems, writing Python to process it and understand the results in the context of what was asked. These are fundamentally UX enhancements, not raw analytics enhancements.

Nobody ever said that large language models are good at calculating statistics reliably or reporting on data in your company. But we already know how to do these things! AI assistants can automate these tasks now.

The scale of the training content and costs invested into building latest generation foundational language models is jaw-dropping and the rate of new releases is dizzying. Even if fine-tuning your own private LLM with your structured or unstructured data was a sensible thing to do (which I believe it almost never will be) the idea that companies will be able to keep pace with the foundational model providers is fanciful. Even the mighty Bloomberg who released BloombergGPT on open source GPT-3 LLM after investing 1.3 million hours of Nvidia A100 GPU time and over $10M in specialization costs have not released a new version between April 2023 and the release of ChatGPT-4 or Claude-3. Meanwhile, researchers at Queens University in Canada studied the performance of ChatGPT-4 in September 2023 against domain-specific fine-tuned models in finance (BloombergGPT, FinBert and FinQANet). They concluded that ChatGPT-4 outperformed all three models on stock market analysis, financial news and investment strategy tasks.

Forget about fine-tuning. The best approach to the challenge of how to get AI to help humans do more and better data analytics work is to provide mechanisms for experts to apply their knowledge, maintain existing enterprise data governance policies so we don’t lose control of our data to AI and make the process and results explainable. This means a fundamental rethink of the whole user experience to leverage the staggering advances in human computer interaction that generative AI provides.





Graeme Condie

Founder Flametree : FinTech : AI : Investment & Data Analysis : Data Science

10 个月

Excellent article Mark. There needs to be some realism injected into the AI journey, there has been one large leap over the recent months. It now needs some well constructed and reliable steps to move the journey along.

要查看或添加评论,请登录

Mark Blakey的更多文章

  • Which team are you on? Bigger is Better vs Less is More…

    Which team are you on? Bigger is Better vs Less is More…

    Discussions about enterprise application of generative AI and LLM often seem to degrade into a false dichotomy that is…

  • When can we expect to see a chatbot that's CFA level 3 qualified?

    When can we expect to see a chatbot that's CFA level 3 qualified?

    Conversational AI systems are becoming commonplace in customer contact centers. They routinely handle basic customer…

  • Why we need more "left brain" AI

    Why we need more "left brain" AI

    Biologists will tell you that this popular model of how humans think has no physical basis. However, the general idea…

    1 条评论
  • What's with all the co-pilots?

    What's with all the co-pilots?

    Microsoft seem to launch a new "co-pilot" every week. There's a bewildering array of them attached to different product…

  • It’s the software, stupid!

    It’s the software, stupid!

    James Carville, Bill Clinton’s 1992 campaign manager famously asserted “it’s the economy, stupid!” while trying to…

    2 条评论
  • A new way to think about the hype cycle

    A new way to think about the hype cycle

    The Gartner Hype Cycle is a remarkably well fitting model for most technology innovations. It describes how people…

  • Getting your coffee eventually

    Getting your coffee eventually

    Traditional relational databases have a very powerful, but extremely expensive, feature : atomic consistency. This…

  • The $1 billion seat

    The $1 billion seat

    A round-trip fare from Chicago O'Hare to Louisville, KY is available for next week on the United web site for a…

    1 条评论
  • Alternative facts are not facts, or are they?

    Alternative facts are not facts, or are they?

    In a classic TV exchange, less than one day into the new government administration in what seems like an age ago now…

    3 条评论
  • What does it take to innovate?

    What does it take to innovate?

    It is widely acknowledged that big companies routinely struggle to innovate. Maybe they are forgetting there’s always…

社区洞察

其他会员也浏览了