Almost Timely News: A Deep Dive into Fine Tuning Models (2023-08-27)
Almost Timely News: A Deep Dive into Fine Tuning Models (2023-08-27) ::?View in Browser
Content Authenticity Statement
100% of this newsletter was generated by me, the human. No AI was used to generate the content of this issue.
Watch This Newsletter On YouTube ??
What’s On My Mind: A Deep Dive into Fine Tuning Models
Today, let’s talk about tuning and customizing large language models in generative AI, since I’ve had quite a few calls this past week about this topic, about how to customize large language models for your business. We’ll discuss it in general and talk about your options.
Before we begin, let’s establish one key fact:?there is no non-technical way presently to fine-tune a model. I’m sure there are plenty of vendors who will say they have a flexible, scalable, turnkey system that’s reassuringly expensive, but the reality is that the process from beginning to end is inherently technical in nature. The process of fine-tuning has gotten much easier in the last few years, but it’s by no means as easy as, say, editing a Spotify playlist.
Let me put it in cooking terms. First, what is fine-tuning a model? Fine-tuning is basically modifying a previously cooked dish to better suit your needs. Say you ordered a pepperoni pizza but you got a sausage pizza. You have to figure out a way to remove the sausage and add pepperoni. There is no way to do so that does not involve cooking in some capacity. Sure, some tasks like adding more spices don’t require a LOT of cooking, but you’re still cooking if you’re modifying a cooked dish. In fact, we’ll be using cooking analogies (huge surprise) throughout to explain the fine-tuning process.
There’s a reason why there’s no non-technical way to tune a model, and the reason is pretty simple: when you’re fine-tuning a model, you’re customizing it based on your data, and your data is inherently unique. There are all kinds of gotchas in your data that are not shared by other companies, and thus it’s very difficult to establish a one-size-fits-all or even one-size-fits-most process for fine-tuning.
Think about think about something like HubSpot. Maybe two companies have a HubSpot instance each. You still have customizations, you have custom fields, you have custom this, that, the other thing. And so there’s no one way to say, we’ll just take the standard HubSpot fields, and we’ll use that to train a model.
That’s not going to work out very well for you because of all those customizations, because of the way that even you used certain data like UTM tracking codes, all that’s going to be different from company to company. So you can’t build one size fits all, which means you can’t build a turnkey, non-technical way to do it.
Why would you want to fine-tune a model? The short answer is that you want a large language model that knows about YOU specifically - your data, your information. The use cases for such a model are fairly obvious - you want something that delivers results that are very specific to you. Asking ChatGPT about your company, depending on its size and public footprint, can be a very unsatisfying experience. Asking a tuned model about your company should deliver the results you want.
The applications of fine-tuned models are also fairly obvious. If you’re building a customer chatbot, for example, you would want it to discuss topics that your customers are specifically asking about. You would want that chatbot to have domain knowledge at a level of depth a public model might not have, or perhaps perspectives derived from your proprietary data that public models simply wouldn’t have.
The first thing we have to think through is what the intended outcome is, because that will determine the technical approach you take. The key question to ask is whether or not your large language model implementation needs perfect memory or not. Here’s what this means. There are use cases where you want the model itself to know all the information about a domain, where you want it to be expert in that domain.
In a cooking analogy, you’d want the model to be able to generate pepperoni pizzas of every kind. At any given time, it should have full, complete knowledge of pepperoni pizza without the need to bring in any additional help. It’s the perfect pepperoni pizza baking machine. That’s the perfect memory example.
An example of a good use case for a perfect memory model is an accounting company. You would want that model to have perfect memory of every accounting regulation and what GAAP is and all these things without needing to rely on any outside data. It should just be an accounting whiz. You don’t care if it knows or doesn’t know Johnny Cash lyrics, right? You care that it knows every possible piece of accounting information inside it.
There are other use cases where the model just needs to be able to generate language intelligently, but connect to other data sources - essentially a language interpretation system. This is how Microsoft has done its implementation of GPT-4 with the Bing search engine; when you ask Bing questions through Bing Chat, it’s not asking the model for the knowledge. It’s asking the model to translate our conversations into formatted search queries, then it retrieves the results from the Bing engine and puts it back into the GPT-4 model to format it as a conversational response.
Why would you choose one over another? A perfect memory system is self-contained; you have the model and the interface to the model, and that’s it. It doesn’t need much infrastructure beyond that. This situation is good for answering questions that are conceptual in nature and for facts that are firmly fixed. Let’s say you’re an insurance company, and you train a foundation model on all the questions and answers that customers normally ask about your policies. That’s a great use case for a perfect memory model, because your policies probably don’t change from day to day.
A language interpretation system is useful for when you have a lot of data flowing into a system that’s rapidly changing. It needs a lot more infrastructure around it, but its data is fresh and the foundation model doesn’t need nearly as much training to succeed in its tasks. A good example of this would be a system that answered questions about stock prices, weather, or other fast-changing data.
There are advantages and disadvantages to each. Perfect memory models have higher compute costs up front, but lower compute costs in operation. However, they take longer to get up and running, and the information in them gets stale pretty quickly. Again, for stuff that doesn’t change often, that’s okay. Language interpretation systems have lower compute costs up front because you’re not changing much of the foundation model, but they have higher compute costs in the long run as they require more horsepower to connect and process data. They have bigger infrastructure footprints, too, and the operational cost of constantly bringing in new data.
So, once you have a general idea of what kind of model and system you’re going to need, the next step is to start laying out the system architecture. One of the biggest mistakes I see vendors make is not having any kind of abstraction layer in their software. What is an abstraction layer? It’s a layer of technology that you create so that the underlying model is insulated from the rest of your infrastructure. Why? Because language models are evolving so quickly that tying yourself to one specific model creates substantial risk, risk that the model you build directly on becomes outdated immediately.
I was at the MAICON conference about a month ago in Cleveland. I talked to some of the vendors and asked them about the backend architecture and other things. After the beer was being served, people were saying, “Yeah, we built on GPT-3.” That’s a three-year-old model that is nowhere near best in class anymore for many of these tasks. However, they had spent so much time and effort building right into the model, instead of creating an abstraction layer. This means that they physically cannot pick up GPT-3 and put GPT-4 in. They can’t do it. As a result, they’re stuck. Their products are stuck. This means that they have aged out really quickly and cannot keep up with more agile competitors.
After you’ve figured out the system architecture, you now have to tackle what is the most difficult, time-consuming, challenging, and arduous part of fine-tuning a language model: your data. You see, you can’t just gather up a pile of random documents and put them into a model any more than you can just take big piles of random ingredients, drop them into a stand mixer, and hope you end up with pizza dough. That’s literally a recipe for failure.
The same is true for large language model tuning. With perfect memory systems, you have to build your datasets in a compatible fine-tuning format (there are a number of different standards based on the model you use). Here’s an example of what that sort of data tends to look like:
Prompt: What are the names of the Trust Insights founders? Response: Katie Robbert and Christopher Penn Prompt: What year was Trust Insights founded? Response: 2017 Prompt: What is the Trust Insights website domain? Response: TrustInsights.ai
You can see that it’s basically questions and answers, at least for a basic training set for a chat-style model. Now, consider how much data you have that you’d want to train a model on, and the effort it will take to create the necessary training data, and you start to understand why this is such a herculean task, why it takes so long to build a fine-tuning dataset.
If you’re using a language interpretation system, then you need to take the same training data and format it for the underlying database that powers language interpretation systems. These specialized databases, known as vector databases, have their own data formats which necessitate converting your training data.
Finally, we can start to talk about the fine tuning process. There are a variety of ways to implement the fine-tuning system. A full tune is where you take your data and re-weight the entire model with it. Think of this like ordering a pizza and it’s the wrong flavor, has the wrong toppings. You’d go back into the kitchen with the right ingredients and essentially make a new pizza from scratch. This is the old-fashioned process that isn’t used much these days for model tuning (though it is for doing things like model merges, which is a topic for another time).
There are advanced fine-tuning methods like low-rank adapters, or LoRa, which add a layer of new model weights on top of a foundation model. Think of LoRa like ordering a pizza, and it’s got the wrong toppings. Instead of sending the pizza back, you get out a fork and you scrape off the cheese and toppings, then put the toppings you want on the pizza, some replacement cheese, and you pop it in the oven for a couple of minutes. That’s effectively what LoRa does - it lets you replace some of the data in a model with the weights of your choice.
Finally, for the language interpretation system, you’ll need to install a specialized vector database like Weaviate, ChromaDB, or Pinecone, then convert your data into the database’s embeddings format. Once you’ve done that, you connect to your database through a utility system like Langchain, and you can begin to converse with your data as it streams into the database.
As I said at the beginning of this note, there’s no way to do this process that’s non-technical. Every approach has varying levels of technical skill that’s required, along with a fair amount of infrastructure. Despite all the technobabble about the implementation, the hardest part really is gathering and formatting the data you want to use to fine-tune a model, because most of the time, the data in our organizations is a hot mess. Without the necessary ingredients, the technical parts don’t matter.
Got a Question? Hit Reply
I do actually read the replies.
Share With a Friend or Colleague
If you enjoy this newsletter and want to share it with a friend/colleague, please do. Send this URL to your friend/colleague:
ICYMI: In Case You Missed it
Besides the newly-refreshed?Google Analytics 4 course I’m relentlessly promoting (sorry not sorry), I recommend the bakeoff we did with five generative AI large language models this week - Claude 2, ChatGPT with GPT-4, Microsoft Bing, Google Bard, and LM Studio with the MythoMax L2 model.
Skill Up With Classes
These are just a few of the classes I have available over at the Trust Insights website that you can take.
Premium
Free
领英推荐
Advertisement: Bring My AI Talk To Your Company
I’ve been lecturing a lot on large language models and generative AI (think ChatGPT) lately, and inevitably, there’s far more material than time permits at a regular conference keynote. There’s a lot more value to be unlocked - and that value can be unlocked by bringing me in to speak at your company. In a customized version of my AI keynote talk, delivered either in-person or virtually, we’ll cover all the high points of the talk, but specific to your industry, and critically, offer a ton of time to answer your specific questions that you might not feel comfortable asking in a public forum.
Here’s what one participant said after a working session at one of the world’s biggest consulting firms:
“No kidding, this was the best hour of learning or knowledge-sharing I’ve had in my years at the Firm. Chris’ expertise and context-setting was super-thought provoking and perfectly delivered. I was side-slacking teammates throughout the session to share insights and ideas. Very energizing and highly practical! Thanks so much for putting it together!”
Pricing begins at US$7,500 and will vary significantly based on whether it’s in person or not, and how much time you need to get the most value from the experience.
Get Back to Work
Folks who post jobs in the free?Analytics for Marketers Slack community?may have those jobs shared here, too. If you’re looking for work, check out these recent open positions, and check out the Slack group for the comprehensive list.
What I’m Reading: Your Stuff
Let’s look at the most interesting content from around the web on topics you care about, some of which you might have even written.
Social Media Marketing
Media and Content
SEO, Google, and Paid Media
Advertisement: Business Cameos
If you’re familiar with the Cameo system - where people hire well-known folks for short video clips - then you’ll totally get Thinkers One. Created by my friend Mitch Joel, Thinkers One lets you connect with the biggest thinkers for short videos on topics you care about. I’ve got a whole slew of Thinkers One Cameo-style topics for video clips you can use at internal company meetings, events, or even just for yourself. Want me to tell your boss that you need to be paying attention to generative AI right now?
Tools, Machine Learning, and AI
Analytics, Stats, and Data Science
All Things IBM
Dealer’s Choice : Random Stuff
How to Stay in Touch
Let’s make sure we’re connected in the places it suits you best. Here’s where you can find different content:
Advertisement: Ukraine ???? Humanitarian Fund
The war to free Ukraine continues. If you’d like to support humanitarian efforts in Ukraine, the Ukrainian government has set up a special portal, United24, to help make contributing easy. The effort to free Ukraine from Russia’s illegal invasion needs our ongoing support.
Events I’ll Be At
Here’s where I’m speaking and attending. Say hi if you’re at an event also:
Events marked with a physical location may become virtual if conditions and safety warrant it.
If you’re an event organizer, let me help your event shine.?Visit my speaking page for more details.
Can’t be at an event? Stop by my private Slack group instead,?Analytics for Marketers.
Required Disclosures
Events with links have purchased sponsorships in this newsletter and as a result, I receive direct financial compensation for promoting them.
Advertisements in this newsletter have paid to be promoted, and as a result, I receive direct financial compensation for promoting them.
My company, Trust Insights, maintains business partnerships with companies including, but not limited to, IBM, Cisco Systems, Amazon, Talkwalker, MarketingProfs, MarketMuse, Agorapulse, Hubspot, Informa, Demandbase, The Marketing AI Institute, and others.?While links shared from partners are not explicit endorsements, nor do they directly financially benefit Trust Insights, a commercial relationship exists for which Trust Insights may receive indirect financial benefit, and thus I may receive indirect financial benefit from them as well.
Thank You
Thanks for subscribing and reading this far. I appreciate it. As always, thank you for your support, your attention, and your kindness.
See you next week,
Christopher S. Penn
Chief Strategy Officer (CSO) / SVP of AI
1 年Christopher Penn the good thing now is that its an engineering problem vs a science problem in most cases making it accessible to a pretty wide range of technical folks. I've found the people that grok it the quickest are senior devops / software engineering folks. Excited to see all the cool things that come out of the various fine tunes especially the ones that combine the efficiency of quantization with domain specialist knowledge to create a network of "Super-senses" for various enterprise applications.