Alpaca's Game-Changer: Democratizing AI, Unleashing Innovation, and Redefining the Tech Landscape
The week surrounding PI day 2023 will be forever remembered in history as the pivotal moment when AI took flight, igniting a new era of groundbreaking advancements and transformative applications that forever changed the way we live, work, and interact with technology. Hopefully everyone's heard about GPT-4's multimodal capabilities, Google's generative AI integration in Workspaces, Microsoft's AI-driven enhancements to Microsoft 365, Anthropic's eagerly awaited Claude, and Midjourney's release of version 5 of its text-to-image capabilities. But what likely flew under the radar was the most exciting and game-changing announcement of all: Standford's Alpaca.
Alpaca is a powerful, instruction-following language model similar to ChatGPT-3.5 that can be run on consumer devices (think laptops or your phone). But that's not even the interesting part! What's causing the industry to collectively poop its shorts is the remarkable speed and efficiency in which it was trained. Alpaca was able to train a model comparable to GPT-3.5 for around $600 in 3 hours!
There's a whole bunch we have to unpack to know what the devil is going on here. A few weeks ago, Meta released a series of foundation language models named LLaMA which ranged in size from 7B to 65B billion parameters and exhibit capabilities comparable to the 175B parameter GPT-3 models. It's been known for about a year ("forever" in today's pace!) that smaller models trained with more (and better) data could outperform much larger models. This was explored with DeepMind's Chinchilla. So it's not unexpected that LLaMA's smaller models could be very capable. Small capable models are super exciting since they can be run in constrained hardware cases such as laptops or phones. Imagine a world in which you have a private ChatGPT that has access to all of your personal data and runs entirely in the privacy of your phone.
Foundation models are generic models trained on vast amounts of unlabelled data (think: Wikipedia and books) in an unsupervised way and can be later fine-tuned to perform a broad array of specific tasks. They're incredibly expensive to train both in terms of time and money. For example, it's estimated that GPT-3 cost around $5M and took about 4 months to train. (The rate at which this is dropping is dizzying. ARK Invest estimates that "the cost of AI training is improving at 50x the speed of Moore’s Law". They state that the "training costs of a large language model similar to GPT-3 level performance have plummeted from $4.6 million in 2020 to $450,000 in 2022" and estimate that it would cost around $100 in 2030.) Meta released LLaMA to researchers to alleviate the need for those researchers to start from scratch and train their own foundation model.
But just because you have a very capable foundation model doesn't mean that you can do something useful with it. There's at least a dozen very capable language models out there today (Cohere, AI Labs, BLOOM, etc.). But using them in a manner similar to how you might use ChatGPT is a very frustrating experience. OpenAI in particular has spent the past few years fine-tuning their models with feedback from humans in order to make them (IMHO) useful as a chatbot. All of this additional fine-tuning training data takes a ton of time and human resources to obtain and collate and represents a significant competitive advantage. But what if you could use the machine to train the machine? In other words, what if you used an already-fine-tuned model to generate your fine-tuning data?
That's exactly what Stanford did with Alpaca. They started with only 175 human-written instruction-output pairs and fed it into GPT-3.5 which generated 52k entries (and cost them about $500). They then took this 52k entry training set and fine-tuned the 6B parameter LLaMA model in 3 hours for $100. The result was a model that is effectively indistinguishable from GPT-3.5! Stanford was able to take off-the-shelf parts (a foundation model from Meta and a set of training data generated by OpenAI) and fine-time a competitive model with almost no time or cost.
(Important aside: Meta's LLaMA is not available for commercial use and OpenAI expressly prohibits you from using its tools for training a competitive model. There are other foundation models such as BLOOM that have more permissive licenses.)
领英推荐
(Unimportant aside: Stanford provided us with a delicious pun in their choice of naming. Alpacas are smaller and are often considered more approachable than llamas just as the Alpaca model is more accessible than the LLaMA foundation models.)
It's difficult to get one's head around what this means for the AI landscape but here are some initial thoughts:
Alpaca represents a paradigm shift in AI accessibility paving the way for more decentralized and personalized applications that can be quickly developed and deployed. It's likely that the implications of this research won't be fully understood for some time but one thing is for certain: the AI landscape is about to get a lot more exciting!
Get your teams together and start brainstorming because the world's about to change ... again!
(1,392 tokens)