The best LLM? The platform
The best LLM today no longer matters.
ChatGPT went viral in November 2022 and captured the world’s imagination. OpenAI had kick-started a rush to harness Generative AI (GenAI) and Large Language Models (LLMs).
Organisations want the best. Whether prioritising quality, price or latency, OpenAI was seen as the model provider to work with. Whilst others (like Anthropic’s Claude and Google’s Gemini) quickly narrowed the gap - some might argue pulling ahead - their similar proprietary approach meant many organisations saw insufficient benefit in changing tack.
Meta’s Llama 3 changed everything.?
For the first time, an open-weight model became a top contender. Even now, Meta’s top model (Llama 3.1 405B) is riding high and proves Meta’s ability to keep pace.
For all Llama’s technical accomplishments, its biggest impact was shattering the collective narrative of OpenAI’s continuing dominance. Organisations worldwide began questioning whether unconditionally hitching themselves to OpenAI’s mast was wise in this new world of multiple top-tier LLMs with alternative commercial models.
Yet one thing did not shift. Organisations still want the best - whether quality, price or latency. But the best LLM now changes with nearly every release. Almost month to month.
That change frequency is critical. Organisations don’t just need the best model today. They need the ability to change to the new best model tomorrow. Quickly and efficiently.
LangChain has emerged as the leading framework for LLM development (although there are others, for example LlamaIndex). It abstracts calling specific LLMs, allowing (in theory) rapid changing of one LLM for another. Without reimplementing all the API calls to the LLM throughout your application.
Using a framework like this has some drawbacks. First, initial prototyping may be slower - as developers can just call the native LLM APIs directly. But without a framework, changing LLM later - as mentioned - becomes much harder. Not desirable in a deployed application. Second, any framework adds overhead - usually impacting performance. However, LLM latency is still relatively high (hundreds of milliseconds even for first chunks) compared with code. So the proportionate impact should be negligible (for a well-implemented framework). Overall, this loose coupling is the preferred approach.
领英推荐
It is early days for LLM frameworks. An abstraction standard may yet emerge. For LLM providers to expose both a proprietary API (perhaps via SDK) and a standardised driver. There is precedent - both direct access (to expose all functionality) and standard JDBC drivers are developed for databases. LangChain has co-maintained packages with LLM partners, which (whilst not an official standard) demonstrates a trend in that direction.
But making it easy to change LLM is far from sufficient. That change must be refined, evaluated and deployed. Fortunately, expertise from DevOps, which evolved into MLOps and now encompasses MLOps for GenAI (LLMOps) is at hand.
I won’t cover LLMOps in detail. For sophisticated organisations, it includes; LLM selection, generating embeddings & grounding, fine tuning, prompt engineering & management, reranking, model evaluation, model registration & versioning, cost analysis, model deployment, continuous monitoring (topic drift, embedding drift) and more. All focussed on getting beneficial changes into production quickly and safely.
Organisations need appropriate tooling for LLMOps. Given I work at Google, this may seem like an advert for Vertex AI (our platform for LLMOps and more). It’s not. To blow our own trumpet, we have innovative capabilities, like AutoSxS for LLM evaluations using the autorater technique leveraged within Google. But you should harness whichever LLMOps platform is appropriate for your organisation.
For developers playing around with LLMs, an LLMOps platform isn’t necessary. But when deploying LLMs into production - particularly for high-impact purposes - an LLMOps platform becomes essential. Sure, your first deployment might be slightly delayed whilst establishing your strong foundation. But every release, iteration and change - including the latest and best LLMs - from then on will be better, quicker and safer.
And if your existing MLOps platform does not handle LLMOps, then it’s time for a change. After all, whilst the pace of change in LLMs is breathtaking, your choice of LLMOps platform should remain comparably constant. Without one, you are likely constraining yourself to your existing LLMs, due to an inability to handle rapid productionisation of better options.
To summarise; if you believe (like me) better models will emerge with high frequency and their improvements will meaningfully benefit your organisation, you must invest in an LLMOps platform (and probably LLM framework) to deploy those better models. If not, others will and your state-of-the-art application will soon become obsolete, forever lagging behind faster-moving alternatives.
The best LLM today no longer matters. Speed of adoption of the best LLM tomorrow does.
The article brings a profound perspective on the rapidly evolving landscape of Large Language Models (LLMs) and the need for organizations to pivot from focusing solely on the best model today to ensuring they can quickly adopt the best model tomorrow. With Meta’s Llama 3 challenging OpenAI's dominance, the narrative has shifted from a single model’s supremacy to a dynamic environment where multiple top-tier LLMs compete for relevance. The key takeaway is clear: in this fast-paced field, the ability to swiftly and efficiently integrate the latest and most advanced models is more critical than ever. The discussion around frameworks like LangChain highlights the importance of abstraction layers that facilitate seamless switching between LLMs without the heavy lifting of re-implementation. While such frameworks might introduce initial overhead, their long-term benefits in maintaining flexibility and competitiveness are undeniable. In an era where LLMs evolve with breathtaking speed, the lesson is simple but powerful: agility in adopting the best model matters far more than the specific model you use today.
Head of Google Data Analytics / BI / GenAI for Manufacturing across Germany
3 个月Many clients I work with take a two-prong approach - strategic and agile. Strategic in terms of planning and thinking out what the impact of AI will have on management, organizationally, their business, the expected innovation, and ultimately the business value. Agile in terms of, getting their hands on the Vertex AI Platform and actually starting to use and experiment with LLMs, to get hands-on experience. Then they are applying their learnings back into their strategy planning.