AI Model Management and Lock-In Potentials
Original Post in Medium: AI Model Management and Lock-In Potentials | by Sam Bobo | Jan, 2025 | Medium
Entering the realm of Artificial Intelligence for the first time can be utterly chaotic, especially today’s entrepreneurs! On one hand, foundation model development is fragmenting, with one fork pursuing higher parameter models with (claiming) dwindling informatic resources available on the web (let alone the crusade against IP protection) and the other branch focusing on accelerating compute and slowing model inference response times to permit use of chain of thought (CoT) and other logical reasoning techniques to bolster the models ability to handle complex tasks in a world fascinated by agentic AI. Wait, yes, yet another framework — Agentic AI architectures spinning up that are displacing traditional automated tasks with more natural language processing capabilities, problem decomposition, and multi-agent processing.
Pfew! Where to begin? That’s the exact question I was prompted with as I spoke on a panel discussing the good, the bad, and the ugly of Artificial Intelligence at the inaugural Sarasota.tech conference. Joined by a blockchain x AI enthusiast and an IP Attorney specializing in AI, I enthusiastically shared my perspective on how entrepreneurs can get started with Artificial Intelligence and build businesses using the technology. As I frequently do on these articles, I referenced many of the topics outlined in “Claiming AI Advantages — Signals of Successful AI Companies”
but one of the issues I failed to discuss was simply model strategy. No, I am not referencing the three-layered approach to AI models which I frequently highlight as the grounding (no pun intended) on my personal perception of AI intelligence, rather, one fundamental issue I’ve faced throughout my career: a model’s lifespan.
As I referenced at the beginning of the piece, entering the world of AI can be daunting, almost getting thrown into the fray. In a world where models are commodities, consumers are still wavering on foundation model vendor selection: “Do I go with GPT, the tried and true model or move to Claud where the AI enthusiasts are going?” “Do I need a small language model or large?” “Which version of the model?” “When will the next model be released?” These are all completely valid questions and ones I anticipate all creators will undergo when making an initial decision to start building an AI product. Depending on the position in which one will be building from may dictate the path ahead of time. The phrase “no one for fired for using {company}” surely holds true and is an analogy for corporate IT policies and overall company vendor selections; but what if one is starting from scratch with no loyalties or policies to abide by?
领英推荐
Arguably, making this decision might be a significant one that sets the path forward for the long-term technical debt incurred within your product. Yes, I uttered those words: “technical debt.” Why? Let’s start with a hypothetical. An entrepreneur starts with selecting GPT 4o as the foundation model for their product (keeping the layered approach and knowing that one could simply go to HuggingFace and get a tuned model, which, arguably follows the same pitfall I will arrive at momentarily). They hire data scientists to build a corpus of industry data to fine tune the model for their specific industry as well as target use case, maybe even employ some Reinforcement Learning on top for the finishing touches. The product gets launched and customers flood in! The model is a success! Fast forward maybe 1 year, and OpenAI (or alternative company) releases GPT-5, that absolutely blows away 4o. What happens? You’ve already spent significant capital fine tuning and performing reinforcement learning, yet, the upgrades to the foundation model might be significant enough to warrant upgrading. What happens next? Well, you might as well budget for a new training cycle. Could it be easier to train and require less tuning? Yes while reusing some of your data or employing transfer learning, but maybe not? In a scenario where customers moved from a text-to-speech synthesis model that employed a unit selection algorithm instead of end-to-end neural network generation, the training set required was vastly different by comparison, but required less training as well, quite nuanced!
History sure repeats itself. I’ve spent the last 4 years working with a constrained speech recognition engine that takes grammar files and constrained inputted speech to only what should be recognized. This AI Engine underpinned early Interactive Voice Response systems for decades. Even when Conversational AI capabilities of real-time speech recognition arrived, the talent pool and deployed systems remained steadfast in their way to continue using the constrained speech system. Only when faced with a singularity moment forcing them away from the engine will they consider a change. Yes, I do take this perspective from the enterprise world, but I would argue that AI models function in this way.
A vendor lock-in occurs when one builds a complex system on a stack of capabilities — the technology at the bottom holds the stickiness part of the chain. Truth of the matter is that the more feedback pumped into improving the engines capabilities, the tighter the lock-in as the solution becomes more and more tailored towards your needs. That might not necessarily be a bad outcome, but one that should present some inkling of caution when making an initial architectural decision using AI. Behind these large language models are billions of parameters that become fine tuned in the latent space when applied on top. Moving from, say, GPT-3 to GPT-4 might not be “forwards compatible” where transferring weights from one generation of a model to another comes for free…that simply does not exist in the world of AI today.
As an AI practitioner, there are multiple routes I might take, and I shall provide advise accordingly:
The linchpin in requiring model migration comes at the underlying logic and techniques. Yes, moving from GPT-4 to 4o was a paradigm shift and would completely render previous model training useless. Maybe one might see these models in a new perspective and find completely new use cases for the new capabilities, running two solutions instead of migrating one. I, personally, am simply sharing anecdotes and presenting readers with tradeoff decisions. I’d personally love to hear from practitioners faced with this decision right now!