From gen AI to next-gen AI
For all its successes, both hallucinated and real, gen AI is still architected on foundations with unresolved gaps, design flaws, and inherent weaknesses.? A deeper understanding of these flaws will help us design and build next-gen AI that's more reliable, more accessible, more?manageable -- and a better investment for clients.
From one perspective, there are four main families of issues with gen AI, some of which receive lots of attention and are well-known while others are buried in the rubble of the foundations themselves. A more detailed list of issues with LLMs is covered here .
Engineering Problems
In spite of the growing availability of tools for leveraging existing LLMs, building broad coverage LLMs themselves continues to be a huge challenge, accessible only to the largest organizations.? Pre-training time, cost , and infrastructure are shockingly high. Large-scale batch pre-training leads to issues of data freshness. Fine-tuning existing models and hosting them for low-latency deployments also require lots of development time and costly infrastructure. And because the technology is so new, trial-and-error approaches to development lead to even more time for deployment, along with much higher costs.
One key underlying issue is that the usual gen AI approach depends crucially on models from a very-high dimensional design space .?
Models in gen AI are conceived of in token space: each input token or subtoken, along with its n-way interactions with other tokens, is tracked and modeled.?
Even if we reduce the token space to the 50,000 most frequent ones (and ignore all the other tokens – a common practice), na?ve tracking of only the pair-wise co-occurrences would require 2.5 billion parameters. Three-way, four-way, and higher co-occurrences quickly multiply that to astronomical values.
For these reasons, among others, a significant change is in the works.
Next-gen AI is likely to shift to a smaller, more streamlined concept space for designing and training models, instead of a token space.
Think of it as the creation of a layer of AI "above" LLMs – a cognitive or conceptual layer. One crucial difference is that in a concept space systems will model abstract concepts – not the hundreds or thousands of nearly-synonymous observable string tokens that we might use to communicate each concept. Rather than ingesting random raw text on a mind-boggling scale, we will ingest pre-structured, information-dense concepts structured and defined in knowledge graphs, ontologies, taxonomies, dictionaries, encyclopedias, and glossaries on a much more manageable scale – most likely as a separate "modality" from text – as we do already with images, audio, and video for multimodal LLMs . In this approach, a language model will continue to be an essential and effective add-on that powers the API rather than trying to execute core reasoning processes.
This shift will yield something like a 1000x reduction in the design space, since we can easily find 1000 synonyms or translations (often more!) for the same concept in a large multilingual corpus of text tokens.? Using the same machinery that is available now, we will be able to predict the next concept instead of the next token. At this more compact scale, interactive (even real-time) fine tuning becomes much more feasible, as does detailed domain adaptation – which helps drive down development time and infrastructure costs.? This shift, in turn, will also have major consequences for the other families of issues that plague gen AI , as sketched below.??
Quality Management Problems
Current LLMs – and the systems based on them – face nearly insurmountable challenges of quality management.??
We have few, if any, reliable tools and processes for tracking and troubleshooting issues in training data, in output quality, in compliance, in rights management, in version control, and in modeling dependencies. Relying on human ground truth, i.e., checking only tiny samples by hand, is clearly not enough. We've barely begun to deploy gen AI and the news is already full of stories of LLMs generating responses that create unwanted financial obligations for their creators, of hallucinations that damage their brands, of litigation over rights to training data – and more legislation on the horizon will create a broad range of additional challenges for quality and provenance management.?
On top of that, published training runs and experimental results on gen ai systems are most often not replicable – neither within an organization nor across companies. No one seems to know how reliable gen AIs are or why they work when they do.? This ill-defined and poorly documented trial-and-error development method leads directly to skyrocketing development costs and unpredictable deployment timelines. These are key reasons – mostly tied to unclear ROI – why many companies hesitate to embrace gen AI .
领英推荐
Market pressures will relentlessly force next-gen AI to shift to more transparent and structured quality management – away from current black box models–, with explicit, automatable evaluation criteria and significantly more detailed documentation.?
This change will enable more systematic, predictable development processes and much more reliable "ready for production" decisions. A renewed emphasis on data quality and the shift to concept space models will both enable and accelerate this change by improving transparency, reducing model size, and creating a well-documented conceptual vocabulary for evaluation – and perhaps by making the legal link to copyrighted text less contentious.
User-Control Problems
LLMs are the ultimate API for humans to "drive" AIs – they already do a great job of enabling interaction through messy, variable human language.?
But the prompts that elicit LLM output are very brittle – relying on token models means that LLMs will often produce wildly different responses for queries that are synonymous or functionally equivalent for humans.
And relying on frequency of token mentions for training means that LLM performance is degraded – by design – for "smaller" languages and less-frequently-discussed topics. The pervasive need for extensive prompt engineering documents these weaknesses, increases development costs, and delays roll out of reliable systems – and, I suspect, is a throwaway effort that will see very little later reuse. News of the death of prompt engineering may be exaggerated, but it is already clear to many that it is by no means a sustainable approach.
Next-gen AI will likely be coerced into leveraging concept-space models like those in knowledge graphs and ontologies to "translate" prompts into a more abstract, more reliable form – rather than into a region of similar-token space as in vector databases.?
There is already considerable hard evidence that Knowledge Graphs and ontologies improve every step of the LLM development and deployment process.
Moving to concept spaces – semantic or cognitive layers on top of LLMs – will make LLMs less dependent on frequency of token mentions and on web-scale corpora during training, will enable more transparent quality management, and will ensure less ambiguous, more streamlined user-system communication.?
Conceptual Problems
Today's LLMs are very new additions to our cognitive repertoire, so we haven't yet had many opportunities to think about them carefully and systematically.? As we do consider their foundational assumptions in more detail, we inevitably find mismatches between the marketing hype, the engineers' na?ve assumptions , and the carefully curated concepts from other domains. These mismatches inflate expectations, confuse consumers, disappoint investors, and block the interdisciplinary collaboration that would otherwise accelerate progress.
One of these mismatches is the use of terms like understanding and reasoning to describe the internal operations of LLMs. LLMs were designed and developed to generate natural-sounding sentences and paragraphs – NOT to understand sentences or to reason about the world.? And in fact they do a spectacular job of generating sentences. But naturally, they do not perform well on a wide range of reasoning tasks – in fact, it's surprising that they accidentally (not by design) do produce correct responses for many of them – that's why some people talk about emergent (in this case, unexpected) properties of LLMs and call that "reasoning".?
Among researchers who focus on processes like understanding and reasoning in humans, i.e. cognitive psychologists or cognitive scientists like me, these processes don't happen or even exist without significant involvement of not-directly-observable concepts that are separate from (and have very different characteristics from) the directly observable tokens or words that we use to communicate them.? Understanding , on this view, is the process of creating a mapping from visible tokens (or gestures, or other things) to unobservable concepts – without concepts, there is no understanding .? And reasoning, on this view, is the process of manipulating concepts without consideration of the tokens that we might use to communicate them. Reasoning abstracts away from words to relate concepts directly, compare them, add details or relations, evaluate their coherence or evidential base, etc.? Because LLMs manipulate only tokens and have no identifiable representations of concepts or types of concepts – that is, no ostensible semantics –, claims about how well LLMs "understand" or "reason" create conceptual mismatches that confuse developers, investors, and clients alike. This is not to say the LLMs aren't useful; they clearly are – but most of them don't understand or reason in any technical sense – just like clocks don't "know" the time and calculators don't "know" math.?
Next-gen AI is likely to capitalize on the shift to concept space models to alleviate these conceptual mismatches and make the capabilities and value-add of AI systems more transparent for both developers and other stakeholders.? Once AI systems map reliably between token spaces and concept spaces (a long-standing focus of many natural language understanding researchers), then the parallels with human understanding and reasoning become much clearer. Current efforts to build multimodal LLMs are making good progress in this direction:? image generators like Stable Diffusion and Midjourney model language tokens separately (a token space), pixels separately (a concept space, with concepts represented as patterns of pixels), and the mappings between them, as well.?
The already significant impact of explicit concept stores like knowledge graphs and ontologies at every step in the development and deployment of LLMs is an important indicator of systems to come.
Head of Operations at Ortelius, Transforming Data Complexity into Strategic Insights
5 个月Interesting topic, I interpret the "finger test" as a variant of "where do you store your business logic and business rules that drive decisions?". And the follow up of, which of these rules/logic were used to come up with this recommendation. Do you agree or have I missunderstood?
?? Building @naas.ai, universal data & AI platform to power your everyday business
8 个月Definitely aligned next gen AI is about KG and ontologies
Dynamic Project Manager & Leader | Expert in AI Solutions for Healthcare | Proven Track Record in Academia & Corporate
8 个月Thanks Mike, very interesting articleIt is really striking that Gen Ai is based on Trial and Error development methods. This is quite worrying!
Surendran Sukumaran Balaji D Loganathan
Founder and CEO @EnterpriseWeb
8 个月Mike Dillinger, PhD - Love your methodical build and continued pursuit. Neuro-Symbolic AI is the future, it's just taking awhile for the Neuro folks to catch-up with the "concepts" ; ) I published a related post that attempts to make the case clear to a general business audience to help folks cut through the hype - https://www.dhirubhai.net/feed/update/urn%3Ali%3Aactivity%3A7174476355919638528/