Hallucination-Free, Self-Tuned, Fast Hierarchical LLMs with Multi-Token Embeddings

Hallucination-Free, Self-Tuned, Fast Hierarchical LLMs with Multi-Token Embeddings

Read the full article here, with Python code and free access to the new embedding tables, including the code and documentation to produce them.

Introduction

The new generation of RAG / LLM architecture is moving away from the original monolithic and generic OpenAI model, towards a collection of decentralized and specialized LLMs jointly organized and governed via multi-agent systems.

The benefits are obvious: low latency, smaller tables (one per LLM), faster training and fine-tuning, energy-efficient, better results, with much lower GPU consumption. The number of tokens or weights is dramatically reduced.? If you charge customers by the token as many vendors do, this is another competitive advantage. It also leads to local implementations and secure enterprise solutions augmented with external sources.

My own product, xLLM, is the pioneering solution that ignited this new trend. It offers additional benefits: self-tuning, user-customized, no neural networks and thus even much faster and more frugal in terms of GPU usage. Embeddings is just one of the many backend tables (one per LLM), and not even the most important one.? In particular, it heavily relies on the reconstructed structure found in the crawled repository, especially the taxonomy and related items. The user can select a specific LLM in addition to the standard prompt. A future version will also integrate user prompts as input data for some of the backend tables.? By contrast to deep neural networks, a core feature of xLLM is explainable AI.

So far, nothing new. It has been available as open source with full Python code, written from scratch and well documented, for quite some time: see here. An enterprise version for a fortune 100 company is currently tested, and some advertisers are interested in blending sponsored results along with the organic output delivered to user queries. The parent company is funded and operated by the author of this article.

Multi-token embeddings

The new feature is the introduction, for the first time to my knowledge, of embeddings consisting of multi-token words, rather than single tokens. As one would expect, it leads to better results for the output section based on embeddings. However, the initial goal was to further improve, create, or update the taxonomy tables. It is especially useful when augmenting the corpus with external sources that lack an obvious, easy-to-detect structure.

Dealing with words rather than tokens leads to a combinatorial explosion in the size and number of multi-token embeddings, called x-embeddings. In order to keep these new tables as small as possible while still bringing extra value, special mechanisms are needed.

Interestingly, the very first attempt produced massive backend tables, reminiscent of standard LLMs. There was a lot of noise, indeed mostly noise: useless text elements that are never fetched when creating the output to a user prompt. This noise can potentially result in hallucinations. The reason I mention it is because I believe that the same issue is still present today in standard LLMs based on trillions of weights. Now I solved this problem: xLLM tables are short again, even those that store the x-embeddings.

Learn more, here.

To not miss future updates on this topic and GenAI in general, sign-up to my newsletter,?here. Upon signing-up, you will get a code to access member-only content. There is no cost. The same code gives you a 20% discount on all my eBooks in my eStore,?here.

Mangesh Nijasure

Software Development / Consulting

6 个月

quite interesting. Thanks for sharing Vincent Granville.

Carl W J Davidson

My opinion is I'm smart you should pay attention . Whether you do that's you opinion..

6 个月

Embeddings and Multi-Token Words: Embeddings: These are a way of converting words into a form that a computer can understand. Think of them as a sort of translation from human language to "computer language." Multi-token embeddings: Traditional models break down sentences into single words or pieces (tokens), but xLLM uses multi-token words, meaning it keeps some words or phrases together. This can improve the model's understanding and outputs, especially when integrating information from various sources. Backend Tables: These are like databases for the model, where it stores and retrieves information needed to answer queries. xLLM uses multiple such tables, each corresponding to a different specialized model. The concept here is to make AI more efficient, cost-effective, and adaptable to specific tasks, while also being easier to manage and understand. This approach could be particularly beneficial for businesses that need secure and efficient AI solutions tailored to their specific needs.

Carl W J Davidson

My opinion is I'm smart you should pay attention . Whether you do that's you opinion..

6 个月

A decentralized system called xLLM. Let's break it down Decentralized and Specialized Models: Traditional large language models (LLMs) like those developed by OpenAI are centralized and quite large, which means they require a lot of computing power. The new system, xLLM, is moving away from this by using a collection of smaller, specialized models that work together. This approach can be faster, use less energy, and be more efficient because each small model handles a specific task. Benefits of xLLM: Low latency: It's faster in responding to queries because it doesn't have to process as much information at once. Energy and cost efficiency: Since it uses less computing power (GPU), it's cheaper to run, especially since some companies charge by the amount of data processed (tokens). Local implementations: It can be set up within a company's own infrastructure, making it more secure and potentially more customizable. Explainable AI: Unlike some deep learning models that are often described as "black boxes" because it's hard to understand how they arrive at certain outputs, xLLM is designed to be more transparent in how it works.

Amin Zayeromali

Full Stack Data Scientist @ Factually Health | Data-Driven Leadership | NLP | DL | Python | Django | Revenue Growth | Problem Solver

6 个月

Great! Thank Vincent Granville.

Ken Morimoto

Director Centific | MP at MVP & Leading Edge Ventures | ex-Scale, Amazon | ML Finetuning & Grounding | Community Builder | Infinite Learner & Investor??

6 个月

要查看或添加评论,请登录

社区洞察

其他会员也浏览了