登录查看更多内容

Hallucination-Free, Self-Tuned, Fast Hierarchical LLMs with Multi-Token Embeddings

Vincent Granville

AI/LLM Disruptive Leader | GenAI Tech Lab

发布日期: 2024年4月12日

Read the full article here, with Python code and free access to the new embedding tables, including the code and documentation to produce them.

Introduction

The new generation of RAG / LLM architecture is moving away from the original monolithic and generic OpenAI model, towards a collection of decentralized and specialized LLMs jointly organized and governed via multi-agent systems.

The benefits are obvious: low latency, smaller tables (one per LLM), faster training and fine-tuning, energy-efficient, better results, with much lower GPU consumption. The number of tokens or weights is dramatically reduced.? If you charge customers by the token as many vendors do, this is another competitive advantage. It also leads to local implementations and secure enterprise solutions augmented with external sources.

My own product, xLLM, is the pioneering solution that ignited this new trend. It offers additional benefits: self-tuning, user-customized, no neural networks and thus even much faster and more frugal in terms of GPU usage. Embeddings is just one of the many backend tables (one per LLM), and not even the most important one.? In particular, it heavily relies on the reconstructed structure found in the crawled repository, especially the taxonomy and related items. The user can select a specific LLM in addition to the standard prompt. A future version will also integrate user prompts as input data for some of the backend tables.? By contrast to deep neural networks, a core feature of xLLM is explainable AI.

So far, nothing new. It has been available as open source with full Python code, written from scratch and well documented, for quite some time: see here. An enterprise version for a fortune 100 company is currently tested, and some advertisers are interested in blending sponsored results along with the organic output delivered to user queries. The parent company is funded and operated by the author of this article.

Rahul Mathur 5 年前

The Power of Abstraction in Software

Wei Li 6 个月前

Generative Adversarial Symmetry: Unveiling Balance in…

Yeshwanth Nagaraj 5 个月前

Multi-token embeddings

The new feature is the introduction, for the first time to my knowledge, of embeddings consisting of multi-token words, rather than single tokens. As one would expect, it leads to better results for the output section based on embeddings. However, the initial goal was to further improve, create, or update the taxonomy tables. It is especially useful when augmenting the corpus with external sources that lack an obvious, easy-to-detect structure.

Dealing with words rather than tokens leads to a combinatorial explosion in the size and number of multi-token embeddings, called x-embeddings. In order to keep these new tables as small as possible while still bringing extra value, special mechanisms are needed.

Interestingly, the very first attempt produced massive backend tables, reminiscent of standard LLMs. There was a lot of noise, indeed mostly noise: useless text elements that are never fetched when creating the output to a user prompt. This noise can potentially result in hallucinations. The reason I mention it is because I believe that the same issue is still present today in standard LLMs based on trillions of weights. Now I solved this problem: xLLM tables are short again, even those that store the x-embeddings.

Learn more, here.

To not miss future updates on this topic and GenAI in general, sign-up to my newsletter,?here. Upon signing-up, you will get a code to access member-only content. There is no cost. The same code gives you a 20% discount on all my eBooks in my eStore,?here.

GenAI and Machine Learning

196,932 位关注者

Mangesh Nijasure

Software Development / Consulting

6 个月

quite interesting. Thanks for sharing Vincent Granville.

1 次回应

Carl W J Davidson

My opinion is I'm smart you should pay attention . Whether you do that's you opinion..

6 个月

Embeddings and Multi-Token Words: Embeddings: These are a way of converting words into a form that a computer can understand. Think of them as a sort of translation from human language to "computer language." Multi-token embeddings: Traditional models break down sentences into single words or pieces (tokens), but xLLM uses multi-token words, meaning it keeps some words or phrases together. This can improve the model's understanding and outputs, especially when integrating information from various sources. Backend Tables: These are like databases for the model, where it stores and retrieves information needed to answer queries. xLLM uses multiple such tables, each corresponding to a different specialized model. The concept here is to make AI more efficient, cost-effective, and adaptable to specific tasks, while also being easier to manage and understand. This approach could be particularly beneficial for businesses that need secure and efficient AI solutions tailored to their specific needs.

1 次回应

Carl W J Davidson

My opinion is I'm smart you should pay attention . Whether you do that's you opinion..

6 个月

A decentralized system called xLLM. Let's break it down Decentralized and Specialized Models: Traditional large language models (LLMs) like those developed by OpenAI are centralized and quite large, which means they require a lot of computing power. The new system, xLLM, is moving away from this by using a collection of smaller, specialized models that work together. This approach can be faster, use less energy, and be more efficient because each small model handles a specific task. Benefits of xLLM: Low latency: It's faster in responding to queries because it doesn't have to process as much information at once. Energy and cost efficiency: Since it uses less computing power (GPU), it's cheaper to run, especially since some companies charge by the amount of data processed (tokens). Local implementations: It can be set up within a company's own infrastructure, making it more secure and potentially more customizable. Explainable AI: Unlike some deep learning models that are often described as "black boxes" because it's hard to understand how they arrive at certain outputs, xLLM is designed to be more transparent in how it works.

1 次回应

Amin Zayeromali

6 个月

Great! Thank Vincent Granville.

4 次回应

Ken Morimoto

6 个月

Interesting. Henry Tan, Arsalan (RC) Mosenia, PhD, Gloria F., Sean Robinson, Ph.D. for visiblity=)

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Hallucination-Free, Self-Tuned, Fast Hierarchical LLMs with Multi-Token Embeddings

Vincent Granville

AI/LLM Disruptive Leader | GenAI Tech Lab

领英推荐

Multi-token embeddings

GenAI and Machine Learning

196,932 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Generative Adversarial Symmetry: Unveiling Balance in AI

Exciting Features of TensorFlow

Evaluation of Generative AI applications using Prompt Flow SDK

Who needs accountants with Cloudployees built with python?

How I Improved the Performance of my Computer Vision Model Two-Fold

Open Source AI Content Generator GPT-3 Alternative GPT-Neo

What is Artificial Intelligence course?-Artificial intelligence course

GPT-4o vs Gemini 1.5 Pro: Battle of The Best AI of 2024

My new GenAI book is now available!

Machine Learning in Deno: Single Layer Perceptrons

领英推荐

Multi-token embeddings

GenAI and Machine Learning

196,932 位关注者

Databases For AI, GenAI & RAG/LLMs: Vendor Comparison

2024年10月9日

Building a Ranking System to Enhance Prompt Results: The New PageRank for RAG/LLM

2024年10月8日

State of the Art in AI Research

2024年10月4日

Top Professional GenAI and LLM Courses & Certifications

2024年10月3日

All Databases are Equal, but Some Databases are More Equal than Others

2024年9月26日

Beginner's Guide to Graph RAG

2024年9月25日

No-Code LLM Fine-Tuning and Debugging in Real Time: Case Study

2024年9月23日

30 Features that Dramatically Improve LLM Performance: Part 3

2024年9月21日

The Enterprise AI Conference

2024年9月16日

LLMs in Fraud Detection: Model Comparison

2024年9月12日

社区洞察

其他会员也浏览了

Generative Adversarial Symmetry: Unveiling Balance in AI

Exciting Features of TensorFlow

Evaluation of Generative AI applications using Prompt Flow SDK

Who needs accountants with Cloudployees built with python?

How I Improved the Performance of my Computer Vision Model Two-Fold

Open Source AI Content Generator GPT-3 Alternative GPT-Neo

What is Artificial Intelligence course?-Artificial intelligence course

GPT-4o vs Gemini 1.5 Pro: Battle of The Best AI of 2024

My new GenAI book is now available!

Machine Learning in Deno: Single Layer Perceptrons