Generative AI, LLMs and Vectors - a Primer
Like most sensible people in tech, I’ve been researching and playing with new generative AI tools and approaches. There’s a lot of confusion out there, so I’m writing up this simple primer to clarify a few things.?
It’s all new, and I’m not an expert - some of this may even be wrong - but I’ve distilled some core concepts and ideas to help us all learn.
First, some definitions
What do LLMs do, and how do they work?
LLMs are described as auto-complete on steroids. That is insightful and true, but also misleading. If you use your phone’s auto-complete, start with a single letter, and keep accepting autocomplete recommendations, you’ll get text. I did this starting with “LLM” on my own phone, and I get:?
“LLMs are statistical analysis of the addressee and may contain legally required information from the sender”
What’s important here is that every group of 2-3 words makes sense, but the overall sentence is gibberish. Auto-complete has no sense of the context, and only generates words based on the prior word (or two). LLMs fix this by looking at a much broader context, and by identifying abstract concepts that guide the overall response. An LLM even generates a good overall structure for an entire paper or document, because its ability to capture abstract concepts and big-picture relationships among text sections is so good. I won’t get into complex technical processes much, but know that “feed-forward” and “self attention” are two underlying AI techniques that enable LLMs to use broader context to get things right.?
Encoding, Decoding and Generation
LLMs would not exist without GPUs. GPUs are vector and matrix-based computing chips that perform millions of operations in parallel. But they need vector and matrix data to perform at their best.?
Encoding is the process of taking some input (such as a question or prompt) and converting it into the vector data formats that GPUs need to do their magic.?
Decoding is then taking these vector outputs of early LLM processing phases (or other AI processes) and turning them into more vectors that represent the structure and likely words in the output or answer.
LLMs in particular, decode by generating “output probability distribution” vectors that represent certain words and perhaps certain concepts, and the probabilities of them being part of a good answer. Intuitively, you or I do something similar when considering what to say before we speak - certain words and word senses are top of mind depending on what we are thinking about, and what we are about to say. Decoding in LLMs also uses feed-forward and self attention to generate words that have good overall structure, rather than gibberish like my phone produced above using simplistic type-ahead.
Generation, finally, completes with token to word conversion, where the vectors of likely words from the decoding are turned into human-readable text.?
Vectors
Encoding produces vectors, and can produce a single vector representing a large, complex thing such as a book, question, or an image; therefore vectors are suddenly far more plentiful and easy to get than they were before this phase of the AI revolution.
We can use these vectors to figure out - very quickly on any computer, and lightning fast using GPUs - what is similar to what. If two vectors are numerically similar (e.g. their dot product is high) then the items the vectors represent are also probably similar.
So we can look at books or furniture descriptions or recipes or people, somehow encode them all, and find similar items fast. The encoding is pretty fast, and the retrieval is even faster. The training is slow and expensive, but even that is much better now due to GPUs.?
Vector databases are specialized to store all the vectors and quickly give a set of similar vectors to any vector.
Knowledge graphs and graph databases
I work in the field of graph databases at Dgraph Labs, so maybe I’m a little biased. LLMs produce a new kind of graph, with a new(ish) kind of relationship: the fuzzy, unreliable relationship. As covered below, LLMs can hallucinate, but are also usually right, and derive or predict statistically likely information. So LLMs are producing vast amounts of unreliable, yet immensely useful, information. Kind of like many people you know.
Databases generally, and knowledge graphs in particular, used to store almost only reliable, curated data. Soon, they will be augmented with statistically likely or “fuzzy” presumptions. The explosion of such data means that dealing with similarity and uncertainty is now vitally important, and data modeling and systems need to distinguish between what is absolutely true, and what was inferred or predicted by AI.
Free, shared and downloadable “models”
One great thing about AI generally and LLMs in particular is that a few larger companies can train models using massive computing resources, and then everyone can use those models to embed information into vectors, or perform text generation using less expensive servers.
For instance, Facebook (Meta, technically) open-sourced one of its pre-trained models for anyone to download and use. The full version is 240GB, and less accurate versions are a few GB.
Some tech details on how LLMs work
You can skip this if you want, but here’s a pretty famous diagram (below) illustrating the vocabulary and processes I covered above. It’s from “Attention is All You Need,” and while a bit complex, it shows the architecture of a typical LLM and puts it all in context.
领英推荐
Here is where the concepts are shown in the diagram, and in what color:
I have not dug that deeply into the paper laying out this architecture, so I’m sure there are subtle errors in all this, but at this high level, I hope this clarifies what is happening inside LLMs.
Langchain and other frameworks - a notional example
Often, LLMs and other AI don’t do exactly what people want out of the box. New tools - perhaps chiefly LangChain - help people string together a bunch of steps to accomplish larger tasks using many AI components..
Consider writing a children’s story with some images as part of a children’s story. The prompt could be: “Write a story about a big dog that chases a cat, but then the cat is saved by a rabbit.”
One might want to:
To do this, you can create a vector representing a user’s history, and use that to look up similar users. Then use a graph database full of themes and concepts in a knowledge graph to find the topics and themes in those users’ favorite stories.
Then create a larger prompt using this template which will ensure proper length and reading level:
“Use 3rd grade vocabulary. Emphasize the themes of ${themes for favorite stories}. Keep content length to 350-400 words. Write a story about ${original user prompt}.”
Etc.
The point being that this requires creating a vector for every user (an embedding of users from a graph database), finding similar users (similarity search in a vector DB), creating a custom prompt, creating a story, summarizing a story, and generating an image. That’s where frameworks like LangChain come in.
Benefits and limitations
Whatever I say here is going to be so very, very wrong in six months. This field is evolving very fast, and nobody can predict where it is going to go. But I’ll do it anyway.
LLMs and Generative AI can summarize and recombine existing ideas, but not generate new ideas. They are fundamentally statistical approaches to predicting a likely answer based on billions of existing inputs. They tell you what a large group of humans might have said if you surveyed them and consolidated the results.
That’s why LLMs tend to be biased, or even outright bigoted. “Garbage in, garbage out,” as they say. This bias is perhaps the most distasteful example of how existing mistakes or biases encoded in the inputs will show up in the outputs too. Again, LLMs don’t reason; they statistically summarize.
LLMs sometimes make up facts or “hallucinate.” Because LLMs produce what is statistically likely given the inputs, they can provide an answer of how things might have been, rather than how they actually are.?
LLMs don’t access reliable data or knowledge that well. If you ask for the average high temperature over the last week, or even for a week back when an LLM? was still? being trained, it won’t know. It can’t access a database to figure that out on its own. It also can’t tell you how many insurance claims your company processed last month. If you have a product catalog with various categories, it would need to be re-trained (incrementally) to know that Part X is Compatible with assembly Y, or that an extended warranty is available for an item for $12.
LLMs can be used to help generate queries to databases with this current, reliable data, if given the schema to look at, but you still need that actual data and knowledge for most business processes.
LLMs are slow to incorporate new information
It takes a very long time and a lot of money to re-train an LLM. And water - training GPT-3 used up about half an acre-foot of clean water to cool the machines! This is one reason ChatGPT’s training process and data stopped in 2021 (as of June of 2023).?
But these issues aside, LLMs are going to be unimaginably transformative. The barrier between humans, who use natural language, and machines, which now also use natural language, has been obliterated.
Vice President of Sales at Evolve Squads | I'm helping our customers find the best software engineers throughout Central/Eastern Europe & South America and India as well.
6 个月Damon, really interesting!
Engineering @ a16z
1 年Great read :) perhaps I missed it in the article but vectors + embeddings + KGs are especially important when doing RAG (instead of fine tuning an LLM) which can help severely reduce hallucinations!
?? Uipath and AutomationEdge Expert Streamlining Operations and Enhancing Productivity as an RPA Developer at eAlliance Corp ??
1 年Thanks for posting this wonderful information!
Founder, AI, Data Science
1 年Damon, this is really good. Thanks for posting.