Generative AI, LLMs and Vectors - a Primer

Generative AI, LLMs and Vectors - a Primer

Like most sensible people in tech, I’ve been researching and playing with new generative AI tools and approaches. There’s a lot of confusion out there, so I’m writing up this simple primer to clarify a few things.?

It’s all new, and I’m not an expert - some of this may even be wrong - but I’ve distilled some core concepts and ideas to help us all learn.

First, some definitions

  • Generative AI is artificial intelligence that generates responses, rather than simply consuming data. So figuring out which web pages talk about politics (tagging/categorization) is not generative. But creating a web page summarizing political news for the day is generative.
  • Large Language Models (LLMs), this year’s big innovation, are a subcategory of generative AI, and include ChatGPT. These models “understand” (encode and use) natural language questions and generate natural language responses. They are trained on massive amounts of text data (using massive amounts of GPU compute resources), so are expensive to build. But remarkably cheap to run.
  • Vector Databases are very hot right now, because LLMs internal layers, and other AI techniques, produce vectors. With all these vectors, you need somewhere to store them and an ability to query for similar vectors.
  • Embedding is the process of converting a chunk of text (or a single word, or an image, or a sub-graph in a graph database) into a numeric vector that represents it. LLMs do this internally, and you can also ask some of them to do this explicitly - that is, an LLM or its underlying engine can produce a vector representing an input or output, rather than generating a natural language answer.
  • Text Embeddings are now typically “Semantic,” meaning they represent concepts and meaning in the text, rather than words.
  • A Model is, roughly, the set of weights used in one of these seemingly-magical AI systems like LLMs. These weights control neural networks, and (imperfectly) distill and represent all the information in the vast set of training inputs (books, articles etc.) that the AI was trained on. Even LLM Models are not that big, and once trained, they can be shared easily and used (“run”) at low cost. It’s the training that is hard and expensive.


What do LLMs do, and how do they work?

LLMs are described as auto-complete on steroids. That is insightful and true, but also misleading. If you use your phone’s auto-complete, start with a single letter, and keep accepting autocomplete recommendations, you’ll get text. I did this starting with “LLM” on my own phone, and I get:?

“LLMs are statistical analysis of the addressee and may contain legally required information from the sender”

What’s important here is that every group of 2-3 words makes sense, but the overall sentence is gibberish. Auto-complete has no sense of the context, and only generates words based on the prior word (or two). LLMs fix this by looking at a much broader context, and by identifying abstract concepts that guide the overall response. An LLM even generates a good overall structure for an entire paper or document, because its ability to capture abstract concepts and big-picture relationships among text sections is so good. I won’t get into complex technical processes much, but know that “feed-forward” and “self attention” are two underlying AI techniques that enable LLMs to use broader context to get things right.?


Encoding, Decoding and Generation

LLMs would not exist without GPUs. GPUs are vector and matrix-based computing chips that perform millions of operations in parallel. But they need vector and matrix data to perform at their best.?

Encoding is the process of taking some input (such as a question or prompt) and converting it into the vector data formats that GPUs need to do their magic.?

Decoding is then taking these vector outputs of early LLM processing phases (or other AI processes) and turning them into more vectors that represent the structure and likely words in the output or answer.


LLMs in particular, decode by generating “output probability distribution” vectors that represent certain words and perhaps certain concepts, and the probabilities of them being part of a good answer. Intuitively, you or I do something similar when considering what to say before we speak - certain words and word senses are top of mind depending on what we are thinking about, and what we are about to say. Decoding in LLMs also uses feed-forward and self attention to generate words that have good overall structure, rather than gibberish like my phone produced above using simplistic type-ahead.

Generation, finally, completes with token to word conversion, where the vectors of likely words from the decoding are turned into human-readable text.?


Vectors

Encoding produces vectors, and can produce a single vector representing a large, complex thing such as a book, question, or an image; therefore vectors are suddenly far more plentiful and easy to get than they were before this phase of the AI revolution.

We can use these vectors to figure out - very quickly on any computer, and lightning fast using GPUs - what is similar to what. If two vectors are numerically similar (e.g. their dot product is high) then the items the vectors represent are also probably similar.

So we can look at books or furniture descriptions or recipes or people, somehow encode them all, and find similar items fast. The encoding is pretty fast, and the retrieval is even faster. The training is slow and expensive, but even that is much better now due to GPUs.?

Vector databases are specialized to store all the vectors and quickly give a set of similar vectors to any vector.


Knowledge graphs and graph databases

I work in the field of graph databases at Dgraph Labs, so maybe I’m a little biased. LLMs produce a new kind of graph, with a new(ish) kind of relationship: the fuzzy, unreliable relationship. As covered below, LLMs can hallucinate, but are also usually right, and derive or predict statistically likely information. So LLMs are producing vast amounts of unreliable, yet immensely useful, information. Kind of like many people you know.

Databases generally, and knowledge graphs in particular, used to store almost only reliable, curated data. Soon, they will be augmented with statistically likely or “fuzzy” presumptions. The explosion of such data means that dealing with similarity and uncertainty is now vitally important, and data modeling and systems need to distinguish between what is absolutely true, and what was inferred or predicted by AI.


Free, shared and downloadable “models”

One great thing about AI generally and LLMs in particular is that a few larger companies can train models using massive computing resources, and then everyone can use those models to embed information into vectors, or perform text generation using less expensive servers.

For instance, Facebook (Meta, technically) open-sourced one of its pre-trained models for anyone to download and use. The full version is 240GB, and less accurate versions are a few GB.


Some tech details on how LLMs work

You can skip this if you want, but here’s a pretty famous diagram (below) illustrating the vocabulary and processes I covered above. It’s from “Attention is All You Need,” and while a bit complex, it shows the architecture of a typical LLM and puts it all in context.


Here is where the concepts are shown in the diagram, and in what color:

  1. Embeddings immediately turn text into vectors (pink “Input Embedding” boxes)
  2. Self-attention is used to represent context from the entire text input, not just one word at a time (orange “Multi-head attention”) boxes.
  3. Feed forward takes some vectorized intermediate results, which likely represent higher level concepts that guide the processing, and use the abstract concepts to influence later, more refined steps (blue “Feed Forward” boxes).
  4. Training vs generation. The lower few steps on the right-hand side process “outputs.” It’s weird to see “outputs” fed in as inputs! These are the desired outputs supplied with the items to learn from during the training process. We have not really talked about training, so you can ignore the lower-right section of the diagram for this article.
  5. Encoding and Decoding. The boxes on the left are encoding: turning the input into vectors with multiple layers of neural network processing. The boxes on the right are decoding: turning the vectorized input into a vectorized result that is not yet consumable as text.
  6. Generation. The “Linear” and “Softmax” boxes up top represent how the vectors representing the output are turned into text by choosing likely words that go together well in the output.

No alt text provided for this image
Steps and flows within OpenAI's processing

I have not dug that deeply into the paper laying out this architecture, so I’m sure there are subtle errors in all this, but at this high level, I hope this clarifies what is happening inside LLMs.


Langchain and other frameworks - a notional example

Often, LLMs and other AI don’t do exactly what people want out of the box. New tools - perhaps chiefly LangChain - help people string together a bunch of steps to accomplish larger tasks using many AI components..

Consider writing a children’s story with some images as part of a children’s story. The prompt could be: “Write a story about a big dog that chases a cat, but then the cat is saved by a rabbit.”

One might want to:

  1. Find some prompt templates that will help, based on the user’s original prompt alone, or incorporating the user’s profile and history.
  2. Embed the original prompt in a larger prompt.
  3. Generate the story.
  4. Generate a 10 word summary of the story.?
  5. Feed the 10 word summary to an image generator to get an illustration.

To do this, you can create a vector representing a user’s history, and use that to look up similar users. Then use a graph database full of themes and concepts in a knowledge graph to find the topics and themes in those users’ favorite stories.

Then create a larger prompt using this template which will ensure proper length and reading level:

Use 3rd grade vocabulary. Emphasize the themes of ${themes for favorite stories}. Keep content length to 350-400 words. Write a story about ${original user prompt}.

Etc.

The point being that this requires creating a vector for every user (an embedding of users from a graph database), finding similar users (similarity search in a vector DB), creating a custom prompt, creating a story, summarizing a story, and generating an image. That’s where frameworks like LangChain come in.


Benefits and limitations

Whatever I say here is going to be so very, very wrong in six months. This field is evolving very fast, and nobody can predict where it is going to go. But I’ll do it anyway.

LLMs and Generative AI can summarize and recombine existing ideas, but not generate new ideas. They are fundamentally statistical approaches to predicting a likely answer based on billions of existing inputs. They tell you what a large group of humans might have said if you surveyed them and consolidated the results.

That’s why LLMs tend to be biased, or even outright bigoted. “Garbage in, garbage out,” as they say. This bias is perhaps the most distasteful example of how existing mistakes or biases encoded in the inputs will show up in the outputs too. Again, LLMs don’t reason; they statistically summarize.


LLMs sometimes make up facts or “hallucinate.” Because LLMs produce what is statistically likely given the inputs, they can provide an answer of how things might have been, rather than how they actually are.?


LLMs don’t access reliable data or knowledge that well. If you ask for the average high temperature over the last week, or even for a week back when an LLM? was still? being trained, it won’t know. It can’t access a database to figure that out on its own. It also can’t tell you how many insurance claims your company processed last month. If you have a product catalog with various categories, it would need to be re-trained (incrementally) to know that Part X is Compatible with assembly Y, or that an extended warranty is available for an item for $12.

LLMs can be used to help generate queries to databases with this current, reliable data, if given the schema to look at, but you still need that actual data and knowledge for most business processes.


LLMs are slow to incorporate new information

It takes a very long time and a lot of money to re-train an LLM. And water - training GPT-3 used up about half an acre-foot of clean water to cool the machines! This is one reason ChatGPT’s training process and data stopped in 2021 (as of June of 2023).?


But these issues aside, LLMs are going to be unimaginably transformative. The barrier between humans, who use natural language, and machines, which now also use natural language, has been obliterated.



Ed Shkriba

Vice President of Sales at Evolve Squads | I'm helping our customers find the best software engineers throughout Central/Eastern Europe & South America and India as well.

6 个月

Damon, really interesting!

回复
Mohamed Elattma

Engineering @ a16z

1 年

Great read :) perhaps I missed it in the article but vectors + embeddings + KGs are especially important when doing RAG (instead of fine tuning an LLM) which can help severely reduce hallucinations!

Sreenivasulu M

?? Uipath and AutomationEdge Expert Streamlining Operations and Enhancing Productivity as an RPA Developer at eAlliance Corp ??

1 年

Thanks for posting this wonderful information!

Dag Holmboe

Founder, AI, Data Science

1 年

Damon, this is really good. Thanks for posting.

要查看或添加评论,请登录

Damon Feldman的更多文章

社区洞察

其他会员也浏览了