LLM And Hallucinations
I'm a bit surprised that the term hallucinations have been used to describe the output of LLMs when they're performing out of sample predictions. The use of the word hallucinations to describe this phenomena obfuscates what really happens within these neural networks.
Conceptually, a LLM is an auto-encoder. To be more specific they're a type of Variational Autoencoder (VAE). Essentially an autoencoder is a network that finds a lower dimensional representation of some input dataset, which in this case is a corpus of text or a series of images that can be used to recreate the original dataset with some level of fidelity. Generally it is hoped that this lower dimensional representation is something that is a bit more amenable to different tasks that we have downstream.
One of the interesting things about using a neural network, and more specifically an autoencoder, is that you can obtain a lower dimensional representation of your data, and then use the decoder portion of your network to reconstruct a reasonable facsimile of the what the original data is. In some ways this can be thought of as a lossy compression technique.
The second, more important part that gives LLMs a flavor of "magic" is taking the autoencoder, and then applying Reinforcement Learning with Human Feedback (RLHF), which allows this embedding to be adjusted so that similar topics can appear together. This is what allows code generation to work. They take two sets of text that under a naive embedding are not associated with each other and tweak the network so that they will be.
The reason that I believe that the hallucination problem is "baked" into the LLM is because of the interplay of a few reasons.
To illustrate this, I trained a neural network to predict the digits from 0-9 from the mnist dataset. This model was > 99% accurate on the dataset on the MNIST dataset with a loss < 6.7x10-5, so at least in its original task it's as accurate as the data allows.
epochs = 50
batch_size = 32
input_layer = Input(shape=(25,))
x = Dense(30, activation="elu")(input_layer)
x = Dense(20, activation="elu")(x)
output_layer = Dense(10, activation='softmax')(x)
prediction_model = Model(input_layer, output_layer)
prediction_model.compile(loss = 'categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
prediction_model.summary()
x_train_embedding = embedding.predict(x_train)
x_test_embedding = embedding.predict(x_test)
prediction_model.fit(x_train_embedding, y_train, batch_size = batch_size, epochs = epochs,
validation_data = (x_test_embedding, y_test))
Rather than using an autoencoder I just played with the input. I tried to find the pixels that would give the highest probability to being predicted as the digit "7". I did this by using a pass through layer that fed 1's into the network, and optimizing the weights of this pass through layer, to maximize the probability. (So the following image will predict a "7" when fed through the original network). By doing so, the "weights" then become the image that becomesThis looks nothing like a 7, but when fed through the original MNIST classifier shown above will return with very high probability that it is a 7 and nothing else.
领英推荐
To better understand why this is the case, we can plot the output of the embedding layer. The test point while it isn't widely off in the middle of nowhere, lies close to the decision boundary, but is also kind of far from the "central" cluster of 7's or any numbers in fact. But that doesn't mean that data can or "can't" lie there.
This indirectly suggests why Retrieval Augmented Generation (RAG) works so well. Because rather than just generating a solution that works for your initial network, you can anchor the points near a document of "real data" instead of taking any arbitrary point in your space. In my example, the test point, though it is within a classification boundary and will get you a 7 is far from the central tendency of all the points that are a 7.
RAG, or finding the closet point in your embedding that gives you a 7 is one way of dealing with this problem, but other ways can include hacking a gaussian mixture model on the embedding layer so you can define a boundary for your known data, and/or putting in random garbage data that is not supposed to classify to anything.
However, the pernicious part of the problem is that the boundaries within a neural network need not be convex, and so you don't know at what points the decision boundaries break down. So while the proposed problems above might make the problem less apparent, it doesn't fully solve the issue. It's fully possible that your underlying embedding looks like a Swiss roll, and a secondary point that is arbitrarily close might have something that represents a different topic.
Now this realization doesn't mean that I don't believe that LLM hype. I do think that they represent a significant leap forward in our ability to do NLP and as such are incredibly useful tools. On the flip side thinking through the problem I do have concerns about the training data that goes into these LLMs, and maybe rather than focusing on ever bigger models, the competitive advantage for the different models will come down to who has curated their data better.