Is the Large Language Model revolution just getting started or are we closer to the end?

Is the Large Language Model revolution just getting started or are we closer to the end?

There's a common inclination for people to anticipate that a breakthrough marks the beginning of a series of similar advancements. While this can indeed happen, numerous instances exist where the opposite occurs. Determining whether the current discovery will lead to a cascade of further developments or not is not always straightforward. That said, I think we are seeing signs that LLM development, despite it seeming brand new, is actually closer to the end.

Examples of just getting started and almost being done

Let’s give some examples. We can point back to 1905, Einstein’s miraculous year which was indeed the start of enormous progress in understanding both quantum mechanics and relativity which led to the revolution of modern physics over the next century. In contrast, Einstein’s publication on General Relativity in 1916 wasn’t exactly the beginning of major advances in understanding gravity. It was in fact mostly marking the end of the process. There are still attempts to unify gravity with quantum mechanics but, to first approximation, they have not progressed and gravity theory remains where Einstein left it.

The Chinese can perhaps claim the invention of the rockets in 1232 but modern rocketry really began in 1898 with the suggestion by Russian school teacher, Konstantin Tsiolkovsky that rockets using chemical combustion could be used for space exploration. Public attention on rocketry arguably peaked in 1969 when the first human set foot on the moon. The expectation at the time was that rocketry was just getting started and that rockets were the future of human transportation. People expected that we would all be spending summer vacation on the Moon or Mars, presumably due to continued breakthroughs on making rockets more efficient and cheap. Obviously that didn’t happen. There have been only incremental changes in the past 50 years. For the most part, when Armstrong stepped foot on the moon it actually marked the point where the significant rocketry breakthroughs would subside.

It’s difficult to say whether AI is in this process especially since it’s so hard to define. Personally I feel there will be quite a few breakthroughs in AI ahead. But I’d rather focus on a simpler question; are we at the beginning or closer to the end with Large Language Models (LLMs) or perhaps generative AI more generally. To this question I will take a more contrarian view that, with LLM breakthroughs, we are now closer to the end.

Why LLM research is about to saturate

It’s important to understand what generative AI actually is. Gen AI is an unsupervised ML technique. It’s attempting to model the distribution of the data as closely as possible. Once that is done, we make inferences from it through applying conditioning operations, sometimes called prompts.

Human written language data lives in a space of enormous dimension. If there are 256 characters in the character set, the possible number of 1000 word “essays” is something like 256 ^6000. Of course most of those “essays” are simply gibberish and do not even form recognizable words. So the space of intelligible essays while still huge is a subspace of this larger space which we might call a lower dimensional surface or manifold. The picture below for example is a representation of a 2 dimensional manifold in a 3 dimensional space (projected down again to the 2D screen).?


Language can be thought of in a similar way even though the spatial dimensions are so large that we can’t begin to visualize it. But again, what we are attempting to do is approximate that language manifold and we do that using deep learning neural networks.

Now let’s take the above surface as an example. We might try to approximate that with some familiar functions such as polynomials, rational functions, sinusoidal functions etc. Perhaps an approximation looks like this.?

This approximation can be quite a bit off and still be useful but there is value in finding better, more complex functions and methods for training their parameters. The closer we get to the true manifold, the better our inferences are going to be when we condition it through prompts.

But this is only true up to a point. First of all, the data that defines an approximate surface may be large but it is still finite. We can’t really know what the real manifold is doing at higher frequencies than our data samples. For example with the surface above, we know we could not reconstruct that surface very well with just 100 points. Our reconstruction might look more like the second surface.

As we get more and more points it will get closer to the real surface but, at some number of points, this improvement will usually begin to saturate. For example, the surface above can probably be modeled very well with 100,000 points. Going to a million points will only give you a small improvement. Continuing to a billion or a trillion points will lead to almost no improvement at all.?

To make matters worse, the data is usually considered to contain what we might call noise. In terms of LLM training data, we can say that not everything humans have written is correct or useful. So even if you could reconstruct the exact training sample answers to questions like “Why is America so wealthy?”, it doesn’t mean that you will get something that can be considered absolutely true or maximally useful. To quote Jeffrey Lebowski, “That’s just like … your opinion man!”. We often call this concept overfitting. We want models to learn a smoother, more generalizable model than the one actually represented by actual data. When our prompt is a point between actual data points, we want it to be able to interpolate and give a sensible answer.

For this reason, there is going to be a point where models are complex enough for the data given to model them accurately and there may be almost nothing to gain by coming up with “better” models. In addition, it may be the case that, for the most part, we have used all the useful data. If new data is not very different from what your model already predicts, it is essentially redundant and adds no new information.

Are we there yet with LLMs?

Let’s try to answer the question of whether LLMs are getting close to this point of saturation. If it were, there are certain signs that we would expect to see. The most obvious sign is that we would run into problems with making them perform better. This is because they are already close to the exact distribution of data. Multiple researchers building different models would find that their models start giving identical answers or at least answers that are more or less equivalent.

Is this what we see? What actually inspired me writing this article at this time is a comment allegedly made by an OpenAI employee going around on social media. I guess the source is here. I’ll take for granted that it is legitimate. Here are some snippets.

“It’s becoming awfully clear to me that these models are truly approximating their datasets to an incredible degree … pretty much every model with enough weights and training time converges to the same point. …

This is a surprising observation! It implies that model behavior is not determined by architecture, hyperparameters, or optimizer choices. It’s determined by your dataset, nothing else.”

As I have explained this is actually not a surprising thing. Or at least, this is what you would expect to happen eventually. The fact that it is happening now might be surprising since it seems LLMs are quite new. There is other evidence of this fact besides this alleged quote.

The limiting characteristic of LLMs

If the insight expressed by this OpenAI employee is correct, we are beginning to saturate things where we have enough model complexity and ability to train that we are close to being done modeling the human language manifold. If so, we are close to the end.

If we can accurately model the language manifold, then we are limited in what we can do with it by the intrinsic tightness of the probability distribution. That is, when we prompt an LLM, we ask it to sample points from the conditional language manifold. We interpret these as “answers” to a question but technically they are probable continuations based on what written data has been trained on.

It can give us an unlimited number of “answers” but many of them will be redundant. Other times we might recognize that an answer is coherent but in contradiction with another answer that it gave before to the same prompt. We shouldn’t be surprised by this. People have different opinions. Questions like: “Why is America so rich” is a controversial one and that will be reflected in the data as many different, conflicting answers. Since the approximated distribution is likely to still be smoother than the data, it will more likely give a kind of consensus answer more often than not. An LLM will not be as opinionated as a typical person responding to the question.?

But if there are conflicting answers even with successive samples from the very same model, which of them is right? There is no answering that with the model. The model is simply telling us of the diversity of opinions that people have and perhaps trying to average them together to some degree.

Goodbye LLM research, we hardly knew ye

LLM research just started a few years ago. Am I saying that we are basically done? Well, not exactly. I’m saying that models are not likely to improve substantially through clever model redesigns or through finding more of the same kind of data to include. There is still plenty to do to turn LLMs into more useful products.

There are so many things that can be done that I can’t possibly list them all or even think of them all. We might for example make improvements through better prompting. What better prompting actually means is that we are injecting more of our own knowledge (or preferences) as prior information. This is a common way to get more out of Bayesian models. For example if we want an essay on “Why America is rich” and want to take a traditional Western capitalist view, we can specify that. Or we can instead take a Marxist view or base the essay on America’s fortunate geography or luck in avoiding the worst of the World Wars etc. Such essays, regardless of whether they are “right” or “wrong”, are likely to be more coherent and less vague and indecisive.?

Likewise, we can get more useful LLMs for a given purpose by limiting the training data or perhaps weighting it appropriately. We can make improvements by removing what we believe to be incorrect or (in our opinion) “biased” writing samples. I wrote one of my humorous pieces about a jailbroken ChatGPT that has waded into the low quality information scapes of the world, such as reddit conspiracy forums and Tinder chats and turned into a belligerent, foul-mouthed monster.

To be honest we have hardly begun to figure out how to use LLMs in actual applications. ChatGPT is a useful general purpose tool when used for information retrieval and other tasks. ChatGPT and similar tools are already reducing the time required to write software, particularly less original kinds of software (which unfortunately or fortunately is lots of current business software). However, there is a larger hope of incorporating these things into more common business workflows. While I expect this will be possible in some cases I also have more reservations than most people that it will have broad success based on the fact that I actually do this for a living.

Gen-AI in general?

I am less confident saying that Gen-AI in general is close to the end. We have modeled language well. We also have modeled the visual manifold well. Tools like Midjourney do an amazing job creating art and graphics. The biggest gain there will be similar to LLMs; that we need to improve how we inject prior information and preferences. We need to develop a better API for gaining more control over what is output. Currently in Midjourney, that is limited (last I checked) to changing the prompt or hitting the buttons to create different variations. We aren’t yet at the point where we can say “Keep that the same but make the monkey’s hat green” or “That’s good but make it a sunny day instead of cloudy”. That’s still a job for manual work in photoshop or fudding around with prompt changes and dealing with other unexpected variations. But we can’t be far away. AI created art and graphics will soon be more of skill rather than a game of persistence and companies will create some excellent products around it.

But there is more to Gen-AI than language and art generation. Gen-AI can be used on any data. It can be used to come up with new molecules or materials, some of which can be actualized and used to create real value. There are likely going to be applications we can’t even think of right now. If we haven’t even decided on the data sets, we can’t claim it is close to being done.

Conclusion

I think there is a very good chance, as strange as it sounds, that LLM research has already made the major breakthrough, and that future progress will be incremental rather than revolutionary. AI however is a much bigger field and I expect continued research and progress on all sorts of things. The current paradigm, pattern recognition, will continue to be the leading methodology for perhaps another decade. After that, if there is much progress, I expect it to come from other areas of AI, some of which have fallen out of favor, or just briefly taken a backseat (e.g. reinforcement learning). Or perhaps we will finally figure out how to combine multiple paradigms into a larger system similar to how the human brain appears to work. But LLMs are probably approaching the point of being a solved problem and focus will shift away from making them more “intelligent” and focus more on making them more useful.

Some good insights and sound thinking on limits here. Good writeup! And the illustrations of manifolds are really helpful. Any visual model other than ever-present "porcelain Terminator" is, really. Subbarao Kambhampati has (what looks to this layman) excellent thoughts on architecture. Putting language in relation to other components of synthetic intelligence. As for "real value" -- really? Isn't that saying that what you just wrote lacks real value...?

回复
Philippe Racicot

Data scientist at Beneva

1 年

Good ! We might need a break to catch on :)

回复
Riza C. Berkan, Ph.D

Artificial Intelligence, Physics, Nuclear Eng, Startups, Investments

1 年

Statistical Linguistics had already shown this up-down popularity curve in the past. We are reliving the same false prospect due to large tech corps who are marketing senselessly their cloud services.

Thomas W. Dinsmore

I write about machine learning tools and software.

1 年

Maybe somewhere in the middle

回复
Thomas Wiecki, PhD

Solving business problems using Bayes @ PyMC Labs

1 年

What I find very surprising is that only by modeling statistical regularities in language and text we get something that looks surprisingly intelligent. But it's indeed a good question of whether the intelligence which is achievable through modeling language is bounded.

要查看或添加评论,请登录

David Johnston的更多文章

  • Are we ready for the Long Winter?

    Are we ready for the Long Winter?

    Ah, my sweet summer child. You never care for the tales of the AI revolution, preferring instead the dark stories of AI…

    1 条评论
  • LLMs can pass math tests but can't do math

    LLMs can pass math tests but can't do math

    People who are marketing LLMs and want to impress others are spending a lot of time trying to get their LLM to perform…

    30 条评论
  • How to think about Large Language Models

    How to think about Large Language Models

    Large Language Models are truly amazing things. There is no denying the importance of this breakthrough.

    28 条评论
  • Why great developers should make great business executives

    Why great developers should make great business executives

    I've often thought that there should be reasons why great software developers should make great business executives. Of…

    1 条评论
  • Why mirrors confuse us

    Why mirrors confuse us

    People are often under the impression that mirrors swap left and right. But that seems weird when you think about it a…

  • Design of information systems in the age of AI

    Design of information systems in the age of AI

    Many enterprises are facing a very similar problem these days. This is the problem of how to use AI to open up a freer…

  • How to leverage Gen-AI in the enterprise and avoid the pitfalls

    How to leverage Gen-AI in the enterprise and avoid the pitfalls

    Every company appears eager to dive into the world of generative AI and implement it for an initial use case. While…

    2 条评论
  • Don't go to college

    Don't go to college

    If you’ve followed my writing you might have noticed a theme. I write about a wide variety of topics including some…

    13 条评论
  • Fed up with gerrymandering

    Fed up with gerrymandering

    A judge is overseeing a case about gerrymandering between the two main political parties in a State. Both parties, when…

  • My AI Writing Compendium

    My AI Writing Compendium

    I decided to make a collated list of the major AI articles I have written; 10 so far. If someone really wants to know…

    1 条评论

社区洞察

其他会员也浏览了