Transforming AI

Transforming AI

Historically, AI has always been a subject matter of research. If we look back at the origins, it essentially started with Alan Turing publishing "Computing, Machinery, and Intelligence." Since then, it has been a continuous field of research without necessarily having a link to application. Key innovations include Arthur Samuel developing the first successful program for checkers and Joseph Weizenbaum developing ELIZA, a program simulating a Rogerian psychotherapist, in 1966. Then, in 1973, Marvin Minsky published "Perceptrons," which popularized the use of neural networks for AI. That was the early stage of research development. This continued until the early 2000s, up until '97, where IBM's Deep Blue chess program defeated Gary Kasparov. In between, John McCarthy published the paper "The Turning Point in Artificial Intelligence," and so much more. But the point that we want to make is, historically, artificial intelligence has always been researched and not applied. It's only after 2000 that we can call it the basic, the chatbox era, beginning in the 2000s. And in 2000, the starting of agents started with the DARPA Grand Challenge being held for autonomous vehicles in 2001.

From 2001 to 2018, there were multiple innovations on the application front of AI. For example, if we go back to 2010, IBM developed Watson, a question-answering computer system, to compete on the show Jeopardy. That was unique to IBM's research, and similarly, following IBM, Apple released Siri in 2011. There was a transition from basic chatbots and research to conversational agents that required a combination of AI and machine learning. If we look back, we also find that Geoff Hinton, Ilya Sutskever who is now the COO of OpenAI, and Alex Krizhevsky published their paper "ImageNet Classification with Deep Convolutional Neural Networks," which essentially demonstrated the effectiveness of convolutional neural networks for image classifications. That was seminal in so many ways and also set the foundation for Google's DeepMind, released in 2015. And AlphaGo defeated Lee Sedol, the world champion of Go, in 2015 itself. This era saw a lot of application-oriented research, which ended with real-world applications, but still, many of these innovations stayed in the lab and required time.


[0.0074, 0.0030, -0.0105, 0.0742, 0.0765, -0.0011, 0.0265, 0.0106, 0.0191, 0.0038, -0.0468, -0.0212, 0.0091, 0.0030, -0.0563, -0.0396, -0.0998, -0.0796, …, 0.0002]

What you see here is the word "cat," and it has about 3000 of these numbers in an array, which is how the language model reads it. It is called a vector. It's essentially a combination of a lot of numbers and is one way to represent the word "cat." As words can be complex, these language models store these words in number sets in a space where there are more dimensions than the human mind can envision. So, for example, what the machine does is it takes words, it arranges them, it gives this word a meaning by giving it a set of numbers in a vector arrangement, then it arranges these vectors together. For example, "I," "we," "they" will be clubbed in one; "buy," "two" will be clubbed in one; "run," "walk," "swim," "go" these will be clubbed in one. Essentially, it tries to place a context by giving each a vector-specific number. And then it continues to club these words together in a jumble, in a bunch, across these dimensions so that it can give it context, and that's how it makes sense of a sentence. This model determines the vector placement and clusters based on a neural network that has been trained on heaps and heaps of languages. For example, this neural network has seen the entirety of Wikipedia, and it has come across "Barack" and "Obama" together. It doesn't know "Barack" and "Obama," but rather it has seen "Obama" getting followed by "Barack." And that's how it will club these words together. And whenever "Barack" comes, it will go to "Obama." And that's how it clusters these words together. And hence vectors act as good building blocks.

So, I would like to quote Ars Technica,

"If a language model learns something about a cat, for example, if it sometimes goes to the vet, the same thing is likely to be true for a kitten or a dog. If a model learns something about the relationship between Paris and France, for example, they share a language, there's a good chance that the same will be true for Berlin and Germany or for Rome or Italy or for Delhi and India."


Now that we understand a little bit about the vectors and how language models understand words, we need to understand that these language models were first introduced by Google in the seminal paper, "Attention is All You Need," where they introduced a form of transformer. What this transformer did was it took these vectors, vectors of words, and it transforms them into buckets of context by passing through a transformation layer from understanding the context that these words hold, that “Barack” comes before “Obama”. Context is headed by these transformation layers. For example, there could be many examples, but for instance, there is a word called "want" or "cash." "Cash" can both be a verb and a noun, can also be a noun and a verb. So these layers keep adding context to these words, which are represented to these algorithms, to this network as vectors. And that's how it comes, or it tries to predict the layers, the next word that will come after one. And the most powerful version up until 2020, the GPT-3, had 96 of these layers trying to find the next word. And these layers of transformation had about 12,288 dimensions. That a single word is represented by a list of 12,288 numbers, which is 20 times more than Google's word-to-vector scheme. This essentially—these dimensions give the transformer model extra space to understand context to predict a better word. For example, that we talked about Obama for Barack. And this essentially created an entire transformer model that changed the way AI is applied. And this led to a wave of applied AI, or rather applied gen AI companies that we are currently studying.



Lastly, if we study indifference curves and Leontief utility function, we usually see an explosion of a good due to innovation and dependence on its capital requirement. Over time, the good declines in value due to saturation and availability and its complements (this is where the utility function comes in) will see an increase in demand. The parallel that we want to draw here is that of foundational LLMs and how they will continue to pave the way for a better and cheaper AI infra leading to penetration and usage. If we were to look back , a perfect example of this would be the usage of the internet. As people began to browse the internet, the problem was search and discovery on the internet, which Google enabled. Hence, in the context of AI as well, we believe that following the transformer architecture, LLMs are now cheaper and faster to train, and the resulting models perform better. This combination, without compromising performance, enables us to foster better innovation across a variety of use cases across copywriting, legal aid, writing, meeting summaries, research, translation, medical aid, and drug discovery. This is where transformer technology could be applied for a significant , and these complementing categories are where we will need to build.


References:

  1. "Attention is All You Need" - Link
  2. Ars Technica - Link
  3. "The billion-dollar question: differentiating AI from SaaS" - Link
  4. "Will AI kill vertical SaaS?" - Link
  5. "AI-enabled SaaS vs. Moatless AI" - Link

Baba Prasad Nath

VC at WaterBridge Ventures | Ex- Ankur Capital, Lok Capital | Ashoka University

10 个月

要查看或添加评论,请登录

WaterBridge Ventures的更多文章

社区洞察

其他会员也浏览了