NVIDIA CEO Jensen Huang Keynote at COMPUTEX 2024

Have you ever lived as well, I suppose here. I received the degree from NTU.I gave the speech and today we have a lot to cover. So I can’t walk ,I must run. We have a lot to cover.

I'm very happy to be here in Taiwan. Taiwan is the home of our treasured partners. This is in fact, where everything that's begins, our partners and ourselves, take it to the world, Taiwan and our partnership has created the world's AI infrastructure.

Today ,what is happening in the meaning of the work that we do together? What is generative AI? What is its impact on our industry and on every industry?a blueprint for how to go forward and engage this incredible opportunity.And what's coming next.blueprints and what comes next? It's a really, really exciting time, a restart of our computing industry and industry that you have formed an industry that you have created, and now you're prepared for the next major journey. But before we start losing the intersection of computer graphics, simulations, and artificial intelligence. This is our soul. Everything I show you today is simulation. It's math and science. It's computer science. It's amazing if your architecture, none of its animated, and it's all homemade. This is your soul and we put onto this virtual world we call Omniverse. Please enjoy. I want to speak to you in Chinese. But I have so much to tell you. I have to think too hard to speak Chinese.So I have to speak to you in English. And the foundation of everything that you saw was two fundamental technologies, accelerated computing, and artificial intelligence running inside the omniverse.Those two technologies those two fundamental forces of computing are going to reshape the computer industry.

The computer industry is now 60 years old, in a lot of ways everything that we do today was invented the year after my birth in 1964. The IBM System 360 introduced central processing units, general purpose computing, the separation of hardware and software through an operating system.Multitasking IO subsystems, DMA all kinds of technologies that we use today, architectural compatibility, backwards compatibility, family compatibility, all of the things that we know today about computing largely described the 1964. Of course, the PC revolution democratized computing, and put it in the hands in the houses of everybody. And then, in 2007, the iPhone, introduced mobile computing and put the computer in our pocket ever since everything is connected and running all the time through the mobile cloud. This last six years, we saw several, just several, not that many, actually, two or three major technology shifts, two or three tectonic shifts in computing where everything changed, and we're about to see that happen again. There are two fundamental things that are happening. The first is that the processor the engine by which the computer industry runs on the central processing unit, the performance scaling has slowed tremendously, and yet, the amount of computation we have to do is still doubling very quickly, exponentially.

If processing requirement in the data that is that we need process continues to scale exponentially, but performance does not. We will experience computation inflation. And in fact, we're seeing that right now. As we speak. The amount of data center power that's used all over the world is growing quite substantially the cost of computing is growing, we are seeing computation inflation.

This of course cannot continue. But data is going to continue to increase exponentially, and CPU performance scaling will never return. There is a better way. For almost two decades now we've been working on accelerated computing, CUDA augments a CPU offloads and accelerates the work that a specialized processor can do much better. In fact, the performance is so extraordinary, that it is very clear now. As CPU scaling has slowed in the mentioned substantially stopped, we should accelerate everything. I predict that every application that is processing intensive will be accelerated and surely every data center will be accelerated in the near future. Nice already computing is very sensible. It's very common sense. If you take a look at an application, and here the 100 T means 100 units of time, it could be 100 seconds could be 100 hours. And in many cases, as you know, when I'm working on artificial intelligence applications that run for 100 days.

The ones he is code that is requires sequential processing. We're single threaded CPUs are really quite central operating systems control logic, really essential to have one instruction executed after another instruction. However, there are many algorithms. Computer Graphics is one that you can operate completely in parallel computer graphics. Image Processing, physics simulations, combinatorial optimizations, graph processing, database processing, and of course, the very famous linear algebra of deep learning. There are many types of algorithms that are very conducive to acceleration through parallel processing. So we invented an architecture to do that by adding the GPU to the CPU the specialized processor can take something that takes a great deal of time and accelerate it down to something that is incredibly fast. And because the two processors can work side by side, they're both autonomous and they're both separate independent, that is, we can accelerate what used to take 100 units of time down to one unit of time. Well, the speed up is incredible. It almost sounds unbelievable. It almost sounds unbelievable, but today, I will demonstrate many examples for you. The benefit is quite extraordinary 100 times speed up when you only increase the power by about a factor of three, and you increase the cost by only about 50%. We do this all the time in the PC industry. We add a GPU, a $500 GPU GeForce GPU to $1,000 PC and the performance increases tremendously. We do this in a data center, a billion dollar data center, we have $500 million with GPUs, and all of a sudden it becomes an AI.This is happening all over the world today. Well, the savings are quite extraordinary. You're getting 50 times performance per dollar 100 times speed up.You only increase your power by 3x 100 times speed up, you only increase your cost by 1.5x. The savings are incredible.

The savings are measured in dollars. And it's very clear that many many companies spend hundreds of millions of dollars processing data in the cloud. If it was accelerated, it is not unexpected that you could save hundreds of millions of dollars. Now why is that? Well, the reason for that is very clear. We've been experiencing inflation for so long, in general purpose computing, now that we finally came to you, we finally determined to accelerate there's an enormous amount of captured loss that we can now be gained a great deal of captured, retain waste that we can help relieve on the system. And that will translate to savings savings and money savings and that's the reason why you've heard me say the more you buy, the more you save.and now I've shown you the mathematics. It is not accurate but it is correct. Okay, that's called SEO math. SEO math is not accurate, but it is correct. The more you buy, the more you say, well, they celebrate computing does deliver extraordinary results, but it is not easy. Why is it that it saves so much money but people haven't done it for so long? The reason for that is because it's incredibly hard. There is no such thing as a software that you can just run through a C compiler. And all of a sudden, that application was 100 times faster. That is not even logical. If it was possible to do that. They would just change the CPU to do that. You in fact had to rewrite the software. That's the hardware. The software has to be completely rewritten so that you can refactor reexpress the algorithms that was written on a CPU so that it could be accelerated offload and accelerate and run in parallel, that computer science exercise is insanely hard. Well, we've made it easy for the world over the last 20 years, of course, the very famous Kuhnian and the Deep Learning Library, the process of neural networks. We have a library for AI physics that you can use for fluid dynamics and other applications where the neural network has to obey the laws of physics. We have a great new library called Aerial that has a CUDA accelerated 5g radio so that we can software defined and accelerate the telecommunications networks, the way that we saw software defined the world's networking, internet, so that the ability for us to accelerate that allows us to turn all the telephones into essentially the same platform, a computing platform, just like we have the cooler there was a copy from computational lithography platform that allows us to process the most computationally intensive parts of champion Patrick making the mascot TSMC is in the process of saving enormous amount of energy that Moore's enormous amounts of money, but the goal for TSMC is to accelerate their stack so they're prepared for even further advances in algorithm and more computation for deeper and deeper neural neurotransmitters. There are risks or gene sequencing library as the highest throughput library in the world for gene sequencing. Co Op is an incredible library for commentarial optimization, route planning optimization, the traveling salesman problem, incredibly complicated people with this evil scientist impartially included the unit of the quantum computer. To do that, we created an algorithm that was only celery computing, there was lightning fast 23 world records we hold every single major world record today. Quantum is an emulation system for quantum computers. If you want to design a quantum computer, you need a simulator to do so if you want to design quantum algorithms, you need a quantum emulator to do so. How would you do that? How would you design just create these quantum algorithms? If the quantum computer doesn't exist, when you use the fastest computer in the world that exists today? We call it of course, on that we have an emulator that simulates what appears and is used by several 100,000 researchers around the world. It was integrated into all the leading leading frameworks for quantum computing, and it's used in scientific supercomputer centers all around the world. KU DF is an unbelievable library for data processing. Data Processing consumes the vast majority of cloud spend today, all of us should be accelerated. Qu DF accelerates the major libraries to use and spark many of you probably use spark in your companies, pandas.

A new one called polar, and of course, network x, which is a graph processing, graph processing and database library. So these are just some examples. There are so many more each one of them has to be created, so that we can enable the ecosystem to take advantage of it's already computable. If we had to create it cudnn CUDA alone would have been able would have been possible for all of the deep learning scientists around the world to use because CUDA and the algorithms that are used in TensorFlow and pytorch the deep learning algorithms, the separation is too far apart. It's almost like trying to do computer graphics without OpenGL. It's almost like doing data processing. We don't see.

These domain specific libraries are really the treasure of our company. We have 350 of these libraries is what it takes, and what has made it possible for us to have such open so many markets. I'll show you some other examples. today. Well, just last week, Google announced that they put KU DF in the cloud, and accelerate pandas. Pandas is the most popular Data Science Library and many of you in here probably already use pandas. It's used by 10 million data scientists downloaded 170 million times each month. It is the Excel that is the spreadsheet of data scientists. Well, with just one click, you can now use pandas in cocoa Lab, which is Google's cloud data centers platform accelerated by CUDA. The speed up is really incredible, isn't it?That was a great demo, right? when you accelerate data processing that fast demos don't take long.

Okay, well, CUDA has now achieved what people call the tipping point. But it's even better than that CUDA has now achieved a virtuous cycle. This rarely happens. If you look at history and all the computing architectures computing platforms, in the case of microprocessors, CPUs, it has been here for 60 years and has not been changed for 60 years. At this level. This way of doing computing accelerated computing has been around has created a new platform is extremely hard because it's a chicken and egg problem. If there are no developers that use your platform, then of course, there'll be no users. But if there are no users there no installed base if they're no installed, these developers aren't interested in. developers want to write software for a large install base. But a large installed base requires a lot of applications so that users will create that installed base. This chicken and egg problem has rarely been broken. And it's taken us now 20 years one domain library after another one acceleration library and other and now we have 5 million developers around the world.

We serve every single industry from healthcare, financial services, of course the computer industry, automotive industry, just about every major industry in the world, just about every field of science because there's so many customers for architecture OEMs and cloud service providers are interested in building our systems, system makers, amazing system makers, like the ones here it's I wonder interested in building our systems, which then takes offers more systems to the market, which of course creates greater opportunity for us, which allows us to increase our scale r&d scale, which speeds up the application even more well every single time we speed up the application, the cost of computing goes down. This is that slide I was showing you earlier 100x Speed up translates to 97 96% 98% savings. So when we go from 100x Speed up to 200x Speed up to 1,000x speed up the savings the marginal cost of computing continues to fall. Well, of course, we believe that by reducing the cost of computing Incredibly, the market developer scientists, inventors will continue to discover new algorithms that consume more and more and more computing. So that one day something happens that a phase shift happens, but the marginal cost of computing is so low, that a new way of using computers emerge. In fact, that's what we're seeing now. Over the years, we've driven down the marginal cost of computing in the last 10 years in one particular algorithm by a million times. Well, as a result, it is now very logical, and very common sense to train large language models. With all of the data on the internet. Nobody thinks twice.

This idea that you could create a computer that could process so much data to write its own software. The emergence of artificial intelligence was made possible because of this complete belief that if we make computing cheaper and cheaper and cheaper, somebody's going to find a great use well today, Google has achieved a virtuous cycle, install base is growing computing costs is coming down, which causes more developers to come up with more ideas, which drives more demand.

And now we're in the beginning of something very, very important that before I show you that I'm gonna show you what is not possible, if not for the fact that we create a CUDA that we created the modern version general to the modern Big Bang of AI and generative AI, what I'm about to show you would not be possible. This is birth to the idea that we would create a digital twin of Earth that we would go and simulate the earth so that we could predict the future of our planet to better avert disasters were better understand the impact of climate change, so that we can adapt better so that we could change our habits now, this digital twin of Earth is probably one of the most ambitious projects that was ever undertaken, and we're taking step large steps every single year and I'll show you results every single year but this year we made some great breakthroughs. Let's take a look.Someday in the near future, we will have continuous weather prediction. And every at every square kilometer on the planet. You will always know what the climate is gonna be you will always know. And this will run continuously because we train the AI and the AI requires so little energy. And so this is just an incredible achievement. I hope you enjoyed it. Because of our dedication to continuously improve the performance of drive the cost now researchers discovered here researchers discovered CUDA in 2012 that was invidious first contact with AI. This was a very important day. We had the good wisdom to work with the scientists to make it possible for deep learning to happen. And Alex net achieved of course, a tremendous computer vision breakthrough but the point was that was to take a step back and understanding what was the background what is the foundation of deep learning? What is its long term impact, what is its potential? And we realized that this technology has great potential to scale and algorithm that was invented and discovered decades ago. All of a sudden because of more data, larger networks, and very importantly, a lot more compute. All of a sudden, deep learning was able to achieve what no human algorithm was able to imagine if we were to scale up the architecture even more larger networks, more data and more compute what can be possible. So we dedicated ourselves to reinvent everything after 2012 We changed the architecture of our GPS antenna course, we invented NV link. That was 10 years ago now., we bought Mellanox tensor RT LM to try two different circuits. And all of it came together on a brand new computer nobody understood. Nobody asked for it. Nobody understood it. And in fact, I was certain nobody wanted to buy it. And so we announced it at GDC and open AI a small company in San Francisco, and they asked me to deliver once I delivered the first DGX the world's first AI supercomputer to open AI in 2016. Well, now not about we continue to scale. From one AI supercomputer, one AI appliance. We scaled it up to large supercomputers even larger. By 2017. The world discovered transformers so that we could train enormous amounts of data and recognize and learn patterns that are sequential over large spans of time. It is now possible for it for us to train these models to understand and achieve a breakthrough in natural language understanding.And we kept going after that. We build even larger ones. And then in November 2022, trained on 1000s 10s of 1000s of NVIDIA GPUs and a very large AI supercomputer, open AI announced ChatGPT. 1 million users after five days, 1 million every five days 100 million after two months, the fastest growing application in history. And the reason for that is very simple. It was just so easy to use. And it was so magical for us to be able to interact with a computer like it's human instead of being clear about what you want. It's like the computer understand your meaning. It understands your intention.

I think that's the closest night market is you know, the night market is very important to me.

So when I was young, I was I think I was four and a half years old. I used to love going to the night market because I just love watching people and and so we went to my parents was in Texas and I work in the kitchen and and ?I love going and one day, my face you guys might might see that I have a large stormy face. My face was talking to somebody who's washing their knife and I was a little kid but my memories of the night market is so deep because of that and I used to love I just I still love going to the night market. And I just gonna tell you guys this night market is is really good because there's a lady. She's been working there for 43 years. She's the fruit lady and it's in the middle of district in the middle between the two. Go find her. Okay, she's really terrific.I think it'd be funny after this. All of you go to see her.She every year she's doing better and her car doesn't improve. I just love watching her succeed. Anyways, I tried GPT came along and something is very important in the slide.Here let me show you something.The fundamental differences this until Chechi Beatty revealed it to the world.AI was all about perception, natural language understanding computer vision, speech recognition. It's all about perception and detection. This was the first time the world saw a generative it produced tokens, one token at a time. And those tokens are words. Some of the tokens are forced to now be images, or charts or tables, songs, word speech videos. Those tokens can be anything, anything that that you can learn the meaning of it can be tokens of chemicals. tokens or proteins, genes. You saw earlier in or two we were generally tokens of the weather. We can understand we can learn physics. If you can learn physics, you can teach an AI model physics. The AR model can learn the meaning of physics and it can generate physics. We were scaling down to one kilometer not by using filtering it was generated.

And so we can use this method to generate tokens for almost anything.

Almost anything about you, we can generate steering wheel control for a car. We can generate articulation for a robotic arm.Everything that we can learn, we can now generate. We have now arrived, not at the AI era, but a generative do here. But what's really important is this.This computer that started out as a supercomputer has now evolved into a data center. And it produces what it produces tokens.It's an AI factory.This new factory is generating, creating, producing something of great value but new commodity in the late 1890s. Nikola Tesla invented an AC generator. We invented an AI generator. The AC generator generate electrons. The biggest AI generator generates tokens. Both of these things have large market opportunities. It's completely fungible in almost every industry. And that's why it's a new industrial revolution. We have now a new factory, producing a new commodity for every industry. That is extraordinary value. And the methodology for doing this is quite scalable and the methodology of doing this is quite repeatable. Notice how quickly so many different AI models, generative AI models are being invented, literally daily. Every single industry is piling on for the very first time. The IT industry which is $3 trillion $3 trillion. IT industry is about to create something that can directly serve $100 trillion of industry, no longer just an instrument for information storage or data processing, but a factory for generating intelligence for every industry. This is going to be a manufacturing industry, not a manufacturing industry of computers, but using the computers in manufacturing. This has never happened before. Quite an extraordinary thing. But let's start with accelerated computing led to AI led to generative AI in an industrial revolution. Now the impact to our industry is also quite significant.

Of course we could create a new commodity a new product we call tokens for many industries. But the impact of ours is also quite profound. For the very first time, as I was saying earlier, it's 60 years every single layer of computing has to change from CPUs general purpose computing to accelerated GPU computing, where the computer needs instructions. Now computers process Elos large language models AI models and whereas the computing model of the past is retrieval based almost every time you touch your phone, some pre recorded text or pre recorded image or pre recorded video is retrieved for you and re composed based on a recommender system to present it to you based on your habits. But in the future, your computer will generate as much as possible, retrieving what's necessary, but the reason for that is because German and German data requires less energy to go fetch information generated data also is more contextually relevant. It will encode knowledge, it will come your understanding of you. And instead of get that information for me or get that file for me, you just say ask me for an answer. And instead of a tool, instead of your computer being a tool that we use, the computer will now generate skills. It performs tasks and instead of an industry that is producing software, what was which was a revolutionary idea in the early 90s. Remember, the idea that Microsoft created for packaging software revolutionized the industry without packaged software, what would we use a PC to do?

It drove this industry and now we have a new factory, a new computer, and what we will run on top of this new types and we call it nips, video inference micro services and what happens is the near runs inside the factory and this is a pre trained model. It's an AI. What does AI is of course, quite complex in itself with the computing stack that once AIs are insanely complex. When you go and use chat up underneath their stack is a whole bunch of software. Underneath that Trump has a ton of software. And it's incredibly complex because the laws are large, billions to trillions of parameters. It doesn't want to just one computer multiple computers. It has to distribute the workload across multiple GPUs tensor parallelism, pipeline parallelism, data parallel parallelism.

Expert parallelism, all kinds of parallelism, distribute the workload across multiple GPUs processing it as fast as possible because if you were in a factory, if you run a factory, your throughput directly correlates to your revenues. Your throughput directly correlates to quality of service and your throughput directly correlates the number that you can use your service. We are now in a world where data center throughput utilization is vitally important. It was important in the past but not vitally important. It was important in the past, but people don't measure it today. Every parameter has measured start time uptime, utilization, throughput, idle time you name it, because it's a factory. When something's a factory, its operations directly correlate to the financial performance of the public. And so we realized that this is incredibly complex for most companies to do. So what we did was I created this AI in a box and the containers and inside this container is CUDA cudnn. Tensor RT. Triton for different services, and it's cloud native so that you can auto scale in a Kubernetes environment and as management services and Hudson that you can monitor your API's. It has common API's API so that you can literally chat with this once you download this nib. And you can talk to so long as you have queued up on your computer, which is now of course everywhere. It's in every cloud available to every computer maker is available and hundreds of millions of PCs. When you download this, you have an AI that you can chat with electronic all of the software is now integrated 400 dependencies, all integrated into one. We tested this, each one of these pre trained models against all kinds of our Turnstone base that's in the cloud all the versions of Pascal the amperes hoppers.Epson incredible invention. This is one of my favorites. And of course, as you know, we now have the ability to create multiple kinds. We have all of these various versions, whether it's language based or vision based or imaging based or we have versions that are available for healthcare, digital biology, we have persons that are digital humans.

The way he uses AI, and today we just posted up the Huggy Face, the monetary, fully optimized, it's available there for you to try. And you can even take it with you. It's available to you for free. So you can run it in the cloud, run it in any cloud. You can download this container, put it into your old data center, and you can host it make it available to your customers. We have, as I mentioned all kinds of different ways. There's some semantic retrieval called rags, visual languages, or customer Lingams. And the way that you use it is connecting these micro services into large applications. The most important applications in the coming future is customer service agents. Customer Service agents are necessary in just about every single industry that represents trillions of dollars of customer service.

Nurses are customer service agents in some ways and some of them are non prescription non diagnostics based nurses are essentially customer service, customer service where retail quick service foods financial services insurance, just 10s and 10s of millions of customer service can can now be augmented by lecture halls, augmented by so these what these boxes that you see are basically those some of the names of our reasoning agents given the task figured out what the mission is break it down into a plan, some of the names, retrieve information, some of the names might go into search, and some of the names might use a tool like COA that I was talking about earlier, could use a tool that could be running on SAP, so it has to learn a particular language called data. Maybe some names have to do SQL queries. So all of these names are experts that are now assembled as a team. So what's happening?Application Layer has been changed.What used to be applications written with instructions are now applications that are assembly teams, assembly teams of elites, very few people know how to write programs, almost everybody knows how to break down a problem and assemble teams. Every company I believe in the future will have a large collection of names. And you would bring down the experts that you want to connect them into a team.

You don't even have to figure out exactly how to connect them. You just give the mission to an agent when they have to figure out when to break the task down and who to give it to and then that aid that central leader of the of the application if you will the leader of a team will break down the task and give it to the various team members. The team members will do their perform their tasks, bring it back to the team leader, the team leader would reason about that and present that information back to you. Just like this is an argument future. This is the way applications go. Now of course, we can interact with these large with these AI services with textbooks and speech prompts. However, there are many applications where we would like to interact with what is otherwise a human life. We call them digital humans. Video has been working on digital human technology for some time. Let me share with you before I do that before digital humans has the potential of being a great interactive interactive agent with you to make a much more engaging, that can be much more empathetic Of course we have to cross this incredible chasm, uncanny chasm of realism so that the digital humans will appear much more natural. This is of course our vision. This is a vision of where we love to go. But Lucia brings me in Taiwan. Before I head out to the night market. Let's dive into some exciting frontiers of digital humans. Imagine a future where computers interact with us just like humans can. Hi, my name is Sophia and I'm a digital human brand ambassador Freeney. This is the incredible reality of digital humans. Digital humans will revolutionize industries, from customer service, to advertising and gaming. The possibilities for digital humans are endless using the scans you took up your current kitchen with your phone, it will be aI interior designers helping generate beautiful photorealistic suggestions and sourcing the materials and furniture we have generated several design options that you can choose from. They'll also be aI customer services, making the interaction more engaging and personal, or digital healthcare workers will check on patients providing timely, personalized care. I did forget to mention to the doctor that I am allergic to penicillin, so it's still okay to take medications, the antibiotics you've been prescribed Ciprofloxacin and metronidazole they'll contain penicillin, so it's perfectly safe for you to take them. And they'll even be the brand ambassadors setting the next marketing and advertising trends.Japan's first virtual model new breakthroughs in generative AI and computer graphics, let digital humans see, understand and interact with us and human lightweights.From what I can see, it's like you're in some kind of recording or production setup. The foundation of digital humans are AI models built on multilingual speech recognition and synthesis and LLS that understand and generate conversation will be open to that issue. IoT AI is connected with another generative AI to dynamically animate a lifelike 3d mesh of a face. And finally, AI models that reproduce lifelike appearances, enabling real time path tracing subsurface scattering to simulate the way light penetrates the skin scatters and exits at various points. Getting skin and soft and translucent appearance in video is a suite of digital human technologists packaged as easy to deploy fully optimized micro services or NIMS. Developers can integrate ace nose into their existing frameworks, engines and digital human experiences. Nemo tried SLM, and LM news to understand our intent and orchestrate other models. Riva speech news for interactive speech and translation, audio interface and gesture news for facial and body animation and Omniverse RTX with dlss for neural rendering of skin and hair. These names run on Nvidia GDN, a global network of Nvidia accelerated infrastructure that delivers low latency digital human processing to over 100 regions.Pretty incredible. Well, those did those ace runs in the cloud, but it also runs on PCs. We had the good wisdom of including tensor core GPUs and all of our TX so we've been shipping AI GPUs for some time, preparing ourselves for this day. The reason for that is very simple. We always knew that in order to create a new computing platform, you need install base first. Eventually the application will come if you don't create the install base, how good the application and so if you build it, they might not come. But if you build it if you don't build it, they cannot come. And so we installed every single RTX GPU with tensor board, tensor core processing, and now we have 100 million GeForce RTX AI PCs in the world and we're shipping 200 and this this Computex, we're featuring four new amazing laptops, all of them are able to run on your future laptop your future PC will be coming on. You'll be constantly helping you assisting you in the background. The PC will also run applications that are enhanced by of course, all your photo editing, your writing and your tools and all the things that you use will all be enhanced by you. And your PC will also post applications with digital humans that are AIs. And so there are different ways that they manifest themselves and become used in PCs, but PCs will pick up very important. And so when we go from here, I spoke earlier about the scaling of our data centers and every single time we scale we found a new phase change when we scaled from DGX into large AI supercomputers, we enabled transformers to be able to train on enormously large data sets. Well what happened was, in the beginning, the data was human supervised. It required human labeling to train the eyes. Unfortunately, there's only so much you can human label. Transformers made it possible for unsupervised learning to happen. Now, Transformers just look at an enormous amount of data or look at enormous amount of video and images. And it can learn from studying an enormous amount of data find the patterns of relationships itself. Well, the next generation of AI needs to be physically based. Most of the AIS today don't understand the laws of physics, is that grounded in the physical world? In order for us to generate images and videos and 3d graphics, and many physics phenomenons. We need ais that are physically based and understand the laws of physics. Well, the way that you can do that is of course, learning from video is one source. Another way is synthetic data, simulation data. And another way is using computers to learn with each other. This is really no different than using AlphaGo. Having AlphaGo play itself self play. And between the two capabilities, cotton sync capabilities played each other for a very long time. They emerge even smarter. So you're gonna start to see this type of AI emerge. Well, in the AI data is synthetically generated and using reinforcement learning, it stands to reason that the rate of degeneration will continue to advance and every single time data generation grows, the amount of computation that we have to offer needs to grow with. We are about to enter a phase where the AIS can learn the laws of physics and understand and be grounded and visible. So we expect that models will continue to grow and we need larger GPUs or Blackwell was designed for this generation. This is Blackwell and I several very important technologies. One of course, is just the size of the chip. We took two of the largest a chip that is as large as you can make it at TSMC and we connected two of them together with a 10 terabytes per second link between the world's most advanced Surtees connecting these two together. We then put two of them on a computer note connected with a gray CPU race if you're gonna use for several things, the training situation they could use it can be used for fast checkpointing and restart. In the case of inference and generation it can be used for story context memory so that the AI has memory and understands the context of the conversation is our second generation transformer engine transforming engine allows us to adapt dynamically to a lower precision based on the precision and range necessary for that layer of computation. This is our second generation GPU that has secure AI so that you can you can ask your service providers to protect your AI from being either stolen from theft or tampering. This is our fifth generation NV link and be like allows us to connect multiple GPUs together and I'll show you more of that in a second. And this is also our first generation with a reliability and availability engine. This system is RASS that allows us to test every single transistor flip flop memory on chip memory on chip. So that we can in the field determine whether a particular chip is failing the MTBF the mean time between failure of a supercomputer with 10,000 GPUs is measured in hours. The mean time between failure of a supercomputer with 100,000 GPUs is measured in minutes. And so the ability for a supercomputer to run for a long period of time and train a model that can last for several months is practically impossible. If we don't invent technologies to enhance its Reliability. Reliability would of course enhances uptime, which directly affects the cost. And the lastly decompression engine data processing is one of the most important things we have to do. We added a data compression engine decompression engine so that we can pull data out of storage 20 times faster than what's possible today. Well all of this represents like Well, I think we have one here that's in production. During GTC. I showed you black wealth in the prototype state the other side this is why we practice.Blackwell was in production incredible amounts of technology.This is our production order. This is the most complex high performance computer the world's ever made.This is a great CPU. These are you can see each one of these Blackwell dies two of them connected together. You see that? It has the largest died with the largest chip the world makes and then we connect to over together with a 10 terabyte per second link that makes it the white computer and the performance is incredible.

So you see our computational flops the AI flops for each generation has increased by 1000 times in eight years. Moore's law in eight years is something along the lines of, I don't know, maybe 4060 in the last eight years of Moore's Law has gone a lot less. And so just to compare even Moore's law, it's best of times to would like to do so the amount of computation is incredible. And when whenever we bring the computation high, the thing that happens is the cost goes down. And I'll show you what we've done is we've increased through its computational capability, the energy used to train a gKt for 2 trillion parameter 8 trillion tokens. The amount of energy that is used, has gone down by 350 times, well, Pascal would have taken 1000 gigawatt hours. 1000 gigawatt hours means that it would take a gigawatt data center, the world doesn't have a good one data center. But if you had a gigawatt data center, it would take a month. If you had 100 100 megawatt data center, it would take about a year and so nobody would of course create such a thing. And that's the reason why these microphones change anything was possible only eight years ago. Why is driving down the increased performance and energy efficient bookkeeping, including energy efficient efficiency along the way, we've now taken with Blackwell, what used to be 1000 gigawatt hours to three and incredible three gigawatt hours if it's a if it's a 10,000 GPUs for example, when we take 10,000 GPUs, I guess we'll take a few days, two days.So the amount of advance in just eight years. Well, this is for inference. This is for token generation. Our token generation performance has made it possible for us to dry the energy down by 340 5000 times 17,000 joules per token. That was Pascal 70,000 joules. It's kind of like two light bulbs running for two days. It would take two light bulbs running for two days amounts of energy to one Watts running for two days. To generate one token for GBT.

It takes about three tokens to generate one word.

So the amount of energy used necessary for Pascal to generate up to four and have a chat up experience was practically impossible. But now we only use 0.4 joules per token, and we can generate tokens at incredible rates and very little energy okay, so quite well is just an enormous leap. Well, even so it's not big enough. So we have to build even larger machines. So the way that we build it is called the chips. So this is this is our Blackwell chips, and it goes into DGX systems.So this is a DGX Blackwell just has this this airport has eight of these GPUs inside. Look at the size of the six on these GPUs about 15 kilowatts 20,000 Watts completely Airport. This version supports x86 And it's it goes into the infrastructure that we mentioned in poppers into however, if you would like to have liquid cooling, we have a new system. And this new system is based on this board and we call it MGS are modular and this modular system you won't get to see this. Can they see us? Can you see this? Okay, are you okay?

So this is the NG Xs. And here's the two black white board so this one node has poor black will choose these four black wall chips. This is item number 72. All of these views are then connected together with a new NV link to simulate switch the generation where the heavy light switch because the technology Gurgel This is the most advanced switch that was made, the data rate is insane. And these switches connect every single one of these black balls to each other. So that we have one giant 72 GPU. Well, the benefit the benefit of this is that in one domain once you get this nailed looks like once you do this once you give us 72 versus the last year we should update. So we increase it by nine times the amount of bandwidth we've increased. The AI slots will decrease by five times and yet the amount of power is only 10 to 100 kilowatts, and that is 10 to one and that's the one of course well, you can always connect these together and I'll show you how to do that in a second. But what's the miracle is this church is going to be lynched. People are starting to awaken to the importance of this mm hm as it connects us together. Because the horsemen responses are so large it doesn't fit on just what you view doesn't fit on what just just went. It's gonna take the entire rap videos like this new DGX that I was just standing next to a large microphone that are 10s of trillions of parameters, maybe like switching in cells that technology miracles 50 billion transistors, 74 ports at four gigabits each for links cross sectional bandwidth of 77.2 terabytes per second, but what are the important things is that it has mathematics inside the switch so that we can do reductions, which is really important in deep learning, right. So this is what this is what a DGX looks like now, and a lot of people ask us, you know, they say, there's this there's this confusion about what Nvidia does and and how is it possible that the NVIDIA became so big deal with GPUs?

And so there's an impression that this is what a GPU looks like. This is a GPU This is one of the most advanced use cases, and there's a game or TV, but you and I know that this is what a GPU looks like. There's one GPU ladies and gentlemen DGX up the back of this GPU is the NV link spine. The NV link spine is 5000 wires, two miles and it's right here.This is an NV link spine.And it connects 72 GPUs to each other.

This is the electrical and mechanical miracle. The transceivers makes it possible for us to drive the entire length in copper. And as a result, the switch to me switching the light switch driving the ambulance in copper makes it possible for us to save 20 kilowatts in one rack. 20 kilowatts can now be used for processing does an incredible machine. So this is the spine even this is not big enough.This is not big enough for a factory so we have to connect it all together with very high speed networking. Well, we have two types of networking we have a fit a band which has been used in supercomputing and AI factories around the world, and it is growing incredibly fast for us. However, not every data center can handle InfiniBand because they've already invested their ecosystem in Ethernet for too long, but it does take some specialty and some expertise to manage InfiniBand switches. So what we've done is we've brought the capabilities of InfiniBand to the Ethernet architecture, which is incredibly hard. And the reason for that is this Ethernet was designed for high average throughput because every single node every single peer is connected to a different person on the internet. And most of the communications is the data center with somebody on the other side of the unit. However, deep learning and AI factories that GPUs are not communicating with people on the internet, mostly it's communicating with each other.They're communicating with each other because they're all cut. They're collecting partial products, and they have to reduce it and then redistributed chunks of partial products. Reduction redistribution. That traffic is incredibly bursty. And it is not the average throughput that matters. It's the last arrival that matters. Because if you're reducing collecting partial products from everybody, if I'm trying to take all of your data on this one, I want to document social membership.So it's not the average throughput is whoever gives me the answer last.Ethernet has no provision for them. So there's several things that we have created. We created an end to end architecture so that the NIC and the switch can be applied for different technologies to make this possible to be as the world's most advanced so now we have the ability to network level RDMA Ethernet that is number two, we have congestion control. The switch does telemetry at all times incredibly fast. And whenever the GPUs or the Knicks are sending too much information, we can deal with the back off so that it doesn't create hotspots. Number three adaptive routing Ethernet needs to transmit and receive good work. We see adjustments or we see a ports that are not currently being used, irrespective of the ordering. We will send it to the available ports and Bluefield on the other end. reorders it so then it comes back that adaptive routing incredibly powerful and lastly, noise isolation. There's more than one mung bean trade or something happening with external toys and their noise, their traffic and get into each other causes. And so when when the noise of one trading model when monitoring causes the last arrival to end up too late, it really slows down the trade. Well, overall, remember, you guys have built a $5 billion or $3 billion data set and you're using this for training. If the utilization network utilization was 40% lower, and as a result, the training time was 20% $5 billion data center is effectively like a $6 billion deficit. So the cost is incredible. The cost impact is quite on Ethernet, what spectramax basically allows us to improve the performance and so much of the network is basically free. So this is really quite cheap. We're very, we have a whole pipeline of different products behind us there's a spectrum X 800 is a 1.2 terabits per second and 276 radix the next one coming is white one radix is what your micro arrays, that's called spectrum X 801 After that is x 16. But the important idea is this x 800 is designed for 10s of 1000s 10s of 1000s of GPUs, X acre and Ultra is designed for hundreds of 1000s of GPUs, and X 1600 is designed for millions of computers. The days of millions of GPU data centers are coming. And the reason for that is very simple. Of course, we want to train much larger models. But very importantly, in the future, almost every interaction you have with the internet or with a computer will likely have a generative AI running in the cloud.

And that generative AI is working with you interacting with you generating videos or images or text with any additional human. So you're interacting with your computer almost all the time. There's always a generative AI and some of it is on private, some of it is on your device.

These generative AIS will also do a lot of reasoning. Instead of just one shot answers, they might iterate on answers that improve the quality of the answer before they give it. And so the amount of generation we're gonna do in the future is going to be extraordinary. Let's take a look at all of this put together. Now tonight. This is our first nighttime genome.I want to thank all of you for coming out tonight at seven o'clock. And so what I'm about to show you has a new vibe. Okay, there's a new vibe. This is kind of the nighttime Keno vibe so enjoy this

okay.Sailing the ship.I think that style of keynote has never been done in Computex ever might be the last only a video can pull off that we can do that. Of course, is the first generation and the platforms that was launched at the beginning of it right as the world knows the generative AI era is here. Just as the world realized the importance of AI factories just as the beginning of this new industrial revolution. We have so much support nearly every OEM every computer maker every CSP every GPU, cloud, sovereign clouds, even telecommunication companies, enterprises all over the world. The amount of success the amount of adoption, the amount of enthusiasm from Michael was just really off the charts and I want to pick your brain.We're not stopping there. During this during the time of this incredible growth. I want to make sure that we continue to enhance performance continue to drive down cost concentrating cost of importance, and continue to scale well AI capabilities for every company to embrace the further we the further performance we drive up. The greater the cost of the client Hopper platform, of course, was the most successful data center processor, probably in history. And this is just an incredible, incredible success story. However, like well is here and every single platform as you'll notice are several things you have to see that you have a GPU give me like, give me like switch connects all of the GPUs together as large domain as we can. And whenever we can do large, very large or very high speed switches. Every single generation that is not just a GPU, but his entire platform. We build the entire platform. We integrate the entire platform is meant to factory supercomputer, however, then we just aggregate it and offer to the world. And the reason for that is because all of you can create interesting and innovative configurations and all kinds of different styles and fit different data centers and their customers in different places on the edge. So from a telco all the different innovation are possible. If it wouldn't be made the system open and make it possible. We're going to innovate. So we designed it integrated, but we offered to do this so that you can create modular systems, blackball platforms here. Our company is on one year with our basic philosophy is very simple, 1 million targets that are scaled disaggregated and sell to you in parts on a one year rhythm. And we pushing everything to technology limits, whatever TSMC process technology will push it to the absolute limits, whatever packaging technology push it to the ISO limits whatever vendor technology question to ask what is service technology, optics, technology, everything is pushed to the limit.

And then after that, do everything in such a way so that all of our software once we have this entire Intel based software condition is the single wasn't one thing that occurs when a computer is backwards compatible is architecturally compatible with all the software that has already been created. Your ability to go to market is so much faster. And so the velocity is incredible when we can take advantage of the entirety of the software has already been created. Well, Mike Wallace here next year. Is Blackwell Ultra. Just as we had each 100 H 200. You'll probably see some pretty exciting new generation from us for black white Ultra against again, push the limits and the next generation spectrum switches I mentioned. Well this is the very first time that this next click has been made.And I'm not sure yet I'm gonna regret this.We have coatings in our company. And I've tried to keep them very secret. Oftentimes, mostly, but our next generation platform is called the ribbon platform. And we're gonna platform I'm not gonna spend much time on it. I know what's gonna happen. You can take pictures of it and look at the fine prints. Feel free to do that. So we have the ribbon platform and one year later we have to Rubin Ultra platform, all of these chips that I'm showing you here are only 200 minimum and the rhythm is one year and the limits of technology all Hermsen architecture. So this is this is basically what it is and all the features of software. So in a lot of ways the last 12 years from that moment of ImageNet and I was realizing that the future of computing was gonna radically change today is really exactly as I was holding up earlier GeForce pre 2012 and video, the company has really transformed tremendously. I want to thank all of our partners here for supporting us every step along the way. Let me talk about what's next.

The next wave of AI is physical AI. AI doesn't understand the laws of physics AI that can work among us. So they have to understand the world model so that they understand how to interpret the world, how to proceed. They have to of course, have excellent cognitive capabilities, so they can understand us understand that what we ask them and perform the tasks in the future.But why is this much more pervasive idea? Of course, when I say robotics, there's a humanoid robotics. That's usually the representation of it. But that's not at all true. Everything is going to be all of the factories will be remodeling. The factories will orchestrate robots. And those robots will be building products that are like robots, interacting with robots and building products.Well, in order for us to do that, because basically breakthroughs and luxury in the area of robotics one day, everything that moves will be autonomous.

Researchers and companies around the world are developing robots powered by physical AI physical AIS, are models that can understand instructions, and autonomously perform complex tasks in the real world.Multimodal elements are breakthroughs that enable robots to learn, perceive and understand the world around them, and plan how they will act and from human demonstrations, robots can now learn the skills required to interact with the world using gross and fine motor skills. One of the integral technologies for advanced robotics is reinforcement learning, just as LLM D RL agent for reinforcement learning for human feedback to learn particular skills. Generative physical AI can learn skills using reinforcement learning from physics feedback in a simulated world. These simulation environments are where robots learn to make decisions. By performing actions in a virtual world that obeys the laws of physics.

And these robot Jeff's a robot can learn to perform complex and dynamic tasks safely and quickly refining their skills through millions of acts of trial and error. We built the omniverse operating system where physical AI is to be created on diversity as a development platform for virtual worlds simulation, combining real time physically based random physics, simulation, and generative AI technologies. In Underverse, robots can learn how to be robots. They learn how to autonomously manipulate objects, such as grasping and handling objects. or navigate environments autonomously finding optimal paths while avoiding obstacles and hazards.

?

Learning and Omniverse minimizes the sim to real gap and maximizes the transfer of learned behavior. building robots with generative physical AI requires three computers and video AI supercomputers to train the models. Immediate Jetson board and next generation Jetson board robotics supercomputer to run the models, and a video on diversity where robots can learn and refine their skills in simulated worlds.We build the platform's acceleration libraries, and AI models needed by developers and companies and allow them to use all of the stacks that suit them best. The next wave of AI robotics powered by physical AI will revolutionize industry.This isn't the future this is happening now.There's several ways that we're gonna serve market. The first we're going to create platforms for each type of robotic systems, one for robotic factories and warehouses, one for robots that manipulate things, one for robots that move and one for robots that are human. And so each one of these robotic robotics platform is like almost everything else. We do think pure acceleration libraries and pregnant moms, computers acceleration libraries, preaching moms. And we test everything. We train everything, and integrate everything inside Omniverse where Omniverse is, as the Buddha was saying, where robots learn how to draw lines. Of course, the ecosystem of robotic warehouses is really, really complex. It takes a lot of companies a lot of tools, a lot of technology to build a modern warehouse. warehouses are increasingly robotic. One of these days will be for the robot and so in each one of these ecosystems, we have SDKs API's that are connected to software industry SDKs API's connected into Edge AI industry companies, and also forest systems that are designed for PLCs robotic systems for human use. Is that integrated by integrators created for ultimately building warehouses? For customers. Here we have an example of Ken Mac and building a robotic warehouse for a giant giant room.And then here now let's talk about factories. Factories has a completely different ecosystem. And Foxconn is building some of the world's most advanced factories. Their ecosystem, again, Edge Computers in robotics, software for designing batteries, the workflows, programming robots, and of course voc computers that orchestrate the digital factories of the doctors. We have that the case now they're connected into each one of these ecosystems as well. This is happening all over Taiwan. Foxconn has built this building digital twins of their factories. Delta is building digital twins of their factories. What we have is real happiness. Digital cameras offers Pegatron just building digital twins of their robotic factories, was trying is building digital twins of their robotic factories. And this is really cool. This is a video of Foxconn new factory.

Demand for a video and accelerated computing is skyrocket as the world modernizes traditional data centers into Jerison AI backwards.Foxconn, the world's largest electronics manufacturer, is gearing up to meet this demand by building robotic factories with Nvidia Omniverse in AI factory planners use Omniverse to integrate facility and equipment data from leading industry applications like Siemens Teamcenter X, Autodesk Revit, and digital twin. They optimize floor layout in line configurations and locate optimal camera placements to monitor future operations within video Metropolis power vision AI virtual integration saves planners on the enormous cost of physical change orders.During construction, the Foxconn teams use the digital twin as a source of truth to communicate and validate accurate equipment layout.

The Omniverse digital twin is also the robot gym, where Foxconn developers train and test the video Isaac AI applications for robotic perception and manipulation can Metropolis AI applications for sensor fusion Omniverse Muscat simulates two robot AIs are deployed runtimes to Jetson computers on the assembly line. They simulate Isaac manipulator libraries and AI models for automated optical inspection for object identification, defect detection and trajectory planning to transfer HDX systems to the test bots. They simulate Isaac Perceptor powered by robot named Mars as they proceed and move about their environment with 3d mapping and reconstruction.

With Omniverse last time builds the robotic factories that orchestrate robots running on video Isaac to build video AI supercomputers which in turn to train boss contributions.So, robotics actually is designed with three computers to train the AI on a video AI.The robot running on the PLC systems are orchestrated the factories and then of course, simulate everything inside while the robotic arm and the robotic AMRs are also the same way three computer systems, the differences the 200 versus will come together. So they'll share one virtual space. When they share one virtual space. That robotic arm will become inside the robotic faction again 333 computers and we provide the computer the acceleration layers and pre train for training. We connected the video manipulator and Nvidia Omniverse with Siemens, the world's leading industrial automation software systems. This is really a fantastic partnership and they're working on factories all over the world. Symmetric AI now integrates Isaac manipulator. And synthetic AI runs operates maybe Cuca Yaskawa, but not universal robotics. And so Siemens is a fantastic integration. We have all kinds of other integrations .arc best is integrating Isaac receptor into Fox smart autonomy robots for enhanced object recognition. And human motion tracking and material handling.BYD electronics is integrating Isaac manipulated and Preceptor into their AI robots to enhance manufacturing efficiencies for both customers.

Ideal works he's building Isaac perceptive into their IW OS software for AI robots in factory logistics. Intrinsic now for that company is adopting Isaac manipulate into their flowstate platform to advance robot grasping, getting integrating Isaac receptors to train AI powered forklifts to advance AI enabled logistics. Cargo robotics is adopting Isaac receptor into perception engine for advanced vision based AMRs. Solomon is using Isaac manipulator AI models in their acupunct 3d software for industrial manipulation. Techwin robot is adopting Isaac septem manipulator into Tm, accelerating automated optical inspection. Teradyne robotics is integrating Isaac manipulate the Polyscope X for KOBAS and Isaac perception into mere AMRs mentioned he's integrating Isaac manipulator. into machine logic for AI manipulation robots.

Robotics is here. Physical AI is here. This is not science fiction. And it's being used all over Taiwan. It just really, really exciting. And that's the factory, the robots inside and of course, all the products going to robotics. There are two very high volume robotics products. One, of course, is the self driving car or cars that have critical autonomous capability and video again builds the entire stack. Next year we're gonna go to production with the Mercedes fleet and after that 2016 The Jr. We offered the full stack. However, you're welcome to take whichever parts which 11 Whichever layer more stacks.

the next high volume provides product that's what you're manufacturing robotic factories with robots inside.And this has great progress in recent years in both the cognitive capability foundation models and also the world understand capability that we're interested. I'm really excited about this area because obviously the easiest robot to adapt to the world are humanoid robots because we also have the fastest most amount of data to train than other types of robots because we have the same so the amount of training data we can provide the demonstration capabilities of video capabilities. So what is your progress in this area? Well, I think we have some robots I would like to welcome.We got some friends to join us. So the future the future of robot robotics is here, the next wave of AI and and of course, you know, Taiwan builds computers with keyboards, build computers. For your pocket. You build computers for data centers to the cloud. In the future, you're gonna build computers that walk and computers that roll.And so these are all just appears. And as it turns out, that the technology is pretty similar to the technology of building all of the computers that you already know today. So this is a really extraordinary journey for us.I want to thank I want to I make one last video if you don't mind. Something that that we really enjoyed making this one

I want to move on just because you're you know you're sitting close to each other.I want to ensure traces security.Thank you.I love you guys. Thank you all for coming. Have a great Computex.

Hrijul Dey

AI Engineer| LLM Specialist| Python Developer|Tech Blogger

6 个月

Elevate your accounting prowess with our handpicked Top 10 AI Tools! Embrace automation, banish errors, and unlock data-driven insights. The future of accounting is here – click to find out more! https://www.artificialintelligenceupdate.com/top-10-ai-tools-for-accountants/riju/ #learnmore #AITools #AccountingFuture

回复
Hrijul Dey

AI Engineer| LLM Specialist| Python Developer|Tech Blogger

6 个月

Elevate your accounting prowess with our handpicked Top 10 AI Tools! Embrace automation, banish errors, and unlock data-driven insights. The future of accounting is here – click to find out more! https://www.artificialintelligenceupdate.com/top-10-ai-tools-for-accountants/riju/ #learnmore #AITools #AccountingFuture

回复

要查看或添加评论,请登录

dayang shi的更多文章

社区洞察

其他会员也浏览了