The Technical Risks to NVIDIA's MarCap are Fundamental
Jen Zhu Scott
Multifamily | Founding Partner | TED Speaker | Board Chair | Perpetual Student of Neuroscience, AI, Math, and History
NVIDIA's astonishing ascend to a recent $2 trillion market cap makes it one of the most valuable companies in history. However, certain upcoming technical risks to the company might not be obvious to everyone but fundamental.
NVIDIA's market cap (and revenue) rise is mainly attributed to its success in the AI industry, which now impacts every other industry. The company's parallel processing capabilities, supported by thousands of computing cores, have contributed to its success in the GPU market. NVIDIA's focus on high-performance computing, gaming, and virtual reality platforms has also helped drive its market value.
Suppose the current trajectory of AI development (driven by Generative AI based on Large Models) continues. In that case, NVIDIA will remain one of the most valuable companies, if not the most valuable, in the world. In other words, NVIDIA wins as long as absolute capacity to computing power is what is running the world. The assumption ignores the other side of the equation: efficiency. I share my arguments below. This article aims to spark healthy debates, inspire more good questions, and not serve as investment advice.
I. LARGE MODELS ARE NOT NECESSARILY THE FUTURE OF AI?
GenAI based on LLMs is an incredible breakthrough in AI. The 'largeness', since the monumental 2017 paper "Attention is All You Need" that conceived Generative AI, has resulted from the enormous amount of data, computing power, algorithms, capital, and talents. It's a human engineering marvel that reduced human-machine interaction to natural language for the first time, hence OpenAI's ChapGPT's historical pace of user generation. But I don't believe the 'largeness' is the future. In fact, I prefer it is not.
It is simple: we don't get on Mars by building taller and taller buildings on Earth - Mars being general purpose AI, or AGI. Currently, an LLM trained with several hundred billion parameters would qualify it in a tiny minority club globally in absolute capacity. SambaNova System, an enterprise GenAI company based in Palo Alto, just announced their 1 trillion parameter LLM. Impressive - yet another taller building to reach to Mars. This planet's most complex and efficient general intelligence still operates inside your skull. A human brain, on average, processes 100 trillion parameters using 30-watt, only enough to power an average incandescent light bulb. Training a large language model like GPT-3, for example, is estimated to use just under 1,300 megawatt hours (MWh) of electricity, about as much power as is consumed annually by 130 US homes. In context, streaming an hour of Netflix requires around 0.8 kWh (0.0008 MWh) of electricity. That means you’d have to watch 1,625,000 hours to consume the same power to train GPT-3.
The inefficiency is not only measured by the absolute consumption. The LLMs are still black boxes to us because we can't tell specifically which set of data really made a difference to allow the algorithm to 'understand', reason, and generate. So, the current approach is to include more and more data and train on more and more parameters, which means more and more AI chips from NVIDIA.
II. THE NEXT AI FRONTIER IS ALL ABOUT EFFICIENCY
Pedro Domingo, a Professor Emeritus of Computer Science & Engineering at the University of Washington and the author of The Master Algorithm, recently published an important paper, "Every Model Learned by Gradient Descent Is Approximately a Kernel Machine" that everyone who is half fluent in the language of mathematics should read carefully.
In a nutshell, solving the efficiency problem isn't as farfetched as it seems. Despite its many successes, deep learning remains poorly understood. In contrast, kernel machines are based on a well-developed mathematical theory, but their empirical performance generally lags behind deep networks. Gradient descent is the standard algorithm for learning deep networks and many other models. The paper shows that every model learned by this method, regardless of architecture, is approximately equivalent to a kernel machine. This kernel measures the similarity of the model at two data points in the neighborhood of the path taken by the model parameters during learning. Kernel machines store a subset of the training data points and match them to the query using the kernel. Deep network weights can thus be seen as a superposition of the training data points in the kernel’s feature space, enabling their efficient storage and matching. Such results have significant implications for boosting algorithms, probabilistic graphical models, and convex optimization.
领英推荐
If the math in the paper is too thick to go through, you would then only need to understand this paragraph:
"Most significantly, however, learning path kernel machines via gradient descent largely overcomes the scalability bottlenecks that have long limited the applicability of kernel methods to large data sets. Computing and storing the Gram matrix at learning time, with its quadratic cost in the number of examples, is no longer required. (The Gram matrix is the matrix of applications of the kernel to all pairs of training examples.) Separately storing and matching (a subset of) the training examples at query time is also no longer necessary, since they are effectively all stored and matched simultaneously via their superposition in the model parameters. The storage space and matching time are independent of the number of examples. (Interestingly, superposition has been hypothesized to play a key role in combatting the combinatorial explosion in visual cognition (Arathorn, 2002), and is also 8 Deep Networks Are Kernel Machines essential to the efficiency of quantum computing (Nielsen and Chuang, 2000) and radio communication (Carlson and Grilly, 2009).) Further, the same specialized hardware that has given deep learning a decisive edge in scaling up to large data (Raina et al., 2009) can now be used for kernel machines as well."
Pedro is not the only one marching ahead to solve the efficiency problem. Jeff Hawkins, the inventor of Palmtop and author of On Intelligence and A Thousand Brains (one of my favorite books of all time, and I had the honor to help spread the word), has studied human brains his entire career. Jeff's AI application/research institute, Numenta, has been publishing its research on sparsity since 2021.
Less data, less computing power, better results.
There are external factors that will drive and accelerate the efficiency pursuit in addition to the obvious financial potential:
Higher efficiency, less chips.
III. DECENTRALISED GPUS
Billions of GPUs are sitting idle in this world. The GPUs in your smartphones and laptops are not fully utilized. OTOY is the special effect tech company that enabled movies like Curious Case of Benjamin Button, Bladerunner 2049, and Star Wars. Humans perceive high-quality special effects, VR/AR, as realistic because every shadow, reflection, and movement is individually calculated. The team at OTOY constantly ran into capacity constraints. In 2017, the brains behind OTOY started The Render Network to use blockchain to enable idle GPU marketplace to utilize the untapped computing power. I was on the initial Advisory Board with JJ Abrams, Ari Emanuel, and Beeple. The Render project has come a long way, though it's still not mainstream. The idle GPUs are a giant goldmine representing enough incentives to attract mainstream solutions sooner or later.
More decentralized GPUs, less reliance on centralized GPU providers.
IV. CONCLUSION
I don't, for a second, pretend that NVIDIA's momentum will slow down soon. But if you are not paying attention to the above facts and trends, you shouldn't allocate any capital in this space. I admit some of the above trends are still early, and NVIDIA has the talents and war chest to be more future-proof. But assuming it remains on the current absolute capacity path, the risks are indeed fundamental, and the incentives and restraints are so real that one morning, we just might wake up and realize that one of them caught fire.
Founder of BEENOS and BEENEXT
6 个月Very informative with a lot of deep insights and implications, Jen. Thank you so much, Jen.
Thanks a lot for the great article and insights Jen Zhu Scott ??????
Climate tech VC & Community Builder ? Lazada Alumni ? Exited Founder ? Grew last company to ARR $10m+ ? Author covered in The Economist, New York Times, Tatler ? Founding CEO of GetLinks, funded by Alibaba, 500 startups.
7 个月Very well done! Love the approach
Managing Director Asia at Chartwell Speakers
7 个月Excellent piece of writing Jen Zhu Scott. I'll share with our clients.
Lead Data Scientist (ex-Head of DS/ML)
7 个月Amazing insights! What is your thoughts about TPUs? Everyone is considering GPUs as the absolute way to train AIs but it's not nearly as efficient as TPUs, as they were made for general purpose and not specifically for AI. Community support of course is a concern, but as the race for AI sped up I believe it will not be hard to be overcome.