Standing at the Gate of AGI Era: the Opportunities for Entrepreneurs
Speaker
Dr. Tian Yuandong
Researcher and Senior Manager at Meta AI
Project leader for Long-Form Story Generation
Dr. Wu Yi
Principle Investigator at the Shanghai Qi Zhi Institute
Assistant Professor and Special Researcher at the Institute for Interdisciplinary Information Sciences at Tsinghua University
Zhu Xiaoyong
Principal Data Scientist at Microsoft Azure
Founder of Azure Feature Store (Feathr)
Dr. Shen Libin
Founder and CEO of Leyan Technologies
Moderator
Ella Hu
Executive Director of Sky9 Capital
?Ella Hu:
From a technical perspective, where do we think the boundaries of AI capabilities will be in the next 3 to 5 years? What kind of problems can it solve, and what kind of problems can it not solve?
Tian Yuandong:
Even if technology doesn't advance further, there are still many things that can be done, such as how to implement it and how to make models smaller, even customizable on mobile devices. ChatGPT is already capable of handling many tasks, but the entire pipeline is not yet fully connected. I estimate that it will take 2 to 3 years to connect it fully, and then many things can be done with ChatGPT or similar models.
This is very different from AlphaGo. Because many people are not familiar with Go, AlphaGo is actually a project to demonstrate the power of AI, for people as spectators. I have also worked on OpenGo (Note: OpenGo refers to the ELF OpenGo intelligent Go project open-sourced in 2018 by the Meta AI (FAIR), and Dr. Tian Yuandong serves as the head and first author), which people think is great, but likely it is a muscle-showing demonstration project. ChatGPT, on the other hand, is something that the everyone can use, so it's very different, and there are always practical issues arising. But the technology is still advancing, and countless people are thinking about how to make it better, so I think there will be significant progress.
I think ChatGPT may eventually help people do a large amount of work, but there will still be 10-20% amount of highly specialized work that still need skilled humans, partly due to insufficient data to train the model well for the work. And new types of jobs will emerge. For example, in the past people travel by horse-drawn carriages, and when cars were invented, there are fewer needs of carriage drivers, but more demand on cars. With new tools available, the job bar may be lower, and at the same time, may be more interesting.
The 10-20% amount of work requires humans who are skilled at leveraging new tools to produce new data, finding novel “patterns” on the fly and analyzing them accordingly. Such data may be something the original model cannot process. So, this is an iterative process that depends entirely on the speed of interaction between humans and machines. As the degree of automation goes higher, the data that are available for training becomes scarcer. Just like with autonomous driving, you will eventually find that there are many corner cases that cannot be solved. Without enough data, you cannot train the model well enough to achieve human-level performance.
ChatGPT also has this problem. The more data you have, the better the performance, but there will always be some “isolated islands”, i.e., some relatively specialized or niche areas where it is difficult to progress further. These islands may be the last "fortresses" of humanity.
There is also the issue of interpretability. Interpretability is a fundamental problem for understanding why a model works and why it has emergent capabilities. I think we are still lost in the mist on this part right now, treating such large language models as black boxes. These models may give natural language explanations, but we are not sure whether they can be trusted.
Perhaps people who have experience with large models know how to tune them, but this process itself does not solve the problem of interpretability. So, I think interpretability may be the next pressing question to answer. But maybe I am wrong; perhaps we can overcome that finish line by stacking a large amount of data.
Ella Hu:
ChatGPT still has some shortcomings. Just now Dr. Tian mentioned some future optimization directions. Do you have anything to add?
Shen Libin:
I think it can be divided into two aspects. One is the low-hanging fruit, such as how to reduce the cost, increase the speed, and improve the efficiency of the service from an engineering perspective, and how to do industry-specific customization. These are more practical and engineering-oriented aspects that are closer to applications and industries.
Secondly, it is about understanding the core issues. Like deep learning, the study of LLM is actually like facing a larger black box, and we need to better understand why the model can have so many capabilities and what is the essence behind it. In this regard, I think it not only requires the industry but also more participation from academia.
Ella Hu:
Regarding the issue of interpretability, how difficult is it? How much time and cost do we need to solve it?
Tian Yuandong:
It is actually a quite difficult problem. I once asked a question on Zhihu.com: What would be the theory of deep learning looks like in 100 years? I emphasized "in 100 years" because I know it's hard to say now and it's difficult to do. Currently, many papers are like the following: take out a model and look at each node of the neural network, the output of each layer, and tell a story based on their empirical relationships (e.g., correlations). But this kind of research is relatively superficial. Deeper research should be a process of studying and proving the training algorithm. Why can the training algorithm capture the features in the data? Which features in the data can be captured, and which features cannot be captured? What is the training algorithm's understanding of the data structure? Where are its boundaries and limitations? What kind of structure can it not learn? This is the most important thing. If we can really explain it, it will be very helpful for us to understand the boundaries or learning mechanisms of large models.
This is very difficult, and we have not seen very good research results yet. Currently, the main research is the first type, that is, after decomposing the large model, explaining what each neuron is doing, and what it might be doing. Applying previous theories to deep learning can indeed produce some papers, but these papers often explain local effects and are difficult to put together into a coherent framework. IMHO, a coherent theory may need to be developed from scratch, by starting from first principles, summarizing the rules step by step, and finally getting a highly abstract mathematical framework. This is very difficult and requires a substantial amount of work.
Of course, there is also a possibility that after the model iterates to a certain extent, it will explain itself, or figure out a way to prove its own properties. (laughs).
Ella Hu:
What is the trend for models in the future? Will they become bigger and deeper or smaller?
Zhu Xiaoyong:
From a first-principles perspective, the most fundamental element is computing power, and computing power will always become cheaper. The cost and time to train a model with the same parameters will always decrease. Therefore, everyone will tend to use larger models, even though larger models may not necessarily be better, but larger models represent a higher upper limit.
Although the models are so large, there is a lot of redundancy in them. On platforms with limited computing power, such as edge devices, smartphones, and smartwatches, it is still necessary to distill and prune large models into smaller models to achieve their intended effect.
For more vertical fields or fields with limited data, such as the medical field, various medical data itself is relatively scarce. Although the model determines the upper limit, data determines the lower limit; if the data is insufficient, there is actually no need for such a large model. Therefore, in these fields, fine-tuning and even pruning will be done based on large models.
Tian Yuandong:
Regarding the number of parameters, we can look back at the evolution of visual models in 2013. At first, AlexNet (note: AlexNet is a convolutional neural network designed by Hinton and his student Alex Krizhevsky, which won the ImageNet competition in 2012) had only 6 to 7 layers. Later, everyone started adding parameters, including VGG (note: VGGNet was developed based on AlexNet), which has a very deep architecture and is difficult to train. However, after ResNet (note: ResNet is a residual network that can solve the problem of degraded performance as the depth of the neural network increases due to vanishing gradients) and Batch Normalization came out, it became much easier to train. Later, people gradually found ways to achieve better results with fewer parameters. Therefore, evolution is not a one-way direction.
Ella Hu:
It seems that the issue is closely related to different application scenarios. What can we do to enhance the inference ability of large models, given the emergence of such ability?
Tian Yuandong:
One simple shortcut is to use tools like Meta's ToolFormer (note: ToolFormer is a large language model released by Meta that can call external tools based on user requirements). When the large model finds it difficult to predict the next token, it can call a tool to solve some arithmetic or difficult reasoning problems. Next, we can imagine integrating various tools into the large model, such as a SQL query into any database. This is a short-term effective solution.
The long-term solution is difficult, mainly because we still do not fully understand the mechanism of large models. We may need to design better models, require better architectures, and perhaps go beyond the Transformer architecture to give the large model better reasoning abilities. However, the short-term solution is enough for everyone to work on for at least one or two years, and I predict that there will be a lot of work in this area in the future.
As for the emergence of abilities in LLMs, it is a very mysterious phenomenon with many interpretations. One interpretation is that the emergence of abilities may be related to the metrics of the dataset. Some metrics will jump very quickly once they reach a certain point. So does the so-called emergence of abilities really exist? This is a question mark. If you adjust the metrics, such as setting the standard for machine-correctly answering questions as "continuously answering ten questions correctly", you will find that the probability of the machine answering questions correctly will naturally have a leap. However, this is not because of the emergence of abilities, but because the metric itself has a bias. This is one possibility.
In fact, the effect of ChatGPT is indeed much better than many current language models. I think one important point is that it can learn some high-level logical reasoning abilities. Conventional language models are designed to predict the next word, and if they are not fine-tuned or reinforced with human feedback, they will continue to write the sentence without understanding its content, let alone performing instructions based on that content. Originally, the model could only be used to write articles or fill in the blanks, but after fine-tuning, it suddenly learned some high-level understanding, which was a big leap. Is that because the pre-trained model already has such capability, or because of the fine-tuning step? Existing neural network theories cannot explain this phenomenon. Perhaps in the future, we will discover its mechanism or how to better control it, but currently, we may still be in a state of confusion, and we need to explore in the dark for a while.
Ella Hu:
We see the role of reinforcement learning in ChatGPT. Can reinforcement learning be used to strengthen the reasoning ability of models?
Wu Yi:
I have a positive answer to this question. Reinforcement learning requires a reward and needs to be combined with a so-called scoring mechanism. And what is reasoning? I think there are two types of reasoning. One is abstract reasoning, such as the need for the model to abstractly explain something. For example, if I ask the model to help me explain why the stock market fell today, it is difficult for me to determine whether the model's explanation is correct or not. This is more difficult to apply reinforcement learning. The other type of reasoning process is to perform specific tasks, such as booking airline tickets or calling a tool like ToolFormer. If this can be combined with a relatively easy-to-judge problem, reinforcement learning can play a very effective role, and it can improve the model's long-term reasoning ability.
Zhu Xiaoyong:
Is the application of reinforcement learning in the field of natural language learning limited to situations with a complete reward mechanism? Is there a possibility of making humans less involved?
Wu Yi:
First of all, the definition of reinforcement learning requires a reward function. Currently, many people are researching a direction called learning from human preferences, which is how to make it easier for people to score and also give reinforcement learning a more informative feedback.
I also want to add that it is not necessarily reinforcement learning for large models, but after the large model becomes stronger, reinforcement learning itself will also undergo many changes. Previously, reinforcement learning was like looking for a needle in a haystack, but with a large model, it can predict the approximate path, and this becomes more like following a map.
Ella Hu:
When the large models become stronger, will they erode the space of some small models? What is the competitive relationship between the two?
Zhu Xiaoyong:
First of all, we need to define what we mean by large and small models. Today, we call models like ChatGPT large, but in fact, models like Bert and many visual models used to be very large as well. It’s just that we are now more focused on improving model efficiency, so the size of the model itself is not particularly important.
Regarding the question itself, I think the combination of the two has been happening since four or five years ago. A typical example is the recommendation system. In the past, the input of the recommendation system was based on various structured data and required a lot of manual work. But after deep learning models like Bert appeared, more data can be embedded, such as user data, text or audio and video content published by users, which the model can understand. This has had many positive effects on downstream pipelines. With better inputs, downstream models can output better results. The current industry practice is to use a chain of models to string together various models to play a role. Therefore, large models actually have a very positive driving effect on downstream models, whether they are large or small. I think this is a very good phenomenon and also confirms the commonly said phrase in the field of machine learning, “Garbage in, garbage out.”
Tian Yuandong:
In addition, I think small models can be placed in mobile devices as private or even personalized models, which may be the application direction of small models. It is possible that in the future, the apps installed in our phones will not be written by programmers but trained using data. Each app corresponds to a small model plugin that generates different functions after training. And this model is private and requires authorization to be used by others.
Shen Libin:
The large and small models mentioned here are also closely related to the business we are doing (note: Dr. Shen Libin founded Leyan Technologies, which provides AI solutions for the e-commerce industry). After ChatGPT became popular, many people asked us whether large models would replace your entire technology framework? I think it takes time. Large models will definitely gradually replace some of the capabilities of small models, but in small models, we have encoded a massive amount of industry know-how. Therefore, it is not realistic for a large model to immediately replace the industry's small models without continuous training on industry data.
But we have also seen that large models can do many things that small models cannot, such as the strong generalization capability of large models, which small models cannot achieve. I think in the next two years or so, some things will be handed over to large models to do, but we certainly don't want to maintain two sets of models at the same time. So, is it possible to integrate industry know-how of small models based on the foundation of large models? I think this is the competitive way for the two in the future.
Ella Hu:
How do you view the barriers and replication difficulties of ChatGPT?
Wu Yi:
Firstly, I want to say that ChatGPT was actually a short-term project by OpenAI. It became popular so quickly because it allowed everyone to use it, so it quickly became popular. If you are interested in the development of OpenAI, you can check out their series of blogs. You will find that since the API was launched in 2020, most of the functions of ChatGPT today could already be achieved with the API at that time, about 80-90%. What ChatGPT offers beyond that is better interaction design and fine-tuning. Therefore, if you look back at the development of OpenAI, from RLHF (Reinforcement Learning from Human Feedback) to GPT2, GPT3, and coding with API, this was already achievable even before Microsoft invested. After Microsoft's investment, OpenAI began large-scale training of models, and later with InstructGPT, WebGPT and ChatGPT, OpenAI went through a long period of iteration and encountered a lot of difficulties.
Therefore, the barrier to repruducing ChatGPT today actually lies in the difficulties that OpenAI has encountered since 2017 and 2018. These difficulties cannot be skipped over. In addition, years of training have enabled OpenAI to develop its own training infrastructure, which makes the entry barrier even higher.
So, in the end, it is an engineering problem. In other words, anyone can make its own ChatGPT if they follow the path that OpenAI has taken.
Tian Yuandong:
I agree, it's just a matter of time. But I think it's very strange that everyone is rushing to reproduce ChatGPT. For example, when I was working on OpenGo back in 2018, which led to very good results, beating 4 professional Go players by 20-0 on a single GPU, someone in the US asked me why we were doing this and why we needed to reproduce it. Thinking about his criticism, I think that it made sense. At that time, perhaps I was in the mindset of following others and now I should do better.
So personally, while it is possible to redo ChatGPT and spend time and effort to make it happen, you will always be following someone else's footsteps without your own direction. And this is a difficult task, one of the challenges being the infrastructure issue I mentioned earlier, such as OpenAI having a team responsible for data labeling. These labelers need to regularly align with researchers and discuss how to label data more efficiently.It’s a difficult team collaboration task. Not to mention, after ChatGPT became popular, many users contributed free data every day. Once these things form a closed loop, it is almost impossible for newcomers to catch up.
I think a better innovation direction may be to have more efficient technology than ChatGPT or a better training plan. But this is indeed very difficult.
Wu Yi:
In fact, OpenAI had many directions back then, and the API was just one of them. It succeeded and then scaled up. But before 2017, no one knew for sure that it would work out. So we can't just look at ChatGPT in hindsight. Today's success might have been just a random decision or a local optimal solution, not a global one.
Tian Yuandong:
Yes, I've also discussed with others before not to focus solely on OpenAI or any single company. You have to look at how innovation is done in the United States. There are many different things that people are doing, and various small companies try different and seemingly crazy ideas. One or two of them may hit the mark and become wildly popular. If you go all in on ChatGPT, it's highly likely that in a few years, something completely different will emerge and take over.
Ella:
From the current cost and performance of the foundation model, how far is it from industrial applications?
Shen Libin:
I believe that in many application scenarios, the current price is not a big problem. For example, Google Translate charges 2 cents per thousand tokens, while ChatGPT's price is only 1/10 of Google's. However, from small-scale applications to large-scale applications, there is still a big gap between ChatGPT or Bing combined with OpenAI technology to achieve the coverage of Google, which requires a lot of engineering work.
In addition, a large amount of computing power has already been consumed, and whether there will be enough computing power in the future when more people use it is also a major problem. I have also seen work being done to make the model smaller in size.
I also believe that the reason why there are companies in China dedicated to redoing ChatGPT may also be a way to exercise their engineering capabilities. ChatGPT can at least serve as a benchmark. However, making ChatGPT is definitely not just for the sake of making it, but to bring the team's engineering skills up. Benchmarking includes performance benchmarking, cost benchmarking, and pricing benchmarking. Even if Chinese companies currently have a big gap with OpenAI in terms of performance and cost, the final pricing cannot differ too much. So the challenge is still great.
Secondly, we see that the foundation model is still more applied to general fields such as writing assistants and efficiency tools, and it is not yet very mature to involve vertical fields and combine with small models. This also tests the engineering capabilities of each company. How to integrate more ISVs (Independent Software Vendors), and how to make many people synchronize and iterate the model on different versions are engineering challenges. The United States is still leading in this regard, but I believe that China will also make great progress in the future.
Tian Yuandong:
I agree. I think OpenAI's pricing is already very affordable. I think if OpenAI wanted to, it could replace Jasper. If startups want to benefit from downstream applications, they can only pray that OpenAI does not target their industry. But I believe that OpenAI is focused on a larger global perspective, and unwilling to focus on anything that is too specific, so it gives downstream enterprises an opportunity to develop vertical applications and form an ecosystem. What to expect next is the better-integrated software ecosystem, which could promote the upgrading of industries around the world.
Ella:
We see that the entire entrepreneurial community is excited, thinking that both traditional internet toC products and ToB SaaS products can be re-imagined with the advent of large models. But at the same time, we also need to recognize that the boundaries and forms of large models themselves have not yet been fixed, and they may move into vertical fields at any time. So, what areas of entrepreneurship do you see as promising? And how can we establish a moat?
Zhu Xiaoyong:
The foundation model already exists, but there is still a need for many service providers who understand industry know-how. This represents a huge opportunity for small startups. Another wave of opportunity is the ecosystem of large models, including inference training and model miniaturization.
Shen Libin:
I think we can look at it from another perspective: the opportunity for entrepreneurship depends on where the boundary of the foundation model lies. If a startup cannot clearly define its division of labor with the foundation model, it is easy to be eroded.
I believe that the foundation model will gradually extend from the current general field to vertical fields. What will be left in the end? I think it will be the interface between the foundation model and humans. These are tasks that require time to accumulate, and even involve some dirty work, but there are also certain opportunities.
For example, developing ERP systems for e-commerce requires someone to go to the warehouse for deployment, which is a connection point between the real and virtual worlds. I think large models are similar; the connection points between large models and humans are left to startups.
Wu Yi:
I will express a different point of view. It seems that everyone now thinks that large models kill everything, but is it possible that large models are infrastructure? For example, in the US cloud services market, AWS started off well, followed by Azure, Google Cloud, and Snowflake. The cost of switching for users is not that high. OpenAI originally used Google Cloud and then switched to Azure, so is it possible that the foundation model will eventually become infrastructure? For users, they can switch between these providers because it's just an API product. So I don't think everyone has to be so pessimistic.
Shen Libin:
If we're talking about the US market, I agree with this view, but the competitive ecology in China is different. For example, in the advertising ecology, the division of labor in the US is clear, with demand side, supply side, matchmaking trading, and data providers. The division of labor is clear, and the ecology is prosperous. However, in China, there is a lack of such ecological division of labor, and almost every company has to do end-to-end work. So I think we need to discuss the situation in China and abroad separately.
Ella:
We have seen some startup teams, including those who previously worked on product, UI, and sales, start to create applications at the application layer. What kind of AI capabilities do you think application layer entrepreneurs need?
Tian Yuandong:
They need to be smart and have strong communication skills. They need to iterate quickly, be able to quickly grab the critical components that lead to successful business model, and make a prototype quickly. Once someone is willing to use it, it can lead to a snowball effect. Having an AI background may not be helpful because they may always be thinking about why, get trapped into stereotypes, wasting time (laughs). I think what's more important is strong execution.
Ella:
Lastly, let's imagine what human society would be like when the era of AGI arrives? What kind of production relationship can match this production capacity? What will be the relationship between humans and AI? How do we view possible risks, such as moral and safety issues?
Tian Yuandong:
I actually discussed this situation in my sci-fi novel. If one day, anyone can immediately produce anything they want, what kind of values will humans cherish in this situation? One consequence is that the price of common goods that will become zero due to infinite supply... So in the end, people who are more creative and can bring new stuff that are never seen before (by leveraging existing tools) may become valuable.
When all material needs can be satisfied, humans may turn to seek their own meanings or new forms of organization and communication. Each person can truly define where their value lies.
Zhu Xiaoyong:
I think there may be more ethical and philosophical questions raised. When AlphaGo emerged and caused a sensation in the Go world, everyone said that Go no longer existed. But later on, a new balance gradually formed, and players would use AlphaGo to learn Go and its strategies. AI went from being human’s competitor to human’s teacher. Maybe in the future, AGI will also become our partner and mentor. Then ethical issues will arise.
Shen Libin:
I think more about how to transition from our current society to the AGI society in the future. During this process, many people will lose their value in the old coordinate system, such as grassroots clerks and simple laborers. Although, as Dr. Tian mentioned before, new professions will arise after machines replace humans, but honestly speaking, the number of jobs created will definitely be less than the number of jobs lost. The lives of many primary laborers will be impacted. How can these people be repositioned?
Tian Yuandong:
Yes, this is a big problem. That's why I think open source and demoncratizing AI is very important, in order to prevent this decisive ability from being controlled by a few people. Mistakes can lead to huge problems and be very dangerous for society.
Wu Yi:
I'm actually quite optimistic because I think AI is ultimately just a tool. Although there are safety and human compatibility issues, everyone is actively researching solutions. It's very normal for humans to feel fear as productivity develops because people are always afraid of the future. But looking back on human history, from the agricultural society to now, we've made it through. So I think human society is more robust than we imagine.
Postscritps
ChatGPT has opened the door to AGI, and productivity has made a qualitative leap. The huge entrepreneurial opportunities of the next generation of the internet are before us. In these exciting days, every day brings a big news that dazzles the eyes, but we must also be part of it.
For startups with existing products, they should not be constrained by their original product thinking, but should actively embrace AI and strengthen the intelligence and automation in their products. For the many companies that want to start a business, how to create an AI-native product and how to establish their own differentiation to resist the boundary erosion of big models, the product upgrades of giants, and the homogenization competition in the industry, more industry know-how, richer industry data, deeper workflows, and the resulting data flywheel will help them gradually build your protective barrier. Or are there more innovative, low-dimensional or ground-level crawling ways?
In this clear and chaotic state, entrepreneurs are also required to possess seemingly contradictory qualities such as passion and calmness, deep thinking and fast execution. The changes brought about by AI require us to abstract to the bottom level of thinking to seize the opportunities that belong to us.
Sky9 Capital focuses on the technological changes, AI and globalization entrepreneurship opportunities. If you are interested in exploring with us, please contact us at [email protected].
Ella Hu
Executive Director, Sky9 Capital
So glad to have Yuandong Tian, Xiaoyong Z., Libin Shen and Will Wu on this panel discussion!