Towards General AI
Recent advancements in Large Language Models (LLMs) have demonstrated a ability for reasoning. Is this the basis for General AI? Artificial General Intelligence (AGI) is known as human-level AI or general intelligent action. AGI is able to learn tasks from one domain and apply to another domain; so AGI is the AI able to perform tasks in a very similar way to humans.
In 2020 we predicted that AGI would be reached in 2040; we have now moved forward to 2030.
Nvidia CEO Jensen Huang says AI will be ‘fairly competitive’ with humans in 5 years
The road is still foggy (and perhaps we are being optimistic), but some signs are showing us the way.
Human vs AI
Let's retrace the road. Human is a collection of complex capabilities:
To emulate these functions, several strategies have been developed: in 1950 we deceived ourselves that we understood the visual mechanisms and we created the perceptron (1958), the later convolutional filters. In 1980 we defined the decision trees to learn from data. Only thanks, however, to the growing computational capacity, we obtained the first results (LeNet and CNN) in 2010.
Deep Learning and Transformers architecture (2017) led us to Large Language Models (LLM, 2020): models with over 1 billion parameters.
Large Language Model
A feature predominantly seen in models with over 100 billion parameters, is the ability of reasoning (considered an emergent ability). While opinions differ on whether LLMs possess reasoning abilities, it's becoming evident that LLMs can be equipped to deal with complex problems (similar to human).
Indeed, through sophisticated prompting strategies (eg. Chain-of-Thought Prompting, Tree of Thoughts, ....) or Symbolic Modules it is possible to achieve complex tasks.
领英推荐
So, what is the next step to achieve the AGI?
Multimodality
The secret ingredient seems to be (obviously) multimodality; that is, the ability to integrate and process audio, video and sound content in an integrated manner. It's not just adding something but also improving.
Gemini (Google), Ferret (Apple) and GPT-4+ (OpenAi) and other open source (eg. Hugging Face, LangChain, ...) tools are going in this direction.
Moreover, cross-modal alignment is another interesting tool: Contrastive Language Image Pretraining (CLIP) utilizes image captioning task to align image and text modalities. Contrastive Language Audio Pretraining aligns audio and text.
1 more Multimodality
There is, however, a fourth element to integrate: not only vision, text and audio, but also the ability to act and interact with an environment.
Tools like Reinforcement Learning or IoT or Robotic or simple API can provide to AI the ability to take a further step.
Super AI
Artificial Super Intelligence (ASI) is defined as a form of AI capable of surpassing human intelligence. ASI is expected by 2050, but, first, we should fix some issues such as ethics and self-awareness (and maybe leveraging on quantum computing).
Conclusion
So is it 2024 when we will find the road to AGI?
AI enthusiast, ML practitioner, IIoT and cloud architect, technology leader, intrapreneur | Shaping the 4th industrial revolution
1 年Very insightful. Looking at the roadmap above, would it be a problem is an AGI acquires self-awareness before acquiring a sense of ethics?
Retired
1 年inspiring