GPT the next frontier.
Multi-Modality a path to AGI.

GPT the next frontier. Multi-Modality a path to AGI.

AI-generated pod-cast here on Sound cloud

Hi, my name is Aamir, ( ????? ????? ????? ?????? ?????) an AI researcher based in Melbourne, Australia. In this series of blogs, we shall examine what is the next frontier and the future of GPT-type models in their quest for AGI (Artificial General Intelligence). We will delve into various aspects, including the technological advancements necessary, and the challenges that lie ahead on the path towards achieving AGI. Additionally, we will explore how GPT-type models can evolve beyond their current capabilities to encompass a broader understanding of context, reasoning, and adaptability, ultimately aiming to bridge the gap between narrow AI and true general intelligence. Through this exploration, we hope to shed light on the possibilities and considerations surrounding the development of AGI, and what it means for the future of humanity.

The human brain is constantly receiving streaming information from all its senses all the time, including vision, hearing, speaking, smell, touch, and internal feedback such as pain or loss. This continuous influx of sensory data is processed and integrated into complex neural networks, allowing us to perceive the world around us, make decisions, and interact with our environment in real-time. Moreover, our brains have the remarkable ability to integrate this information into a single unified view or model of this world.

?This unified perception enables us to navigate our surroundings, recognize patterns, make predictions, and form coherent interpretations of our experiences. It is this holistic processing that underlies our ability to understand context, anticipate events, and adapt to changing circumstances; characteristics that pose significant challenges for artificial intelligence systems striving to achieve similar levels of comprehension and adaptability. When it comes to AGI, we keep coming back to human intelligence because it serves as the yardstick through which we measure progress. We consider something truly intelligent when it can mimic most, if not all, parts of our cognitive abilities. Human intelligence represents the pinnacle of evolution's design, exhibiting a remarkable blend of creativity, problem-solving, emotional intelligence, and adaptability. It encompasses not only the ability to process information efficiently but also to understand context, learn from experience, and interact meaningfully with the world. As such, achieving AGI entails replicating these diverse facets of human intelligence within an artificial system. By striving to emulate human cognition, we aim to create machines capable of understanding and navigating the complexities of our reality, ultimately advancing technology to unprecedented heights and reshaping the future of humanity.

Where things stand

As impressive as GPT-type models are, we can think of them as our first large-scale experiment to master AGI. Let me explain why I believe that we have a long way to go: The worldview of the current generation of GPT-type models is based on a single modality. To further simplify the matter, let's do a thought experiment. Imagine that you are stuck in a dark room, and your only view of the world is through a series of symbolic tokens that seem to appear out of nowhere without any prior context. These tokens convey information about the world, but they lack the richness and depth of real sensory experiences and background awareness. For you to make a realistic model of this world, you will process text data and generate responses based solely on the patterns and correlations within that data. Similarly for GPT-type models, while they demonstrate impressive language understanding and generation capabilities, they cannot perceive and interact with the world in the way humans do, through multiple sensory modalities. This limitation underscores the need for further research and development to create AI systems that can integrate information from various modalities and achieve a more comprehensive understanding of the world, bringing us closer to the goal of AGI.

Next steps: A picture is worth a thousand words

Let's face it, there are certain situations where words fall short of what we are trying to convey. Whether it's the intricate details of a complex concept, the beauty of a breathtaking landscape, or the subtle nuances of human emotions, sometimes words alone cannot capture the full depth and richness of an experience. In such cases, visual imagery can be incredibly powerful, offering a direct and immediate way to communicate ideas, evoke emotions, and convey information. By incorporating visual elements into our communication strategies, whether through photographs, diagrams, or illustrations, we can enhance understanding, engage our audience on a deeper level, and create more impactful messages. In the journey towards AGI, exploring the integration of visual perception with language understanding will be a crucial next step, allowing AI systems to comprehend and communicate with the richness and complexity of the human experience.

The Big Two

Google and OpenAI, the two heavyweights of the tech industry, understand where these models need to go to fulfill the promise of AGI and create agents that truly perceive this world as we do. With their vast resources, expertise, and commitment to advancing artificial intelligence, Google and OpenAI are at the forefront of research and development in this field. They recognize the importance of not only improving the capabilities of existing AI models but also pushing the boundaries of innovation to achieve a deeper understanding of human cognition and perception. By collaborating with researchers, investing in cutting-edge technologies, and fostering a culture of experimentation and exploration, these organizations are driving forward the quest for AGI and paving the way for a future where intelligent machines can truly comprehend and interact with the world in a manner akin to human beings.

While Open AI through GPT-4 has somewhat limited multi-modal capabilities, Google Gemini-Ultra as per Google is designed from the ground up to be fully MM ( Multi-modal).

In my humble opinion, while it is a significant step on the path to AGI, it does leave me with the feeling that we are not quite there yet. Maybe GPT-5 will get us closer.

What can we do with what we have?

"We cannot get hung up on where we are going that we forget to make the most of where we are" From the dawn of civilization, every technology is created to serve humans and MMs are no different. Multi-modal models are increasingly being recognized for their ability to address complex business cases across various industries, leveraging the integration of multiple types of data, such as text, images, audio, and more, to enhance decision-making and operational efficiency.

With the hindsight of over twelve years in the field of ML and AI, my two cents would be to invest and create MMs that are designed to work on a specific sector and the narrow scope of solving key business problems with great efficiency and accuracy. In human terms, one might be an expert in one specific domain and ineffective in some other domain. So if we wish to create an MM who is a great coder or developer, we do not care how good they are in accounting.

Here are some generic examples where specialized Multi-model models can make a difference.

  1. Traffic Management.
  2. Work safety.
  3. Insurance assisments.

Please have a look at the embedded video presentation.

The idea is to find your best people and distill their knowledge into a specialized multimodal AI.

It has been a pleasure writing this blog, with the hope that you would have found it to be useful. I welcome your thoughts and feedback on the topic. In the next series, we shall examine more technical aspects of how these MMs are put together and train custom domain-specific MMs.

I can also contacted on X, formally Twitter @AM12_IO

要查看或添加评论,请登录

Aamir Mirza的更多文章

  • EV: To be or not to be.

    EV: To be or not to be.

    Hi, my name is Aamir Mirza,( ????? ????? ????? ?????? ????? ) an AI researcher based in Melbourne, Australia. This blog…

  • Age of truly immersive gaming, GPT meets NPC.

    Age of truly immersive gaming, GPT meets NPC.

    My name is Aamir Mirza an AI researcher based in Melbourne(????? ????? ????? ?????? ?????). In this blog, we shall look…

    1 条评论
  • The Age of Prompt Engineering.

    The Age of Prompt Engineering.

    My name is Aamir Mirza, (????? ????? ????? ?????? ?????) an AI researcher based in Melbourne. In this blog, we shall…

    4 条评论
  • GPT - Hallucination, toxicity, and other electric dreams

    GPT - Hallucination, toxicity, and other electric dreams

    My name is Aamir Mirza,(????? ????? ????? ?????? ?????) an AI researcher based in Melbourne. In this blog we shall…

    2 条评论
  • Chat GPT unpacked.

    Chat GPT unpacked.

    My name is Aamir,( ????? ????? ????? ?????? ?????) and I am an AI researcher based in Melbourne. Welcome to my latest…

    3 条评论
  • Security, What Security?

    Security, What Security?

    There has been a recent spate of high-profile online security breaches in Australia. Private details of millions of…

  • The Age of Declarative Programming.

    The Age of Declarative Programming.

    My name is Aamir Mirza ( ????? ????? ????? ?????? ?????) an AI researcher based in Melbourne Australia. If you are not…

  • Self Driving Cars, what they are and what they are not

    Self Driving Cars, what they are and what they are not

    My name is Aamir Mirza ( ????? ????? ????? ?????? ?????) an AI researcher based in Melbourne Australia. A few years…

  • Using Machine learning for Data anonymization.

    Using Machine learning for Data anonymization.

    Hi, my name is Aamir Mirza, a Data Scientist based in Melbourne Australia. It has been a while since I have posted any…

    2 条评论
  • From Strings to Things.

    From Strings to Things.

    How Data Scientist in @myob are transforming online search. Hi, my name is Aamir Mirza, a Data Scientist and…

    4 条评论