The journey to AGI
Current AI systems are sometimes referred to as "narrow AI" as they are optimized to solve one specific task. Large language models like OpenAI's GPT-3 which manipulates text are examples. In order to move toward AI with "human" level capabilities these systems will have to do many different tasks - usually referred to as artificial general intelligence or AGI.
One step on this journey is to introduce "multimodal" AI systems, most recently accomplished by Deepmind with the introduction of their Gato system. MIT technology review writes that Gato "learns multiple different tasks at the same time, which means it can switch between them without having to forget one skill before learning another."
This is accomplished by having parameters of different kinds in the same system - text, image, sound, even actions -- a simple example of this last category is movement in a computer game: move up, move down, go left. So instead of compartmentalizing these different skills into different models, a single model can handle all skills.
领英推荐
For example, OpenAI currently has four different systems for four different types of content: GPT-3 for text, DALL-E for images, Codex for programming, and Whisper for speech. A multimodal approach would combine all four content types into a single model -- a user would then specify which skill should be used for a given prompt. Such a system should also then be able to "learn" new skills.
Where are we going: On the journey toward artificial general intelligence, the near term milestone will be multimodal AI systems that can handle and move between different information types. The next step will be to use those multimodal systems to quickly learn new skills -- composing music for example could be the next step for a multimodal system that already has mastered text, images, and sounds. We should see these multimodal systems within the next year - not just in a lab, but available for use by anyone (just as GPT-3 or Codex or DALL-E are all available today). Researchers believe that this eventually leads to a system that can learn any skill and thus is a path to AGI.
What’s next?
1 年Great info on AGI and multimodal approach to it. In the multimodal model approach to AGI (if I understand it correctly), it seems to me that one of the hardest parts of this would have to be dealing with visual inputs. Somehow babies figure out very quickly what's new in their environments and what isn't, and even more important, which new stuff is important to pay attention to, and which new stuff can be ignored. I could well imagine an AGI being very good at detecting new stuff, but struggling with determining what is important and what isn't, leading to 'info overload', or at least a slower learning curve. Then again, transactions can be millions of times more and faster than a human could ever deal with, so maybe there's a meta learning loop an AGI could use to develop that discernment.