Revolutionary AI System Learns Concepts Shared Across Video, Audio, and Text

Revolutionary AI System Learns Concepts Shared Across Video, Audio, and Text

Researchers at the Computer Science and Artificial Intelligence Laboratory (CSAIL) have developed an artificial intelligence (AI) technique that allows machines to learn concepts shared between different modalities such as videos, audio clips, and images. The AI system can learn that a baby crying in a video is related to the spoken word “crying” in an audio clip, for example, and use this knowledge to identify and label actions in a video. The technique performs better than other machine-learning methods at cross-modal retrieval tasks, where data in one format (e.g. video) must be matched with a query in another format (e.g. spoken language). It also allows users to see the reasoning behind the machine’s decision-making. In the future, this technique could potentially be used to help robots learn about the world through perception in a way similar to humans.

A machine-learning model can identify the action in a video clip and label it, without the help of humans.

Humans observe the world through a combination of different modalities, like vision, hearing, and our understanding of language. Machines, on the other hand, interpret the world through data that algorithms can process.

要查看或添加评论,请登录

Salgem Infoigy Tech Pvt. Ltd.的更多文章

社区洞察

其他会员也浏览了