Meta Releases New AI Research Models to Accelerate Innovation at Scale

Meta Releases New AI Research Models to Accelerate Innovation at Scale

For over a decade, Meta’s Fundamental AI Research (FAIR) team has been dedicated to advancing the state of AI through open research. In a rapidly evolving field, collaboration with the global AI community has never been more crucial.

Today, we are excited to share some of the latest FAIR research models with the global community. We are publicly releasing five models, including image-to-text and text-to-music generation models, a multi-token prediction model, and a technique for detecting AI-generated speech. By sharing this research openly, we aim to inspire further innovation and help advance AI responsibly.

Meta Chameleon: Processing and Generating Text and Images

We are publicly releasing key components of our Chameleon models under a research-only license. Chameleon is a family of mixed-modal models capable of understanding and generating both images and text simultaneously. Unlike most large language models that typically produce unimodal results (e.g., turning text into images), Chameleon can handle any combination of text and images as input and produce any combination as output. This capability opens up endless possibilities, such as generating creative captions for images or using mixed text and image prompts to create entirely new scenes.

Multi-Token Prediction for Faster AI Model Training

Large language models (LLMs) are already enhancing creative text generation, idea brainstorming, and question answering by predicting the next word in a sequence. However, this approach is inefficient, requiring significantly more text than humans need to achieve language fluency. In April, we proposed a new approach to build better and faster LLMs using multi-token prediction. This method trains language models to predict multiple future words simultaneously, improving efficiency. We are releasing the pretrained models for code completion under a non-commercial, research-only license.

JASCO: Enhanced Control Over AI Music Generation

Generative AI is revolutionizing creativity, allowing users to turn text prompts into music clips. Our new model, JASCO, goes beyond existing text-to-music models like MusicGen by accepting various inputs, such as chords or beats, to provide greater control over the generated music. This integration of symbols and audio within the same text-to-music model results in better and more versatile control over music outputs. JASCO is comparable to existing models in generation quality while offering significantly enhanced control.

AudioSeal: Detecting AI-Generated Speech

We are also introducing AudioSeal, the first audio watermarking technique specifically designed for the localized detection of AI-generated speech. AudioSeal can pinpoint AI-generated segments within longer audio snippets, significantly enhancing detection speed—up to 485 times faster than previous methods. This makes it suitable for large-scale and real-time applications. AudioSeal is being released under a commercial license as part of our commitment to preventing the misuse of generative AI tools.

Increasing Diversity in Text-To-Image Generation Systems

To ensure text-to-image models reflect the world's geographical and cultural diversity, we developed automatic indicators to evaluate potential geographical disparities. We conducted a large-scale annotation study, collecting over 65,000 annotations and numerous survey responses to understand regional perceptions of geographic representation. This data helps improve diversity and representation in AI-generated images. Today, we are releasing the geographic disparities evaluation code and our annotations to help the community enhance diversity across generative models.

By sharing these advancements, Meta aims to foster collaboration and drive innovation in AI, ensuring it evolves in a responsible and inclusive manner.

要查看或添加评论,请登录

Kiantechwise Pvt Ltd的更多文章

社区洞察

其他会员也浏览了