Meta Releases New AI Research Models to Accelerate Innovation at Scale
Kiantechwise Pvt Ltd
Website development | Mobile app development | IoT development | Digital Marketing
For over a decade, Meta’s Fundamental AI Research (FAIR) team has been dedicated to advancing the state of AI through open research. In a rapidly evolving field, collaboration with the global AI community has never been more crucial.
Today, we are excited to share some of the latest FAIR research models with the global community. We are publicly releasing five models, including image-to-text and text-to-music generation models, a multi-token prediction model, and a technique for detecting AI-generated speech. By sharing this research openly, we aim to inspire further innovation and help advance AI responsibly.
Meta Chameleon: Processing and Generating Text and Images
We are publicly releasing key components of our Chameleon models under a research-only license. Chameleon is a family of mixed-modal models capable of understanding and generating both images and text simultaneously. Unlike most large language models that typically produce unimodal results (e.g., turning text into images), Chameleon can handle any combination of text and images as input and produce any combination as output. This capability opens up endless possibilities, such as generating creative captions for images or using mixed text and image prompts to create entirely new scenes.
Multi-Token Prediction for Faster AI Model Training
Large language models (LLMs) are already enhancing creative text generation, idea brainstorming, and question answering by predicting the next word in a sequence. However, this approach is inefficient, requiring significantly more text than humans need to achieve language fluency. In April, we proposed a new approach to build better and faster LLMs using multi-token prediction. This method trains language models to predict multiple future words simultaneously, improving efficiency. We are releasing the pretrained models for code completion under a non-commercial, research-only license.
领英推荐
JASCO: Enhanced Control Over AI Music Generation
Generative AI is revolutionizing creativity, allowing users to turn text prompts into music clips. Our new model, JASCO, goes beyond existing text-to-music models like MusicGen by accepting various inputs, such as chords or beats, to provide greater control over the generated music. This integration of symbols and audio within the same text-to-music model results in better and more versatile control over music outputs. JASCO is comparable to existing models in generation quality while offering significantly enhanced control.
AudioSeal: Detecting AI-Generated Speech
We are also introducing AudioSeal, the first audio watermarking technique specifically designed for the localized detection of AI-generated speech. AudioSeal can pinpoint AI-generated segments within longer audio snippets, significantly enhancing detection speed—up to 485 times faster than previous methods. This makes it suitable for large-scale and real-time applications. AudioSeal is being released under a commercial license as part of our commitment to preventing the misuse of generative AI tools.
Increasing Diversity in Text-To-Image Generation Systems
To ensure text-to-image models reflect the world's geographical and cultural diversity, we developed automatic indicators to evaluate potential geographical disparities. We conducted a large-scale annotation study, collecting over 65,000 annotations and numerous survey responses to understand regional perceptions of geographic representation. This data helps improve diversity and representation in AI-generated images. Today, we are releasing the geographic disparities evaluation code and our annotations to help the community enhance diversity across generative models.
By sharing these advancements, Meta aims to foster collaboration and drive innovation in AI, ensuring it evolves in a responsible and inclusive manner.