ML RUNDOWN: xAI's Colossus, New Humanoids, and More AI Breakthroughs

Welcome to this edition of our AI newsletter!

This week, xAI has launched the Colossus supercomputer, equipped with 100K Nvidia H100 GPUs, a significant development in AI training. Elon Musk plans to expand this power to 200K GPUs shortly. Additionally, 1X Robotics and OpenAI have unveiled Neo Beta, a new humanoid robot designed for advanced human-like interactions.

We also dive into the latest research papers on model compression and extended video comprehension, along with exciting updates from ModelsLab and notable advancements in AI tools.

Read on to catch up on all the latest news and developments!

| Latest AI News

xAI Colossus supercomputer with 100K H100 GPUs comes online — Musk lays out plans to double GPU count to 200K with 50K H100 and 50K H200

Elon Musk's AI company, xAI, has activated a supercomputer cluster called "Colossus," which is currently the most powerful AI training system in the world. It utilizes 100,000 Nvidia H100 GPUs and was brought online in just 122 days.?

This system, located in Memphis, Tennessee, will support the training of advanced AI models, including the company's large language model, Grok. Musk stated that the cluster would soon expand to include another 50,000 Nvidia H100 and H200 GPUs, eventually reaching a total of 200,000 GPUs.

This supercomputer is a significant milestone for AI advancements, especially in the area of large-scale AI training. It's designed to push the boundaries of AI model performance, with Grok 3 expected to utilize the full potential of the 100,000 GPUs for its training.

New OpenAI ChatGPT-5 humanoid robot unveiled 1X NEO Beta

1X Robotics, in partnership with OpenAI, has unveiled the Neo Beta, a humanoid robot featuring advanced movement, agility, and human-like capabilities. This robot is equipped with bioinspired actuators that enable fluid motion, making it adept at performing delicate tasks, such as assisting with personal care and handling fragile objects.?

Neo Beta is designed with advanced vision systems, allowing real-time environmental interaction, and safety features, including a soft exterior for gentle interactions—especially important in assisting elderly individuals.

Key features of Neo Beta include adaptive learning through machine learning, allowing it to improve its locomotion over time, and significant strength, enabling the robot to lift heavy objects, which can be useful in healthcare or home environments. Neo Beta's durable construction ensures its long-term performance and reliability, positioning it as a key advancement in human-robot interaction.

This milestone highlights the growing role of AI-powered robots in everyday life, offering potential applications in homes and healthcare, and enhancing quality of life through human-robot collaboration.

Read More!

Salesforce to acquire AI voice agent firm Tenyx, joining AI talent race

Salesforce has announced its plan to acquire Tenyx, a California-based AI startup that specializes in developing AI-powered voice agents. This acquisition is aimed at enhancing Salesforce's AI-driven solutions, particularly in industries like e-commerce, healthcare, and travel.?

Tenyx, founded in 2022, has gained attention for its innovative voice technologies. The acquisition is expected to close by the third quarter of 2024, and Tenyx's co-founders, CEO Itamar Arel and CTO Adam Earle, along with their team, will join Salesforce.

This move comes as Salesforce, facing pressure from activist investors, seeks to re-accelerate its revenue growth after a period of limiting acquisitions. It also mirrors similar efforts by tech giants like Microsoft and Amazon, which have recently made strategic AI acquisitions to strengthen their positions in the race for AI talent and tools.

Read More!

These AI agents are building ‘civilizations’ on Minecraft

A California-based startup, Altera, has been running simulations in Minecraft where over 1,000 autonomous AI agents work together to create virtual civilizations. These AI agents collaborate to build societies that include government systems, economies, cultures, and even religions. In one instance, AI agents established a market with gems as currency.?

In some simulations, the agents have been capable of forming democratic governments and responding to challenges like missing villagers. For example, a Trump-led civilization focused on increasing policing, while a Harris-led one pursued criminal justice reforms. This experiment in autonomous collaboration showcases how AI agents can organize and evolve in complex social environments, solving real-world issues like cooperation and progression.

Altera aims to scale AI agents to operate autonomously, requiring minimal human oversight, while ensuring their behavior aligns with human values. These advances could eventually lead to AI agents interacting more deeply with human society in various applications beyond gaming.

This project illustrates the growing potential of autonomous AI systems to work together, adapt to new situations, and develop complex societal structures.

Read More!

OpenAI Japan CEO says the new AI model GPT-Next is coming soon and it will be 100 times better than GPT-4

OpenAI Japan's CEO, Tadao Nagasaki, revealing upcoming advancements in artificial intelligence, specifically about a new model called GPT-Next. Nagasaki claims that GPT-Next will be 100 times more powerful than GPT-4 and promises significant improvements in multimodal capabilities, meaning better handling of text, images, and audio. The model will also leverage "Project Strawberry," OpenAI’s effort toward creating Artificial General Intelligence (AGI).

Nagasaki highlighted that GPT-Next's advancements come from architectural improvements rather than simply increasing computing resources. OpenAI Japan aims to further integrate AI into various industries, with GPT-Next offering unprecedented performance levels. The news also mentions how GPT-Next could surpass human intelligence in specific tasks, marking a major leap forward in AI capabilities.

This development follows earlier advancements, like GPT-4o, which was faster and more cost-effective than GPT-4. The future of AI as envisioned by OpenAI could lead to more autonomous systems that perform complex, human-like tasks.

Read More!

| Latest Research Papers

Paper 1: MINT-1T: Scaling Open-Source Multimodal Data by 10x

MINT-1T, the largest open-source multimodal interleaved dataset, is designed to advance the training of large multimodal models (LMMs). The dataset contains one trillion text tokens and 3.4 billion images, scaling up existing datasets by 10x.?

The authors address a key challenge in the field: the scarcity of large-scale, open-source datasets capable of training frontier LMMs. The dataset is sourced from diverse mediums, including HTML, PDFs, and ArXiv papers, distinguishing it from prior datasets like OBELICS, which primarily relied on HTML.

Key Points:

MINT-1T contains a massive scale of data: one trillion tokens and 3.4 billion images.
It incorporates novel data sources such as PDFs and ArXiv documents, enhancing data diversity.
The dataset is 10x larger than existing open-source datasets like OBELICS.
Models trained on MINT-1T rival the performance of those trained on previous leading datasets.

Methodology:

The dataset was curated by extracting multimodal documents (text interleaved with images) from three main sources—HTML, PDFs, and ArXiv papers. Extensive filtering and deduplication methods were employed to ensure the dataset's quality.?

The team applied NSFW detection, image filtering, and deduplication to maintain a clean and high-quality dataset. Large-scale infrastructure was utilized to process the vast amount of data, involving 2,350 CPU cores and 4.2 million CPU hours.

Results:

The experiments demonstrate that LMMs trained on MINT-1T perform on par or better than those trained on OBELICS across several benchmarks, including question-answering and captioning tasks. MINT-1T’s inclusion of PDFs and ArXiv documents contributed significantly to improvements in science and technology-related tasks.

Findings:

MINT-1T enhances multimodal learning by providing a more diverse and expansive dataset. Its open-source nature allows the research community to train models transparently and collaboratively, potentially bridging the gap between open-source and proprietary models.

For more information, you can read the full paper here!

Paper 2: SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models

SynthVLM is a novel data synthesis pipeline designed to generate high-quality synthetic data for Vision Language Models (VLLMs). SynthVLM addresses the limitations of existing large-scale image datasets, such as low quality, inefficiency, and privacy concerns.

Instead of using web-sourced images with manual captions, SynthVLM employs advanced diffusion models to create synthetic image-text pairs from high-quality captions, achieving superior image-text alignment.?

The pipeline ensures data quality by curating the best caption-image pairs based on a metric called CLIPScore, which assesses the similarity between captions and images. This method not only improves the quality and alignment of training data but also offers privacy advantages by generating images instead of using real-world data.

Key Points:

SynthVLM uses a diffusion model to generate synthetic images from captions, providing high-quality, aligned image-text pairs.
The generated data achieves state-of-the-art (SoTA) performance in visual question-answering tasks while reducing computational costs.
It offers significant privacy benefits by not relying on real-world images, thus avoiding security risks.

Methodology:

SynthVLM constructs a synthetic dataset by selecting high-quality captions and generating corresponding images using a diffusion model. These pairs are evaluated using CLIPScore, with the top 10% being selected for training. The pipeline is efficient, requiring only 100,000 curated data points—18% of the size of comparable datasets—to achieve SoTA performance on various benchmarks.

Results:

SynthVLM outperforms previous VLLMs across multiple vision tasks, including image understanding, question answering, and text comprehension. It achieves better results than models trained on larger datasets, demonstrating superior efficiency and alignment with minimal data.

Findings:

The paper proves that high-quality, well-aligned synthetic data can match or surpass the performance of models trained on real-world data while offering substantial efficiency gains and protecting privacy.

For more information, you can read the full paper here.

| Updates From ModelsLab

Flux Model is on ModelsLab

We’re excited to announce that Flux is now live on ModelsLab, with over 100,000 images already generated using this powerful tool! Flux offers three models to choose from: Dev, Schnell, and Merge (a fusion of Pro and Schnell), providing diverse options for all your creative needs.

To access and use Flux, a paid subscription is required, starting at $49/month. Here’s what’s included in this plan:

10,000 API calls per month.
No charges for uploading Flux models.
Ability to upload up to 50 Flux models.

We are also working on the Flux LoRA training model, so stay tuned for updates! In the meantime, check out our blog to see how Flux compares with other models and discover its unique capabilities.

Stay creative with ModelsLab!

Affiliate Program

Join our affiliate program and start earning commissions for your referrals.

Help your network learn more, build more on AI, and get paid for it. Learn more by signing up and checking out your dashboard - https://modelslab.com/

Join Our Community

Join our community on LinkedIn, Instagram, and X and connect with like-minded people who share similar interests and keep tabs on our communications. Share your stories, showcase what you have been working on, and learn from others through our Discord.

| Keep Eyes On This

Juggernaut XI: Enhanced SDXL Model Now Available

RunDiffusion has just rolled out Juggernaut XI, an upgraded version of their highly acclaimed SDXL fine-tuned model. This release introduces several improvements aimed at refining your AI experience:

Better prompt adherence: Ensures more accurate responses to user inputs.
Expanded dataset: Now includes ChatGPT-4 captions for richer context.
Improved text generation: Enhances the overall quality of generated content.
Enhanced style control: Allows for more customizable outputs.

Juggernaut XI, once available only via API, is now open to the public. Meanwhile, the development team is already working on Juggernaut XII and exploring integrations with Flux models.

For more details and downloads, visit:

API access is available through Octo.ai.

Resources:

New Open-Weights Text-to-Video Model: CogVideoX-5B

THUDM has introduced CogVideoX-5B, a cutting-edge open-weights model for text-to-video generation. Key features include:

6-second videos at 720x480 resolution, 8 FPS: Perfect for short content creation.
Complex prompt handling (up to 226 tokens): Generate detailed video outputs.
Multiple inference options: BF16, FP16, and INT8 supported for flexible performance.
Optimized for consumer GPUs: Can run on standard hardware with lower VRAM requirements.

Explore the model on Hugging Face, or visit the GitHub page for in-depth instructions and fine-tuning tips. The ComfyUI wrapper is also available for a more user-friendly interface.

SDXL LoRA Model: Melyn's 3D Render

Creator u/PixarCEO has released their first LoRA model for Stable Diffusion XL, trained exclusively on personal 3D renders developed over the past decade. The model is perfect for creating stunning, detailed 3D artwork:

SDXL compatible: Seamless integration with Stable Diffusion.
Trained on unique 3D artwork: Highly specialized model for professional-level renders.
Future Flux Dev version planned: Keep an eye out for further developments.

Learn more and download the model here: Melyn's 3D Render.

FluxForge v0.1 Update: Advanced LoRA Search Tool

FluxForge has updated its LoRA search tool to version 0.1, providing users with:

Search across Civitai and Hugging Face repositories
Real-time updates every 2 hours
Fast and seamless interface

Note: Initial user-reported errors are being actively addressed. Check it out at FluxForge.

Photoshop Regional Prompt Support for ComfyUI

A new Photoshop extension, sd-ppp, introduces regional prompt support for ComfyUI. This offers precision in AI image generation directly within Photoshop, enhancing the creative process for artists:

Custom nodes for Photoshop: Allows for deeper integration with ComfyUI.
Text-layer support: Use text layers for localized prompting.
Supports dense diffusion: Works with Photoshop's editing tools to generate highly customized results.

Get started with this tool by visiting the Github page.

GenWarp: Generate Novel Views From a Single Image

Sony AI's GenWarp model creates entirely new perspectives of a scene using just one input image. Whether it's real-world photos or illustrations, GenWarp opens up a new world of possibilities:

Generates novel views: Great for both in-domain and out-of-domain images.
No explicit pixel warping: Uses a diffusion model to learn geometric relationships.
3D reconstruction capabilities: Compatible with tools like InstantSplat for 3D rendering.

Try the demo on Hugging Face Spaces, and access the code and pre-trained weights on GitHub.

Flux Latent Detailer Workflow

u/renderartist has shared an innovative ComfyUI workflow called Flux Latent Detailer, which uses latent interpolation to enhance image details without causing over-processing.

Latent interpolation: Generates higher-quality details without compromising composition.
Dev version of Flux.1: Compatible with the latest version of Flux checkpoints.

Download the workflow here: FluxLatentDetailer.

FLUX LoRA Showcase

Explore a variety of FLUX LoRA models designed for different artistic styles:

Stay ahead of the latest AI tools and updates by checking these out!

That’s a wrap for this edition of our AI newsletter! We hope you found the updates useful and engaging. To keep up with the latest AI news and insights, subscribe to our newsletter.

Subscribe Now to get the newest AI developments and exclusive content delivered directly to your inbox. Join our community to stay informed about the future of technology!

| Latest AI News

xAI Colossus supercomputer with 100K H100 GPUs comes online — Musk lays out plans to double GPU count to 200K with 50K H100 and 50K H200

New OpenAI ChatGPT-5 humanoid robot unveiled 1X NEO Beta

Salesforce to acquire AI voice agent firm Tenyx, joining AI talent race

These AI agents are building ‘civilizations’ on Minecraft

OpenAI Japan CEO says the new AI model GPT-Next is coming soon and it will be 100 times better than GPT-4

| Latest Research Papers

Paper 1: MINT-1T: Scaling Open-Source Multimodal Data by 10x

Key Points:

Methodology:

Results:

Findings:

Paper 2: SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models

Key Points:

Methodology:

Results:

领英推荐

Findings:

| Updates From ModelsLab

Flux Model is on ModelsLab

Affiliate Program

Join Our Community

| Keep Eyes On This

Juggernaut XI: Enhanced SDXL Model Now Available

New Open-Weights Text-to-Video Model: CogVideoX-5B

SDXL LoRA Model: Melyn's 3D Render

FluxForge v0.1 Update: Advanced LoRA Search Tool

Photoshop Regional Prompt Support for ComfyUI

GenWarp: Generate Novel Views From a Single Image

Flux Latent Detailer Workflow

FLUX LoRA Showcase

ModelsLab Weekly Rundown

525 位关注者

ModelsLab的更多文章

ML RunDown: Your Weekly Newsletter!

ML RunDown: AI News You Can’t Miss!

ML Rundown: Weekly Highlights and Innovations

ML Rundown: Updates from Google, Tesla, Nvidia, and More!

ML RunDown: Don't Miss These Updates!

ML RUNDOWN: Check Out This Week’s AI Newsletter!

ML Rundown: Your Weekly AI Highlights

ML Rundown: AI News You Can't Ignore!

ML Rundown: Latest AI Models, Research Papers, Business Innovation, and Creative Tech Updates

ML RUNDOWN: Breakthroughs, Funding, and Future Trends

社区洞察

其他会员也浏览了

5 Mind-Blowing AI Advancements You Need to Know About Now

Pushing Past L(LM)its: The AI Evolution of 2024 and What Lies Ahead in 2025

AI Week in Review: Welcome to the Age of Ecosystem Wars

CES 2017: AI Comes to World's Largest Tech Show

Empathy vs. Efficiency: Navigating the AI Healthcare Paradox with NVIDIA & Hippocratic AI

AI Weekly Updates 0113

The $249 AI Supercomputer You Can Hold in Your Hand: NVIDIA’s Latest Breakthrough

Will Nvidia’s AI Advancements Lead Our Tech Industries?

?? AI in Your Pocket: How Soon Will AI Hardware Be in Everyone’s Hands?

NewMind AI Journal Monthly Digest - January'25