AI News #69: Week Ending 01/24/2025 with Executive Summary, Top 63 Links, and Helpful Visuals
About This Week’s Covers
This week's cover represents a dramatic shift in power.? For years, closed frontier models like GPT and Claude have comfortably basked in the spotlight.? This week the world noticed that open-source models like DeepSeek are sneaking up from behind.? New versions of Grok and Llama are coming.? There are no moats.? Open-source is relentless.? Image created with Ideogram, upscaled with Magnific, and edited in Photoshop.
The rest of the covers were created using Claude 3.5 and the Ideogram API, with the theme 'Grim Reaper + category name.' Six of the better ones are below:
This Week’s Executive Summaries
There are three giant stories in this week’s edition. ?The first is the announcement of Stargate, a $500 billion project? between OpenAI, Japanese investment firm SoftBank, Oracle, and the US government. The second is a pretty cool evolution of the GPT tool that allows ChatGPT to use computers.? The third is an inexpensive yet devastatingly powerful model from China called DeepSeek.
Each one of these merits a completely separate newsletter, so I hope you will take your time and read these carefully!
First, I went back and gathered all the headlines I could remember from last year that connect with Stargate and DeepSeek. It’s pretty neat to look at these headlines with 20/20 hindsight.? SoftBank announced they want to go into the chip business back in February, and DeepSeek made its first big splash in September.
A timeline of headlines leading up to DeepSeek and Stargate: https://ethanbholland.com/2025/01/22/48-ai-headlines-leading-up-to-stargate/??
1: Stargate
OpenAI Launches $500 Billion 'Stargate Project' to Build Massive AI Infrastructure
OpenAI and tech giants are joining forces in a historic $500 billion initiative to build new AI computing infrastructure across the United States. The Stargate Project, led by OpenAI and SoftBank, will immediately deploy $100 billion to construct AI computing campuses, starting in Texas. The project brings together tech companies including Microsoft, NVIDIA, Oracle, and Arm (a British semiconductor and software design firm), with SoftBank's Masayoshi Son as chairman.? The project plans to create hundreds of thousands of American jobs, and aims to strengthen U.S. leadership in artificial intelligence while supporting national security interests. The initiative expands on OpenAI's existing partnerships, particularly its long-standing collaboration with NVIDIA dating back to 2016 and its ongoing work with Microsoft's Azure platform.
Stargate Project will invest $500B over the next 4 years - that's ~0.4% of US GDP over that period.
For comparison, the inflation-adjusted dollars spent on other large undertakings:
? Interstate Highway System: ~$650B
? Apollo Program: ~$280B
? Manhattan Project: ~$35B
Masayoshi Son: "Mr. President, last month I came to celebrate your winning and promised $100B. And you told me go for $200B. Now I came back with $500B. This is because as you say, this is the beginning of the Golden Age. We wouldn't have decided this unless you won."?
2: OpenAI Operator??
Before we dive into OpenAI’s Operator announcement, I want to note that Anthropic introduced computer-use capabilities in November.? They also released a standard, the Model Context Protocol, which defines an API structure for AI agents to communicate with web data.? Right now, we're seeing a low-key demo of multimodality (computer vision and mimicry) that demonstrates AI's ability to see and act.? Long term, this will completely change interface design.
Here’s Anthropic’s Model Context Protocol: https://modelcontextprotocol.io/introduction????
For a refresher, here’s a collection of headlines and demos from Anthropic’s November announcements: https://ethanbholland.com/2024/11/29/anthropic-ai-news-week-ending-11-29-2024/
As luck would have it, a new course on using Anthropic’s system came out this week:
OpenAI Unveils Operator: AI Assistant That Browses the Web Like a Human -
OpenAI has launched Operator, an AI agent that is basically a quick multimodality demonstration, “seeing” the browser elements and clicking on buttons and mimicking the way people browse the web.? Operator can navigate web browsers to complete (tenuously, under supervision) tasks like ordering groceries, booking travel, and filling out forms. Powered by their new Computer-Using Agent (CUA) model and GPT-4's vision capabilities, Operator can see and interact with websites like a person - clicking, typing, and scrolling through web pages. Initially only available to pro users in the USA for $200/month, OpenAI is partnering with major companies like DoorDash, Instacart, and Uber. While the system can handle many tasks independently, it's designed to hand control back to users when encountering sensitive actions like payments or login credentials.??
Openai | SullyOmarr My take on this is that Operator and other tools are softening the beaches for the public to start to grasp that AI can see what it’s doing.? It’s no longer a chat bot.? It’s a “see and hear and do” bot.? Long term, there won’t be a need to use the web like people (see Anthropic’s Model Context Protocol).
It’s worth noting that NVIDIA is making humanoid robots their priority because the world is designed for human use cases (driving, sitting, standing, using hands).? OpenAI’s computer use launch takes advantage of the web’s design for people to see and click on.? I find that fascinating.
3: DeepSeek
The biggest story of the week was DeepSeek R-1.? There was a clear trajectory of DeepSeek over the past few months, if you look back at the timeline of headlines leading up to both DeepSeek and Stargate: https://ethanbholland.com/2025/01/22/48-ai-headlines-leading-up-to-stargate???
Chinese AI Startup DeepSeek Challenges OpenAI with Powerful Open-Source Model
DeepSeek released an open-source AI model that matches the performance of OpenAI's latest systems at a fraction of the cost, marking a significant shift in AI accessibility (to put it mildly). The model, DeepSeek-R1, was developed using innovative training methods that overcame China's chip restrictions, costing just $5.6 million to train compared to competitors' hundreds of millions.
Key developments that make this significant:
More than anything, DeepSeek demonstrates that open-source AI is consistently only 6 months behind proprietary models, raising questions about the sustainability of high-priced, closed AI systems.? “Companies charging premium prices for closed models may need to rethink their strategy”, is one subtle way to say it.
Top DeepSeek Reactions Worth Reading
DeepSeek's open frontier significance: "The release of DeepSeek-R1 demonstrates that, for better or worse, any attempt to restrict access to AI by governments is unlikely to work."
Cipher text challenge: "Deepseek R1 thinks for around 75 seconds and successfully solves this cipher text problem from openai's o1 blog post."
"I asked #R1 to visually explain to me the Pythagorean theorem. This was done in one shot with no errors in less than 30 seconds. Wrap it up, its over: #DeepSeek #R1?
Raw chain of thought: "The raw chain of thought from DeepSeek is fascinating, really reads like a human thinking out loud."
"No matter how much you fight it, I find that the visible chain-of-thought from DeepSeek makes it nearly impossible to avoid anthropomorphizing the thing. The visible first-person "thinking" makes you feel like you are reading a diary of a somewhat tortured soul who wants to help?
Math task nailed: "DeepSeek R1 Distill Qwen 7B (in 4-bit) nailed the first hard math question I asked it. Thought for ~3200 tokens in about 35 seconds on M4 Max."
The Rest of the Summaries
Goldman Sachs' New AI Helper Could Replace Human Tasks for 10,000 Bankers
Goldman Sachs is rolling out a new AI assistant to 10,000 employees, with plans to expand it company-wide in 2024. The tool, called GS AI assistant, helps with basic tasks like email writing and code translation, but aims to eventually function like a seasoned Goldman employee. Built using technology from OpenAI, Google, and Meta, the assistant represents a broader trend of major banks embracing AI - JPMorgan and Morgan Stanley have launched similar tools, reaching over 240,000 employees combined. While some experts predict AI could eliminate up to 200,000 banking jobs in the next few years, Goldman's CIO Marco Argenti emphasizes that human workers will remain crucial in training and directing these AI systems.
Cnbc?
AI Leaders Warn of Rapid Intelligence Advances, Point to Surprising Breakthroughs
Top AI executives and researchers are signaling that artificial intelligence may be advancing faster than expected.? Anthropic's CEO suggests AI systems could match or exceed human capabilities as soon as 2027, while leaked test results show OpenAI solving complex problems years ahead of previous estimates. Industry experts emphasize that while these predictions aren't certain, the public and policymakers should take the possibility of rapid AI advancement seriously.
Anthropic CEO Says AI Could Surpass Human Intelligence by 2027
"This prediction (AGI within next couple years) is a common timeline for insiders. There are reasons to not believe them, but I think people are not taking the possibility seriously enough that they may be directionality correct."??
"leaked benchmark: o3 pro solved problems we thought were 5 years away. sam's team is trying to figure out how it did it. something unprecedented is happening."?
AI Visuals and Charts: Week Ending 01/24/2025
These Robot Demonstrations Will Freak You Out
MUST SEE VIDEO: "??Hottest on the Ice | #DEEPRobotics #Lynx Snow Parkour, Stream Crossing #robotdog #robotics #robots #ai #tech?
"Let's reverse engineer this demo. You need 3 things: (1) robust hardware and motor designs that treat simulation as first-class citizen; (2) a human motion capture ("mocap") dataset, such as those for film and gaming characters; (3) massively parallel RL training in?
Google VEO Text-To-Video Examples
"How does veo 2 pull off three nervous women holding knives on the back of a giant caterpillar running through a deserted city looking determined as a tank rolls towards them? If I imagine a thing, I can generate something close. It isn't a replacement for film, it is a new thing?
"The new ability of AI video creators to add real people and products to scenes with just an image is likely to increase the utility (& certainly misuse) of AI video. Here I made Shakespeare at a cafe and the Girl with the Pearl Earring piloting a mech (just as Vermeer intended)?
Other Visuals Worth Seeing This Week - Don’t quit now!
EMO2: End-Effector Guided Audio-Driven Avatar Video Generation https://humanaigc.github.io/emote-portrait-alive-2/
"This is creepy.. It’s an AI tool called GeoSpy that can geolocate photos based on features in the image?
"I am training a Diffusion Feature Extractor with lpips outputs as targets now. It is still cooking, but I pulled an early version out and finetuned Flex.1-alpha for ~2k steps with it. It is really cleaning up the features, especially text. I am super excited about this.?
"The most hilarious application of perception AI has to be creating a Nintendo Wii Tennis rendition of a tennis game to bypass needing streaming rights lol?
Top 63 Links of The Week - Organized by Category?
Agents and Copilots
Agents Webinar Jan 30 Kore.ai - AI for Process
"Most advanced Agentic Researcher by Google. It can draft a plan, search the web, analyze results, and create a well-researched report in under 2 minutes. It's a team of AI Agents that works like a human researcher.?
"Introducing Perplexity Assistant. Assistant uses reasoning, search, and apps to help with daily tasks ranging from simple questions to multi-app actions. You can book dinner, find a forgotten song, call a ride, draft emails, set reminders, and more. Available on Play Store.?
"???? What does Caterpillar construction equipment have to do with AI agents? More than you'd think! @mmitchell_ai explains how the team defined agent capabilities.?
Ph.D.-level AI super-agent breakthrough expected very soon
Anthropic
Introducing Citations on the Anthropic API \ Anthropic
"We've rolled out Citations in the Anthropic API. Citations allows Claude to ground its answers in user-provided information and provide precise references to the sentences and passages used in its responses. Here's how it works:?
"Our first short course with @AnthropicAI! Building Towards Computer Use with Anthropic. This teaches you to build an LLM-based agent that uses a computer interface by generating mouse clicks and keystrokes. Computer Use is an important, emerging capability for LLMs that will let?
Augmented and Virtual Reality (AR/VR)
"Physical AI's progress depends on the development of World Foundation Models (WFMs) – AI systems that simulate real-world environments from text, image, or video inputs. Just two weeks ago, @NVIDIA launched and open-sourced Cosmos WFMs platform. Here's how it works?? The?
"Introducing #NVIDIACosmos, the world foundation model platform built to advance physical #AI. Learn how, through integrations with @NVIDIAOmniverse, developers can create physics-based, geospatially accurate scenarios. Watch the #CES2025 demo ???
领英推荐
"NVIDIA’s Jensen Huang has declared “Physical AI” the next big revolution. What is Physical AI? Think robotics, AR glasses, planetary-scale 3D simulations, and beyond — an entirely new wave of tech that fuses digital intelligence with the real world. Let's break down NVIDIA’s?
Business and Enterprise
Mira Murati’s AI Startup Makes First Hires, Including Former OpenAI Executive | WIRED
Chips, Hardware, and Infrastructure
"NVIDIA’s Jensen Huang has declared “Physical AI” the next big revolution. What is Physical AI? Think robotics, AR glasses, planetary-scale 3D simulations, and beyond — an entirely new wave of tech that fuses digital intelligence with the real world. Let's break down NVIDIA’s?
"NEW VIDEO: Unpacking NVIDIA’s vision for Physical AI — where robotics, AR/VR, and real-world data converge into a $100T opportunity. From preventing disasters to reimagining cities, here’s why it matters (Link in comment below).?
Ethics/Legal/Security
"The thing about open models is that they can, as far as we know, always be jailbroken in a way that gets around their guardrails, as once they are in the wild there are lots of techniques. This applies equally to political censorship as it does to preventing harmful use of AI.?
Sam Altman on X: "thank you to the external safety researchers who tested o3-mini. we have now finalized a version and are beginning the release process; planning to ship in ~a couple of weeks. also, we heard the feedback: will launch api and chatgpt at the same time! (it's very good.)" / X - https://x.com/sama/status/1880356297985638649?
"Geoffrey Hinton warns about the dangers of releasing AI Model weights releasing the weights of large AI models is dangerous, similar to making fissile material available for making bombs. once these models are released, bad actors can use them for harmful purposes but it's too?
"Most advanced Agentic Researcher by Google. It can draft a plan, search the web, analyze results, and create a well-researched report in under 2 minutes. It's a team of AI Agents that works like a human researcher.?
"We are rolling out a new Gemini 2.0 Flash Thinking update: - Exp-01-21 variant in AI Studio and API for free - 1 million token context window - Native code execution support - Longer output token generation - Less frequent model contradictions Try it?
"The Graduate-Level Google-Proof Q&A test (GPQA) is a series of multiple-choice problems that internet access doesn't help PhDs with access to the internet get 34% right on this test outside their specialty, 81% inside their specialty I matched model release dates to scores 1/?
"Breaking news from Text-to-Image Arena! ???? @GoogleDeepMind’s Imagen 3 debuts at #1, surpassing Recraft-v3 with a remarkable +70-point lead! Congrats to the Google Imagen team for setting a new bar! Try the best text2image at LMArena and cast your vote! More analysis???
Imagery
"Breaking news from Text-to-Image Arena! ???? @GoogleDeepMind’s Imagen 3 debuts at #1, surpassing Recraft-v3 with a remarkable +70-point lead! Congrats to the Google Imagen team for setting a new bar! Try the best text2image at LMArena and cast your vote! More analysis???
Locally Run Models
(2) Why o3-mini had to be free: the coming DeepSeek R1, 2.0 Flash, and Sky-T1 Price War
Multimodality
"Alibaba presents: VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Open-sources VideoLLaMA 3, the SotA open-source model on both image and video understanding benchmarks?
[2501.10098v1] landmarker: a Toolkit for Anatomical Landmark Localization in 2D/3D Images
OpenAI
Sam Altman on X: "thank you to the external safety researchers who tested o3-mini. we have now finalized a version and are beginning the release process; planning to ship in ~a couple of weeks. also, we heard the feedback: will launch api and chatgpt at the same time! (it's very good.)" / X - https://x.com/sama/status/1880356297985638649
Open Source/DeepSeek
DeepSeek on X: "?? DeepSeek-R1 is here! ? Performance on par with OpenAI-o1 ?? Fully open-source model & technical report ?? MIT licensed: Distill & commercialize freely! ?? Website & API are live now! Try DeepThink at https://t.co/v1TFy7LHNy today! ?? 1/n https://t.co/7BlpWAPu6y" / X - https://x.com/deepseek_ai/status/1881318130334814301
DeepSeek R1's recipe: (2) DeepSeek R1's recipe to replicate o1 and the future of reasoning LMs
DeepSeek combines RL with multi-stage training: "Reinforcement Learning is all you need! @deepseek_ai R1, an open model that rivals @OpenAI o1 and other models on complex reasoning tasks, just got released."
PSA from Mark Lord: "It takes <2 minutes to set up R1 as a free+offline coding assistant ??♀? Big shoutout to @lmstudio and @continuedev!"
License update: "?? License Update! ?? DeepSeek-R1 is now MIT licensed for clear open access ?? Open for the community to leverage model weights & outputs."
Training pipeline visualization: "Here's my attempt at visualizing the training pipeline for DeepSeek-R1(-Zero) and the distillation to smaller models."
DeepSeek on HuggingChat: "DeepSeek R1 has landed on HuggingChat!"
Most researchers are shocked: "Most AI researchers I talk to have been a bit shocked by DeepSeek-R1 and its performance."
Epoch AI Article: How has DeepSeek improved the Transformer architecture?
"No matter how much you fight it, I find that the visible chain-of-thought from DeepSeek makes it nearly impossible to avoid anthropomorphizing the thing. The visible first-person "thinking" makes you feel like you are reading a diary of a somewhat tortured soul who wants to help?
"The release of DeepSeek-R1 demonstrates that, for better or worse, any attempt to restrict access to AI by governments is unlikely to work. You can get an open frontier model on a USB stick, and the methods outlined by DeepSeek suggest pathways forward for other open models, too." / X
"I asked #R1 to visually explain to me the Pythagorean theorem. This was done in one shot with no errors in less than 30 seconds. Wrap it up, its over: #DeepSeek #R1?
"DeepSeek is a side project ???
"DeepSeek's first-generation reasoning models are achieving performance comparable to OpenAI's o1 across math, code, and reasoning tasks! Give it a try! ?? 7B distilled: ollama run deepseek-r1:7b More distilled sizes are available. ???
"That a second paper dropped with tons of RL flywheel secrets and multimodal o1-style reasoning is not on my bingo card today. Kimi's (another startup) and DeepSeek's papers remarkably converged on similar findings: > No need for complex tree search like MCTS. Just linearize?
"DeepSeek-V3, the company's latest open LLM, surpasses Llama 3.1 405B and GPT-4o on key benchmarks, especially in coding and math tasks. Using a mixture-of-experts architecture with 671 billion parameters, only 37 billion are active at once, DeepSeek V3 was trained at a low cost" / X
"????Announcing @MistralAI new model: Codestral 25.01 - new SOTA coding model, #1 on LMSYS! - Lightweight, fast, and proficient in over 80 programming languages, - Optimized for low-latency, high-frequency usecases - 2x faster than the previous version - Supports tasks such as?
"?? Introducing Kimi k1.5 --- an o1-level multi-modal model -Sota short-CoT performance, outperforming GPT-4o and Claude Sonnet 3.5 on ??AIME, ??MATH-500, ?? LiveCodeBench by a large margin (up to +550%) -Long-CoT performance matches o1 across multiple modalities (??MathVista,?
Buzzy French AI startup Mistral isn't for sale and plans to IPO, its CEO says
Perplexity
andy chung on X: "Today I’m excited to share that @read_cv is joining the team at @perplexity_ai in their mission to make the world's knowledge more accessible to everyone. This is incredibly bittersweet for us, as the start of this new chapter will mark the end of our time with @read_cv. It has https://t.co/6CUinOEGsi" / X - https://x.com/_andychung/status/1880332676013650006
Publishing
"This essay from John Micklethwait is one of the most thoughtful texts I've read recently about the future of journalism. Nuanced, grounded in real newsroom experience.?
Robotics and Embodiment
Watch Nvidia’s Huang Sees AI Robots Boosting Manufacturing - Bloomberg
"Physical AI's progress depends on the development of World Foundation Models (WFMs) – AI systems that simulate real-world environments from text, image, or video inputs. Just two weeks ago, @NVIDIA launched and open-sourced Cosmos WFMs platform. Here's how it works?? The?
"Introducing #NVIDIACosmos, the world foundation model platform built to advance physical #AI. Learn how, through integrations with @NVIDIAOmniverse, developers can create physics-based, geospatially accurate scenarios. Watch the #CES2025 demo ???
"NVIDIA’s Jensen Huang has declared “Physical AI” the next big revolution. What is Physical AI? Think robotics, AR glasses, planetary-scale 3D simulations, and beyond — an entirely new wave of tech that fuses digital intelligence with the real world. Let's break down NVIDIA’s?
"Let's reverse engineer this demo. You need 3 things: (1) robust hardware and motor designs that treat simulation as first-class citizen; (2) a human motion capture ("mocap") dataset, such as those for film and gaming characters; (3) massively parallel RL training in?
Google is building a ‘world modeling’ AI team for games and robots - The Verge
"Projects like OpenAI’s Operator are to the digital world as Humanoid robots are to the physical world. One general setting (monitor keyboard and mouse, or human body) that can in principle gradually perform arbitrarily general tasks, via an I/O interface originally designed for" / X
Unitree on X: "Unitree G1 Bionic: Agile Upgrade ?? Unitree rolls out frequent updates nearly every month. This time, we present to you the smoothest walking and humanoid running in the world. We hope you like it. #Unitree #AGI #EmbodiedAI #AI #Humanoid #Bipedal #WorldModel https://t.co/uM0DWJG5Ii" / X - https://x.com/UnitreeRobotics/status/1879864345615814923
Science and Medicine
"Your brain's next 5 seconds, predicted by AI Transformer predicts brain activity patterns 5 seconds into future using just 21 seconds of fMRI data Achieves 0.997 correlation using modified time-series Transformer architecture ----- ?? Original Problem: Predicting future?
Video News
"Video Depth Anything is out! ?? Real-time inference for arbitrarily long videos with temporal and spatial consistency. Built on the excellent Depth Anything v2 (for images), by "simply" replacing the head and adjusting the loss for temporal consistency. Videos from the project?
Luma Ray2
"Luma Labs released Ray2, its next-gen AI for generating 10s videos with advanced motion quality and physics realism Ray2 understands complex object interactions, including water physics Now, the question is which lab will crack longer-length outputs?