AI/ML news summary: week 30
Marco van Hurne
Partnering with the most innovative AI and RPA platforms to optimize back office processes, automate manual tasks, improve customer service, save money, and grow profits.
It can be a nightmare staying on top of all the things that are happening in the world of AI. So, here I present you with this week's articles, guides, and news about AI. It's curated so you won't have to scour the internet yourself.
?? let’s jump right in!
If you like this article and you want to support me:
This week's news in brief.
We just had a crazy week in the world of foundation language models.
First off, GPT-4o mini dropped, and it's a game-changer. It's way faster and cheaper than GPT-4o. Can you believe it's almost 3 times cheaper than GPT-3.5 Turbo? That's nuts!
But that's not all. Apparently, Llama 3.1's 405B model is giving the top closed API models a run for their money. Some leaks showed it beating them on certain metrics. Who saw that coming?
Even Elon Musk is getting in on the action.
He hinted that xAI is stepping up its game. They've got Grok 2 coming in August and a massive GPU data center training Grok 3 as we speak. Things are heating up!
OpenAI's GPT-4o mini is a total game-changer for many applications. It's super fast, cheap, and can handle up to 16k output tokens. That's a big deal for stuff like translation and code conversion.
Yet open-source models like LLaMA still have some advantages. They're better for data security, privacy, and flexibility. Plus, there are ways to make inference cheaper, like quantization. Together.ai and Groq are already on it.
META isn't backing down, though.
They just released LLama 3.1, and the leaks say it's giving GPT-4o and Claude 3.5 Sonnet a run for their money on MMLU-Pro. The 8B and 70B models got some serious upgrades too, especially for coding.
It's getting easier to train LLMs with all the big H100 clusters, open datasets, and new techniques out there. But it's not all smooth sailing. A new study found that up to 25% of high-quality AI training data has been restricted in the past year. Publishers are catching on and trying to protect their content.
It's a bit of a double-edged sword. Companies are closing off access to their websites, but they're also making bank by selling data to big AI labs. It's a crazy incentive structure that's changing the game.
Why is it interesting
2024 saw a trend of rapid LLM cost reduction. OpenAI’s GPT-4o mini averages about 140x cheaper than GPT-4 was at its release just 16 months ago while at the same time, it is also performing better on most benchmarks. It is also 230x cheaper and vastly better than the GPT-3 Da Vinci 002, released in August 2022 and the best model at the time. Matt Holden noted on x/twitter that in the early days of cloud storage — in its first decade (2006–2016), Amazon S3 cost per GB of storage dropped 86% (or ~97%, including Glacier). The speed of AI cost reduction is dramatically faster, potentially enabling much more rapid adoption relative to cloud computing.
With GPT-4o-mini, now anybody essentially has access to unlimited interns who can each read ~70,000 words and write 16,000 words for $0.02 in under one minute.
This is an incredibly valuabletool to help people in performing work tasks. Many tasks also still require a lot of work on data preparation, prompting, fine-tuning, RAG, tool use, and surrounding software and UI/UX to get LLMs to a sufficient level of reliability.
LLM adoption is only getting started, and the increased level of competition will only accelerate it further!
Hot news
It can be a nightmare staying on top of all the things that are happening in the world of AI. Just look at the stuff that happened in under a week !
1. OpenAI Unveils GPT-4o Mini, a Smaller and Cheaper AI Model. OpenAI just dropped GPT-4o mini, and it's a total game-changer. This little model is cheaper and faster than their top-of-the-line models, but it still is powerful. It's scoring 82% on MMLU and beating GPT-41 in chat preferences on the LMSYS leaderboard. You can try it out right now on the ChatGPT web and mobile app.
2. Progress at xAI on Grok-2 and a 100k Training Cluster for Grok-3. Meanwhile, over at xAI, they're making some serious moves. Elon Musk spilled the beans on Grok-2, which finished training in June using a whopping 15,000 H100s. They're aiming to match GPT-4's capabilities when they release it in August. But that's not all! xAI's new Memphis data center is training Grok-3 on a 100k liquid-cooled H100s on a single RDMA fabric. Musk is claiming it's the most powerful AI training cluster in the world, and they're racing to get Grok-3 out by December.
3. Mistral AI and NVIDIA Unveil Mistral NeMo 12B, a Cutting-Edge Enterprise AI Model Mistral AI and NVIDIA teamed up to bring us Mistral NeMo 12B, a 12B model with a crazy 128k context length. They're using a new tokenizer called Tekken that's based on Tiktoken and trained on over 100 languages. It's way more efficient than the old SentencePiece tokenizer. You can find the weights for the base and instruct models on HuggingFace.
4. Groq Introduces Two New Open-Source Models Specifically Designed for Tool Use. Groq is getting in on the action with two new open-source models built for tool use: Llama-3-Groq-70B-Tool-Use and Llama-3-Groq-8B-Tool-Use. They used full fine-tuning and Direct Preference Optimization (DPO) to make these babies sing, and they did it all without using any user data.
5. Andrej Karpathy Introduced Eureka Labs. Remember Andrej Karpathy, the AI whiz who used to run the show at Tesla and OpenAI? Well, he's back with Eureka Labs, an "AI native" education platform. They're kicking things off with a traditional AI course called LLM101n.
6. Microsoft Unveiled SpreadSheet LLM. Microsoft is getting into the spreadsheet game with "SpreadsheetLLM." This new AI model is all about understanding and working with spreadsheets, tackling the challenges of applying AI to this complex and super popular format.
7. A Survey on the Adoption of ChatGPT. A massive survey in Denmark looked at how 100,000 workers from 11 different jobs are using ChatGPT. Get this: it could cut working times in half for 37% of the tasks the average worker does. ChatGPT is everywhere in jobs that are exposed to it, with 79% of software developers and 34% of financial advisors using it. But women are 20% less likely to use ChatGPT than men, and the people who were already making more money are the ones using it the most.
8. Apple Released DCLM-7B, a 7B Open-Source LLM Trained on the DCLM-Baseline Dataset Apple just dropped DCLM-7B, a 7B open-source LLM that's totally transparent. I mean fully open weights, training code, and dataset. They trained this model on 2.5T tokens from open datasets. While it might not be beating the other open-source models out there just yet, this is a big deal. It's the first time we've seen a cleaned, high-quality dataset of this size go fully open. This could make it way easier to experiment with new model architectures, but you'll still need some serious cash to train a model on a dataset this huge.
9. Together AI Announced Inference Engine 2.0, a New Inference Stack. Together AI is shaking things up with their new inference stack, Inference Engine 2.0. This thing is fast! It is giving 4x faster decoding throughput than open-source vLLMs. On Meta Llama 3 8B, it's generating over 400 tokens per second. They've also got Together Turbo and Together Lite endpoints now, so you can pick your performance, quality, and price sweet spot. You can try out these endpoints with Llama 3 models right now.
10 minutes learning
To achieve optimal model performance in machine learning projects, defining the problem, understanding the context, and analyzing the dataset in detail is important. This article outlines five actionable tips essential for training machine learning models.
This quick and easy-to-follow tutorial explains the process of training a language model using the Llama model architecture and the Transformers library.
领英推荐
This is a step-by-step video tutorial for creating a coding agent with GPT-4o-mini. It shows how you can use Claude Engineer to create your own custom version with GPT4o mini as the model.
LLMs always “make stuff up”: we call it a hallucination when the output is noticeably incorrect or inappropriate. This article identifies two main categories of hallucinations, factuality and faithfulness hallucinations, and highlights their impact, mitigation strategies, best practices, and more.
This article series will teach you how to build a multi-agent AI app. In the first part, the author starts with a single agent as a proof of concept (PoC). They use a function-calling agent, with each function responsible for a specific data retrieval algorithm, and leverage existing tools like AWS Bedrock and Slack to streamline knowledge sharing within our organization.
6. Long Context
This guide explores the basics of the context window, how developers should think about long context, various real-world use cases for long context, and ways to optimize the usage of long context.
Tools
Scientific papers
This paper presents DiT-MoE, a sparse version of the diffusion Transformer. It is scalable with dense networks and exhibits highly optimized inference. It includes two simple designs: shared expert routing and expert-level balance loss, thereby capturing common knowledge and reducing redundancy among the routed experts.
This paper introduces the Graph Foundation Model, trained on 152 datasets with over 7.4 million nodes and 189 million edges spanning diverse domains. This work shows that multi-graph pretraining can significantly reduce the burden imposed by the current graph training paradigm by creating a single generalist model that performs competitively across a wide range of datasets and tasks.
This paper presents Reliable and Efficient Concept Erasure (RECE), a novel method that erases inappropriate content from diffusion models in just 3 seconds without needing extra fine-tuning. RECE leverages a closed-form solution to derive new target embeddings, which can regenerate erased concepts within the unlearned model.
This paper presents a new way of implementing a neural network with an optical system, which could make machine learning more sustainable. It relies on linear wave scattering and yet achieves non-linear processing with a high expressivity. The key idea is to inject the input via physical parameters that affect the scattering processes.
In this paper, researchers trained strong language models to produce text that is easy for weak language models to verify and found that this training also made the text easier for humans to evaluate. When the problem-solving process of strong models is optimized for getting the correct answer, the results can become harder to understand. This finding highlights the importance of correctness, clarity, and ease of verification in AI-generated text.
This paper introduces Embodied Chain-of-Thought Reasoning (ECoT) for vision-language-action models. ECoT enhances the decision-making capabilities of robot control systems by enabling them to reason about tasks, sub-tasks, and their environment before taking action.
Short links
1. Towards AI has partnered with O’Reilly to make our latest resources more accessible on their learning platform. Through this partnership, our latest book, “Building LLMs for Production,” and two exclusive ‘shortcut’ video series on LLMs and Generative AI research are now available on the O’Reilly learning platform.
2. Google, OpenAI, Microsoft, Amazon, and others are joining the Coalition for Secure AI (CoSAI). The initiative addresses a “fragmented landscape of AI security” by providing access to open-source methodologies, frameworks, and tools. Other companies joining CoSAI include IBM, PayPal, Cisco, and Anthropic.
3. Fei-Fei Li, the renowned computer scientist known as the “godmother of AI,” has created a startup dubbed World Labs. It’s valued at more than $1 billion in just four months. World Labs hopes to use human-like visual data processing to make AI capable of advanced reasoning.
4. In a new funding round, Cohere was valued at $5.5 billion, making it one of the world’s most valuable artificial intelligence companies and one of the largest startups in Canada. The company has also raised $500 million in Series D funding.
5. Nvidia is preparing a version of its new flagship AI chip for the Chinese market. Nvidia will work with Inspur, one of its major distributor partners in China, on the launch and distribution of the chip, tentatively named the “B20.”
6. In the latest study, OpenAI researchers found prover-verifier games improve legibility of language model outputs. They explored training strong language models to produce text that is easy for weak language models to verify, and this training also made the text easier for humans to evaluate.
Well, that's a wrap for today. Tomorrow, I'll have a fresh episode of TechTonic Shifts for you. If you enjoy my writing and want to support my work, feel free to buy me a coffee ??
Think a friend would enjoy this too? Share the newsletter and let them join the conversation. LinkedIn appreciates your likes by making my articles available to more readers.
Signing off - Marco
Top-rated articles: