AI/ML news summary: week 34

Marco van Hurne

Partnering with the most innovative AI and RPA platforms to optimize back office processes, automate manual tasks, improve customer service, save money, and grow profits.

发布日期: 2024年8月25日

Here are the articles, guides, and news about AI; Week 34. I read tons of RSS feeds and blogs, so you won't have to scour the internet yourself for the latest AI news of this week.

Before we start!

If you like this topic and you want to support me:

Comment on the article; LinkedIn appreciates that and it will really help spread the word ??
Connect with me on Linkedin ??
Subscribe to TechTonic Shifts to get your daily dose of tech ??

The AI landscape was moving fast this week. GPT-4 class models are becoming more prevalent because now xAI has joined the ranks of OpenAI, Anthropic, DeepMind, Meta, Mistral, and DeepSeek.

Yet only the first four offer multimodal capabilities. Anthropic's new context caching feature is particularly noteworthy because it can significantly reduce costs for reused input tokens. This innovation opens up new possibilities for complex LLM agent pipelines.

Sakana AI's "The AI Scientist" is another intriguing development this week.

This LLM agent is designed to assist in machine learning research. It brainstorms ideas, conducts literature searches, executes experiments, and writes research papers. I have tried it and it is really impressive. The quality of the output is not yet that groundbreaking, but the cost-effectiveness is impressive.

But to be hones, I've seen a tsunami of research journals with low-quality AI-generated content these past few months. This will pose a threat to research integrity.

Sakana's implementation is active in the broader discussion on "inference-time scaling laws." Some experts argue that scaling alone will not lead to AGI. But a lot of different approaches are being explored to improve LLM capabilities without it increasing training budgets. And the training budgets is 70% of the LLM cost. So this is a big, big plus!

Agent pipelines or research breakthroughs could yield new capabilities and even small-scale experiments managed by LLM agents will start producing insights that can be scaled and integrated into state-of-the-art models.

I think that the use of AI in scientific research is a sensitive topic though.

Man scientists are hesitant (rightly so) to delegate human work to AI at this time. But Sakana's agent functions more as a tool to improve or even amplify human researchers! It works best when it is guided by an experienced AI scientist with promising ideas and codebases in stead of doing the grunt work for you. I think that responsible use of such agents will accelerate research because it will allow human researchers to focus on distilling the most promising experimental results.

Rest of the news

xAI’s Grok-2 Beta Release: xAI launched Grok-2 Beta, featuring Grok-2 and Grok-2 mini models on ??. Grok-2 shows significant improvements, scoring 75.5% on MMLU-Pro, surpassing Grok-1.5 and even GPT-4o. An enterprise API will soon be available, offering enhanced security and low-latency access.
Anthropic Introduced Prompt Caching: Anthropic’s API now offers prompt caching, reducing costs by up to 90% and latency by up to 85% for long prompts. Available in public beta for Claude 3.5 Sonnet and Claude 3 Haiku.
Perplexity Answers 250 Million Questions a Month: Perplexity, an AI search engine, handled 250 million queries last month, signaling growing interest in AI-driven search despite Google's dominance.
Runway ML Releases Gen-3 Alpha Turbo: Runway ML’s Gen-3 Alpha Turbo, now officially released, is seven times faster and half the cost of Gen-3 Alpha. Available to all users, with more improvements expected.
OpenAI Introduced SWE-Bench Verified: OpenAI released SWE-Bench Verified, a subset of the SWE-Bench benchmark with human verification, to more reliably evaluate AI models’ ability to solve real-world software issues.
xAI’s Controversial Image Generator: xAI’s Grok 2 chatbot on ?? integrates Black Forest Labs’ Flux model for image generation with minimal restrictions, sparking debate over digital safety and AI regulation.
Multion Introduces Agent Q: MultiOn launched Agent Q, an autonomous AI agent with self-healing and planning capabilities. It leverages MCTS, AI self-critique, and RLFH for complex reasoning and decision-making.
Google’s Upgraded AI Image Generator: Google released Imagen 3, its latest AI text-to-image generator, in the US. Available on Google’s AI Test Kitchen, it promises better detail, richer lighting, and fewer artifacts.

Short readings

1.How To Prune and Distill Llama-3.1 8B to an NVIDIA Llama-3.1-Minitron 4B Model

This is a guide on refining the Llama-3.1 8B language model into a compact 4B version using NVIDIA’s structured compression techniques, including weight pruning and knowledge distillation. This approach yields a resource-efficient Llama-3.1-Minitron 4B that delivers high performance on benchmarks while cutting down on computational expenses.

2. Why I Bet on DSPy

DSPy is an open-source framework that facilitates the coordination of multiple LLM calls to tackle complex issues. It offers verifiable feedback to enhance practical solution deployment. The framework is currently improving reliability and user accessibility to strengthen its utility and continued development within the AI community. This article provides insight into how DSPy forces you to think about the problems with LLMs.

3. Review: ChatGPT’s New Advanced Voice Mode

ChatGPT’s new Advanced Voice Mode enhances speech understanding and production, outperforming predecessors and competitors like Siri and Alexa. In this article, the author reviewed the basics of Advanced Voice Mode and explored a few use cases that underscore the leap-forward nature of this technology.

4. The Workflow of PEFT

PEFT is a method designed to fine-tune large models more efficiently by focusing on a subset of parameters. This blog looks under the hood of the PEFT library to better understand how things work and explores how to create a base model and use it to build a LoRA model.

5. Free Tools Every ML Beginner Should Use

This article highlights some of the essential tools that every beginner — or person willing to get started — with ML should use. It introduces tools such as Jupyter Notebook, Hugging Face and Transformers, Kaggle, and more.

6. A Crash Course of Model Calibration — Part 1

Many experiments have revealed that modern neural networks are often not well-calibrated. A model is perfectly calibrated if the predicted probabilities of outcomes align closely with the actual outcomes. This article explores how to make ML models reflect true probabilities in their predictions.

7. Synthetic Data Solves AI’s Biggest Problem

This article discusses how synthetic data is a useful application of AI technology already delivering real, tangible value to customers. Unlike fake data, synthetic data supports data-driven business systems throughout their lifecycle, mainly where ongoing access to production data is impractical or ill-advised.

Towards AI 1 个月前

AI Weekly #4

Steve Nouri 2 年前

Ahead of AI #4: A Big Year For AI

Sebastian Raschka, PhD 1 年前

Tools

Qwen 2 is the official repository of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Deep Live Cam allows real-time face swap and one-click video deepfake with only a single image.
LongWriter dataset contains 6,000 SFT data with ultra-long output ranging from 2k-32k words.
SWE Agent takes a GitHub issue and tries to automatically fix it using GPT-4 or your LM of choice.
Fabric is an open-source framework for augmenting humans using AI.
MiniCPM-V is a GPT-4V-level MLLM for a single image, multi-image, and video on your phone.
Tinygrad is a deep learning framework that is like a blend of PyTorch and micrograd.

Research papers

1. Imagen 3

This is the official paper for Google’s Imagen 3, a latent diffusion model that generates high-quality images from text prompts. The paper discusses their quality and responsibility evaluations, issues around safety and representation, and methods used to minimize the potential harm of the models.

2. The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Researchers from Sakana AI, Oxford, University of British Columbia, and several other institutions published a paper unveiling the AI Scientist, a pipeline for open-ended scientific research using LLMs.

3. Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

Microsoft Research published a paper introducing rStar, a self-play multi-reasoning approach that improves reasoning capabilities in small language models. rStar uses a generation-discrimination process to decouple the different steps in the reasoning process

4. Causal Agent based on Large Language Model

This paper explores the difficulty of large language models in mastering causal reasoning and addresses the issue by introducing a Causal Agent. This agent, enhanced with causal reasoning techniques and memory components, shows proficiency in tackling various causal problems.

5. Tree Attention: Topology-Aware Decoding for Long-Context Attention on GPU Clusters

The paper presents a topology-aware decoding approach that improves long-context attention in transformer models on GPU clusters. It connects self-attention to energy-based models, leading to parallel GPU computation, significantly faster processing, reduced inter-GPU communication, and lower memory consumption.

6. Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities

The paper reviews model merging strategies in machine learning, underscoring their cost-effectiveness and minimal resource usage. It introduces a new classification system for these techniques, detailing their use in language models, continual learning, and multi-task learning. It points out existing literature deficits, current obstacles, and potential areas for future study.

7. Med42-v2: A Suite of Clinical LLMs

This paper introduces Med42-v2, an advanced clinical large language model based on the Llama3 architecture. It is tailored for healthcare with specialized data and preference alignment and surpasses its predecessor and GPT-4 in medical query performance.

Links

1. Nvidia will train 100,000 California residents on AI in a first-of-its-kind partnership. The program focuses on training students, educators, and workers, supporting job creation and promoting innovation, and using AI to solve challenges that can improve the lives of Californians

2. Midjourney releases a new unified AI image editor on the web. It combines inpainting, outpaining/canvas extension, and more into a single view. The new web editor is now live and available to all users who have created at least ten images on the platform. Users can access this tool by visiting midjourney.com/imagine.

3. Lambda has partnered with Nous Research to launch Hermes 3, a new fine-tuned version of Meta’s open-source Llama 3.1–405 billion parameter large language model (LLM). Hermes 3 offers an unlocked, uncensored, open weights model designed to be highly steerable, enabling users to tailor the model’s responses to their individual needs.

Signing off - Marco

Well, that's a wrap for today. Tomorrow, I'll have a fresh episode of TechTonic Shifts for you. If you enjoy my writing and want to support my work, feel free to buy me a coffee ??

Think a friend would enjoy this too? Share the newsletter and let them join the conversation. LinkedIn appreciates your likes by making my articles available to more readers.

Top-rated articles

TechTonic Shifts

2,015 位关注者

Akriti Singh

1 个月

yes I think use of AI in research can accelerate the research work

2 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

AI/ML news summary: week 34

Marco van Hurne

Partnering with the most innovative AI and RPA platforms to optimize back office processes, automate manual tasks, improve customer service, save money, and grow profits.

Before we start!

Rest of the news

Short readings

领英推荐

Tools

Research papers

Links

Top-rated articles

TechTonic Shifts

2,015 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

The Future of Artificial Intelligence: An Analysis of Eric Schmidt's Predictions

The Future of Artificial Intelligence: An Analysis of Eric Schmidt's Predictions

The Fight to Stop AI Hallucinations

The 'Gorilla Problem' and beyond ...

First Principles Thinking in AI: A Progressive Leap from Product Thinking to Fundamental Innovations in AI

Interesting Content in AI, Software, Business, and Tech- 08/07/2024

Humanity's Last Exam: The Ultimate Challenge for Artificial Intelligence

Fear of AI is Overblown

AI: Holds More Information & Knowledge Than Any Human

??Boom! Is GPT-4.5 Here?

Before we start!

Rest of the news

Short readings

领英推荐

Tools

Research papers

Links

Top-rated articles

TechTonic Shifts

2,015 位关注者

The Dutch government. Another lesson in how not to do AI (including petition)

2024年10月9日

A European train wreck with AI in the driver’s seat

2024年10月8日

How to ID Strangers in seconds using Rayban | Meta Smart Glasses (warning: nerd alert)

2024年10月7日

AI/ML news summary: Week 40

2024年10月6日

TechTonic Shifts officially has a podcast!

2024年10月6日

Rabbit R1: a lesson in MVP marketing (and why I ended up buying one today)

2024年10月5日

What would I ask an AI, suspected of being conscious!

2024年10月5日

Google NotebookLM has a scary good AI podcast generation tool....

2024年10月4日

LinkedIn siphons user information to Train AI, skips disclosure

2024年10月3日

Why I am using Claude Artifacts.

2024年10月2日

社区洞察

其他会员也浏览了

The Future of Artificial Intelligence: An Analysis of Eric Schmidt's Predictions

The Future of Artificial Intelligence: An Analysis of Eric Schmidt's Predictions

The Fight to Stop AI Hallucinations

The 'Gorilla Problem' and beyond ...

First Principles Thinking in AI: A Progressive Leap from Product Thinking to Fundamental Innovations in AI

Interesting Content in AI, Software, Business, and Tech- 08/07/2024

Humanity's Last Exam: The Ultimate Challenge for Artificial Intelligence

Fear of AI is Overblown

AI: Holds More Information & Knowledge Than Any Human

??Boom! Is GPT-4.5 Here?