AI/ML news summary: week 34
Marco van Hurne
Partnering with the most innovative AI and RPA platforms to optimize back office processes, automate manual tasks, improve customer service, save money, and grow profits.
Here are the articles, guides, and news about AI; Week 34. I read tons of RSS feeds and blogs, so you won't have to scour the internet yourself for the latest AI news of this week.
Before we start!
If you like this topic and you want to support me:
The AI landscape was moving fast this week. GPT-4 class models are becoming more prevalent because now xAI has joined the ranks of OpenAI, Anthropic, DeepMind, Meta, Mistral, and DeepSeek.
Yet only the first four offer multimodal capabilities. Anthropic's new context caching feature is particularly noteworthy because it can significantly reduce costs for reused input tokens. This innovation opens up new possibilities for complex LLM agent pipelines.
Sakana AI's "The AI Scientist" is another intriguing development this week.
This LLM agent is designed to assist in machine learning research. It brainstorms ideas, conducts literature searches, executes experiments, and writes research papers. I have tried it and it is really impressive. The quality of the output is not yet that groundbreaking, but the cost-effectiveness is impressive.
But to be hones, I've seen a tsunami of research journals with low-quality AI-generated content these past few months. This will pose a threat to research integrity.
Sakana's implementation is active in the broader discussion on "inference-time scaling laws." Some experts argue that scaling alone will not lead to AGI. But a lot of different approaches are being explored to improve LLM capabilities without it increasing training budgets. And the training budgets is 70% of the LLM cost. So this is a big, big plus!
Agent pipelines or research breakthroughs could yield new capabilities and even small-scale experiments managed by LLM agents will start producing insights that can be scaled and integrated into state-of-the-art models.
I think that the use of AI in scientific research is a sensitive topic though.
Man scientists are hesitant (rightly so) to delegate human work to AI at this time. But Sakana's agent functions more as a tool to improve or even amplify human researchers! It works best when it is guided by an experienced AI scientist with promising ideas and codebases in stead of doing the grunt work for you. I think that responsible use of such agents will accelerate research because it will allow human researchers to focus on distilling the most promising experimental results.
Rest of the news
Short readings
This is a guide on refining the Llama-3.1 8B language model into a compact 4B version using NVIDIA’s structured compression techniques, including weight pruning and knowledge distillation. This approach yields a resource-efficient Llama-3.1-Minitron 4B that delivers high performance on benchmarks while cutting down on computational expenses.
DSPy is an open-source framework that facilitates the coordination of multiple LLM calls to tackle complex issues. It offers verifiable feedback to enhance practical solution deployment. The framework is currently improving reliability and user accessibility to strengthen its utility and continued development within the AI community. This article provides insight into how DSPy forces you to think about the problems with LLMs.
ChatGPT’s new Advanced Voice Mode enhances speech understanding and production, outperforming predecessors and competitors like Siri and Alexa. In this article, the author reviewed the basics of Advanced Voice Mode and explored a few use cases that underscore the leap-forward nature of this technology.
PEFT is a method designed to fine-tune large models more efficiently by focusing on a subset of parameters. This blog looks under the hood of the PEFT library to better understand how things work and explores how to create a base model and use it to build a LoRA model.
This article highlights some of the essential tools that every beginner — or person willing to get started — with ML should use. It introduces tools such as Jupyter Notebook, Hugging Face and Transformers, Kaggle, and more.
Many experiments have revealed that modern neural networks are often not well-calibrated. A model is perfectly calibrated if the predicted probabilities of outcomes align closely with the actual outcomes. This article explores how to make ML models reflect true probabilities in their predictions.
This article discusses how synthetic data is a useful application of AI technology already delivering real, tangible value to customers. Unlike fake data, synthetic data supports data-driven business systems throughout their lifecycle, mainly where ongoing access to production data is impractical or ill-advised.
领英推荐
Tools
Research papers
This is the official paper for Google’s Imagen 3, a latent diffusion model that generates high-quality images from text prompts. The paper discusses their quality and responsibility evaluations, issues around safety and representation, and methods used to minimize the potential harm of the models.
Researchers from Sakana AI, Oxford, University of British Columbia, and several other institutions published a paper unveiling the AI Scientist, a pipeline for open-ended scientific research using LLMs.
Microsoft Research published a paper introducing rStar, a self-play multi-reasoning approach that improves reasoning capabilities in small language models. rStar uses a generation-discrimination process to decouple the different steps in the reasoning process
This paper explores the difficulty of large language models in mastering causal reasoning and addresses the issue by introducing a Causal Agent. This agent, enhanced with causal reasoning techniques and memory components, shows proficiency in tackling various causal problems.
The paper presents a topology-aware decoding approach that improves long-context attention in transformer models on GPU clusters. It connects self-attention to energy-based models, leading to parallel GPU computation, significantly faster processing, reduced inter-GPU communication, and lower memory consumption.
The paper reviews model merging strategies in machine learning, underscoring their cost-effectiveness and minimal resource usage. It introduces a new classification system for these techniques, detailing their use in language models, continual learning, and multi-task learning. It points out existing literature deficits, current obstacles, and potential areas for future study.
This paper introduces Med42-v2, an advanced clinical large language model based on the Llama3 architecture. It is tailored for healthcare with specialized data and preference alignment and surpasses its predecessor and GPT-4 in medical query performance.
Links
1. Nvidia will train 100,000 California residents on AI in a first-of-its-kind partnership. The program focuses on training students, educators, and workers, supporting job creation and promoting innovation, and using AI to solve challenges that can improve the lives of Californians
2. Midjourney releases a new unified AI image editor on the web. It combines inpainting, outpaining/canvas extension, and more into a single view. The new web editor is now live and available to all users who have created at least ten images on the platform. Users can access this tool by visiting midjourney.com/imagine.
3. Lambda has partnered with Nous Research to launch Hermes 3, a new fine-tuned version of Meta’s open-source Llama 3.1–405 billion parameter large language model (LLM). Hermes 3 offers an unlocked, uncensored, open weights model designed to be highly steerable, enabling users to tailor the model’s responses to their individual needs.
Signing off - Marco
Well, that's a wrap for today. Tomorrow, I'll have a fresh episode of TechTonic Shifts for you. If you enjoy my writing and want to support my work, feel free to buy me a coffee ??
Think a friend would enjoy this too? Share the newsletter and let them join the conversation. LinkedIn appreciates your likes by making my articles available to more readers.
Top-rated articles
Front-End Web Developer | Web designing | HTML5 | CSS3 | JS | Power bi
1 个月yes I think use of AI in research can accelerate the research work