AI/ML news summary: week 35
Marco van Hurne
Architect of AI solutions that improve business efficiency and client engagement.
Here are the articles, guides, and news about AI; Week 35. I read tons of RSS feeds and blogs, so you won't have to scour the internet yourself for the latest AI news of this week.
An explanation in plain English, or my comments, read the quotes.
TL;DR
OpenAI now offers fine-tuning for GPT-4o. This lets developers customize their models for better performance. It has free training tokens available until September 23. Microsoft launched three open-source models in its Phi 3.5 series to improve their multilingual and scientific tasks. OpenAI partnered with Condé Nast to integrate SearchGPT. Whti this act, they want to improve search and content reliability.
AI21 Labs released Jamba 1.5 models. They combined Transformer and State Space Model architectures. Jamba 1.5 Mini is crazy enough, outperforming larger models on benchmarks. Nvidia introduced StormCast. That is an AI model improving weather forecasts by 10%, aiding disaster planning. Anthropic's Claude reached $1 million in app sales in 16 weeks but has a lot of competition as Apple integrates ChatGPT. And Nvidia's Llama-3.1-Minitron 4B, created through pruning and distillation, matches even larger models' performance with greater efficiency.
So in short, this is a crazy week for Short Language Models !
Nous Research published a report on DisTrO, new distributed optimizers that reduce inter-GPU communication significantly. And that boosts multi-location training (I'll explain that below). Amazon's AI tool, Amazon Q, added a code transformation feature, and that is saving 4,500 developer years and $260 million in system upgrades (I wrote about that here: Anticipating AI's next move ? article ③ ? ) . Google DeepMind and Imperial College London developed FermiNet. That is a cool neural network that accurately models molecular energies, which is advancing my favorite topic: quantum chemistry. Jina AI introduced "Late Chunking", and that is a new data retrieval technique that uses contextual embeddings for better search performance.
An explanation in plain English, of all of this mumbo jumbo, down below
Before we start!
If you like this topic and you want to support me:
Latest AI and ML Developments
1. Custom Fine-Tuning for GPT-4o Now Offered by OpenAI
OpenAI has introduced fine-tuning options for GPT-4o. This is only cool for developers because it lets them customize models for specific needs. That means a better performance and less cost. This feature is available to users on a paid plan, with free daily training tokens offered until September 23 (pffff). (Source: Original article on OpenAI's fine-tuning announcement)
2. Microsoft Introduces New Phi 3.5 AI Models
Microsoft launched three new open-source AI models under its Phi 3.5 series: mini-instruct, MoE-instruct, and vision-instruct. These models improve reasoning capabilities for multilingual tasks and scientific research, particularly in handling long documents (like this one). They have challenges with factual accuracy and potential bias. Now Microsoft suggests using these models with retrieval-augmented generation systems for the best results, especially in environments with limited resources. (Source: Original article on Microsoft’s Phi 3.5 models)
In plain English: Microsoft released three AI models in its Phi 3.5 series. These models are good at handling multilingual tasks and long documents but they can struggle with accuracy and bias.
3. OpenAI Partners with Condé Nast for Enhanced Search Features
OpenAI has teamed up with Condé Nast to incorporate SearchGPT into the publisher’s platforms.
This partnership gets OpenAI the much needed dough, and Conde some cool search capabilities and credibility of their content.
The collaboration is a strategic move to help media companies manage the financial impacts of rapid technological changes (read: repay media companies for the stolen content to train their LLM)
(Source: Original article on OpenAI and Condé Nast partnership)
4. AI21 Labs Launches Jamba 1.5 Models for Long-Context AI
AI21 Labs has released a new set of models called Jamba 1.5, which blends Transformer and State Space Model architectures. The series includes two versions: Mini (12B active/52B total) and Large (94B active/398B total) MoE models. Jamba 1.5 Mini is a leader in its class. It has achieved a score of 46.1 on the Arena Hard benchmark. They are moving ahead of larger models like Mixtral 8x22B and Command-R+. The Arena Hard benchmark, measures a model's ability to handle challenging language understanding and reasoning tasks. The benchmark hasn't got a maximum. (Source: Original article on AI21 Labs' Jamba 1.5 models)
In plain English: AI21 Labs released the Jamba 1.5 models, blending Transformer and State Space architectures. The Mini version (12B active) leads its class with a 46.1 score on the Arena Hard benchmark, beating larger models. This benchmark tests how well models handle complex language tasks.
Readers, be honest in the comments.. Do you really care about stuff like the above? Just testing the waters with this little litmus test.
5. Nvidia Unveils StormCast AI for Weather Prediction
Nvidia has introduced StormCast. That is a new AI model on its Earth-2 platform. It wants to improve mesoscale weather forecasting by simulating atmospheric dynamics. This is a lot of blabla, but what it means is that this model improves prediction accuracy by 10% over traditional six-hour forecasts. Which is helping people in more effective disaster planning.
With this release, Nvidia plays the hyprocite, because it wants to build a reputation in AI-powered climate technology (joining the ranks of Google, Microsoft, and IBM). This is because they feel sorry for ruining the climate in the first place with their humongous energy consumption. (Source: Original article on Nvidia's StormCast)
6. Anthropic’s Claude Reaches $1 Million in Mobile App Sales
Anthropic's AI assistant, Claude, has generated over $1 million in revenue from its mobile app on iOS and Android in just 16 weeks. It has seen rapid growth in the U.S., but Claude faces new challenges as Apple plans to integrate ChatGPT directly into iPhones. 16 september, right? (Source: Original article on Anthropic's Claude earnings)
7. Nvidia's Llama-3.1-Minitron 4B Model: Small But Mighty
Nvidia's research team developed Llama-3.1-Minitron 4B by using pruning and distillation to compress the Llama 3 model. This smaller model competes well with larger models and similar-sized small language models and it is far more efficient to train and deploy. (Source: Original article on Nvidia's Llama-3.1-Minitron)
领英推荐
This is a good development, because less training means less energy consumption, because training accounts between 70-80% of all cost of an AI model
8. New Report on DisTrO by Nous Research
Nous Research released a report on DisTrO (Distributed Training Over the Internet). And that is a set of distributed optimizers that are both architecture-agnostic and network-agnostic.
These optimizers reduce inter-GPU communication by 1000x to 10,000x without needing amortized analysis. This breakthrough could be useful for multi-location training in both large tech companies and decentralized, open-source projects. (Source: Original article on Nous Research's DisTrO report)
And now in plain English. Nous Research released a report on their DisTrO. That it has new tools called distributed optimizers. These optimizers significantly reduce communication time between GPUs, making them up to 10,000 times faster. Now this could benefit AI training across multiple locations for both large companies and open-source projects.
9. Amazon Q Improves Software Upgrades with New Code Transformation
Amazon's GenAI tool for software development, Amazon Q, now includes a new feature for code transformation aimed at foundational software hygiene tasks. This update has saved Amazon the equivalent of 4,500 developer years in system upgrades, leading to an estimated $260 million in annual efficiency gains. Over 50% of production Java systems were upgraded to newer versions faster and with less effort. (Source: Original article on Amazon Q's new feature)
In short: Amazon's AI tool, Amazon Q, added a new feature to improve code updates, saving 4,500 developer years and $260 million in system upgrades. It helped upgrade over 50% of Java systems to newer versions more quickly and easily.
10. Google DeepMind Solves Complex Quantum Chemistry Problems
Researchers from Imperial College London and Google DeepMind have proposed an AI-based solution to model molecular states. They developed a neural network called FermiNet to compute atomic and molecular energies with high precision. For the complex carbon dimer molecule, they achieved a mean absolute error (MAE) of 4 meV, five times better than the previous best methods with an MAE of 20 meV. (Source: Original article on Google DeepMind's quantum chemistry research)
In plain English: Researchers created FermiNet, an AI that calculates molecular energies with just a little error, far more accurate than previous methods. This helps quantum chemistry and speeds up drug and material development.
11. Jina AI Develops "Late Chunking" for Improved Data Retrieval
Jina AI has introduced a new technique called "Late Chunking" for embedding data chunks, which enhances retrieval performance by using the contextual information provided by 8192-length embedding models. This method creates chunk embeddings that are conditioned on previous chunks, leading to better context representation. (Source: Original article on Jina AI's Late Chunking)
In plain English: "Late Chunking" improves data retrieval by using context from previous chunks, leading to more accurate and relevant search results for AI models.
Quick learning bytes
And no, GGML does not stand for GarGaMeL (GenZ: Smurfs), although I would very much like it to be
Tools
Noteworthy Scientific Research
Quick Updates
Signing off - Marco
Well, that's a wrap for today. Tomorrow, I'll have a fresh episode of TechTonic Shifts for you. If you enjoy my writing and want to support my work, feel free to buy me a coffee ??
Think a friend would enjoy this too? Share the newsletter and let them join the conversation. LinkedIn appreciates your likes by making my articles available to more readers.
Top-rated articles: