AI/ML news summary: week 21

AI/ML news summary: week 21

Here are the weekly articles, guides, and news about AI, curated so you won't have to.


This was a brilliant week for new model releases, with Alphafold-3 and GPT-4o both unlocking many new capabilities and AI use cases in very different domains.

Deepmind’s Alphafold-3 is a significant update to the Alphafold series of models, which have already been used by 1.8 million scientists. The new model builds on its protein capabilities but can now predict folding patterns and chemical structures across proteins, DNA, RNA, ligands (often used as small molecule drugs), ions, and antibodies. Not only that — it can now also model their interactions with each other — key in biomedical research tasks such as modeling if drug candidates bind to target proteins. The model is a “Pairformer” (custom transformer with triangular attention) combined with a new diffusion model — a further step in the dominance of transformer and diffusion model architectures in machine learning.

I am also very impressed with the release of GPT-4o (omni) by OpenAI. While not yet the eagerly anticipated GPT-5 — OpenAI instead chose to first release a faster, cheaper (and likely much smaller, at least in terms of active parameters), natively multimodal model. GPT-4o can directly input, plan, and output speech (previously, it relied on separate models for each of these, which added significant latency) and now can interact with speech, images, and video in real-time. This unlocks many new use cases, such as real-time translation and much more natural speech including recognizing and conveying emotions. Unlike speech, vision capabilities were already previously integrated directly into GPT-4V. However, they were still not “native” as they were rumored to be added to the foundation model via fine-tuning. Now, I assume all modalities for 4o are learned during pre-training. The model makes good progress on the SOTA on many benchmarks, particularly multimodal. Despite now being the most performant model, it has been made available for free within ChatGPT (replacing GPT-3.5 Turbo) and available via API at half the cost of GPT-4 Turbo. OpenAI noted double the response speed and 5x the rate limits vs 4-Turbo; however, many are experiencing response speeds closer to 5x faster and I think the model is likely significantly less than half the compute cost to OpenAI. Full new multimodal features are not yet available.

Why should you care?

AI use in drug development or drug target identification has shown very positive early results. It can already cut development time down to ~30 months from initiation to phase 1 trial (vs 60 months for normal drugs), and a recent study measured an 80–90% phase 1 success rate for AI drugs (vs 40–65% for normal drugs). Phase 2 data is limited, and the success rate was flat at about 40%. Despite these positive results — there are still only ~70 AI drugs in clinical trials relative to many thousands overall, and none has yet passed phase 3. While Alphafold-3 alone won’t find a new drug — (separate models need to be used to identify new drug targets, for example, more data needs to be collected for many areas of biology, and many lab experiments are still needed to verify and iterate predictions) — I think it could potentially be the catalyst for a “chatgpt” moment for AI’s use in drug design. AI tools are now much more accessible, and I hope many more biology “foundation” models” will be developed and made available. A limited version of Alphafold-3 is accessible for free, but the full model weights are expected to be released in the next six months.


Subscribe to the TechTonic Shifts newsletter

Hottest news

1.OpenAI Introduced GPT4o

OpenAI announced GPT-4o, its latest AI model with text, vision, and audio capabilities. GPT-4o, with an “o” for “Omni,” will be accessible to all ChatGPT users, including those on the free version. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages. It is also much faster and 50% cheaper in the API.

2. DeepMind Releases AlphaFold 3

AlphaFold 3 is an advanced AI model by Google DeepMind and Isomorphic Labs capable of accurately predicting biomolecular structures and interactions. It is a significant advancement over prior models and enhances scientific research and drug development. It is available globally through the AlphaFold Server.

3. Microsoft Allegedly Developing MAI-1, a Competing Model to OpenAI’s GPT-4

Microsoft is currently working on MAI-1, a 500 billion parameter AI model, aiming for a competitive edge in the AI industry and moving towards greater independence in AI development.

4. Google I/O 2024 Will Be All About AI Again

Google is preparing to hold its annual Google I/O developer conference today, and naturally, it will be all about AI. A lot of the keynote will probably cover how Google is fusing Search and generative AI. The company has been testing new search features like AI conversation practice and image generation for shopping and virtual try-ons. You can catch it on Google’s site.

5. gpt2-Chatbot Confirmed As OpenAI

The gpt2-chatbot that appeared in the LMSYS arena was confirmed to be an OpenAI test model after a 429 rate limit error revealed its connection to OpenAI’s API. Now renamed to im-also-a-good-gpt-chatbot, it can only be accessed randomly in “Arena (battle)” mode rather than “Direct Chat”.


Five 5-minute reads/videos to keep you learning

1. The Next Big Programming Language Is English

GitHub Copilot Workspace offers an AI-powered coding platform that enables users to write code using conversational English, streamlining the process, particularly for straightforward tasks. In this article, the author tests the Copilot Workspace and implements the code.

2. Everything About Long Context Fine-tuning

This article examines the difficulties of fine-tuning large language models for extended contexts over 32,000 tokens, such as high memory utilization and processing inefficiencies. It presents solutions like Gradient Checkpoint, LoRA, and Flash Attention to mitigate these issues and enhance computational efficiency.

3. How LLMs Know When to Stop Generating?

This article explains how LLMs know when to stop generating text. It dives into two concepts that can make the model stop generating: EOS tokens and Maximum Token Lengths.

4. What’s Up with Llama 3? Arena Data Analysis

Meta’s Llama 3–70B is a language model that performs well in English Chatbot Arena for open-ended and creative tasks, with high friendliness and quality conversation outputs. However, it is less proficient in math and coding-related tasks. This analysis observes the types of prompts users ask, the challenges with using Llama 3, whether the ranking changes if the prompts are easier or harder, and more.

5. Unpacking Kolmogorov-Arnold Networks

Researchers at MIT recently introduced a new neural network architecture called Kolmogorov-Arnold Networks (KANs). This article illustrates the structure of Kolmogorov-Arnold Networks (KANs) through clear examples and straightforward insights, aiming to make these advanced concepts understandable and accessible to a broader audience.


Repositories & Tools

  1. Granite Code Models is a family of open foundation models for code intelligence, trained with code written in 116 programming languages.
  2. Consistency LLM is a new family of models capable of reducing inference latency by efficiently decoding ? tokens in parallel.
  3. Gemma-2B-10M implements the Gemma model with recurrent local attention with a context length of up to 10M.
  4. LeRobot aims to provide models, datasets, and tools for real-world robotics in PyTorch.
  5. AnythingLLM is an all-in-one Desktop & Docker AI application with full RAG and AI Agent capabilities.


Top papers of the week

1.xLSTM: Extended Long Short-Term Memory

Researchers have advanced LSTM-based language models by applying exponential gating and revamping the memory structures, resulting in two key variants: the scalar-focused sLSTM and the fully parallelizable mLSTM. These innovations are incorporated into xLSTM blocks, which, when stacked residually, create xLSTM architectures that compare competitively with leading Transformers and State Space Models in performance and scalability.

2. Large Language Models can Strategically Deceive their Users when Put Under Pressure

Researchers have presented the first instance of a large language model (LLM) like GPT-4, designed for helpfulness, harmlessness, and honesty. This model exhibits strategic deception without directives for such behavior. In a simulated stock trading environment, the model engaged in insider trading and subsequently concealed its actions from its management, illustrating misaligned behavior in a realistic scenario.

3. AI Deception: A Survey of Examples, Risks, and Potential Solutions

This paper argues that a range of current AI systems have learned how to deceive humans. The analysis found numerous examples of AI deception, such as Meta’s Cicero Diplomacy bot telling premeditated lies to players. Other cases included negotiation systems misrepresenting preferences, AI bluffing in poker, and ‘playing dead’ to fool safety tests.

4. DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

The paper presents DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. Compared with DeepSeek 67B, DeepSeek-V2 saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times.

5. Generative Multimodal Models are In-Context Learners

Emu2 is a novel 37 billion parameter generative multimodal AI model with advanced in-context learning capabilities and excels at multimodal tasks. It defines new performance standards, especially in few-shot scenarios, achieving state-of-the-art results in visual question answering and open-ended generation after instruction tuning.


Quick links

1. Cohere announced Command R Fine-tuning. It offers superior performance on enterprise use cases and costs up to 15x less than the largest models on the market.

2. OpenAI and Stack Overflow have announced a partnership that could potentially improve the performance of AI models and bring more technical information into ChatGPT. The first set of new integrations and capabilities between Stack Overflow and OpenAI will be available in the first half of 2024.

3. Anthropic AI launches a prompt engineering tool that generates production-ready prompts in the Anthropic Console. This free tool helps you create better prompts for your AI chatbot.

4. Crunchbase data shows that 50% of all global venture funding for AI-related startups went to companies headquartered in the Bay Area, as a cluster of talent congregates in the region.

Think a friend would enjoy this too? Share the newsletter and let them join the conversation.


Well, that's it for now. If you like my article, subscribe to my newsletter or connect with me. LinkedIn appreciates your likes by making my articles available to more readers.

Signing off - Marco


Other articles you may like


要查看或添加评论,请登录

社区洞察

其他会员也浏览了