LAI #68: The Tech Behind Deepseek, an Open AI Agent, and Why LLM Evals Matter

LAI #68: The Tech Behind Deepseek, an Open AI Agent, and Why LLM Evals Matter

Breaking down a powerful new technique, testing an open-source AI agent, and making sense of LLM evaluations.?

Good morning, AI enthusiasts!

This week, we’re diving into some of the most talked-about developments in LLMs and AI agents. First, we’re exploring Deepseek and the technique behind its performance—FlashMLA. Then, we’ll walk through running OpenManus, an open-source alternative to Manus, and what makes it a compelling option.

We also ran a poll on LLM evaluations. Since many of you are unfamiliar with them, we’re highlighting a powerful tool that automates AI response assessments against ideal (golden) answers.

Plus, we’ve got new tutorials and resources on fine-tuning Gemma, working with State-Space Models, and exciting collaboration opportunities. Enjoy the read!

What’s AI Weekly

This week in What’s AI, I am diving into a new technique from Deepseek that improves LLM speed and performance called FlashMLA. We’ll explore what it is, why it matters, and how it builds on key innovations like KV Cache, CAG, and Infini-Attention. FlashMLA plays a major role in Deepseek’s speed and efficiency, so if you want to understand how it works, check out the full article or watch the video on YouTube.?

— Louis-Fran?ois Bouchard, Towards AI Co-founder & Head of Community


This week, we’re also super excited to share a first for us?—?we’ve teamed up with DecodingML (aka Paul Iusztin ) and The AI Edge Newsletter (aka Damien Benveniste, PhD ) for our very first guest posts (focused on optimization and cost reduction)! And the timing couldn’t be better. With OpenAI launching its most expensive API yet ($150 per million input tokens and $600 per million output tokens), LLM costs are a bigger concern than ever.

In these posts, we dive into one of the biggest challenges in AI today: keeping costs down without compromising performance. We break down the strategies top teams are using, uncover hidden operational expenses, and share a blueprint for optimizing LLM efficiency at scale.

If you’re building with LLMs, these insights are a must-read:

?? Our blueprint to cut LLM costs by 88%

?? Reduce AI Model Operational Costs With Quantization Techniques

We had a blast putting these together, and we’re already thinking about what to tackle next. Did you find them useful? Let us know in the comments so we can bring you more insights on the topics that matter most!


Learn AI Together Community Section!

AI poll of the week!

Most of you are unfamiliar with evals or haven’t implemented them. Is it because there isn’t a practical need for you to do it, or are there not enough resources to teach you how? Tell me in the thread so we can figure it out!?

Collaboration Opportunities?

The Learn AI Together Discord community is flooding with collaboration opportunities. If you are excited to dive into applied AI, want a study partner, or even want to find a partner for your passion project, join the collaboration channel! Keep an eye on this section, too—we share cool opportunities every week!?

1. Cyberx0244 is working on an AI-driven trend analysis platform that tracks fast-growing content across platforms like YouTube and TikTok and is looking for ML engineers & data scientists who want to collaborate. If this sounds interesting, reach out to in the thread!

2.? Charlidamelio is learning AI for application in startups and implementing AI features in their own startup. If you want to learn, help, or implement, connect in the thread!?

Meme of the week!

Meme shared by hadi_rizwan47


TAI Curated Section

Article of the week

Reinforcement Learning for Business Optimization: A Genetic Algorithm-Powered Pricing Strategy By Shenggang Li

This article presents a hybrid approach to dynamic pricing by combining Proximal Policy Optimization (PPO) with Genetic Algorithms (GA). Traditional pricing models often fail to adapt to rapidly changing market conditions, but this framework addresses these limitations by leveraging PPO for stable, incremental learning and GA for broader exploration. The model dynamically adjusts pricing based on competitor actions, advertising spending, and seasonality, ensuring profitability and adaptability. Results from simulated experiments demonstrate significant improvements in revenue optimization, with smoother pricing adjustments and better long-term strategies.?

Our must-read articles

1. Google’s Gemma-3 Fine-Tuning Made Simple: Create Custom AI Models with Python and Unsloth By Krishan Walia

This article provides a comprehensive guide to fine-tuning Google’s Gemma-3, a lightweight, open-source LLM with 27 billion parameters designed for developers building AI applications across devices. It explains the model’s capabilities, including support for over 35 languages and handling text, images, and short videos. It walks through the fine-tuning process using Python and the Unsloth library, covering prerequisites, data preparation, LoRA adapters, and training with SFTTrainer. It also demonstrates how to save and deploy the fine-tuned model on Hugging Face Hub.?

2. OpenManus: Fully Free Manus AI Agent —Install & Run Step-by-Step Locally and Google Colab with Ollama By Md Monsur ali

This tutorial outlines the step-by-step process of installing and running OpenManus, an open-source AI agent designed for manuscript analysis and processing, on local machines or Google Colab. OpenManus emphasizes privacy by enabling local execution without third-party cloud services. It covers setting up dependencies, configuring environments, and integrating tools like Ollama for LLM support. Key features include modular workflows, support for multiple LLMs, and advanced automation for tasks like coding and document processing. It also explains how to enhance OpenManus with web search and browser tools for extended functionality.?

3. Exploring State-Space Models: The Next Evolution Beyond Transformers By Paul Sandhu

This blog examines the evolution of State-Space Models (SSMs) as an alternative to Transformers for sequence modeling. Originating from control theory, SSMs efficiently handle long sequences using fixed-size memory, unlike Transformers' quadratic scaling. The piece highlights key innovations like the HiPPO matrix and the S4 model, which introduced computational efficiency and long-range dependency handling. It also discusses practical applications, limitations in context retention, and emerging architectures like Mamba, which improve SSM performance.?

4. ?? Streamlit-Powered Automated AI Response Evaluation: Leverage Cosine Similarity and GPT to Score & Visualize Model Performance By Suhas Pawar

This blog introduces a Streamlit-powered AI response evaluation tool designed to automate the assessment of AI-generated responses against ideal (golden) responses. It combines Cosine Similarity, Sentence-BERT embeddings, and GPT-based contextual evaluation to calculate a comprehensive confidence score. The tool supports manual input and batch processing via Excel uploads, offering 3D visualizations of response embeddings using PCA. It eliminates the need for manual reviews, providing an efficient and scalable solution for AI developers and data scientists.?

If you are interested in publishing with Towards AI, check our guidelines and sign up. We will publish your work to our network if it meets our editorial policies and standards.



要查看或添加评论,请登录

Towards AI的更多文章