AI Agent Surfs the Web like Humans
Today’s top AI Highlights:
& so much more!
Read time: 3 mins
AI Tutorials
Working with multiple LLMs simultaneously can be incredibly useful for comparing their strengths, weaknesses, and response styles. Setting up an app that allows direct comparison between top models would be great for understanding LLM behaviors and selecting the right model for specific tasks.
Does it sound complex to build your own chat playground with multiple LLMs? It’s really not. Just 20 lines of Python code and it’s done!
Let’s build this Multi-LLM Chat Playground that lets you interact with three popular models—GPT-4o, Claude Sonnet 3.5, and Cohere Command R Plus—all within a single app. You can swap these with any other LLMs of your choice too. With a few clicks, you can view responses from each model in a parallel layout for easy comparison.
We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!
Latest Developments
Here’s a new Python toolkit, RAGLite, for building RAG systems that uniquely supports both PostgreSQL and SQLite databases, offering you options for managing your data, regardless of project scale. What sets RAGLite apart is its lightweight architecture, eliminating heavy dependencies like PyTorch or LangChain, resulting in faster performance and reduced complexity. It also seamlessly integrates with various LLM providers through LiteLLM, including local llama-cpp-python models, giving you the freedom to choose the best tools for your needs.
Key Highlights:
A new AI agent that can autonomously surf the web better than Claude Computer Use. Paris-based AI startup H launched Studio, a platform to effortlessly create production-ready and robust automations at scale. Accompanying this is Runner H, their flagship web automation AI agent that can autonomously interact with web UI and complete tasks with simple natural language commands.
What sets Runner H apart are its specialized in-house models, designed to be both smaller and more cost-effective than generalist models while still delivering superior performance, particularly in UI interaction and localization. The agent outperforms larger models in web automation tasks, scoring 67% on WebVoyager compared to Anthropic Computer Use's 52%.
Key Highlights:
Quick Bites
Chinese researchers have open-sourced vision language model with OpenAI o1'-like reasoning capabilties. Built on Llama 3.2 Vision, LLaVa-o1 tackles complex visual reasoning tasks by breaking down problems into stages: summarizing, describing the image, reasoning, and concluding—trained on a custom 100k dataset and using a novel stage-level beam search for better inference. It also surpasses the performance of larger and even closed-source models, such as Gemini-1.5-pro and GPT-4o-mini.
Alibaba Cloud just dropped a preview of QwQ, a 32B open-source model with excellent reasoning capabilities, particularly in math and coding, beating even OpenAI o1-mini. While it's still in early stages and has limitations like potential language mixing and recursive reasoning loops, it's already showing promising results on benchmarks like GPQA and LiveCodeBench. You can try it out on AnyChat. You can even run it locally with Ollama using ollama run qwq.
LangChain has introduced Promptim, an experimental library for automated prompt optimization. Feed it your initial prompt, a dataset, and some evaluation metrics, and it'll run tests to find you a better performing prompt. Think of it as a shortcut to improved AI system results, saving you time and adding a dose of rigor to your prompt engineering.
LMSYS launched RepoChat Arena, live AI software engineering battleground where AI models tackle real coding tasks from public GitHub links. You can watch AI models fix bugs, add features, or review PRs, side-by-side, then vote for the best solution. Head to lmarena.ai to see the AI coding battles in action and help rank the top AI software engineer!
Tools of the Trade
Hot Takes
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends ??