Edition 35 - Creating Self-Improving LLM Evals

Edition 35 - Creating Self-Improving LLM Evals

This month's edition of the Evaluator is packed with cutting-edge insights and practical know-how from our team. This time, we cover self-improving evals, dive into OTel, chat about OpenAI's Swarm, and more. There's also information about our ongoing agent series.

As always, we conclude with some of our favorite news, papers, community threads, and upcoming events.


Techniques for Self-Improving LLM Evals

If you’ve implemented a series of LLM-based evaluations or unit tests, but aren’t sure if your methods are robust, this guide by Eric Xiao is for you. In this article, we cover how to create self-improving LLM evals by following a systematic approach. Dive in.


Tracing and Evaluating LangGraph Agents

LangGraph is a powerful library designed for building stateful, multi-actor applications within large language models. In this post, Greg Chase covers how LangGraph’s traces can be ingested into Arize, and how to leverage LLMs as a judge to evaluate LangGraph agent performance. Read it here.


The Role of OpenTelemetry in LLM Observability

Comprehensive piece on the role of OpenTelemetry in LLM observability, including a comprehensive overview of OTel. Dat Daryl Ngo wrote this based on his experience working alongside customers who have productionized real consumer facing LLM applications with real business ROI. Dive into OTel here.


Arize + Vertex AI API

In leveraging an AI observability and evaluation platform like Arize AI with the advanced capabilities of Google’s suite of AI tools, enterprises looking to push the boundaries of what’s possible with their AI applications have a robust, compelling option. By Gabe Barcelos . Read it here.


Swarm: OpenAI's Experimental Approach to Multi-Agent Systems

In this paper read, John Gilhuly and Xander Song discuss Swarm’s design, its practical applications, and how it stacks up against other frameworks. Whether you’re new to multi-agent systems or looking to deepen your understanding, Swarm offers a straightforward, hands-on way to get started. Learn more about Swarm.


Tracing LLM Function Calls

A quick demo of how to trace LLM function calls in Arize. Eric Xiao shows you how to trace OpenAI function calls for enhanced debugging and structured outputs, and how function calling enables LLMs to interact with external tools and return structured data for tasks like summarization, classification, and code transformation. Watch the video.


Object Detection Modeling

A quick demo of the object detection modeling and the capabilities Arize has around computer vision by Duncan McKinnon . Get a better idea of what's going on in your CV datasets and what's underperforming. Watch the video.


Register for our Agents Workshop


Join us as we walk through a 5-part series of real-life agents deployed in production. We’ll deep dive into the architectures of these agents, the systems used in their development, and lessons learned from using them in production. Each week, we’ll unpack a new example agent or agent component used in a real-world agent. Register here.


Staff Picks

Here's a roundup of our team's favorite news, research, threads, and things to do.

?? SF MEETUP: The Rise of the Agent, Nov 8th

?? NYC MEETUP: Feat. Google, LlamaIndex, Weaviate & Priceline, Nov 19th

?? Discussion About NotebookLM

?? Phoenix Community Challenge: Win Cash

?? AI Model Generates Playable Video Game Environments in Real-Time

?? Claude Can Process PDFs

?? ChatGPT Search

要查看或添加评论,请登录