free online slots.REGISTER NOW GET FREE 888 PESOS REWARDS!

This month's edition of the Evaluator is packed with cutting-edge insights and practical know-how from our team. This time, we cover self-improving evals, dive into OTel, chat about OpenAI's Swarm, and more. There's also information about our ongoing agent series.

As always, we conclude with some of our favorite news, papers, community threads, and upcoming events.

Techniques for Self-Improving LLM Evals

If you’ve implemented a series of LLM-based evaluations or unit tests, but aren’t sure if your methods are robust, this guide by Eric Xiao is for you. In this article, we cover how to create self-improving LLM evals by following a systematic approach. Dive in.

Tracing and Evaluating LangGraph Agents

LangGraph is a powerful library designed for building stateful, multi-actor applications within large language models. In this post, Greg Chase covers how LangGraph’s traces can be ingested into Arize, and how to leverage LLMs as a judge to evaluate LangGraph agent performance. Read it here.

The Role of OpenTelemetry in LLM Observability

Comprehensive piece on the role of OpenTelemetry in LLM observability, including a comprehensive overview of OTel. Dat Daryl Ngo wrote this based on his experience working alongside customers who have productionized real consumer facing LLM applications with real business ROI. Dive into OTel here.

Arize + Vertex AI API

In leveraging an AI observability and evaluation platform like Arize AI with the advanced capabilities of Google’s suite of AI tools, enterprises looking to push the boundaries of what’s possible with their AI applications have a robust, compelling option. By Gabe Barcelos . Read it here.

Swarm: OpenAI's Experimental Approach to Multi-Agent Systems

In this paper read, John Gilhuly and Xander Song discuss Swarm’s design, its practical applications, and how it stacks up against other frameworks. Whether you’re new to multi-agent systems or looking to deepen your understanding, Swarm offers a straightforward, hands-on way to get started. Learn more about Swarm.

Tracing LLM Function Calls

A quick demo of how to trace LLM function calls in Arize. Eric Xiao shows you how to trace OpenAI function calls for enhanced debugging and structured outputs, and how function calling enables LLMs to interact with external tools and return structured data for tasks like summarization, classification, and code transformation. Watch the video.

Object Detection Modeling

A quick demo of the object detection modeling and the capabilities Arize has around computer vision by Duncan McKinnon . Get a better idea of what's going on in your CV datasets and what's underperforming. Watch the video.

Register for our Agents Workshop

Join us as we walk through a 5-part series of real-life agents deployed in production. We’ll deep dive into the architectures of these agents, the systems used in their development, and lessons learned from using them in production. Each week, we’ll unpack a new example agent or agent component used in a real-world agent. Register here.

Staff Picks

Here's a roundup of our team's favorite news, research, threads, and things to do.

?? SF MEETUP: The Rise of the Agent, Nov 8th

?? NYC MEETUP: Feat. Google, LlamaIndex, Weaviate & Priceline, Nov 19th

?? Discussion About NotebookLM

?? Phoenix Community Challenge: Win Cash

?? AI Model Generates Playable Video Game Environments in Real-Time

?? Claude Can Process PDFs

?? ChatGPT Search

Edition 35 - Creating Self-Improving LLM Evals

Arize AI

Arize AI is an AI observability and LLM evaluation platform built to enable more successful AI in production.

Techniques for Self-Improving LLM Evals

Tracing and Evaluating LangGraph Agents

The Role of OpenTelemetry in LLM Observability

Arize + Vertex AI API

Swarm: OpenAI's Experimental Approach to Multi-Agent Systems

Tracing LLM Function Calls

Object Detection Modeling

Register for our Agents Workshop

Staff Picks

The Evaluator

4,146 位关注者

更多精彩文章

Techniques for Self-Improving LLM Evals

Tracing and Evaluating LangGraph Agents

The Role of OpenTelemetry in LLM Observability

Arize + Vertex AI API

Swarm: OpenAI's Experimental Approach to Multi-Agent Systems

Tracing LLM Function Calls

Object Detection Modeling

Register for our Agents Workshop

Staff Picks

The Evaluator

4,146 位关注者

Edition 34 - Choosing the Best LLM Eval Model

2024年10月2日

Edition 33 – How LLM Tracing Works

2024年8月29日

Edition 32 – How to Protect Your LLM App

2024年8月6日

Edition 31 – How to Build a Great LLM App

2024年7月2日

Edition 30 - Should You Trust an LLM to Pick Stocks?

2024年6月6日

Edition 29 - There is More Than One LLM Eval

2024年5月20日

Edition 28 – How Well Do LLMs Conduct Numeric Evaluations?

2024年3月26日

Edition 27 – RAG Evaluation

2024年2月22日

Edition 26 - The LLM Observability Checklist ?

2024年1月9日

Edition 25 - What Retrieval Approaches Actually Work?

2023年12月6日