登录查看更多内容

LLMOps: Evaluate LLM apps with Langsmith.

Soumen Mondal

Enthusiastic on Generative AI and open source.

发布日期: 2024年5月28日

Overview: -

Through this article, I will share my recent work on LLM application integration with Langsmith. LangSmith has built by Langchain. This is a unified DevOps platform for developing, collaborating, testing, deploying, and monitoring LLM applications. Langsmith has multiple features. I believe one article will have excessive information to digest. So, breaking it into multiple chapter. This is important chapter as will cover how to create dataset, experiment and evaluate LLM applications .

Take away from article: -

In the first chapter link, I have documented how to integrate Langsmith with LLM application and trace the LLM calls.

After going through this chapter, readers will understand how to create dataset using Python code. Most important part of Langsmith is to evaluate LLM applications through the created dataset. As LLM application is unpredictable in nature so monitoring and evaluation is key for success.

All chapters are hands on. Video guides are available and it wont require any API subscription to write implement through Python code.

Let's start and create dataset:-

Power of LLM application is with data it can handle and predict the response with accuracy. So required data set and evaluate LLM application response with predefined experiment. We can directly upload dataset in .csv format in Langsmith web portal. The following steps is using python code.

Create python class
Import langsmith pip packages "pip install -U langsmith"
Initialize client object
Create a dataset using create_dataset
Feed Input and Output and run the code

from langsmith import Client

client = Client()
dataset = client.create_dataset( dataset_name="Demo dataset")
client.create_examples(
    inputs=[
        {"postfix": "to LangSmith"},
        {"postfix": "to Evaluations in LangSmith"},
    ],
    outputs=[
        {"output": "Welcome to LangSmith"},
        {"output": "Welcome to Evaluations in LangSmith"},
    ],
    dataset_id=dataset.id,
)

6. Open Langsmith navigate to Dataset. The dataset will show there.

Benjamin Bennett Alexander 8 个月前

Cracking the Code: The Python Serpent's Embrace on the…

Brecht Corbeel 1 年前

Bulk Boto3 (bulkboto3): Python package for fast and…

Amir Masoud Sefidian 2 年前

Let's evaluate LLM application:-

After the dataset added there are few steps required to evaluate LLM application responses is as per the same question and answer. Will our developed LLM application able to generate similar output for same set of dataset? That's called evaluation.

Need few import like LangChainStringEvaluator, evaluate, Run and Example etc.
Create a prompt template to take input , to capture real LLM responses and mark it as Correct or Incorrect.
Define evaluator. E.g generate score based on differences between prediction and actual responses

# Define evaluators
def must_mention(run: Run, example: Example) -> dict: 
    prediction = run.outputs.get("output") or "" 
    required = example.outputs.get("must_mention") or [] 
    score = all(phrase in prediction for phrase in required) 
    return {"key":"must_mention", "score": score}

4. Evaluate the prediction and responses of LLM application.

experiment_results = evaluate(
    predict, # Your AI system
    data=dataset_name, # The data to predict and grade over
    evaluators=[must_mention], # The evaluators to score the results
    experiment_prefix="rap-generator", # A prefix for your experiment names to easily identify them
    metadata={
      "version": "1.0.0",
    },

5. Run and open Langsmith. Navigate to Dataset to check evaluation result.

6. Useful video to create dataset and evaluation steps. Detailed Python code and steps are mentioned in video guide.

Useful video to create dataset:-

Useful video to evaluate:-

Conclusion:

Congratulations! You've now created a dataset and used it to evaluate your agent or LLM.

LLMOps: Evaluate LLM apps with Langsmith.

Soumen Mondal

Enthusiastic on Generative AI and open source.

Overview: -

Take away from article: -

Let's start and create dataset:-

领英推荐

Conclusion:

更多精彩文章

社区洞察

其他会员也浏览了

From Static to Dynamic: Using Python and GitHub Actions to Automate Content on GitHub Pages

ParityVend Releases Free Open-Source Python Library for Smart Pricing

Power up with Apify

How to Deploy any LLM (ChatGPT like) Python App on Azure

Mastering Python Function Naming Conventions

Build a Web Scraper with Python and BeautifulSoup

Mastering Python Generators and Iterators

Why FAST-API??

Build with E2E: Enhancing Python Code Generation with Updated Documentation Using Llama 3

10 Python Antipatterns: Common Mistakes to Avoid for Cleaner, More Efficient Code

Overview: -

Take away from article: -

Let's start and create dataset:-

领英推荐

Conclusion:

Agentic AI: The Next Evolution in Intelligent Automation

2024年10月5日

Unlocking the Power of AutoGen RAG with Agentic AI

2024年10月3日

Reduce hallucination in RAG with Reflective and Adaptive RAG Using LangGraph.

2024年7月24日

Hybrid search - Retrieve from Graph and Embeddings.

2024年7月9日

Graph RAG - Streamlit chatbot to generate knowledge graph using Neo4J

2024年6月10日

Chapter1 of LLMOps: Overview and integrate LLM apps with Langsmith. Langsmith intro.

2024年5月24日

AutoGen with Local LLM setup - Build and execute code

2023年12月17日

Open Source LLM Chatbot-Local Setup: with Llama2 ,vector DB and PDF(RAG)

2023年10月3日

Scripted Jenkins Input Step Pipeline with Sonar Quality Gate

2021年6月29日

AWS SDKs: Integration of RDS(MySQL), Spring Boot, S3 and deploy in EC2

2021年4月11日

社区洞察

其他会员也浏览了

From Static to Dynamic: Using Python and GitHub Actions to Automate Content on GitHub Pages

ParityVend Releases Free Open-Source Python Library for Smart Pricing

Power up with Apify

How to Deploy any LLM (ChatGPT like) Python App on Azure

Mastering Python Function Naming Conventions

Build a Web Scraper with Python and BeautifulSoup

Mastering Python Generators and Iterators

Why FAST-API??

Build with E2E: Enhancing Python Code Generation with Updated Documentation Using Llama 3

10 Python Antipatterns: Common Mistakes to Avoid for Cleaner, More Efficient Code