登录查看更多内容

Code Snippet: Parallel LLM Calls

Han Xiang Choong

Senior Customer Architect - APJ @ Elastic | Applied AI/ML | Search Experiences | Delivering Real-World Impact

发布日期: 2024年9月19日

+ 关注

Problem

Want to use LLM to process very large data (>10^7 documents), want results asap. Minimize time per document.

Setting

SRC: SG

DST: GPT-4o-Mini on Azure OpenAI, US-East Deployment

Scenario

List of 20 prompts generated by an LLM for benchmarking purposes.

prompts = [
    "Explain the concept of quantum entanglement to a high school student.",
    "Write a short story about a time traveler who accidentally changes history.",
    "Describe the process of photosynthesis in plants.",
    "Compare and contrast the economic systems of capitalism and socialism.",
    "Provide a step-by-step guide on how to change a car tire.",
    "Analyze the themes in George Orwell's novel '1984'.",
    "Explain the basics of machine learning to a non-technical person.",
    "Describe the impact of social media on modern interpersonal relationships.",
    "Write a persuasive essay on the importance of renewable energy sources.",
    "Summarize the key events of World War II in chronological order.",
    "Explain the concept of blockchain technology and its potential applications.",
    "Describe the process of natural selection in evolution.",
    "Write a dialogue between two characters discussing the ethics of artificial intelligence.",
    "Explain the greenhouse effect and its role in climate change.",
    "Analyze the pros and cons of remote work in the modern economy.",
    "Describe the basic principles of cognitive behavioral therapy.",
    "Explain the concept of supply and demand in economics.",
    "Write a critical review of a famous painting (e.g., Van Gogh's 'Starry Night').",
    "Describe the process of how a bill becomes a law in the United States government.",
    "Explain the basics of computer programming to someone with no prior experience."
]

Solution

Parallel execute function takes a function, an iterable of inputs, and kwargs. Aggregates outputs in a list called 'results'.

领英推荐

Data Science #33

Andriy Burkov 4 个月前

GP Bullhound's weekly review of the latest news in…

GP Bullhound 1 年前

Show Notes - Monday 1/8/24

Microsoft Community 1 年前

import os
import traceback
from concurrent.futures import ThreadPoolExecutor, as_completed

def parallel_execute(func, iterable, max_workers=10, **kwargs):
    results = []
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        future_to_item = {executor.submit(func, item, **kwargs): item for item in iterable}
        for future in as_completed(future_to_item):
            try:
                result = future.result()
                results.append(result)
            except Exception as e:
                traceback.print_exc()
    return results

LLM Class:

class AzureOpenAIClient:
    def __init__(self):
        self.client = AzureOpenAI(
            api_key=os.environ.get("AZURE_OPENAI_KEY_1"),
            api_version="2024-06-01",
            azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT")
        )

    def generate(self, prompt, model="gpt-4o-mini", system_prompt=""):
        response = self.client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": prompt}
            ],
            max_tokens=4096
        )
        return response.choices[0].message.content

Run:

LLM = AzureOpenAIClient()

results = parallel_execute(
    LLM.generate, 
    prompts, 
    max_workers=10,  
    model="gpt-4o-mini",
    system_prompt=""
)

Results (20 prompts, no system prompt

Naive Sequential: 112.3 seconds

Parallel, 5 workers: 22.9 seconds

Parallel, 10 workers: 14.8 seconds

Parallel, 20 workers: 8.5 seconds

要查看或添加评论，请登录

Han Xiang Choong的更多文章

Improving e-Commerce Search with Query Profiles in Elastic

2024年11月21日

Improving e-Commerce Search with Query Profiles in Elastic

Introduction Elasticsearch is naturally suited for e-Commerce data, by which I mean large quantities of product…

2 条评论
Snippet: Speeding up Bulk Upload Speeds to Elastic with Parallelisation in Python

2024年9月13日

Snippet: Speeding up Bulk Upload Speeds to Elastic with Parallelisation in Python

Scenario: Uploading 35,000 large text documents of the format below, roughly 1-1500 words each, to an Elastic Cloud…
Automating Traditional Search

2024年9月9日

Automating Traditional Search

tldr; This short article is about uploading structured data to an Elastic index, then converting a plain English query…
A Personal Chatbot Interface with Elasticsearch & Streamlit

2024年8月31日

A Personal Chatbot Interface with Elasticsearch & Streamlit

This project lives in this github repo along with set-up instructions. Features Select an LLM and a custom system…

4 条评论
Search Concepts Cheatsheet - Elastic Oriented

2024年8月13日

Search Concepts Cheatsheet - Elastic Oriented

Decided to write an overview of key search concepts, just to refresh and crystallize my understanding. This is far from…
Search in Elastic 8.15 - Building RAG Extremely Quickly WITHOUT Code

2024年8月12日

Search in Elastic 8.15 - Building RAG Extremely Quickly WITHOUT Code

Hello friends and colleagues! Elastic 8.15 is out, and Semantic Search is easier than ever to pull off.

2 条评论
Advanced RAG Techniques Part 2: Querying and Testing

2024年8月6日

Advanced RAG Techniques Part 2: Querying and Testing

Welcome to Part 2 of our article on Advanced RAG Techniques! In part 1 of this series, we set-up, discussed, and…

2 条评论
Advanced RAG Techniques Part 1: Data Processing

2024年8月6日

Advanced RAG Techniques Part 1: Data Processing

This is Part 1 of our exploration into Advanced RAG Techniques. [Click here for Part 2!] The recent paper Searching for…

2 条评论
Searching for Best Practices in RAG: The Sparknotes Version

2024年7月26日

Searching for Best Practices in RAG: The Sparknotes Version

Recently got around to reading "Searching for Best Practices in Retrieval Augmented Generation". Thought it would be a…
The Basics: Managing Time-Series Data with Elastic Datastreams

2024年7月24日

The Basics: Managing Time-Series Data with Elastic Datastreams

Second entry in my Basics Series. This article revolves around using the Elastic Query Domain Specific Language.

See all articles

Code Snippet: Parallel LLM Calls

Han Xiang Choong

Senior Customer Architect - APJ @ Elastic | Applied AI/ML | Search Experiences | Delivering Real-World Impact

Problem

Setting

Scenario

Solution

领英推荐

Results (20 prompts, no system prompt

Han Xiang Choong的更多文章

社区洞察

其他会员也浏览了

Building an Enterprise-Grade Agentic RAG

How Microsoft Scored Big in the OpenAI Chaos

Did OpenAI's GPT-4 Just Kill Vector Databases?

Exploring the LLM Infra Stack, Part 2: The Model Layer

Don't Ask OpenAI's o1 What It Thinks—Literally

Graphically controlling the power of RDFS ontologies

FastAPI helps to address key challenges faced in ML Model Serving

Learn how Milvus 2.4 Enhances Search Capabilities and More!

Leaked presentation reveals Microsoft's astounding plan to ramp up data-center capacity for the AI boom

MLOps Best Practices

Problem

Setting

Scenario

Solution

领英推荐

Results (20 prompts, no system prompt

Han Xiang Choong的更多文章

Improving e-Commerce Search with Query Profiles in Elastic

Snippet: Speeding up Bulk Upload Speeds to Elastic with Parallelisation in Python

Automating Traditional Search

A Personal Chatbot Interface with Elasticsearch & Streamlit

Search Concepts Cheatsheet - Elastic Oriented

Search in Elastic 8.15 - Building RAG Extremely Quickly WITHOUT Code

Advanced RAG Techniques Part 2: Querying and Testing

Advanced RAG Techniques Part 1: Data Processing

Searching for Best Practices in RAG: The Sparknotes Version

The Basics: Managing Time-Series Data with Elastic Datastreams

社区洞察

其他会员也浏览了

Building an Enterprise-Grade Agentic RAG

How Microsoft Scored Big in the OpenAI Chaos

Did OpenAI's GPT-4 Just Kill Vector Databases?

Exploring the LLM Infra Stack, Part 2: The Model Layer

Don't Ask OpenAI's o1 What It Thinks—Literally

Graphically controlling the power of RDFS ontologies

FastAPI helps to address key challenges faced in ML Model Serving

Learn how Milvus 2.4 Enhances Search Capabilities and More!

Leaked presentation reveals Microsoft's astounding plan to ramp up data-center capacity for the AI boom

MLOps Best Practices