Code Snippet: Parallel LLM Calls

Code Snippet: Parallel LLM Calls

Problem

Want to use LLM to process very large data (>10^7 documents), want results asap. Minimize time per document.

Setting

SRC: SG

DST: GPT-4o-Mini on Azure OpenAI, US-East Deployment

Scenario

List of 20 prompts generated by an LLM for benchmarking purposes.

prompts = [
    "Explain the concept of quantum entanglement to a high school student.",
    "Write a short story about a time traveler who accidentally changes history.",
    "Describe the process of photosynthesis in plants.",
    "Compare and contrast the economic systems of capitalism and socialism.",
    "Provide a step-by-step guide on how to change a car tire.",
    "Analyze the themes in George Orwell's novel '1984'.",
    "Explain the basics of machine learning to a non-technical person.",
    "Describe the impact of social media on modern interpersonal relationships.",
    "Write a persuasive essay on the importance of renewable energy sources.",
    "Summarize the key events of World War II in chronological order.",
    "Explain the concept of blockchain technology and its potential applications.",
    "Describe the process of natural selection in evolution.",
    "Write a dialogue between two characters discussing the ethics of artificial intelligence.",
    "Explain the greenhouse effect and its role in climate change.",
    "Analyze the pros and cons of remote work in the modern economy.",
    "Describe the basic principles of cognitive behavioral therapy.",
    "Explain the concept of supply and demand in economics.",
    "Write a critical review of a famous painting (e.g., Van Gogh's 'Starry Night').",
    "Describe the process of how a bill becomes a law in the United States government.",
    "Explain the basics of computer programming to someone with no prior experience."
]        

Solution

Parallel execute function takes a function, an iterable of inputs, and kwargs. Aggregates outputs in a list called 'results'.

import os
import traceback
from concurrent.futures import ThreadPoolExecutor, as_completed

def parallel_execute(func, iterable, max_workers=10, **kwargs):
    results = []
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        future_to_item = {executor.submit(func, item, **kwargs): item for item in iterable}
        for future in as_completed(future_to_item):
            try:
                result = future.result()
                results.append(result)
            except Exception as e:
                traceback.print_exc()
    return results        

LLM Class:

class AzureOpenAIClient:
    def __init__(self):
        self.client = AzureOpenAI(
            api_key=os.environ.get("AZURE_OPENAI_KEY_1"),
            api_version="2024-06-01",
            azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT")
        )

    def generate(self, prompt, model="gpt-4o-mini", system_prompt=""):
        response = self.client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": prompt}
            ],
            max_tokens=4096
        )
        return response.choices[0].message.content        

Run:

LLM = AzureOpenAIClient()

results = parallel_execute(
    LLM.generate, 
    prompts, 
    max_workers=10,  
    model="gpt-4o-mini",
    system_prompt=""
)        

Results (20 prompts, no system prompt

Naive Sequential: 112.3 seconds

Parallel, 5 workers: 22.9 seconds

Parallel, 10 workers: 14.8 seconds

Parallel, 20 workers: 8.5 seconds


要查看或添加评论,请登录

Han Xiang Choong的更多文章

社区洞察

其他会员也浏览了