Prompt Caching

Prompt Caching

In Gen AI applications involving LLMs, API calls are made to LLM Providers to generate text. LLM providers charge you for the number of tokens sent/received in the requests/response to their api calls. Also from a latency perspective LLM providers will take a hit if the number of requests are more. So Prompt caching plays an important role in improving efficiency, lowering costs, and enhancing responsiveness of language model applications.

Prompt caching is a strategy that involves storing responses to prompts that have been previously queried. When a prompt is repeated, instead of sending a new API request and incurring extra computational cost and time, the cached response is retrieved and used. For applications where repeated queries occur frequently, prompt caching can provide substantial benefits, such as decreasing response latency, saving computational resources, and reducing API costs.

Prompt caching can be implemented in a variety of ways using variety of caching solutions. In this case we will use a redis server as a caching layer.

Redis server can be installed in Mac OS using the following commands

brew install redis

brew services start redis

import redis

import hashlib

import time

import os

from langchain_openai import ChatOpenAI


要查看或添加评论,请登录