登录查看更多内容

GenAI - The Cost of Context.

Manish Mehta

Industry Head - Health and Life Sciences | Business Unit Head

发布日期: 2023年11月7日

As I write this article, OpenAI released their latest model GPT 4 Turbo today - its OpenAI's most powerful AI yet. Numerous other organizations like Google, Huggingface, AWS, Meta, Anthropic etc. all have a very progressive view and their own differentiation on the LLMs catering to various needs of today and the future, however there is one thing common to all leading LLMs, the context. OpenAI's GPT-4 boasts an impressive context window of 128 thousand tokens, this is an excellent feature and a bait at the same time. For the uninitiated let's understand what context mean.

The term "context" in relation to LLMs such as OpenAI's GPT-4 Turbo refers to the relevant data provided to the AI to facilitate specific responses. Since LLMs lack persistent memory, they require context to be reintroduced with each interaction. GPT-4 Turbo's impressive context window can process up to 128 thousand tokens, enhancing its ability to understand and generate detailed and relevant responses.

From OpenAI Website:
A helpful rule of thumb is that one token generally corresponds to ~4 characters of text for common English text. This translates to roughly ? of a word (so 100 tokens ~= 75 words).

Let's taken an example.

If one prompts ChatGPT to summarize a book, the model analyzes the text— 10,000 characters, for example—and produces a summary. Should there be subsequent questions, the model needs the text to be presented again to continue generating pertinent answers. This necessity is not readily apparent when using the ChatGPT web interface, where context management is seamless, but becomes very evident when interacting directly with OpenAI's APIs, impacting usage costs.

So what's the big deal?

The cost implication is significant, as each piece of data sent to or received from an LLM constitutes tokens, which are priced accordingly. Using OpenAI's tokenizer, one can estimate the token count for any text block, with a book summary example being approx. 3 cents per API request under the current pricing model for GPT-4 Turbo. This expense escalates with each additional prompt that includes the requisite context.

领英推荐

Economic moats and generative AI

温永祥 1 年前

Mysterious GPT is Back...

Steve Nouri 5 个月前

?? The gpt2-chatbot Mystery

Jim Carter III 5 个月前

Moreover, certain applications might demand more context than the current maximum limit provided by these models. Below is the sample context length of popular LLMs

Enter Retrieval Augmented Generation (RAG)

Simply put, techniques like RAG provide methods to optimize the contextual information supplied to LLMs to make the responses more accurate and reduce token costs and limitations. There are a few approaches to implementing RAG and this is an evolving space where new tools and providers are emerging to provide the most comprehensive solution. One RAG approach I personally like is combining GenAI with Vector databases. This approach includes taking large context data and feeding it to Vector databases as embeddings. While embeddings are a topic of discussion in its own but for this article, think of embeddings as converting each word into a number and clustering similar or related numbers together to create a relationship between them. Special Vector databases like Pinecone, Weaviate, Redis etc. are some of the emerging platforms. This can be further optimized by using approaches like map reduce to make embeddings more granular.

With this approach a user first loads contextual data to a vector database, then creates a question (query) that is itself converted to embeddings and queried against a vector database and the result is a relevant chunk of data that contains the textual context that can be fed to LLMs for refined results. This may sound complex but its surprisingly easy to try out.

In the book example above, with the RAG approach, the follow up prompt need not carry with it the full 10K of context but rather a much smaller subset that a vector database will provide resulting in better results and lower token costs.

It remains to be seen whether these innovations will stand the test of time as LLMs continue to evolve and potentially integrate these capabilities natively but for now a lot of advancements seem to be happening in this space, take MemGTP as an example that is taking the approach of using LLMs as operating systems to give it long lasting memory. For now, it's clear that the AI playground is getting some serious upgrades – our silicon-based friends might just end up with better memories than us. Interesting times, indeed!

Mohita Narang

8 个月

?I have written an article on it. If you want to learn more, then read it. https://blog.paperspace.com/memgpt-with-real-life-example-bridging-the-gap-between-ai-and-os. Hey guys, if you are looking for affordable and powerful GPUs, I recommend checking Paperspace GPU. I used it personally and find it a good option for next-gen AI applications. Try NVIDIA H100 GPU on the Paperspace platform at affordable prices.https://www.paperspace.com/. https://bit.ly/3whNViA. Also, try their free GPU https://www.paperspace.com/gradient/free-gpu.

要查看或添加评论，请登录

查看全部

GenAI - The Cost of Context.

Manish Mehta

Industry Head - Health and Life Sciences | Business Unit Head

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Revolutionizing Custom AI: OpenAI's GPT-3.5 Turbo Fine-Tuning Unleashed!

OpenAI Unveils Fine-Tuning for GPT-3.5 Turbo and GPT-4: Customizing AI for a New Era

The AI Assistant Arms Race: Google Gemini vs. OpenAI GPT-4o

GPT-4: The Dawn of a New Era in Artificial Intelligence Starts Now

Just Launched ?? GPT-4o Mini, Its Most Cost-Efficient Small AI Model Built on Azure AI. Faster And More Affordable Than Its Previous Models.

GPT-4 is expected to be introduced next week (updated)

GPT-4o Release - Capabilities and Implications

AI wrote half this article, can you tell?

GPT-4 Cheat Sheet: What Is GPT-4, and What Is it Capable Of?

The Latest in AI News

领英推荐

Retro Rational: Making Wise Choices in a High-Tech World

2023年8月14日

AI Whisperers: The Rise of the Prompt Engineer

2023年5月12日

Cloud Assembly Language - The Ultimate Low-Code Manifest.

2022年8月1日

Put on your Avengers Quantum suit and let's spin some atoms !!

2021年8月21日

Who's your preferred Healthcare Hyperscaler?

2021年4月26日

You got a mask, did your data?

2020年4月9日

To Virtualize or to Containerize – That is the question.

2018年4月19日

For Newbs only - What I learnt from my investment in Crypto Currency

2017年12月15日

Blockchain – Duh!!

2017年9月25日

Am I a Full Stack Developer?

2017年9月12日

社区洞察

其他会员也浏览了

Revolutionizing Custom AI: OpenAI's GPT-3.5 Turbo Fine-Tuning Unleashed!

OpenAI Unveils Fine-Tuning for GPT-3.5 Turbo and GPT-4: Customizing AI for a New Era

The AI Assistant Arms Race: Google Gemini vs. OpenAI GPT-4o

GPT-4: The Dawn of a New Era in Artificial Intelligence Starts Now

Just Launched ?? GPT-4o Mini, Its Most Cost-Efficient Small AI Model Built on Azure AI. Faster And More Affordable Than Its Previous Models.

GPT-4 is expected to be introduced next week (updated)

GPT-4o Release - Capabilities and Implications

AI wrote half this article, can you tell?

GPT-4 Cheat Sheet: What Is GPT-4, and What Is it Capable Of?

The Latest in AI News