GenAI Optimization Techniques - Part 1

GenAI Optimization Techniques - Part 1

Techniques used in the Store and Structure Phase

There is a view to LLMs that it is some sort of a database that contains tons of data, you can ask it anything and it will respond with accuracy. If the response is not accurate, then we have to employ “optimization” to get the response to be accurate.

That view is operating on the assumption that LLMs are fact based responders. In the field of Generative AI, that is largely a misdirected view considering the fact that most LLMs are closed (we do not know what data they are trained on).?


The right assumption would be to imagine LLMs as primarily a vocabulary provider that provides relevancy and not accuracy. In other words, you can inquire an LLM to examine the relevancy of the response.?


To illustrate this, Imagine if you were to go to a bookstore that has English books. Though all books are written in the same language, The books in the Religion section will use a certain vocabulary which would be quite different from books in the Travel section.?

Let's say the section tags were missing and all the books had blank covers.? You as the reader will be able to figure out whether you are in the Travel book section or Religious book section. To determine this, you may employ some of your background knowledge (aka LLM) to evaluate what category the book you are reading belongs to. This is what I mean by relevance and context awareness.

If we view LLMs this way it would make more sense to apply optimization at different stages within the Gen AI process.?

To briefly define optimization, It is a set of techniques applied at different stages within the GPT pipeline. These techniques are applied so the response to a user's question will be accurate, relevant and safe.?

  • Accurate - Fact based mathematical precision, when precision is needed.
  • Relevant - Responds within context, style and uses domain specific language.?
  • Safe -? Ignores users' malicious questions such as how to make an atom bomb.? Does not reveal private and sensitive data.?


I primarily like to segment the GPT pipeline into two phases. Though they may not be sequential in timeline for execution.

Phase one Optimization?

In this blog we will cover optimization techniques that are primarily used in Phase1.? In my next blog we will cover optimization techniques used in Phase 2.? So stay tuned by clicking the subscribe button.

Let's quickly understand the store and structure phase. There are two steps that happen in this phase (refer diagram below).?

  1. Ingestion?
  2. Model embedding

Ingestion is the process of creating the necessary data pipelines to define and store the model. The key components create the data pipelines are?

  • Loaders - Components that import data from various sources.?
  • Transformers - Components that chunk data to relevant store spaces and add contextual data

The below diagram will shed some light on the steps that happen during this phase.

Chunking algorithms are a way to employ optimization at the ingestion step. Choosing the right algorithm to use for chunking is an essential optimization technique. Here are a few chunking techniques.

  1. Chunking Techniques

  • Fixed Size Chunking - Split data into fixed size chunks.? The chunk size can potentially be determined to optimize for retrieval speed or size of data being loaded.
  • Context Aware Chunking - Split data into chunks based on some criteria. For example, you can split into chunks, sentences that are separated by periods (“.”) or new lines.

Indexing or model embedding one utilizes LLMs to inquire and add meaning to the chunks of data. This meaning is basically a numerical representation of the chunked data’s context, and relative relevancy. Refer to the earlier blog link here to learn more about how this numerical representation is done. During this step, the numerical representations are stored as indices. These indicer are more like the table of contents for a book

  1. Embedding Model Choice

In this step,? one may want to? determine which would be the best embedding model to choose in order to optimize for performance.

Here is a link to the Leaderboard to find embedding models that may be best suited for embedding/encoding.? How you determine which model to choose is beyond the scope of this blog. ? The important point here is that this is another optimization technique that can be applied in this phase.

Based on the choice the embedding data can be added to the vector database as indices.? Again, the Table of Contents example helps here.

In my next blog, we will cover Optimization techniques used in Phase 2.? So stay tuned by clicking the subscribe button.

要查看或添加评论,请登录

David Jitendranath的更多文章

社区洞察

其他会员也浏览了