Get past the AI analysis phase.

Get past the AI analysis phase.

The first phase of new technology adoption is always daunting, and it's easy to get stuck in analysis paralysis. AI adoption is no exception. It is easy to get lost sifting through all the use-cases and be paralyzed by the cost, security, and performance considerations. There is an added fear of the unknown in a rapidly evolving space. The desire to perform an analysis of alternatives (AOA) to combat these considerations and fears is real. In this space, AOA might not be a fruitful endeavor. The tools best for one use-case might not be the best tools for another, and chances are that newer and better offering will come before you can finish assessing one.

You are not alone, so get started. If you have been following sound SDLC practices, you can incorporate AI.

When starting AI adoption, it's easy to focus on LLM selection. Though that's an important component, the other layers can have an equal (if not greater) impact on the final product. Understanding various AI patterns will help inform you of the layers of the stack. Let's take one of the most common AI frameworks called Retrieval Augmented Generation (RAG) as an example.

What is the RAG model?

RAG (Retrieval-Augmented Generation) is an AI framework that combines the strengths of traditional information retrieval systems (such as databases) with the capabilities of generative large language models (LLMs). By combining this extra knowledge with its own language skills, the AI can write text that is more accurate, up-to-date, and relevant to your specific needs. --- GCP Definition

The layers: simplified

  1. User interface: mechanism to output results from LLM generation. Often a chatbot, but it could be APIs or an existing platform.
  2. Embedding Model:?Mechanism to break apart large and potentially unstructured data, such as articles and publications, and store them in like text for contextual retrieval.
  3. Large Language Model: AI model pre-trained in large datasets with algorithms that can recognize, summarize, translate, predict, and generate content.
  4. Vector Data Base: A place to store a collection of data as mathematical representations for the purposes of faster retrieval and aiding LLMs to remember previous inputs used to build upon past responses.

Contextualize AI layers.

Whenever new technology, concepts, or patterns emerge in tech, I like to "link what I don't know with what I do know." Though the interworking of GenAI may be complicated, you are implementing and consuming an established framework and design pattern.

User Interface

User interfaces aren't new, and I won't spend much time here. The only aspect that may be new to some frontend developers is how data might be delivered in multiple HTTP responses rather than one response with all the expected data. If you have used any chatbot recently, you might have seen responses appear one word at a time. Reactive UI frameworks such as ReactJS handle this pattern natively.

Embedding Model

The way I like to link the concept of embedding model with what I do know is by thinking of it like an ORM or ODM such as Hibernate or Mongoose, but on steroids. Though this is a simplified conceptualization and embedding models do way more than what ODM does, it helps to put it in perspective in the overall architecture. Your embedding model is responsible for taking input data and breaking it apart into chunks to store in your vector database. One key difference architecturally is that embedding models are algorithms and are accessed via API rather than a library in your application. This means there will be additional design considerations when it comes to performance, network, security, and cost. For example, if you are creating an internal AI chatbot and you don't want the retrieval data to leave your network boundaries, you might want to consider self-hosting the model. Self-hosting vs. consuming external models is a large topic on its own, and we will dedicate a separate time to discuss that.

Large Language Model (LLM)

LLM is at the core of your AI application. Similarly to embedding models, LLMs are consumed as APIs and not built into your deployable. This can be thought of as a synchronous service in a micro-service architecture. You send it REST API requests and receive REST API responses. There will be similar considerations with performance, security, hosting, and cost. There are other key decision points in this area; the size of the model and the size of the training data can impact the responses received. Choosing a model depends on the context and objectives of the AI application. In a RAG model AI application, it's often both performant and cost-effective to leverage smaller models because you are bounding the model response within the confines of your dataset.

Vector Database

Everyone understands the concept of a database, and naturally, it's the easiest layer to link with what you do know. We have seen databases evolve, and seeing that evolution will help us contextualize the difference between vector databases from traditional relational databases vs. key value stores vs. object stores vs. document databases. At the core, a database stores data, so why do we have so many flavors? It often comes down to performance, cost, types of data, access patterns, flexibility, modeling, speed of retrieval, and much more. So what makes vector DB different? Traditional databases are stored and retrieved by column values or key value combinations; the objective of vector databases is different. While traditional databases return an exact value from the query, vector databases have algorithms that return data based on similarity metrics or mathematical proximity. There are similar considerations as traditional DBs when it comes to latency, security controls, index creation, query speed, cost, and much more.

Choosing a Stack

This is where I wish I could be prescriptive, but each layer is use-case-driven. As you guide your organization, aim to enable and be flexible, but know the organizational restrictions. Make each layer an offering. It's easy to define a template for the final product, but rather, I recommend creating templates for the different layers. Similar to how we harden and standardize container or server images, take a similar approach to LLMs and embedded models. Leverage infrastructure as code to define and provision components for the various layers. For self-hosting, create scripts that data scientists and development teams can use to spin up instances of models for their use-cases. For public LLMs, create gateway templates or proxy templates to reach these instances and secure API keys with this pattern.

Vector databases may be the exception. Organizationally, it's difficult to manage multiple database products. Therefore, it's more maintainable to pick one and make it an enterprise offering. Choose a purpose-built vector database that is production-ready, avoiding incumbent DBs. Purposeful vector databases are optimized for ML use-cases and come with fewer limitations. Though there are exceptions to this generalization, benchmark testing has shown this to be consistent.?


Elias Seyoum - Senior Solutions Architect - [email protected]



要查看或添加评论,请登录

社区洞察

其他会员也浏览了