Comprehensive guide to building a custom Generative AI enterprise app
Picture generated by Adobe Firefly (not for commercial use)

Comprehensive guide to building a custom Generative AI enterprise app

I have fond?memories?of my college days.

A land before mobile phones, India of the 90s was a place where loose change could buy a landline phone call from roadside shops that also doled out an assortment of cigarettes, hot cups of chai, and samosas around the clock.

A single Rupee, equivalent to a hopeful wish in a fountain, was all it took to shout expletives across telephone lines - asking your compadres about when they are coming to college or demanding their presence at the coffee house, sometimes, guitar in tow.

My friends and I spent a lot of time sitting around at a local haunt near our college as regulars.

Ah, sweet memories.

Last year, when I went back to India, I met up with one of my old friends and decided to pay a visit to our college and the chai shop and re-live some of those memories.

The college was still there. So was the shop.

And they looked nothing like the picture I had in my memory from the 90s.

I felt a sudden pang of loss and a realization that memory, fickle as it is, is the marrow of our identity.

Our memories drive our actions.

Our memories make us who we are.

Our memories are what makes us - us.

This has been on my mind quite a lot lately as my interactions with Chat GPT-4 and other LLMs have surged in the last few days.

AI without memory is just a smart database. But AI with?custom?memory now that is a tool that can magnify your capabilities by an order of magnitude.

Let me explain.

No alt text provided for this image
Picture generated by Midjourney - Imagine the child to be your LLM model and the books all your enterprise data stored in Vectors and in other data formats


Let's say you wake up one morning to realize you have a prodigy of a kid that is a genius when it comes to reading books, comprehending, and responding intelligently based on their knowledge (let's not worry about how the kid got there for now). But the kid has no memory or knowledge of who you or other family members are. As smart as the kid is, there is no chance that the kid can answer questions related to any information you ask about yourself or the family.

But what if you compiled all your family information into very well-curated books, asked the kid to read and internalize the information, and now asked it to answer questions related to you, your family, or even your ancestors?

This is effectively how Chat GPT and other LLMs can be used to build enterprise generative AI apps on custom data.

And believe it or not, there is a way to do this without putting all your custom data at risk of privacy and security.

Let's see how.

Effectively, there are three steps to build a custom AI app, or in other words, make your own AI child. You first ask the mom out for dinner, then you….. sorry, got carried away.

  1. Get an LLM model and run it within your network.
  2. Curate your data and store them effectively in a database.
  3. Search and grab the right data/book and give it to the LLM as context and get back a response.

If you want to act on the information you get back from LLM, you can now do that as well (making your apps Agentic) and I will cover some of that below as well.

Let's get started.

Step 1 - Running your LLMs within your network

As of writing this article, there are broadly two options. The diagram below represents how this can sit entirely within your network so that at no time is your data going out to the Internet.

No alt text provided for this image
High level architecture of LLM models running in your network

Option 1?- Get Open AI models through Microsoft Azure and run them within your network. Since this is not generally available to everyone at the time of writing this article, you have to sign up by filling out a form and wait to get access. Once you get access, you can set up your own Resource and then create an Open AI service.

In this model, you can both re-train the models against your data or you can get access to fine-tuned models. IMO, both of these are overkill. Keep in mind that the pricing is still by the number of tokens you use so very similar to how users are currently paying for the open cloud version of the OpenAI models.

Option 2?- Run open source LLMs from?Hugging Face ?and run it on one of your servers that is in your network. Note that AWS now also offers running Hugging Face on their cloud through Sagemaker.

Hugging Face is an open-source community where different people, teams, and organizations regularly post their own LLM models. You can download these directly through code and then run them on your virtual machines. Here are some high-level steps.

Install the Transformers Library: Hugging Face's transformers library provides thousands of pre-trained models to perform tasks on texts. You can install it using pip:

pip3 install transformers        


Next, download the models:


from transformers import GPT2LMHeadModel, GPT2Tokenize

tokenizer = GPT2Tokenizer.from_pretrained("gpt2") model = GPT2LMHeadModel.from_pretrained("gpt2")r        


Once the models have been downloaded, you can use them for various tasks, including generating texts.


input_text = "Once upon a time

inputs = tokenizer.encode(input_text, return_tensors="pt")

outputs = model.generate(inputs, max_length=500, do_sample=True, temperature=0.7)

print(tokenizer.decode(outputs[0]))"        


Note you can use the wildly famous and useful?langchain libraries ?to chain multiple LLM models and even chain them so that the output from one becomes the input for the next model in the chain. This is also a good way to make your apps Agentic, aka make your AI apps perform actions based on responses from the Gen AI models.

Here is a list of some companies that are offering their own commercial Gen AI models that you can run within your network:

  1. Cohere:

2. Mosaic ML

3. Anthrpoic - Not generally available yet, but good one to keep an eye on.

4. AWS Bedrock and Titan


5. Google PALM


Step 2 - Organizing your data with Vectors

Once you have your models running within your network, you now need to think about how to add memory to it. As we saw, this is key to making any Gen AI app truly custom to your requirements.

In general, there are two ways to make your LLM custom to you.

  1. Re-train your LLM?- This article will not get into the details of how to do this, but I should add this is a very expensive and time-consuming process as of the time this article was written. You not only need to have access to your dataset, you also need expensive GPUs to use libraries like pytorch. I have no doubt that in the near future, we may be able to incrementally re-train LLMs, but till that happens, personally, I am going spare my time and money to do what the rest of the world is doing, which is the second option.
  2. Search and Prompt?- There are industry analysts that have already started giving this fancy names like Retrieval Augmented Generation (RAG) but put simply, this is primarily the strategy to be the librarian between you and your prodigy child. In a slightly more technical term, this strategy can also be referred to as Search and Prompt (SAP anyone?). The application takes the user prompt, does a quick search within the company database (typically semantic only but ideally a hybrid search), then sends that along with the original prompt to the prodigy child (your LLMs) to answer.

No alt text provided for this image
Search and Prompt methodology to provide context to each query

I should add here that I have skipped the muddiness of adding a Vector-only database to the stack because, personally, my opinion is you need a hybrid search - exact keyword and semantic search in order to really generate highly curated and matched data for your LLM. If you choose to only use semantic search, say for a prototype or your own personal projects, there are two broad categories for vector embeddings and searches.

Vector-only databases?- These are purpose-built only for Vectors, like a chef that only knows how to make one dish. There are some downsides to this approach, like no support for SQL, limited support for metadata and the biggest one - they cannot join other kinds of data within your organization. Some examples of Vector only databases

  1. Milvus

2. Weaviate

3. Chroma

4. Qdrant

5. Pinecone


Vector libraries?- These are Python libraries that can help you both create, store (typically in memory), and do semantic searches. These are mostly open source and have similar characteristics as Vector-only databases. If you are interested in using libraries, I would suggest one of the following in the list below.

  1. FAISS

2. ANNOY

3. NSLIB


Enterprise-grade databases?- This would be my recommendation for anyone starting new because if we are talking about enterprise-grade, it is important to take into account the following things, the most important being the first one -

  1. Is the database able to handle Vectors, SQL, and NoSQL data and do hybrid searches?
  2. Can you run the DB in the cloud, on-prem, or in a hybrid fashion?
  3. Does it have Disaster Recovery (DR), and can it scale horizontally?
  4. Does the DB have connectors/pipelines to bring data into it from different diverse sources?
  5. Does the DB have millisecond response times?
  6. Based on all of these requirements, I personally would recommend SingleStore, but I am biased because I currently work at SingleStore. There are numerous options out there, and I suggest doing your own due diligence to pick the one that works for you.


Step 3 - Putting it all together with Search and Prompt

After you have successfully installed your own LLM and curated your relevant data into vectors and/or other data formats, you are now ready to start building the app. The diagram below describes how to do this using different libraries. I have personally become a fan of OpenAI as a product and use it to generate embeddings for all my projects. You can choose to use Transformers from Hugging Face as well.

Let's look at a simple example of taking a pdf doc, converting the text into embeddings, and storing them in a database.

No alt text provided for this image
The complete process of creating and searching embeddings


I created two apps using this methodology, one to read from Wikipedia and then pass this information to OpenAI to get back the response, and the second to ingest a pdf book and use OpenAI to answer questions about the book's content.

Note that since this is publicly available, I am using the cloud version of OpenAI in these examples, but you can use the code and change the endpoints to talk to your LLMs in your private network.

You can check these out on GitHub here:

  1. Wikipedia as a memory generator →?https://github.com/singlestore-labs/singlestoredb-samples/blob/main/Tutorials/OpenAI_wikipedia ?semantic_search/OpenAI_wikipedia semantic_search.ipynb
  2. PDF/Book reader →?https://github.com/madhukarkumar/pdf-reader-with-openai/blob/main/pdf-reader-notebook-example.ipynb

If you have stuck this far, I am going to assume this is relevant to you and you are interested in learning more. My goal in writing these long articles is to help others cut short the time it took me to learn and do these experiments. So, if this article is helpful to you, drop me a direct message or follow me on LinkedIn and Twitter and feel free to engage, suggest, or ask.

Faith Falato

Account Executive at Full Throttle Falato Leads - We can safely send over 20,000 emails and 9,000 LinkedIn Inmails per month for lead generation

1 个月

Madhukar, thanks for sharing! I am hosting a live monthly roundtable every first Wednesday at 11am EST to trade tips and tricks on how to build effective revenue strategies. I would love to have you be one of my special guests! We will review topics such as: -LinkedIn Automation: Using Groups and Events as anchors -Email Automation: How to safely send thousands of emails and what the new Google and Yahoo mail limitations mean -How to use thought leadership and MasterMind events to drive top-of-funnel -Content Creation: What drives meetings to be booked, how to use ChatGPT and Gemini effectively Please join us by using this link to register: https://forms.gle/iDmeyWKyLn5iTyti8 #sales

回复
A. A. Meem

Leverage my startup expertise & CloudApper's AI platform to take the path less travelled—where ideas bloom swiftly.

4 个月

Thanks for the entertaining "kid prodigy" example. xD

回复

That was the article I was searching to get the fundamentals right

Pastor Louvens Pierre(MBA)(CSM?)

Disabled Vet || JD&MD (c) || Founder & VP || Senior Quantitative & AI Developer || Data Scientist || Software Engineer || Technical Business Analyst || Software QA Manager || Scrum Master || Tech Recruiter&Philosopher

1 年

Madhukar Kumar Thank you:) This is superb.

Madhukar Kumar

CMO | Writer | Developer -> Marketer

1 年

Two new #GenAI updates this weeks that make highly curated structured and structured data and hybrid searches a must for all enterprise apps. 1. Paper on LIMA - Less is More for Alignment -https://t.co/WTweIBhDFw The paper shows that LIMA, even with less data, can outperform giants like Alpaca 65B & DaVinci003. This highlights the importance of quality over quantity in #data for AI models. This also means we will see better and more Open Source LLMs that can be run privately. 2. Antropic's Claude will supposedly take 100K tokens for prompt and contextualization. Another data point to suggest we will see more #searchandprompt vs re-training models. Exciting times.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了