Comprehensive guide to building a custom Generative AI enterprise app
I have fond?memories?of my college days.
A land before mobile phones, India of the 90s was a place where loose change could buy a landline phone call from roadside shops that also doled out an assortment of cigarettes, hot cups of chai, and samosas around the clock.
A single Rupee, equivalent to a hopeful wish in a fountain, was all it took to shout expletives across telephone lines - asking your compadres about when they are coming to college or demanding their presence at the coffee house, sometimes, guitar in tow.
My friends and I spent a lot of time sitting around at a local haunt near our college as regulars.
Ah, sweet memories.
Last year, when I went back to India, I met up with one of my old friends and decided to pay a visit to our college and the chai shop and re-live some of those memories.
The college was still there. So was the shop.
And they looked nothing like the picture I had in my memory from the 90s.
I felt a sudden pang of loss and a realization that memory, fickle as it is, is the marrow of our identity.
Our memories drive our actions.
Our memories make us who we are.
Our memories are what makes us - us.
This has been on my mind quite a lot lately as my interactions with Chat GPT-4 and other LLMs have surged in the last few days.
AI without memory is just a smart database. But AI with?custom?memory now that is a tool that can magnify your capabilities by an order of magnitude.
Let me explain.
Let's say you wake up one morning to realize you have a prodigy of a kid that is a genius when it comes to reading books, comprehending, and responding intelligently based on their knowledge (let's not worry about how the kid got there for now). But the kid has no memory or knowledge of who you or other family members are. As smart as the kid is, there is no chance that the kid can answer questions related to any information you ask about yourself or the family.
But what if you compiled all your family information into very well-curated books, asked the kid to read and internalize the information, and now asked it to answer questions related to you, your family, or even your ancestors?
This is effectively how Chat GPT and other LLMs can be used to build enterprise generative AI apps on custom data.
And believe it or not, there is a way to do this without putting all your custom data at risk of privacy and security.
Let's see how.
Effectively, there are three steps to build a custom AI app, or in other words, make your own AI child. You first ask the mom out for dinner, then you….. sorry, got carried away.
If you want to act on the information you get back from LLM, you can now do that as well (making your apps Agentic) and I will cover some of that below as well.
Let's get started.
Step 1 - Running your LLMs within your network
As of writing this article, there are broadly two options. The diagram below represents how this can sit entirely within your network so that at no time is your data going out to the Internet.
Option 1?- Get Open AI models through Microsoft Azure and run them within your network. Since this is not generally available to everyone at the time of writing this article, you have to sign up by filling out a form and wait to get access. Once you get access, you can set up your own Resource and then create an Open AI service.
In this model, you can both re-train the models against your data or you can get access to fine-tuned models. IMO, both of these are overkill. Keep in mind that the pricing is still by the number of tokens you use so very similar to how users are currently paying for the open cloud version of the OpenAI models.
Option 2?- Run open source LLMs from?Hugging Face ?and run it on one of your servers that is in your network. Note that AWS now also offers running Hugging Face on their cloud through Sagemaker.
Hugging Face is an open-source community where different people, teams, and organizations regularly post their own LLM models. You can download these directly through code and then run them on your virtual machines. Here are some high-level steps.
Install the Transformers Library: Hugging Face's transformers library provides thousands of pre-trained models to perform tasks on texts. You can install it using pip:
pip3 install transformers
Next, download the models:
from transformers import GPT2LMHeadModel, GPT2Tokenize
tokenizer = GPT2Tokenizer.from_pretrained("gpt2") model = GPT2LMHeadModel.from_pretrained("gpt2")r
Once the models have been downloaded, you can use them for various tasks, including generating texts.
input_text = "Once upon a time
inputs = tokenizer.encode(input_text, return_tensors="pt")
outputs = model.generate(inputs, max_length=500, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0]))"
Note you can use the wildly famous and useful?langchain libraries ?to chain multiple LLM models and even chain them so that the output from one becomes the input for the next model in the chain. This is also a good way to make your apps Agentic, aka make your AI apps perform actions based on responses from the Gen AI models.
Here is a list of some companies that are offering their own commercial Gen AI models that you can run within your network:
2. Mosaic ML
领英推荐
3. Anthrpoic - Not generally available yet, but good one to keep an eye on.
4. AWS Bedrock and Titan
5. Google PALM
Step 2 - Organizing your data with Vectors
Once you have your models running within your network, you now need to think about how to add memory to it. As we saw, this is key to making any Gen AI app truly custom to your requirements.
In general, there are two ways to make your LLM custom to you.
I should add here that I have skipped the muddiness of adding a Vector-only database to the stack because, personally, my opinion is you need a hybrid search - exact keyword and semantic search in order to really generate highly curated and matched data for your LLM. If you choose to only use semantic search, say for a prototype or your own personal projects, there are two broad categories for vector embeddings and searches.
Vector-only databases?- These are purpose-built only for Vectors, like a chef that only knows how to make one dish. There are some downsides to this approach, like no support for SQL, limited support for metadata and the biggest one - they cannot join other kinds of data within your organization. Some examples of Vector only databases
2. Weaviate
3. Chroma
4. Qdrant
5. Pinecone
Vector libraries?- These are Python libraries that can help you both create, store (typically in memory), and do semantic searches. These are mostly open source and have similar characteristics as Vector-only databases. If you are interested in using libraries, I would suggest one of the following in the list below.
2. ANNOY
3. NSLIB
Enterprise-grade databases?- This would be my recommendation for anyone starting new because if we are talking about enterprise-grade, it is important to take into account the following things, the most important being the first one -
Step 3 - Putting it all together with Search and Prompt
After you have successfully installed your own LLM and curated your relevant data into vectors and/or other data formats, you are now ready to start building the app. The diagram below describes how to do this using different libraries. I have personally become a fan of OpenAI as a product and use it to generate embeddings for all my projects. You can choose to use Transformers from Hugging Face as well.
Let's look at a simple example of taking a pdf doc, converting the text into embeddings, and storing them in a database.
I created two apps using this methodology, one to read from Wikipedia and then pass this information to OpenAI to get back the response, and the second to ingest a pdf book and use OpenAI to answer questions about the book's content.
Note that since this is publicly available, I am using the cloud version of OpenAI in these examples, but you can use the code and change the endpoints to talk to your LLMs in your private network.
You can check these out on GitHub here:
If you have stuck this far, I am going to assume this is relevant to you and you are interested in learning more. My goal in writing these long articles is to help others cut short the time it took me to learn and do these experiments. So, if this article is helpful to you, drop me a direct message or follow me on LinkedIn and Twitter and feel free to engage, suggest, or ask.
Account Executive at Full Throttle Falato Leads - We can safely send over 20,000 emails and 9,000 LinkedIn Inmails per month for lead generation
1 个月Madhukar, thanks for sharing! I am hosting a live monthly roundtable every first Wednesday at 11am EST to trade tips and tricks on how to build effective revenue strategies. I would love to have you be one of my special guests! We will review topics such as: -LinkedIn Automation: Using Groups and Events as anchors -Email Automation: How to safely send thousands of emails and what the new Google and Yahoo mail limitations mean -How to use thought leadership and MasterMind events to drive top-of-funnel -Content Creation: What drives meetings to be booked, how to use ChatGPT and Gemini effectively Please join us by using this link to register: https://forms.gle/iDmeyWKyLn5iTyti8 #sales
Leverage my startup expertise & CloudApper's AI platform to take the path less travelled—where ideas bloom swiftly.
4 个月Thanks for the entertaining "kid prodigy" example. xD
That was the article I was searching to get the fundamentals right
Disabled Vet || JD&MD (c) || Founder & VP || Senior Quantitative & AI Developer || Data Scientist || Software Engineer || Technical Business Analyst || Software QA Manager || Scrum Master || Tech Recruiter&Philosopher
1 年Madhukar Kumar Thank you:) This is superb.
CMO | Writer | Developer -> Marketer
1 年Two new #GenAI updates this weeks that make highly curated structured and structured data and hybrid searches a must for all enterprise apps. 1. Paper on LIMA - Less is More for Alignment -https://t.co/WTweIBhDFw The paper shows that LIMA, even with less data, can outperform giants like Alpaca 65B & DaVinci003. This highlights the importance of quality over quantity in #data for AI models. This also means we will see better and more Open Source LLMs that can be run privately. 2. Antropic's Claude will supposedly take 100K tokens for prompt and contextualization. Another data point to suggest we will see more #searchandprompt vs re-training models. Exciting times.