Using a Large Language Model (llm) with a RAG to advise on setting up AWS to deploy LLMs with RAGs
We have been developing some amazing AI powered avatars Scenegraph Studios to conduct interview skills training, museum tour guides, and university guides and other cool things.
Powering our avatars is an Large Language Model (Llama2) with a RAG (Retrieval Augmented Generation). This allows us to have the avatar talk about specific information without it going in crazy directions of conversation.
AWS is SCARY
AWS is a scary place to start looking around... click a button and it might turn on a server and your hit with a massive bill. This is me being dramatic, but the anxiety if this happening is real.
Our company Scenegraph Studios in on the NIVDIA Inception Program and have credits with AWS (thanks 英伟达 ) to support our deployment to the cloud.
We have been researching the best ways to set up on AWS, but with over 200 services, it is difficult to know which one to choose.
We asked the help box the same question we asked our AI and this is what it told us;
Question: "I am trying to deploy a docker container that runs a API for a llama 2 chat bot what is the best service to accomplish this where multiple users will be connecting and I need enough vram to let the?model?run?fast."
Response:
"Sorry, I can't answer that question. Can I help you understand more about AWS services? Below are a few examples"
We asked a few questions and it answered them fine, but we thought this to be simple.
RAG Setup with our LLM
We had the realisation that our own LLM setup can answer the questions. We just need data. After about 3 minutes of downloading PDFs, we popped them into the RAG and asked the question again - and job done.
Results of our LLM setup
Asking the same question we got this response:
Great! as an aws chatbot, i'd be happy to help you find the best service to deploy your docker container and run your llama bot
for running a containerized api that supports multiple users and requires enough vram for your machine learning model to run fast, i recommend using amazon elastic container service (ecs) combined with amazon elastic container registry (ecr) and amazon elastic container instances (eci)
here's why:
领英推荐
Ecs: ecs is a highly scalable, high-performance container orchestration service that allows you to deploy, manage, and run containerized applications
It supports a wide range of container runtimes, including docker
With ecs, you can easily deploy and manage your llama bot container, scale it up or down as needed, and monitor its performance
Ecr: ecr is a fully managed container registry that allows you to store, manage, and deploy container images
You can use it to store your llama bot container image and easily deploy it across multiple environments
Ecr integrates seamlessly with ecs, making it easy to manage your container images and deploy them to your desired environment
Eci: eci is a fully managed container instance type that provides high-performance computing capabilities
It's optimized for containerized applications and supports popular container runtimes like docker
With eci, you can run your llama bot container with enough vram to ensure smooth model performance
You can also take advantage of eci's high-performance computing capabilities to handle a large number of concurrent connections from multiple users
by combining these services, you can easily deploy and manage your llama bot container, scale it up or down as needed, and ensure it runs fast and efficiently
let me know if you have any questions or if there's anything else i can?help?you?with!
Next steps for Scenegraph
We have been abit quiet about the work we have been developing but it is time to get our application out into the wild.
If you are interested in having your own AI powered avatar with a database of information which you can query by talking to it, or typing, reach out to us.
Sr. Python Developer- AI Expert (custom coded)} Specialised in multi-AI agent systems, Vertex AI Agent integrations, Agentforce, LLM, NLP, GenAI, RAG, Chatbot, Data scraping, ETL, Chatbot Integration etc
10 个月still open?
I really want to come down and check this out. Was chatting with someone this weekend about application of this kind of tech in Holocaust Museums. While I can’t say I was impressed with how it was proposed to be implemented, the whole idea was fascinating.
Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer
1 年It's intriguing how you're exploring the integration of local LLMs with RAGs to enhance your AI avatars, especially considering the complexity of AWS options. This approach aligns with recent trends where companies are leveraging on-premises resources alongside cloud services for tailored solutions. In your comparison between the AWS Chatbot and your local LLM, how do you ensure the scalability and reliability of your solution, particularly when handling varying workloads and user interactions?