Using a Large Language Model (llm) with a RAG to advise on setting up AWS to deploy LLMs with RAGs

Using a Large Language Model (llm) with a RAG to advise on setting up AWS to deploy LLMs with RAGs

We have been developing some amazing AI powered avatars Scenegraph Studios to conduct interview skills training, museum tour guides, and university guides and other cool things.

Powering our avatars is an Large Language Model (Llama2) with a RAG (Retrieval Augmented Generation). This allows us to have the avatar talk about specific information without it going in crazy directions of conversation.

AWS is SCARY

AWS is a scary place to start looking around... click a button and it might turn on a server and your hit with a massive bill. This is me being dramatic, but the anxiety if this happening is real.

Our company Scenegraph Studios in on the NIVDIA Inception Program and have credits with AWS (thanks 英伟达 ) to support our deployment to the cloud.

We have been researching the best ways to set up on AWS, but with over 200 services, it is difficult to know which one to choose.

We asked the help box the same question we asked our AI and this is what it told us;

Question: "I am trying to deploy a docker container that runs a API for a llama 2 chat bot what is the best service to accomplish this where multiple users will be connecting and I need enough vram to let the?model?run?fast."

Response:

"Sorry, I can't answer that question. Can I help you understand more about AWS services? Below are a few examples"

We asked a few questions and it answered them fine, but we thought this to be simple.

RAG Setup with our LLM

We had the realisation that our own LLM setup can answer the questions. We just need data. After about 3 minutes of downloading PDFs, we popped them into the RAG and asked the question again - and job done.

Results of our LLM setup

Asking the same question we got this response:

Great! as an aws chatbot, i'd be happy to help you find the best service to deploy your docker container and run your llama bot

for running a containerized api that supports multiple users and requires enough vram for your machine learning model to run fast, i recommend using amazon elastic container service (ecs) combined with amazon elastic container registry (ecr) and amazon elastic container instances (eci)

here's why:

  • 1

Ecs: ecs is a highly scalable, high-performance container orchestration service that allows you to deploy, manage, and run containerized applications

It supports a wide range of container runtimes, including docker

With ecs, you can easily deploy and manage your llama bot container, scale it up or down as needed, and monitor its performance

  • 2

Ecr: ecr is a fully managed container registry that allows you to store, manage, and deploy container images

You can use it to store your llama bot container image and easily deploy it across multiple environments

Ecr integrates seamlessly with ecs, making it easy to manage your container images and deploy them to your desired environment

  • 3

Eci: eci is a fully managed container instance type that provides high-performance computing capabilities

It's optimized for containerized applications and supports popular container runtimes like docker

With eci, you can run your llama bot container with enough vram to ensure smooth model performance

You can also take advantage of eci's high-performance computing capabilities to handle a large number of concurrent connections from multiple users

by combining these services, you can easily deploy and manage your llama bot container, scale it up or down as needed, and ensure it runs fast and efficiently

let me know if you have any questions or if there's anything else i can?help?you?with!

Next steps for Scenegraph

We have been abit quiet about the work we have been developing but it is time to get our application out into the wild.

If you are interested in having your own AI powered avatar with a database of information which you can query by talking to it, or typing, reach out to us.



Sudarashan Sahu {Python Developer ~ AI Expert}

Sr. Python Developer- AI Expert (custom coded)} Specialised in multi-AI agent systems, Vertex AI Agent integrations, Agentforce, LLM, NLP, GenAI, RAG, Chatbot, Data scraping, ETL, Chatbot Integration etc

10 个月

still open?

回复

I really want to come down and check this out. Was chatting with someone this weekend about application of this kind of tech in Holocaust Museums. While I can’t say I was impressed with how it was proposed to be implemented, the whole idea was fascinating.

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

1 年

It's intriguing how you're exploring the integration of local LLMs with RAGs to enhance your AI avatars, especially considering the complexity of AWS options. This approach aligns with recent trends where companies are leveraging on-premises resources alongside cloud services for tailored solutions. In your comparison between the AWS Chatbot and your local LLM, how do you ensure the scalability and reliability of your solution, particularly when handling varying workloads and user interactions?

要查看或添加评论,请登录

Dr. David Tully的更多文章

社区洞察

其他会员也浏览了