登录查看更多内容

Using a Large Language Model (llm) with a RAG to advise on setting up AWS to deploy LLMs with RAGs

Dr. David Tully

Unreal Engine | OmniVerse | AI | VR | AR | XR | Web3D |

发布日期: 2024年3月25日

We have been developing some amazing AI powered avatars Scenegraph Studios to conduct interview skills training, museum tour guides, and university guides and other cool things.

Powering our avatars is an Large Language Model (Llama2) with a RAG (Retrieval Augmented Generation). This allows us to have the avatar talk about specific information without it going in crazy directions of conversation.

AWS is SCARY

AWS is a scary place to start looking around... click a button and it might turn on a server and your hit with a massive bill. This is me being dramatic, but the anxiety if this happening is real.

Our company Scenegraph Studios in on the NIVDIA Inception Program and have credits with AWS (thanks 英伟达 ) to support our deployment to the cloud.

We have been researching the best ways to set up on AWS, but with over 200 services, it is difficult to know which one to choose.

We asked the help box the same question we asked our AI and this is what it told us;

Question: "I am trying to deploy a docker container that runs a API for a llama 2 chat bot what is the best service to accomplish this where multiple users will be connecting and I need enough vram to let the?model?run?fast."

Response:

"Sorry, I can't answer that question. Can I help you understand more about AWS services? Below are a few examples"

We asked a few questions and it answered them fine, but we thought this to be simple.

RAG Setup with our LLM

We had the realisation that our own LLM setup can answer the questions. We just need data. After about 3 minutes of downloading PDFs, we popped them into the RAG and asked the question again - and job done.

Results of our LLM setup

Asking the same question we got this response:

Great! as an aws chatbot, i'd be happy to help you find the best service to deploy your docker container and run your llama bot

for running a containerized api that supports multiple users and requires enough vram for your machine learning model to run fast, i recommend using amazon elastic container service (ecs) combined with amazon elastic container registry (ecr) and amazon elastic container instances (eci)

here's why:

领英推荐

A re:Invent exclusive: AWS CEO Adam Selipsky to reveal…

John Furrier 1 年前

What's in Tech : Wk of Dec 6th 2024

Dr.Dinesh Chandrasekar (DC) 3 个月前

Unleashing the Power of Google Cloud AI: Transforming…

Sidd TUMKUR 5 个月前

Ecs: ecs is a highly scalable, high-performance container orchestration service that allows you to deploy, manage, and run containerized applications

It supports a wide range of container runtimes, including docker

With ecs, you can easily deploy and manage your llama bot container, scale it up or down as needed, and monitor its performance

Ecr: ecr is a fully managed container registry that allows you to store, manage, and deploy container images

You can use it to store your llama bot container image and easily deploy it across multiple environments

Ecr integrates seamlessly with ecs, making it easy to manage your container images and deploy them to your desired environment

Eci: eci is a fully managed container instance type that provides high-performance computing capabilities

It's optimized for containerized applications and supports popular container runtimes like docker

With eci, you can run your llama bot container with enough vram to ensure smooth model performance

You can also take advantage of eci's high-performance computing capabilities to handle a large number of concurrent connections from multiple users

by combining these services, you can easily deploy and manage your llama bot container, scale it up or down as needed, and ensure it runs fast and efficiently

let me know if you have any questions or if there's anything else i can?help?you?with!

Next steps for Scenegraph

We have been abit quiet about the work we have been developing but it is time to get our application out into the wild.

If you are interested in having your own AI powered avatar with a database of information which you can query by talking to it, or typing, reach out to us.

Sudarashan Sahu {Python Developer ~ AI Expert}

Sr. Python Developer- AI Expert (custom coded)} Specialised in multi-AI agent systems, Vertex AI Agent integrations, Agentforce, LLM, NLP, GenAI, RAG, Chatbot, Data scraping, ETL, Chatbot Integration etc

10 个月

still open?

Tim Limon

1 年

I really want to come down and check this out. Was chatting with someone this weekend about application of this kind of tech in Holocaust Museums. While I can’t say I was impressed with how it was proposed to be implemented, the whole idea was fascinating.

1 次回应

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

1 年

It's intriguing how you're exploring the integration of local LLMs with RAGs to enhance your AI avatars, especially considering the complexity of AWS options. This approach aligns with recent trends where companies are leveraging on-premises resources alongside cloud services for tailored solutions. In your comparison between the AWS Chatbot and your local LLM, how do you ensure the scalability and reliability of your solution, particularly when handling varying workloads and user interactions?

1 次回应

查看更多评论

要查看或添加评论，请登录

Dr. David Tully的更多文章

Make your own Podcast talking about AWS Whitepapers using NotebookLM

2024年11月29日

Make your own Podcast talking about AWS Whitepapers using NotebookLM

Whitepapers are great..

2 条评论
Unreal Engine 5 Sequence list of {format_strings} - render out to folders instead of a dump of images

2024年4月23日

Unreal Engine 5 Sequence list of {format_strings} - render out to folders instead of a dump of images

When we render out animations from Unreal Engine, it is often preferred to render out shots and sequences to individual…
Comparing llama_index(llama2) vs langchain(llama3) for a University Prospectus Chatbot

2024年4月22日

Comparing llama_index(llama2) vs langchain(llama3) for a University Prospectus Chatbot

We have been playing with the new Llama3 from Meta, testing which model is the best with our AI avatars. In this test…
Remember to check your Epic Games Editor Vault Cache to gain storage back

2024年4月7日

Remember to check your Epic Games Editor Vault Cache to gain storage back

When working with Unreal Engine - or playing games on the Epic Games Launcher, it is always good to check your vault…
Stop MEDIUM upgrade messages - use Incognito

2020年9月11日

Stop MEDIUM upgrade messages - use Incognito

I receive the MEDIUM emails (too many to be honest) but they are quite good once the algorithm finds out what you like,…

1 条评论
Contributions to Big Geospatial Data Rendering and Visualisations - Ph.D Thesis: Design chapter 6

2020年7月7日

Contributions to Big Geospatial Data Rendering and Visualisations - Ph.D Thesis: Design chapter 6

Within this section we discuss the high-level overview of the design of the PVS framework and design of components…
Pitch4D - What is it?

2020年7月1日

Pitch4D - What is it?

Pitch4D - Interactive Presentations is the new PowerPoint for a modern company. Fully interactive 3D presentations with…

3 条评论
Contributions to Big Geospatial Data Rendering and Visualisations - Ph.D Thesis: Specification chapter 5

2020年6月24日

Contributions to Big Geospatial Data Rendering and Visualisations - Ph.D Thesis: Specification chapter 5

Within this section we will discuss the specification needs for the project. The specification states what is needed to…

2 条评论
Contributions to Big Geospatial Data Rendering and Visualisations - Ph.D Thesis: Big Geospatial Data Challenges chapter 4

2020年6月19日

Contributions to Big Geospatial Data Rendering and Visualisations - Ph.D Thesis: Big Geospatial Data Challenges chapter 4

This section will cover all the difficulties and issues with data we propose to use. Multiple types of data available…

2 条评论
How to read an academic paper

2020年6月17日

How to read an academic paper

This is a short article on how to read an academic paper. The structure of an academic paper Abstract Introduction…

2 条评论

See all articles

Using a Large Language Model (llm) with a RAG to advise on setting up AWS to deploy LLMs with RAGs

Dr. David Tully

Unreal Engine | OmniVerse | AI | VR | AR | XR | Web3D |

AWS is SCARY

RAG Setup with our LLM

Results of our LLM setup

领英推荐

Next steps for Scenegraph

Dr. David Tully的更多文章

社区洞察

其他会员也浏览了

Your single-model AI strategy is costing you millions

AWS re:Invent — It's All About Applied AI

Cloud Strategies for LLM Model Deployment : AWS, Azure, GCP

Building a Scalable Retrieval-Augmented Generation (RAG) Workflow with AWS Bedrock and LLM Ops

Optimizing Generative AI: AWS Introduces Cost-Saving Features for Bedrock

What is Microsoft Copilot for Azure?

AWS Generative AI Services

LLMOps: The Backbone of Large Language Models

AWS re:Invent 2024 Recap: Generative AI Takes Center Stage!

How Azure and OpenAI Are Revolutionizing Businesses

AWS is SCARY

RAG Setup with our LLM

Results of our LLM setup

领英推荐

Next steps for Scenegraph

Dr. David Tully的更多文章

Make your own Podcast talking about AWS Whitepapers using NotebookLM

Unreal Engine 5 Sequence list of {format_strings} - render out to folders instead of a dump of images

Comparing llama_index(llama2) vs langchain(llama3) for a University Prospectus Chatbot

Remember to check your Epic Games Editor Vault Cache to gain storage back

Stop MEDIUM upgrade messages - use Incognito

Contributions to Big Geospatial Data Rendering and Visualisations - Ph.D Thesis: Design chapter 6

Pitch4D - What is it?

Contributions to Big Geospatial Data Rendering and Visualisations - Ph.D Thesis: Specification chapter 5

Contributions to Big Geospatial Data Rendering and Visualisations - Ph.D Thesis: Big Geospatial Data Challenges chapter 4

How to read an academic paper

社区洞察

其他会员也浏览了

Your single-model AI strategy is costing you millions

AWS re:Invent — It's All About Applied AI

Cloud Strategies for LLM Model Deployment : AWS, Azure, GCP

Building a Scalable Retrieval-Augmented Generation (RAG) Workflow with AWS Bedrock and LLM Ops

Optimizing Generative AI: AWS Introduces Cost-Saving Features for Bedrock

What is Microsoft Copilot for Azure?

AWS Generative AI Services

LLMOps: The Backbone of Large Language Models

AWS re:Invent 2024 Recap: Generative AI Takes Center Stage!

How Azure and OpenAI Are Revolutionizing Businesses