登录查看更多内容

Deep Dive into AWS Generative AI Services: A Layered Approach

Dr. Rabi Prasad Padhy

Generative AI Practice Head

发布日期: 2024年4月25日

AWS Generative AI offers a comprehensive suite of services to empower you at every stage of your generative AI journey. Let's delve deeper into each layer:

Layer 1: Infrastructure for Foundation Model Training and Inference

The bottom layer of the stack is the infrastructure—compute, networking, frameworks, services—required to train and run LLMs and other FMs. AWS innovates to offer the most advanced infrastructure for ML.

GPUs: AWS was the first Cloud Service Provider (CSP) to offer NVIDIA GPUs in the public cloud. AWS P5 instances provide up to 8 NVIDIA H100 GPUs with a total of up to 640 GB HBM3 GPU memory per instance.

AWS Nitro System is a lightweight hypervisor that provides improved compute and networking performance for EC2 instances. Accelerated computing instances combined with differentiated AWS technologies such as?the AWS Nitro System, up to 3,200 Gbps of?Elastic Fabric Adapter?(EFA) networking, as well as exascale computing with Amazon EC2 UltraClusters helps to deliver the most performant infrastructure for generative AI workloads.

AWS Trainium2 : It is designed to deliver better price performance for training models with hundreds of billions to trillions of parameters. Trainium2 should deliver up to four times faster training performance than first-generation Trainium, and when used in EC2 UltraClusters should deliver up to 65 exaflops of aggregate compute. This means customers will be able to train a 300 billion parameter LLM in weeks versus months. Trainium2’s performance, scale, and energy efficiency are some of the reasons why Anthropic has chosen to train its models on AWS, and will use Trainium2 for its future models.

AWS Inferentia accelerators are designed by AWS to deliver high performance at the lowest cost in Amazon EC2 for your deep learning (DL) and generative AI inference applications.?

Amazon SageMaker is a fully managed machine learning (ML) service. With SageMaker, data scientists and developers can quickly and confidently build, train, and deploy ML models into a production-ready hosted environment. It provides a UI experience for running ML workflows that makes SageMaker ML tools available across multiple integrated development environments (IDEs).

AWS Neuron is an SDK with a compiler, runtime, and profiling tools that unlocks high-performance and cost-effective deep learning (DL) acceleration. It supports high-performance training on AWS Trainium-based Amazon Elastic Compute Cloud (Amazon EC2) Trn1 instances.

Elastic Fabric Adapter (EFA) is a network interface for EC2 instances that enables customers to run applications requiring high levels of inter-node communications at scale on AWS. Its custom-built operating system (OS) bypass hardware interface enhances the performance of inter-instance communications, which is critical to scaling these applications.

EC2 UltraClusters: UltraClusters can help you scale to thousands of GPUs or purpose-built ML accelerators, such as AWS Trainium, to get on-demand access to a supercomputer. They democratize access to supercomputing-class performance for machine learning (ML), generative AI, and high performance computing (HPC) developers through a simple pay-as-you-go usage model without any setup or maintenance costs. Amazon EC2 P5 instances, Amazon EC2 P4d instances, and Amazon EC2 Trn1 instances are all deployed in Amazon EC2 UltraClusters.

领英推荐

HPE Empowers Users with Generative AI through Advanced…

Bernard Marr 1 年前

The Future of Artificial Intelligence, Cloud…

Bernard Marr 3 年前

What is Cloud AI?

Analytics Insight? 7 个月前

EC2 Capacity Blocks for ML: Capacity Blocks for ML allow you to reserve highly sought-after GPU instances on a future date to support your short duration machine learning (ML) workloads. Instances that run inside a Capacity Block are automatically placed close together inside Amazon EC2 UltraClusters, for low-latency, petabit-scale, nonblocking networking.

Layer 2: Tools to Build with LLMs and Other Foundation Models

Amazon Bedrock – a fully managed service that provides access to high-performing FMs from leading AI companies through a single API, enabling you to choose the right model for your sustainability use cases.

Agents for Amazon Bedrock offers you the ability to build and configure autonomous agents in your application. An agent helps your end-users complete actions based on organization data and user input. Agents orchestrate interactions between foundation models (FMs), data sources, software applications, and user conversations.

With agents, you can automate tasks for your customers and answer questions for them. For example, you can create an agent that helps customers process insurance claims or an agent that helps customers make travel reservations. You don't have to provision capacity, manage infrastructure, or write custom code. Amazon Bedrock manages prompt engineering, memory, monitoring, encryption, user permissions, and API invocation.

Agents perform the following tasks:

Extend foundation models to understand user requests and break down the tasks that the agent must perform into smaller steps.
Collect additional information from a user through natural conversation.
Take actions to fulfill a customer's request by making API calls to your company systems.
Augment performance and accuracy by querying data sources.

We can apply guardrails to all large language models (LLMs) in Amazon Bedrock, including fine-tuned models, and Agents for Amazon Bedrock. This drives consistency in how we deploy our preferences across applications so we can innovate safely while closely managing user experiences based on our requirements. By standardizing safety and privacy controls, Guardrails for Amazon Bedrock helps us build generative AI applications that align with your responsible AI goals.

Layer 3: Applications that Leverage LLMs and Other Foundation Models

Amazon Q is a secure AI chatbot for business use that can be tailored to your company by plugging into all your popular data sources, AWS or third-party. It assists with answering questions, problem-solving, content generation, and taking actions based on company data, code, and systems. It can help with everyday tasks like summarizing documents, drafting emails, conducting research, and performing comparative analyses.

Amazon Q in Amazon Quicksight : Amazon Q integrates with Amazon QuickSight for Generative BI capabilities to facilitate the quick creation of compelling visuals and data stories, answer data-related queries, and summarize insights using natural language. This integration streamlines the process of interacting with data and extracting meaningful insights.

Amazon Q in Amazon Connect : Amazon Q in Connect automatically detects customer intent during calls and chats using conversational analytics and natural language understanding (NLU). It then provides agents with immediate, real-time generative responses and suggested actions. It also provides links to relevant documents and articles.

Amazon CodeWhisperer is a tool designed to help software developers to write better-quality code. It provides recommendations on how to optimize code for performance, security, and maintainability. CodeWhisperer uses AI and ML algorithms to analyze code and provide suggestions to improve it. CodeWhisperer is integrated with other AWS services, such as AWS CodeCommit, AWS CodeBuild, and AWS CodePipeline, to provide a seamless software development experience.

要查看或添加评论，请登录

Dr. Rabi Prasad Padhy的更多文章

Gen AI Observability & Monitoring

2024年11月9日

Gen AI Observability & Monitoring

Understanding Gen AI Observability & Monitoring Gen AI observability and monitoring is the practice of systematically…

1 条评论
Beyond Retrieval: How Agentic RAG is Transforming Autonomous AI

2024年11月6日

Beyond Retrieval: How Agentic RAG is Transforming Autonomous AI

[ 1 ] Simple RAG Definition: Retrieves relevant documents based on the query and uses them to generate an answer…
Large Language Models (LLMs/LSTMs/BERT)

2024年11月6日

Large Language Models (LLMs/LSTMs/BERT)

Large Language Models (LLMs) are a category of artificial intelligence models specifically designed to understand…
Selecting the Right Foundation Model for Your Use Case

2024年11月4日

Selecting the Right Foundation Model for Your Use Case

Choosing the ideal foundation model for a given use case involves evaluating several critical factors. With a wide…
Comparing LlamaIndex vs LangChain

2024年10月31日

Comparing LlamaIndex vs LangChain

LlamaIndex: LlamaIndex is a framework for organizing and retrieving information, designed to make data easier to find…
Decoding the Data Analytics Value Chain: Building a Modern Data Architecture

2024年10月30日

Decoding the Data Analytics Value Chain: Building a Modern Data Architecture

The data analytics value chain represents the entire journey of data—from its raw form in various sources to meaningful…
Open or Closed? A Practical Guide to Gen AI Model Selection

2024年10月29日

Open or Closed? A Practical Guide to Gen AI Model Selection

What Are Open-Source and Closed-Source Generative AI Models? Before diving into specific model options, let's clarify…
How Databases Evolved from Transactions to Analytics and Contextual Search

2024年10月28日

How Databases Evolved from Transactions to Analytics and Contextual Search

Databases have come a long way from their origins as simple transactional systems. Today, the database ecosystem is a…
The Modern LLM Tech Stack

2024年10月27日

The Modern LLM Tech Stack

The Modern LLM Tech Stack In the world of Generative AI, a well-structured and versatile tech stack is essential for…
Fine-Tuning LLMs Made Easy: A Comparison of LoRA and QLoRA

2024年10月26日

Fine-Tuning LLMs Made Easy: A Comparison of LoRA and QLoRA

Large language models (LLMs) like OpenAI’s GPT, Meta’s LLaMA, and Google’s PaLM have become essential tools for a wide…

See all articles

Deep Dive into AWS Generative AI Services: A Layered Approach

Dr. Rabi Prasad Padhy

Generative AI Practice Head

Layer 1: Infrastructure for Foundation Model Training and Inference

领英推荐

Layer 2: Tools to Build with LLMs and Other Foundation Models

Layer 3: Applications that Leverage LLMs and Other Foundation Models

Dr. Rabi Prasad Padhy的更多文章

社区洞察

其他会员也浏览了

General availability of Inf2 instances made possible by Meenakshi Sharma

Profit Dollars per GPU Dollar

re:Capping AWS re:Invent 2024

Convergence: AI’s Impact on Competition and Regulation in the Cloud Market

Event Report: AWS reInvents Itself for the Age of AI

AWS Goodies - June 10, 2024

Artificial Intelligence #70: Cloud architectures for AI and ML - AWS, Azure, GCP

AI DevSummit Bonus ?? FREE $1000 Cloudchipr Credit ?? Google AI Data Center Plans ??

Technical Notes from AWS re:Invent 2023

Innovative AI Solutions: Edvenswa’s Approach to Leveraging AWS Infrastructure

Layer 1: Infrastructure for Foundation Model Training and Inference

领英推荐

Layer 2: Tools to Build with LLMs and Other Foundation Models

Layer 3: Applications that Leverage LLMs and Other Foundation Models

Dr. Rabi Prasad Padhy的更多文章

Gen AI Observability & Monitoring

Beyond Retrieval: How Agentic RAG is Transforming Autonomous AI

Large Language Models (LLMs/LSTMs/BERT)

Selecting the Right Foundation Model for Your Use Case

Comparing LlamaIndex vs LangChain

Decoding the Data Analytics Value Chain: Building a Modern Data Architecture

Open or Closed? A Practical Guide to Gen AI Model Selection

How Databases Evolved from Transactions to Analytics and Contextual Search

The Modern LLM Tech Stack

Fine-Tuning LLMs Made Easy: A Comparison of LoRA and QLoRA

社区洞察

其他会员也浏览了

General availability of Inf2 instances made possible by Meenakshi Sharma

Profit Dollars per GPU Dollar

re:Capping AWS re:Invent 2024

Convergence: AI’s Impact on Competition and Regulation in the Cloud Market

Event Report: AWS reInvents Itself for the Age of AI

AWS Goodies - June 10, 2024

Artificial Intelligence #70: Cloud architectures for AI and ML - AWS, Azure, GCP

AI DevSummit Bonus ?? FREE $1000 Cloudchipr Credit ?? Google AI Data Center Plans ??

Technical Notes from AWS re:Invent 2023

Innovative AI Solutions: Edvenswa’s Approach to Leveraging AWS Infrastructure