Deep Dive into AWS Generative AI Services: A Layered Approach
AWS Generative AI offers a comprehensive suite of services to empower you at every stage of your generative AI journey. Let's delve deeper into each layer:
Layer 1: Infrastructure for Foundation Model Training and Inference
The bottom layer of the stack is the infrastructure—compute, networking, frameworks, services—required to train and run LLMs and other FMs. AWS innovates to offer the most advanced infrastructure for ML.
GPUs: AWS was the first Cloud Service Provider (CSP) to offer NVIDIA GPUs in the public cloud. AWS P5 instances provide up to 8 NVIDIA H100 GPUs with a total of up to 640 GB HBM3 GPU memory per instance.
AWS Nitro System is a lightweight hypervisor that provides improved compute and networking performance for EC2 instances. Accelerated computing instances combined with differentiated AWS technologies such as?the AWS Nitro System, up to 3,200 Gbps of?Elastic Fabric Adapter?(EFA) networking, as well as exascale computing with Amazon EC2 UltraClusters helps to deliver the most performant infrastructure for generative AI workloads.
AWS Trainium2 : It is designed to deliver better price performance for training models with hundreds of billions to trillions of parameters. Trainium2 should deliver up to four times faster training performance than first-generation Trainium, and when used in EC2 UltraClusters should deliver up to 65 exaflops of aggregate compute. This means customers will be able to train a 300 billion parameter LLM in weeks versus months. Trainium2’s performance, scale, and energy efficiency are some of the reasons why Anthropic has chosen to train its models on AWS, and will use Trainium2 for its future models.
AWS Inferentia accelerators are designed by AWS to deliver high performance at the lowest cost in Amazon EC2 for your deep learning (DL) and generative AI inference applications.?
Amazon SageMaker is a fully managed machine learning (ML) service. With SageMaker, data scientists and developers can quickly and confidently build, train, and deploy ML models into a production-ready hosted environment. It provides a UI experience for running ML workflows that makes SageMaker ML tools available across multiple integrated development environments (IDEs).
AWS Neuron is an SDK with a compiler, runtime, and profiling tools that unlocks high-performance and cost-effective deep learning (DL) acceleration. It supports high-performance training on AWS Trainium-based Amazon Elastic Compute Cloud (Amazon EC2) Trn1 instances.
Elastic Fabric Adapter (EFA) is a network interface for EC2 instances that enables customers to run applications requiring high levels of inter-node communications at scale on AWS. Its custom-built operating system (OS) bypass hardware interface enhances the performance of inter-instance communications, which is critical to scaling these applications.
EC2 UltraClusters: UltraClusters can help you scale to thousands of GPUs or purpose-built ML accelerators, such as AWS Trainium, to get on-demand access to a supercomputer. They democratize access to supercomputing-class performance for machine learning (ML), generative AI, and high performance computing (HPC) developers through a simple pay-as-you-go usage model without any setup or maintenance costs. Amazon EC2 P5 instances, Amazon EC2 P4d instances, and Amazon EC2 Trn1 instances are all deployed in Amazon EC2 UltraClusters.
领英推荐
EC2 Capacity Blocks for ML: Capacity Blocks for ML allow you to reserve highly sought-after GPU instances on a future date to support your short duration machine learning (ML) workloads. Instances that run inside a Capacity Block are automatically placed close together inside Amazon EC2 UltraClusters, for low-latency, petabit-scale, nonblocking networking.
Layer 2: Tools to Build with LLMs and Other Foundation Models
Amazon Bedrock – a fully managed service that provides access to high-performing FMs from leading AI companies through a single API, enabling you to choose the right model for your sustainability use cases.
Agents for Amazon Bedrock offers you the ability to build and configure autonomous agents in your application. An agent helps your end-users complete actions based on organization data and user input. Agents orchestrate interactions between foundation models (FMs), data sources, software applications, and user conversations.
With agents, you can automate tasks for your customers and answer questions for them. For example, you can create an agent that helps customers process insurance claims or an agent that helps customers make travel reservations. You don't have to provision capacity, manage infrastructure, or write custom code. Amazon Bedrock manages prompt engineering, memory, monitoring, encryption, user permissions, and API invocation.
Agents perform the following tasks:
We can apply guardrails to all large language models (LLMs) in Amazon Bedrock, including fine-tuned models, and Agents for Amazon Bedrock. This drives consistency in how we deploy our preferences across applications so we can innovate safely while closely managing user experiences based on our requirements. By standardizing safety and privacy controls, Guardrails for Amazon Bedrock helps us build generative AI applications that align with your responsible AI goals.
Layer 3: Applications that Leverage LLMs and Other Foundation Models
Amazon Q is a secure AI chatbot for business use that can be tailored to your company by plugging into all your popular data sources, AWS or third-party. It assists with answering questions, problem-solving, content generation, and taking actions based on company data, code, and systems. It can help with everyday tasks like summarizing documents, drafting emails, conducting research, and performing comparative analyses.
Amazon Q in Amazon Quicksight : Amazon Q integrates with Amazon QuickSight for Generative BI capabilities to facilitate the quick creation of compelling visuals and data stories, answer data-related queries, and summarize insights using natural language. This integration streamlines the process of interacting with data and extracting meaningful insights.
Amazon Q in Amazon Connect : Amazon Q in Connect automatically detects customer intent during calls and chats using conversational analytics and natural language understanding (NLU). It then provides agents with immediate, real-time generative responses and suggested actions. It also provides links to relevant documents and articles.
Amazon CodeWhisperer is a tool designed to help software developers to write better-quality code. It provides recommendations on how to optimize code for performance, security, and maintainability. CodeWhisperer uses AI and ML algorithms to analyze code and provide suggestions to improve it. CodeWhisperer is integrated with other AWS services, such as AWS CodeCommit, AWS CodeBuild, and AWS CodePipeline, to provide a seamless software development experience.