10 Key Products for Building LLM-Based Apps on AWS
Prashant Parihar
LLM & GenAI Pioneer | AI Evangelist | Redhat OpenShift Enterprise Architect | DevSecOps Architect | Azure DevOps | Terraform |BI-Big Data, ETL | Jenkins CI-CD | Digital Transformation & Automation |GitOps/ArgoCD.| TOGAF|
Reinventing with generative AI was the big topic of Adam Selipsky’s keynote at the AWS re:Invent conference at the end of last year. In his opening keynote Selipsky, CEO of Amazon Web Services (AWS), was laser-focused on how generative AI, based on large language models (LLMs), offers a “turbo boost” for developer productivity while at the same time enabling a never before seen degree of customization toward each individual customer.
Swami Sivasubramanian, VP of Data and AI at AWS, stated that “generative AI has the potential to redefine customer experiences across industries.” This transformative character of generative AI is based on the ability to create highly personalized app experiences at the level of the industry, company, and customer. Generative AI enables this “mass customization” through its ability to crunch massive amounts of data based on the perspective of individual users.
What’s the Big Deal
The big deal here is that ultimately generative AI can deliver the exact data, user experience, and application features that an individual end user needs to complete a specific task or solve a particular challenge at a particular point in time. Generative AI-driven applications “know” each user’s priorities and overall situational context and can therefore offer individually customized dashboards, insights, recommendations, and automated solutions. Based on this deep understanding of the end user, generative AI can provide guardrails for optimal productivity.
For example, the LLM can listen to a customer support call and automatically provide the support engineer with recommendations on how to resolve the issue fastest. This could mean that for one and the same issue the LLM could make different recommendations simply based on the tone of voice of the customer on the other end of the line.
These recommendations could also depend on a myriad of other factors, such as the customer’s spending history, previous support calls with the same or similar customers, the company’s level of priority on preventing customer churn, and any number of other factors. This example shows that traditional software development based on hard-coded instructions can never work, as there is simply too much situational complexity around the organization, its customers, and staff.
10 Key AWS Products and Considerations
Now let us take a look at 10 key AWS products that aim to enable developers to add generative AI capabilities to their current and future product portfolio.
1. Foundation Models to Go: Amazon Bedrock
Amazon Bedrock allows developers to consume different foundation models from AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon itself as a managed service. Developers can fine-tune these models with their own data for tasks like text summarization, question answering, or image generation. Bedrock supports customization for specific industries and domains, enhancing the relevance and accuracy of AI responses. Use cases include enriching AI responses with proprietary data using RAG (Retrieval Augmented Generation), executing complex tasks across company systems, and creating agents for customer interaction or order processing. In a nutshell, the significance of Bedrock lies in its ability to offer one API for developers to start using and fine-tuning LLMs without much startup time.
TRENDING STORIES
2. Control the LLM Output: Guardrails for Bedrock
Guardrails for Bedrock provides a developer API for defining a set of boundaries for the answers given by any one of the BedRock foundation models. Developers can, for example, ensure that their chatbot will only answer questions that belong in a specific topic category, or they can prevent their bot from using technical jargon or problematic language. Developers can also use the Guardrails API to control regulatory compliance, limit legal liabilities, ensure brand voice consistency and respond based on local cultural sensitivities. A healthcare-related LLM could be tuned to avoid giving medical advice and always warn users to seek professional help instead of using generative AI as a substitute for their doctor.
3. Accelerated Model Training: SageMaker HyperPod
While SageMaker HyperPod is not a serverless offering, AWS enables easy provisioning, configuration and maintenance through the AWS Management Console and the command line interface. The Hyperpod cluster comes with all of the SageMaker libraries needed for distributed training across nodes and to pause, analyze and optimize the training process without the risk of having to start over. Hyperpod allows instant scaling from one to thousands of GPUs without having to rebuild the cluster. For app developers, these two factors can make the difference between easily meeting a delivery deadline and blowing up their critical path due to slow the need to repeatedly train and retrain an LLM. But keep in mind, as you are paying for the size, number, and duration of compute instances that are part of the HyperPod cluster, there is always a risk of overprovisioning or forgetting to de-provision expensive resources.
4. High-Performance Silicon: AWS Trainium2 and Graviton4 Chips
As a developer, you can harness the power of AWS’s latest silicon innovations through various EC2 instances. The EC2 Trn2 instances, each packing 16 Trainium2 chips, aim to accelerate model training, offering up to a fourfold speed increase. On the other hand, EC2 R8g instances, equipped with Graviton4 processors, are tailored for CPU and memory-heavy tasks like AI inference and real-time analytics. Essentially, while Trainium2 focuses on accelerating the model training process, Graviton4 addresses the inference side, providing the computational muscle for tasks like real-time analytics and efficient data handling in memory caches.
领英推荐
5. High-Performance Storage: Amazon S3 Express One Zone
S3 Express One Zone provides developers with low-latency high-performance storage. This type of storage is important for projects that require the handling of large data sets. LLM training and big data analytics both are prime workloads that could significantly benefit from S3 Express One Zone, as Amazon promises up to 10x faster data access speeds compared to standard S3 storage and consistent request latency in the single-digit milliseconds and 50% reduced cost of API requests. As S3 Express One Zone is a lot more expensive in terms of the actual cost of storing data, an auto-tiering capability that automatically scales up or down based on consumption is critical to keep cost in check.
6. Collaboration in Privacy: AWS Clean Rooms ML
Many AI projects die because of data privacy concerns when the data owner decides that the benefits of the project do not outweigh the compliance risk it might entail. AWS Clean Rooms ML allows model training without disclosing or even moving the training data. The new service provides access for multiple parties to provide data for model training, but without allowing for any access to this data. For example, IT, sales, and marketing could combine their data to create a joint model that helps IT prioritize service tickets based on potential business impact while sales and market teams could predict the impact of their campaigns on corporate IT. Clean Rooms ML is a fully managed “serverless” offering charged by usage metrics, such as the number and complexity of queries or trained AI models.
7. Corporate Chatbot: Q
Amazon Q represents a transformative tool for developers, integrating generative AI into the core of AWS services. With its API-centric design, it empowers developers to create tailored AI assistants, leveraging a broad spectrum of data sources from Amazon S3 to GitHub. This capability is not just an incremental enhancement; it’s a paradigm shift, offering a customizable, private LLM instance that can be intricately aligned with specific applications or user needs. As a fully managed chatbot, Amazon Q simplifies the complexity typically associated with such integrations, providing a cost-effective, usage-based billing model. This positions Q as an essential asset in the modern developer’s toolkit, unlocking new potentials in application intelligence and user interaction.
8. Data-Driven Insights: Amazon Redshift ML with LLM Support
Amazon Redshift ML now supports LLMs, enabling developers to leverage the power of LLMs in their analytics. With this enhancement, developers can make inferences on their product feedback data in Amazon Redshift, perform tasks like summarizing feedback, entity extraction, sentiment analysis, and product feedback classification. This feature requires creating an endpoint for an LLM in Amazon SageMaker JumpStart, which can be a predefined model or a custom model trained with your own data. The integration of LLMs with Redshift ML brings the power of AI to data analytics, allowing developers to extract more value from their data and make more informed decisions.
9. Streamlined Experimentation: Generative AI Application Builder on AWS
The Generative AI Application Builder on AWS accelerates development and streamlines experimentation. It provides pre-built connectors to a variety of LLMs, including those from Amazon Bedrock and select third-party LLMs. This solution offers flexibility in deploying the model of choice and integrating preferred AWS and third-party services. It also provides enterprise-grade security, scalability, high availability, and low latency. Developers can extend this solution’s functionality by integrating existing projects or natively connecting additional AWS services. The solution includes the LangChain orchestration library and Lambda functions for connecting with third-party services, making it a comprehensive tool for building and deploying AI applications.
10. Simplified AI Development: LangChain
AWS offers LangChain as a tool for developers to simplify AI development, integrate with various AWS services, and deploy LLM applications. LangChain simplifies AI development by abstracting the complexity of data source integrations and prompt refining. It ties together various components such as Amazon Bedrock, Amazon Kendra, Amazon SageMaker JumpStart, and your LLMs, enabling the building of highly accurate generative AI applications on enterprise data. LangChain serves as the interface that connects these components, making it easier for developers to set up and access generational models. This simplification of AI development allows developers to focus more on the application logic and less on the intricacies of AI model management.
Conclusion
Given the resource-intensive nature of LLMs, developers must design scalable applications that optimize AWS resources. This includes choosing appropriate instance types, managing computational resources, and employing cost-effective strategies.
Building LLM-based applications on AWS requires a nuanced understanding of both the LLM technology and the AWS ecosystem. Developers must consider aspects like scalability, cost, data privacy, integration with other AWS services, and the user experience. By paying attention to these considerations, developers can harness the full potential of AWS for building powerful, efficient, and effective LLM-driven applications.
As the field of LLMs continues to evolve, staying updated with the latest trends and best practices will be crucial for developers. The spate of announcements in LLM-related developments on AWS marks just the beginning of an exciting journey in the realm of AI and machine learning.
The advancements announced at AWS re:Invent in Chicago late last year represent a significant leap in the capabilities and efficiency of LLM-based applications on AWS. Developers must stay abreast of these developments, integrating these tools and considerations into their applications to harness the full potential of AWS and LLM technology.