AWS re:Invent 2024: Developer-focused event announced new AWS LLMs and latest accelerator chips
Summary
#AWS #re:Invent 2024, held in Las Vegas from December 2 to 6, was a feast for #Amazon Web Services (AWS) cloud developers, with a host of new feature announcements. One of the challenges of working with serverless computing is the lack of transparency when testing and debugging applications, as observability agents need to be planted on the servers. Amazon is helping serverless developers by adding observability features. It announced new features for working with Kubernetes, including Amazon Elastic Kubernetes Service (EKS) Hybrid Nodes for the unified management of clusters across separate environments and AWS-managed cluster operations with Amazon EKS Auto Mode.
Amazon entered the large language model (LLM) owners arena with its new Nova series of LLMs, an output from the company’s research group focused on keeping up to date with the latest progress in generative artificial intelligence (AI). The support for partner LLMs in Amazon Bedrock has expanded. To accelerate AI with hardware, Amazon in-house originated chips Trainium and Inferentia continue to be iterated. Developers can use Amazon Q Developer as a code assistant, and this is available integrated with Amazon Bedrock for building LLM applications. They can also use Amazon SageMaker to build analytics and machine learning applications.
Notably, Amazon conducts significant in-house research (such as discussed at www.amazon.science). In Omdia’s opinion, analysts and Amazon customers would welcome hearing more about the company’s research efforts.
Analyst view
Amazon Lambda now supports application performance management
Serverless services are so named because they abstract away the chore of infrastructure management from developers, shielding them from having to administer and maintain the infrastructure. However, the downside is that developers building applications using AWS Lambda lack transparency into performance bottlenecks and vital information to troubleshoot test and debug efforts. This has been corrected by observability features that provide logs, allow custom metrics to be installed, and can examine edge cases in detail. Distributed tracing is possible with sensors placed close to generators. These features also make self-healing possible by monitoring metrics and, when issues arise, trigger remediation and failback.
Additionally, AWS Lambda now supports Amazon CloudWatch Application Signals, an application performance monitoring (APM) solution. It enables developers and operators to easily monitor the health and performance of their serverless applications built using Lambda. Application Signals provides prebuilt, standardized dashboards for critical application metrics (such as throughput, availability, latency, faults, and errors), correlated traces, and interactions between the Lambda function and its dependencies (such as other AWS services) without requiring any manual instrumentation or code changes from developers. AWS Lambda also supports CloudWatch Log Insights.
In Omdia’s opinion, these AWS tools fill a gap in serverless services, which could only be partially filled by third-party observability tools. The third-party tools will also benefit from having access to log and metrics data available from the “serverless” servers. So, the AWS tools represent a welcome addition that will give enterprises more confidence in moving to serverless computing.
The future of Kubernetes on AWS
At re:Invent 2024, Nathan Taber, head of product for Kubernetes and registry at AWS, spoke about enhanced image scanning on Amazon Elastic Container Registry (ECR). It is powered by Amazon Inspector and covers over 50 vulnerability databases and more than 12 operating systems. ECR is popular, with over 2 billion image pulls per day. Amazon is providing extended version support on EKS as new releases of Kubernetes take place; customers need some 14 months to move to a new version, so extending the support is vital for enterprises. The company is also providing Upgrade Insights, a report on how API calls would be impacted on different Kubernetes versions, helping users assess the move to a new Kubernetes version. It has added an application recovery controller (ARC) to EKS so that when failures occur, traffic can be automatically routed to alternative availability zones (AZs). This feature can be automatically triggered by AWS or manually by the user. ARC also manages the transfer back to the originally troubled AZ once it is operational again.
Amazon EKS is now available in every global region and across cloud-to-edge deployments. Also announced at re:Invent 2024 is the availability of EKS Hybrid Nodes, where on-premises and edge infrastructure can be used as nodes in Amazon EKS clusters, offering unified Kubernetes management across all the separate environments. Another new tool is Amazon EKS Auto Mode, which automates Kubernetes cluster operations, letting AWS make the operational decisions. Costs are optimized with automatic capacity planning and dynamic scaling.
In the longer term, Amazon intends to reduce the friction for non-tech companies to benefit from the technology available on AWS without having to become tech experts. This entails more focus on ease of management, meeting the workloads where they exist, and simplifying building solution platforms on AWS.
In Omdia’s opinion, one of the biggest challenges that an AWS user faces is the sheer vastness of the services available—and navigating these options is a daunting task. It would help users, whether newbies or experienced, to have a role-based navigation path through AWS that only exposed services related to the level of technology skills available in the user organization.
Amazon enters LLM market with Nova series models
While Amazon competes with Microsoft and Google in the cloud service provider (CSP) space, the company’s ambitions regarding AI to date have been about “selling shovels to the miners” (i.e., providing the tools to build AI applications and being agnostic about any particular LLM). This approach is in contrast with the aforementioned rivals that have deep research projects in AI. They create their own LLMs and partner with assorted independent LLM providers, but they are also engaged in building more intelligent machines (a.k.a. artificial general intelligence [AGI] or human-level AI). However, this position is changing. At re:Invent 2024, Amazon announced a series of its own in-house developed LLMs called Nova.
This announcement indicates Amazon is building competency in LLM creation. The company says little about its wider research goals but is running a pure and applied research program (see www.amazon.science). In contrast, Microsoft and Google are more vocal about their AI research aspirations, with stated aims that may yield a stepping stone to AGI. But this remains to be seen, and there is a lot of skepticism that without additional features, LLMs cannot achieve AGI as they are currently architected (see an article predicting the achievement of AGI in Further reading).
Amazon Bedrock features expand
Amazon Bedrock is the AWS service that creates generative AI applications. Its Marketplace offers access to a host of foundation-level LLMs, with options that keep expanding; model creators include Anthropic, Cohere, Meta, Mistral, and stability.ai. At re:Invent 2024, Amazon announced a host of new features and alliances in Bedrock:
§? poolside: A partnership with poolside for its code Assistants (models malibu and point), which will be available in Bedrock.
领英推荐
§? Amazon Bedrock Distillation: A feature for distilling knowledge in a large model into a more cost-effective smaller model targeting specific use cases. This approach is also faster and only slightly less accurate than the full model.
§? Amazon Bedrock Intelligent Prompt Routing: Makes intelligent decisions on which size model is appropriate to the query, helping reduce costs and latency.
§? Amazon Bedrock Knowledge Base: Designed to make retrieval-augmented generation (RAG) easier to use. Natural language queries can also access structured data sources.
§? Amazon Bedrock Knowledge Graph: Brings the power of knowledge graphs using graphRAG and Amazon Neptune (the serverless graph database).
§? Amazon Bedrock Data Automation: Makes unstructured data into structured data.
§? Amazon Bedrock Guardrails: Has multimodal toxicity detection.
In Omdia’s opinion, Amazon Bedrock is an excellent starting point for exploring developing applications with LLMs. When used with Bedrock, tools such as Amazon Q Developer will reduce the time to production when working with the latest generative AI advances.
Amazon Q Developer gives developers advanced code assistance
Amazon Q was first launched in April 2024 and was a key announcement at re:Invent 2024. It is available in three versions: Developer for software development, Business for business analytics, and Apps for building AI-powered apps. Amazon Q Developer (which is built on top of and supersedes Amazon CodeWhisperer) is a chatbot integrated into a developer’s IDE, and it provides a host of features across the software development lifecycle. Enhancements announced at re:Invent include agents that automate unit testing, documentation, and code reviews and have the capability to help users address operational issues in a fraction of the time. Amazon Q Developer can automate the upgrading of legacy Java code to newer versions and help transform applications in IBM z/OS mainframe migrations by automating analysis, refactoring code, and generating code documentation.
Amazon Q Developer is also available in Amazon SageMaker Canvas, a no code tool for building machine learning applications with SageMaker. Amazon Q Business is available in Amazon QuickSight, the business analytics tool, and it allows business analysts to use natural language prompts to perform business analytics.
Analysis of Amazon Q Developer will appear in Omdia’s forthcoming Omdia Universe assessment of no code, low code, and professional code assistants, to appear in 2Q25.
Amazon chips anticipate trillion parameter LLMs
At the re:Invent 2024 keynote, Peter Desantis, senior vice president of AWS Utility Computing Products, anticipated the need to support LLM models with trillions of parameters. Along with including partner chip providers such as Nvidia on AWS instances, Amazon is investing in in-house developed accelerators. Desantis revealed how the latest Trainium2 chip has a data flow architecture using systolic arrays (using a custom neural network array design), which has 1.3 PFLOPS performance (at dense FP8 numeric format).
The Trainium2 package includes eight NeuronCore (third generation) compute chips and four high bandwidth memory (HBM) modules. This chip is available in the Trn2 Amazon EC2 instance, combining 16 Trainium2 chips interconnected with Amazon’s latest NeuronLink high bandwidth, low latency chip interconnect (2TBps bandwidth and 1 microsecond latency). It offers 20.8 peak PFLOPS performance in dense FP8 compute and 83.2 PFLOPS of sparse FP8 compute. The new network in the AWS cloud is called 10p10u, as it provides 10 petabits of network capacity and under 10 microseconds latency.
Amazon is running a “Build on Trainium” research and education program, invested with $110m funding, to support next-generation novel AI research in academia, exploiting Trainium accelerators. Institutions on board this program include UC Berkeley, Texas University, Carnegie Mellon University, and the University of Oxford.
In Omdia’s opinion, Amazon’s strategy of hedging its position to ensure that it meets cloud users’ demands for running AI applications is sound. Amazon builds its own state-of-the-art AI accelerator chips while also offering chips from companies in the open market, such as Nvidia.
Appendix
Further reading
Michael Azoff, “Prediction for when artificial general intelligence will happen,” LinkedIn article (November 28, 2024)