Brief of Serverless Keynote by Peter DeSantis @ AWS re:Invent 2023
Amit Juneja
Business & Strategy Leader | Thinker | Creator | General Manager & Global Client Executive at Wipro
What is the promise of Serverless?
Serverless removes the muck of caring for servers.
What are the key attributes of a cloud computing service?
Elasticity, Security, Performance, Cost, Availability, Sustainability
How does Serverless address this?
Elastic - Because it runs over vast infrastructure and shares the capacity across larger number of customers
More cost effective - pay for what you use
Sustainability - Most efficient power is the power you do not use
Security & Performance - These are built from grounds up they are more secure and available taking advantage of native capabilities like AWS Nitro and Availability Zone Architecture
Why is everything not Serverless?
Familiarity and Legacy Code - Change over short term is hard. Over long term it is inevitable. Mainframe is hard and expensive to move out. Developers need time to learn new systems and approaches
Richness of Capability - Initially more targeted limited capabilities were available on AWS. Started off with only Simple Storage service and Simple Queue Service in 2006. DynamoDB added transactions in 2018 made it much better for replacing RDBMS. Added more capability like EFS Elastic File System, Lambda, Fargate server less containers.
Customers love that AWS innovates on the tools and softwares that they use today. That is why there is availability of several managed softwares
AWS's goal is to bring the value of Serverless computing to the Serverful Software
Where else to start than Relational Database.
2009 RDS Amazon Relational Database Service, EC2 removes the muck of operating a server, RDS removes that for database. How do we go now from managed offering to Serverless?
Amazon Aurora has lot of innovation underpinning it. The biggest innovation of Aurora is Internal Optimized Database storage system called as Grover.?
Grover disaggregates storage from DBMS.?
In relational database Log is the record of everything that has happened inside the database. Database engine logs every step it takes using technique called write ahead logging. If you have the log you have the database.
Rather than logging locally, Aurora sends the log entries to Grover which ensures durability and availability by replicating them to Multi AZ.
Grover also processes the log. Creates identical copy of database’s internal memory system on the remote system. These data structures can be sent back to the database so that they can be loaded anytime. So now Aurora does not need to write its dirty pages to memory instead it logs to grover.
Writing log is significantly faster than writing memory pages. Writing a log involves small amount of sequential IO. Grover can significantly reduce the IO demands by unto 80% so you get 3-5x price performance over open source managed databases.
Aurora also gives Multi AZ without setting up replication. You can relaunch using the logs. You can scale with read only database.
With this AWS made Aurora more serverless than serverful.?
However it is still not serverless because if you have to update the primary database you need more write capacity so you have to failover if you have to do this.
Aurora Serverless was launched in 2018 which could grow or shrink without failover.
How did Aurora achieve scales up and down seamlessly?
AWS does not consider any other mechanism other than Hypervisor to share server resources. Processes aren’t adequate security boundary, containers are also processes under the cover. Here comes AWS Nitro Hypervisor. Database -> Guest OS -> Hypervisor -> Physical Host
So even if Physical host has memory but the guest is configured with particular memory so the only option is to reboot. So Nitro does not help.
So a new approach was needed which led to Caspian Hypervisor - Cooperative Oversubscription
领英推荐
Now the guest OS is allocated full memory of the physical host, unlike Nitro these are not allocated to the hypervisor on the physical host. Physical is allocated based on the instance. Controlled by Caspian heat management system. So if the host has 256GB then all the Guest OS believe they have 256GB but it must ask the Hypervisor. You can run multiple databases and each believes they have the full resources.
Now What happens when you need physical memory. It will migrate one of the instances to another physical host. High bandwidth low jitter networking by EC2.
Caspian Heat management system continuously monitors and predicts which database will require more capacity.
What if we exceed the limits of a physical database?
Announcing the launch of Aurora Limitless Database, Scaled beyond the limits of a physical server.
Aurora automatically distributes across shards and you can configure it to co-locate rows from different tables into the same shard.
Request Routing Layer - Routers need little information, lightweight.
Each router is a aurora database so we can orchestrate complex queries across shards and combine the outputs
Managings the shards - Full elastic shards
Each shard runs on Caspian. Once each shard runs max capacity it can be split into 2 new shards with the help of Grover seamlessly and update the routing layer.
How do we achieve distributed transactions across Shards?
Amazon Time Sync now gives UTC with microseconds latency, Now the distributed transactions can be run efficiently.
Nitro chips have custom hardware that have time synchronization based on a time pulse delivered by a custom custom time synchronization network.?
Time sync racks have a specialized reference clock that receives precise timing signal from a Statellite based atomic clock. Few nanoseconds. Each has a atomic clock to keep time in case satellite is unavailable.
Specialized nitro cards in EC2 server, and this is done in hardware.
Elastic Cache
RDBMS is not the only server less thing that Amazon is investing in. Cache is another technology for critical for cost effectively scaling services. Amazon Elastic Cache is a managing caching service. Caches are tied to the servers that host them. Now Launching Elastic Cache Serverless
There is no infra to manage. Key features is speed. Look up speed is half millisecond. It can scale to unto 5TB of memory.
Redshift Serverless AI driven scaling and optimization
ML powered forecasting model trained on the data of the data warehouse and adjust capacity? based on the anticipated query load.
However there is always surprise so there should be an ability to react in time. AI is used again to now analyze each query. The query analyzer creates a feature embedding of each query using over 50 features like types of joins, dataset statistics better identify the complexity of the query. About 80% of queries on DWH are seen before so these embeddings can be used to look up information about the query. But the remaining 20% there is a second model trained on the client DWH. In terms of the flow first check is the local model then the global model.
So it is like same query before, similar query before, finally anything like this before.
Riot Games Customer Testimonial - Brent Rich head of global infrastructure and operations
2009 league of legends launch was colocated data center because they did not believe anybody could meet their bar for making live games great.
2017 decision was taken to move to cloud as it was taking forever to getting things done. AWS was chosen as they had all the services they need. Also, all new things will be born in the cloud.
2020 global launch of Valorant - Tactical shooter game had specific design goals. One of that was Peekers Advantage. Peeker’s advantage occurs because the the player who peeks first gets an advantage because of the lag it takes for the event to go to the server and then to the client of the other player.?
Riot Games determined that if the server tick rate is 128 ticks per second and network latency of 35 millisecond then this could be mitigated. 128 reflects the update from server to all the players 128 per second.
With AWS, Riot was able to launch Valorent without any long term commits using AWS Regions and Outposts.
eSports in pandemic came to a standstill. How do we produce and encode events instead of all the staff in cramped trucks. AWS enabled this on the cloud. So the events and production were separated. All this was done in 11 days.
Outro
Comments, Corrections & Updates are appreciated.