AWS  case studies

AWS case studies

Amazon Web Services (AWS)

AWS (Amazon Web Services) is a comprehensive, evolving cloud computing platform provided by Amazon that includes a mixture of infrastructure as a service (IaaS), platform as a service (PaaS) and packaged software as a service (SaaS) offerings. AWS services can offer an organization tools such as compute power, database storage and content delivery services.

No alt text provided for this image

AWS launched in 2006 from the internal infrastructure that Amazon.com built to handle its online retail operations. AWS was one of the first companies to introduce a pay-as-you-go cloud computing model that scales to provide users with compute, storage or throughput as needed.

AWS offers many different tools and solutions for enterprises and software developers that can be used in data centers in up to 190 countries. Groups such as government agencies, education institutions, nonprofits and private organizations can use AWS services.

How AWS works

AWS is separated into different services; each can be configured in different ways based on the user's needs. Users should be able to see configuration options and individual server maps for an AWS service.

More than 100 services comprise the Amazon Web Services portfolio, including those for compute, databases, infrastructure management, application development and security.

No alt text provided for this image

Case study of Netflix

Netflix on AWS

No alt text provided for this image

Netflix is the world’s leading internet television network, with more than 100 million members in more than 190 countries enjoying 125 million hours of TV shows and movies each day. Netflix uses AWS for nearly all its computing and storage needs, including databases, analytics, recommendation engines, video transcoding, and more—hundreds of functions that in total use more than 100,000 server instances on AWS.

Online content provider Netflix can support seamless global service by using Amazon Web Services (AWS). AWS enables Netflix to quickly deploy thousands of servers and terabytes of storage within minutes. Users can stream Netflix shows and movies from anywhere in the world, including on the web, on tablets, or on mobile devices such as iPhones.

This video is about how Netflix Delivers Billions of Hours of Content Globally by Running on AWS


How does Netflix operate on AWS?

Netflix has been on AWS since a devastating fire destroyed their own datacenter in 2010. By 2015, their Cloud migration was complete and, thanks to AWS, the scale they have achieved has been outstanding.

Josh Evans – Director of Operations Engineering at Netflix described the Netflix’s microservices architecture as a living organism, with critical components, internal flows, and failures. The infrastructure is composed of hundreds of completely decoupled and independent microservices involving thousands of daily production changes to many thousands of AWS instances.

Josh identifies two main challenges to achieving operational excellence:

Product innovation

In order to offer the best user experience – and therefore win their customers’ “moments of truth” (i.e., get them to watch more video content) – Netflix has to move and change fast.

Their innovation strategy involves the massive use of A/B tests on every facet of the product. During the last year, they ran more than 1,400 experiments (meaning at least 25 experiments running in parallel every day). Of course, the goal is to increase user engagement, and this explains why each user’s Netflix experience is sort of unique, both because of the customized recommendations they’re shown, and the unique combination of experiments.

Scale and complexity

Netflix currently handles hundreds of thousands of requests per second from about 60 countries. Their infrastructure runs multi-zone and multi-region, serving users from three different AWS regions. The only component running outside of AWS is their Netflix CDN, which currently covers about 37% of US Internet traffic.

Operations Engineering

Achieving operational excellence also involves a tough tradeoff between availability and rate of change (i.e. quality versus speed). Netflix is keen on trading some of their availability to enable fast change, and they approach the problem by means of continuous improvement of management, design, and function of operational environments. This kind of approach leads to greater quality, velocity, and competitive advantage.

The culture behind this choice can be summarized as “You build it, you run it”. It means 100% ownership, starting from designing, coding, building, testing and deploying…all the way to operating, configuring, monitoring, and responding (while doing it all globally!). They built their own software tools to enable this approach, like Spinnaker, Eureka, Hystrix, Atlas, and Vector .

These tools are based on software engineering standards and advanced technologies:

  • Anomaly detection: to identify anomalous patterns on short windows of time series events.
  • Outlier detection and remediation: via unsupervised machine learning and clustering techniques.
  • Canary release process: new versions of the software are available to a small percentage of the traffic, with automatic canary analysis.
  • Unsupervised monitoring and decision making: take humans out of the equation and provide automatic alerts.

Chaos Engineering

Another important component of Netflix’s approach is chaos engineering. Being aware that components are going to fail, they work hard on building confidence in the system’s capability to withstand turbulent conditions (directly in production). You can find their SimianArmy on Github. By using FIT (Fault-injection Testing) they can simulate service failures, both on an instance- and region-level.

Netflix Keystone

Director of Engineering at Netflix, Peter Bakas – after proudly taking a picture of the crowd – explained how Netflix handles data streams of up to 8 million events per second.

No alt text provided for this image

Keystone handles about 550 billion events every day (more than 8 million events per second) and manipulates more than one petabyte of data, composed of hundreds of event types. Their data pipeline solution is based on open source projects, such as Apache Kafka, Apache Chukwa, and Apache Samza, besides Docker and MySQL.

Netflix Core Team

Dave Hahn talked about how it feels and how it is possible for a few DevOps engineers to handle more than 37% of the US Internet. His team – the CORE team (Cloud Operations Reliability Engineering) – is responsible for crisis management, availability reporting, reliability best practices, AWS relationship, and operations education. It is mainly composed of crisis leaders and its goals are the following:

  • Protect customer experience. This is crucial at Netflix and is the key point of each operation.
  • Make failures unique. This means making errors happen only once, by identifying the real root of each problem and fixing it.
  • Achieve constant improvement. This takes a lot of individual effort and can be helped along by incident reviews and by encouraging honest and open feedback.

Dave described the DevOps culture they have built based on the 100% ownership concept and made easier by the many tools developed for software engineers to enable easy ownership, including service discovery, solid communication, automated recovery, continuous deployment, and data persistence.

Insights are a key factor as well: Netflix records about 2.5 billion metrics every day and needed in-house tools to help them visualize and analyze relevant patterns, via prediction and automation.

Netflix & Amazon Kinesis Data Streams Case Study

Amazon Kinesis Data Streams processes multiple terabytes of log data each day, yet events show up in our analytics in seconds. We can discover and respond to issues in real time, ensuring high availability and a great customer experience.

Application Monitoring on a Massive Scale

Netflix uses Amazon Web Services (AWS) for nearly all its computing and storage needs, including databases, analytics, recommendation engines, video transcoding, and more—hundreds of functions that in total use more than 100,000 server instances on AWS.

This results in an extremely complex and dynamic networking environment where applications are constantly communicating inside AWS and across the Internet. Monitoring and optimizing its network is critical for Netflix to continue improving customer experience, increasing efficiency, and reducing costs. In particular, Netflix needed a solution for ingesting, augmenting, and analyzing the multiple terabytes of data its network generates daily in the form of virtual private cloud (VPC) flow logs. This would enable Netflix to identify performance-improvement opportunities, such as identifying apps that are communicating across regions and collocating them. The company would also be able to increase uptime by quickly detecting and mitigating application downtime.

Each log record carries information about the communications between two IP addresses. However, in a dynamic environment like the one at Netflix, where an IP address can float between applications from day to day or even minute to minute, IP addresses alone don’t have much meaning. “The data sources we had before we took on this initiative were one sided,” says John Bennett, senior software engineer at Netflix. “We’d know an application was connecting to others, but we didn’t know both sides of the conversation and how to optimize those communications or the placement of the applications on the network.”

Netflix set out to establish a new data source that could give it more insight into communication among applications and regions by combining VPC flow logs with application metadata.

Centralizing Flow Logs Using Amazon Kinesis Data Streams

From the outset, AWS enabled Netflix to experiment with different approaches to analyzing its network data. “Early in the design process, the flexibility to try different ways of processing the data was important,” says Bennett. “We experimented with multiple designs and used many AWS products to get here.”

The solution Netflix ultimately deployed—known internally as Dredge—centralizes flow logs using Amazon Kinesis Data Streams. The application reads the data from Amazon Kinesis Data Streams in real time and enriches IP addresses with application metadata to provide a full picture of the networking environment. “Usually, we would put the data into a database, which would build an index to enable faster querying,” says Bennett. “Dredge joins the flow logs with application metadata as it streams and indexes it without using a database, which eliminates a lot of the complexity.”

The enriched data lands in an open-source analytics application called Druid. Netflix uses the OLAP querying functionality of Druid to quickly slice data into regions, availability zones, and time windows to visualize it and gain insight into how the network is behaving and performing.

AWS was the logical choice for Dredge in part because the data was already resident in the AWS Cloud. “It would have been daunting to publish, stream, and consume that much information from an external system such as Kafka,” says Bennett. “It took just a few API calls to centralize multiple terabytes of flow logs into Amazon Kinesis Data Streams. Now we can focus on getting insights from the data rather than simply getting access to it.

No alt text provided for this image

The scalability of Amazon Kinesis Data Streams was a good fit for the Dredge application because of the cyclical and elastic nature of network usage at Netflix. “When it comes to our networking data, it’s more cost efficient to be able to scale up and down, which is not as easy to do with alternatives to Amazon Kinesis Data Streams,” says Bennett. 

Improving Customer Experience with Real-Time Network Monitoring

Netflix’s Amazon Kinesis Data Streams-based solution has proven to be highly scalable, each day processing billions of traffic flows. Typically, about 1,000 Amazon Kinesis shards work in parallel to process the data stream. “Amazon Kinesis Data Streams processes multiple terabytes of log data each day, yet events show up in our analytics in seconds,” says Bennett. “We can discover and respond to issues in real time, ensuring high availability and a great customer experience.

Netflix is now able to identify new ways to optimize its applications, whether that means moving an application from one region to another or changing to a more appropriate network protocol for a specific type of traffic. “Our solution built on Amazon Kinesis enables us to identify ways to increase efficiency, reduce costs, and improve resiliency for the best customer experience,” says Bennett.

Although a streaming data solution is not new to the IT industry, it is an innovation in the networking space. “Netflix is heavily invested in AWS in part because it abstracts the underlying network, so we don’t have to deal with switches and routers,” says Bennett. “We’re monitoring, analyzing, and optimizing at a higher level of the stack—in ways we would never even consider if we were running our own data centers.”

About Company Name

Netflix is the world’s leading internet television network, with more than 100 million members in more than 190 countries enjoying 125 million hours of TV shows and movies each day.

Benefits of AWS

  • Processes and enriches multiple terabytes each day, representing billions of events, with sub-second response times for analytics queries
  • Highly cost efficient compared to competing solutions
  • Freedom to experiment with system architecture to arrive at the most effective solution
  • Data ingestion initiated with just a few simple API calls
  • Highly elastic solution with close to 1,000 Amazon Kinesis shards working in parallel

Case Study on Verizon:

Verizon Teams Up With AWS to Deliver 5G at the Edge :

Verizon Communications Inc. is a leading telecommunications company and mobile carrier, with the largest subscriber base in the US. At re:Invent 2019, Amazon Web Services (AWS) and Verizon announced a partnership that will bring the power of the cloud closer to mobile and connected devices at the edge of Verizon’s 5G Ultra Wideband network. In the video below, hear from Srinivasa Kalapala, Verizon's vice president of technology strategy development, about the company's collaboration with AWS. “When you’re trying to change the world, you really want to bring the best of the network and the best of the cloud together,” Kalapala says.

Amazon Web Services (AWS) is powering the future of telecommunications. Leading communications service providers (CSPs) run more workloads on AWS than any other cloud provider. By partnering with AWS, CSPs not only accelerate their data center consolidation and migration to the cloud, but monetize their path to 5G by offering customers next-generation capabilities in mobile edge computing and IoT. With a rich catalog of cloud-deployable partner solutions, AWS enhances the customer experience through machine learning and AI and accelerate business process automation to drive operational efficiency. Now, more than ever, CSPs can leverage the most advanced technologies on the market to create new revenue streams and focus on what sets their business apart.

Benefits

Accelerate digital transformation and data center consolidation

Accelerate digital transformation and data center consolidation to optimize performance, lower IT costs, strengthen security posture, and free up investment capital.

Monetize the path to 5G with mobile edge computing and IoT

Leverage 5G and mobile edge computing to bring next-generation capabilities to smart devices and networks, enabling monetization of various IoT applications.

Enhance the customer experience with machine learning and AI

Improve the customer experience by providing exceptional service and personalized customer care enabled by machine learning/AI and predictive analytics.

Automate business processes to drive efficiency

Engage a rich catalog of technology partner solutions to automate critical business processes and drive operational efficiency.

Thank you );

Szilard N Vegas

Founder and CEO at Wholesale Hotels Group

4 年

??

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了