登录查看更多内容

Case Study of Kubernetes

Ganesh Chaudhari

Software Engineer at Dassault Systèmes || Frontend Engineer

发布日期: 2020年12月26日

What is Kubernetes?

Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation. It has a large, rapidly growing ecosystem. Kubernetes services, support, and tools are widely available. The name Kubernetes originates from Greek, meaning helmsman or pilot. Google open-sourced the Kubernetes project in 2014. Kubernetes combines over 15 years of Google's experience running production workloads at scale with best-of-breed ideas and practices from the community

Spotify: An Early Adopter of Containers, Spotify Is Migrating from Homegrown Orchestration to Kubernetes

Challenge

Launched in 2008, the audio-streaming platform has grown to over 200 million monthly active users across the world. "Our goal is to empower creators and enable a really immersive listening experience for all of the consumers that we have today—and hopefully the consumers we'll have in the future," says Jai Chakrabarti, Director of Engineering, Infrastructure and Operations. An early adopter of microservices and Docker, Spotify had containerized microservices running across its fleet of VMs with a homegrown container orchestration system . By late 2017, it became clear that "having a small team working on the features was just not as efficient as adopting something that was supported by a much bigger community," he says.

Solution

"We saw the amazing community that had grown up around Kubernetes, and we wanted to be part of that," says Chakrabarti. Kubernetes was more feature-rich than Helios. Plus, "we wanted to benefit from added velocity and reduced cost, and also align with the rest of the industry on best practices and tools." At the same time, the team wanted to contribute its expertise and influence in the flourishing Kubernetes community. The migration, which would happen in parallel with Helios running, could go smoothly because "Kubernetes fit very nicely as a complement and now as a replacement to Helios," says Chakrabarti.

Impact

The team spent much of 2018 addressing the core technology issues required for a migration, which started late that year and is a big focus for 2019. "A small percentage of our fleet has been migrated to Kubernetes, and some of the things that we've heard from our internal teams are that they have less of a need to focus on manual capacity provisioning and more time to focus on delivering features for Spotify," says Chakrabarti. The biggest service currently running on Kubernetes takes about 10 million requests per second as an aggregate service and benefits greatly from autoscaling, says Site Reliability Engineer James Wen. Plus, he adds, "Before, teams would have to wait for an hour to create a new service and get an operational host to run it in production, but with Kubernetes, they can do that on the order of seconds and minutes." In addition, with Kubernetes's bin-packing and multi-tenancy capabilities, CPU utilization has improved on average two- to threefold. "We saw the amazing community that's grown up around Kubernetes, and we wanted to be part of that. We wanted to benefit from added velocity and reduced cost, and also align with the rest of the industry on best practices and tools." JAI CHAKRABARTI, DIRECTOR OF ENGINEERING, INFRASTRUCTURE AND OPERATIONS, SPOTIFY

An early adopter of microservices and Docker, Spotify had containerized microservices running across its fleet of VMs since 2014. The company used an open source, homegrown container orchestration system called Helios, and in 2016-17 completed a migration from on premise data centers to Google Cloud. Underpinning these decisions, "We have a culture around autonomous teams, over 200 autonomous engineering squads who are working on different pieces of the pie, and they need to be able to iterate quickly," Chakrabarti says. "So for us to have developer velocity tools that allow squads to move quickly is really important."

But by late 2017, it became clear that "having a small team working on the Helois features was just not as efficient as adopting something that was supported by a much bigger community," says Chakrabarti. "We saw the amazing community that had grown up around Kubernetes, and we wanted to be part of that. We wanted to benefit from added velocity and reduced cost, and also align with the rest of the industry on best practices and tools." At the same time, the team wanted to contribute its expertise and influence in the flourishing Kubernetes community.

Another plus: "Kubernetes fit very nicely as a complement and now as a replacement to Helios, so we could have it running alongside Helios to mitigate the risks," says Chakrabarti. "During the migration, the services run on both, so we're not having to put all of our eggs in one basket until we can validate Kubernetes under a variety of load circumstances and stress circumstances."

The team spent much of 2018 addressing the core technology issues required for the migration. "We were able to use a lot of the Kubernetes APIs and extensibility features of Kubernetes to support and interface with our legacy infrastructure, so the integration was straightforward and easy," says Site Reliability Engineer James Wen.

Migration started late that year and has accelerated in 2019. "Our focus is really on stateless services, and once we address our last remaining technology blocker, that's where we hope that the uptick will come from," says Chakrabarti. "For statefull services there's more work that we need to do."

A small percentage of Spotify's fleet, containing over 150 services, has been migrated to Kubernetes so far. "We've heard from our customers that they have less of a need to focus on manual capacity provisioning and more time to focus on delivering features for Spotify," says Chakrabarti. The biggest service currently running on Kubernetes takes over 10 million requests per second as an aggregate service and benefits greatly from auto scaling, says Wen. Plus, Wen adds, "Before, teams would have to wait for an hour to create a new service and get an operational host to run it in production, but with Kubernetes, they can do that on the order of seconds and minutes." In addition, with Kubernetes's bin-packing and multi-tenancy capabilities, CPU utilization has improved on average two- to threefold.

Both of those technologies are in early stages of adoption, but already "we have reason to believe that gRPC will have a more drastic impact during early development by helping with a lot of issues like schema management, API design, weird backward compatibility issues, things like that," says Zolotusky. "So we're leaning heavily on gRPC to help us in that space."

As the team continues to fill out Spotify's cloud native stack—tracing is up next—it is using the CNCF landscape as a helpful guide. "We look at things we need to solve, and if there are a bunch of projects, we evaluate them equivalently, but there is definitely value to the project being a CNCF project," says Zolotusky.

Spotify's experiences so far with Kubernetes bears this out. "The community has been extremely helpful in getting us to work through all the technology much faster and much easier," Zolotusky says. "It's been surprisingly easy to get in touch with anybody we wanted to, to get expertise on any of the things we're working with. And it's helped us validate all the things we're doing."

要查看或添加评论，请登录

Ganesh Chaudhari的更多文章

Configuration of K8s Multinode Cluster over AWS by integrating ansible and terraform with dynamic inventory.

2021年2月6日

Configuration of K8s Multinode Cluster over AWS by integrating ansible and terraform with dynamic inventory.

Integration of terraform, ansible, AWS and k8s. lets understand what is terraform, ansible, AWS and k8s.
Configuration of Hadoop Cluster using Ansible

2020年11月29日

Configuration of Hadoop Cluster using Ansible

Task 11.1 Configure Hadoop cluster using Ansible 1.
AWS Cloud using AWS CLI.

2020年10月13日

AWS Cloud using AWS CLI.

AWS CSA Training Task : 2 1.Create a key pair 2.

6 条评论
Configuration of Load balancer HAPROXY Using Ansible

2020年10月9日

Configuration of Load balancer HAPROXY Using Ansible

Ansible Task-3 Deploy a Load Balancer and multiple Web servers on AWS instances using Ansible. 1.
Big Data

2020年9月17日

Big Data

What do you think, how big companies stored their customers data. You may think about big data but actually it is…

4 条评论
Deployment of Webserver on AWS using Ansible

2020年8月22日

Deployment of Webserver on AWS using Ansible

Deployment of Webserver on AWS through Ansible TASK 2 1.Provision of EC2 instance through Ansible 2.

3 条评论
Integration of Ansible with Docker

2020年8月5日

Integration of Ansible with Docker

Integration of Ansible with Docker Ansible TASK 1 : Write an Ansible playbook to perform following operations in…

2 条评论

See all articles

Case Study of Kubernetes

Ganesh Chaudhari

Software Engineer at Dassault Systèmes || Frontend Engineer

What is Kubernetes?

Spotify: An Early Adopter of Containers, Spotify Is Migrating from Homegrown Orchestration to Kubernetes

Challenge

Solution

Impact

Ganesh Chaudhari的更多文章

社区洞察

其他会员也浏览了

Data Streaming Services on AWS

Real-Time Features, Real-Time Results: Exploring Streaming Feature Stores with Kafka

From Infra-Centric IT to Dev-Centric IT: Embracing Internal Platforms for a Unified Tech Ecosystem

Data Streaming

Running a Billion Workflows a month with Netflix Conductor

Three Steps to the Consolidation of New Innovations: Google, Facebook and Spotify

Netflix & Amazon Kinesis Data Streams Case Study: Why It Remains Crucial for Them

Netflix Whole System design in 20 points

Kafka/Spark Streaming System - Telecom Case Study

StreamNative December Newsletter

What is Kubernetes?

Spotify: An Early Adopter of Containers, Spotify Is Migrating from Homegrown Orchestration to Kubernetes

Challenge

Solution

Impact

Ganesh Chaudhari的更多文章

Configuration of K8s Multinode Cluster over AWS by integrating ansible and terraform with dynamic inventory.

Configuration of Hadoop Cluster using Ansible

AWS Cloud using AWS CLI.

Configuration of Load balancer HAPROXY Using Ansible

Big Data

Deployment of Webserver on AWS using Ansible

Integration of Ansible with Docker

社区洞察

其他会员也浏览了

Data Streaming Services on AWS

Real-Time Features, Real-Time Results: Exploring Streaming Feature Stores with Kafka

From Infra-Centric IT to Dev-Centric IT: Embracing Internal Platforms for a Unified Tech Ecosystem

Data Streaming

Running a Billion Workflows a month with Netflix Conductor

Three Steps to the Consolidation of New Innovations: Google, Facebook and Spotify

Netflix & Amazon Kinesis Data Streams Case Study: Why It Remains Crucial for Them

Netflix Whole System design in 20 points

Kafka/Spark Streaming System - Telecom Case Study

StreamNative December Newsletter