How OpenAI leverages by using Kubernetes...
Ankit Kumar
Platform Engineer @ Brevo | Kubernetes | Python | Linux | Cloud | RHCE | RHCSA
Okay so we've all heard this term a lot, right? But now it's time that we take a better look at it. So, let's get started.
What is Kubernetes?
In there own terms:
Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation.
-Source: kubernetes.io
Kubernetes also known as k8s or kube, was initially developed by engineers at Google before being open sourced in 2014. It is now maintained by the Cloud Native Computing Foundation.It is a descendant of Borg, a container orchestration platform used internally at Google.
Why Kubernetes?
In today's world the users expect their web based applications to be always available i.e. 24/7. But the problem arises when the developers have to deploy newer versions of those applications with bug fixes or any newly added features. Many a times they have to do it more frequently. Now this problem is tackled with the help of container technology. But still managing the containerized applications is a tedious task.
Here, Kubernetes comes into the picture. In simple terms, it basically helps you manage the containerized applications by ensuring the availability of the resources required for the applications to work.
Let's see one of the industry use case of Kubernetes that will give you a better insight.
Case Study: OpenAI
OpenAI is an AI research and deployment company, governed by the board of OpenAI Nonprofit. Their stated goal is,
"To ensure that artificial general intelligence benefits all of the humanity."
Problem statement
OpenAI needed infrastructure for deep learning that would allow experiments to be run either in the cloud or in its own data center, and to easily scale. Portability, speed, and cost were the main drivers.
Solution
OpenAI began running Kubernetes on top of AWS in 2016, and in early 2017 migrated to Azure. OpenAI runs key experiments in fields including robotics and gaming both in Azure and in its own data centers, depending on which cluster has free capacity.
"We use Kubernetes mainly as a batch scheduling system and rely on our autoscaler to dynamically scale up and down our cluster. This lets us significantly reduce costs for idle nodes, while still providing low latency and rapid iteration," says Christopher Berner, Head of Infrastructure.
The company has benefited from greater portability:
"Because Kubernetes provides a consistent API, we can move our research experiments very easily between clusters," says Berner.
Being able to use its own data centers when appropriate is lowering costs and providing them access to hardware that they wouldn't necessarily have access to in the cloud. As long as the utilization is high, the costs are much lower there. Launching experiments also takes far less time:
"One of our researchers who is working on a new distributed training system has been able to get his experiment running in two or three days. In a week or two he scaled it out to hundreds of GPUs. Previously, that would have easily been a couple of months of work."
Different teams at OpenAI currently run a couple dozen projects. While the largest-scale workloads manage bare cloud VMs directly, most of OpenAI's experiments take advantage of Kubernetes' benefits, including portability.
"Because Kubernetes provides a consistent API, we can move our research experiments very easily between clusters," says Berner.
That's all...
Thanks for reading...
Comment your thoughts on Kubernetes down below...
Till then...See you soon!!