登录查看更多内容

How OpenAI leverages by using Kubernetes...

Ankit Kumar

Platform Engineer @ Brevo | Kubernetes | Python | Linux | Cloud | RHCE | RHCSA

发布日期: 2020年12月26日

Okay so we've all heard this term a lot, right? But now it's time that we take a better look at it. So, let's get started.

What is Kubernetes?

In there own terms:

Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation.

-Source: kubernetes.io

Kubernetes also known as k8s or kube, was initially developed by engineers at Google before being open sourced in 2014. It is now maintained by the Cloud Native Computing Foundation.It is a descendant of Borg, a container orchestration platform used internally at Google.

Why Kubernetes?

In today's world the users expect their web based applications to be always available i.e. 24/7. But the problem arises when the developers have to deploy newer versions of those applications with bug fixes or any newly added features. Many a times they have to do it more frequently. Now this problem is tackled with the help of container technology. But still managing the containerized applications is a tedious task.

Here, Kubernetes comes into the picture. In simple terms, it basically helps you manage the containerized applications by ensuring the availability of the resources required for the applications to work.

Let's see one of the industry use case of Kubernetes that will give you a better insight.

Case Study: OpenAI

OpenAI is an AI research and deployment company, governed by the board of OpenAI Nonprofit. Their stated goal is,

"To ensure that artificial general intelligence benefits all of the humanity."

Problem statement

OpenAI needed infrastructure for deep learning that would allow experiments to be run either in the cloud or in its own data center, and to easily scale. Portability, speed, and cost were the main drivers.

Solution

OpenAI began running Kubernetes on top of AWS in 2016, and in early 2017 migrated to Azure. OpenAI runs key experiments in fields including robotics and gaming both in Azure and in its own data centers, depending on which cluster has free capacity.

"We use Kubernetes mainly as a batch scheduling system and rely on our autoscaler to dynamically scale up and down our cluster. This lets us significantly reduce costs for idle nodes, while still providing low latency and rapid iteration," says Christopher Berner, Head of Infrastructure.

The company has benefited from greater portability:

"Because Kubernetes provides a consistent API, we can move our research experiments very easily between clusters," says Berner.

Being able to use its own data centers when appropriate is lowering costs and providing them access to hardware that they wouldn't necessarily have access to in the cloud. As long as the utilization is high, the costs are much lower there. Launching experiments also takes far less time:

"One of our researchers who is working on a new distributed training system has been able to get his experiment running in two or three days. In a week or two he scaled it out to hundreds of GPUs. Previously, that would have easily been a couple of months of work."

Different teams at OpenAI currently run a couple dozen projects. While the largest-scale workloads manage bare cloud VMs directly, most of OpenAI's experiments take advantage of Kubernetes' benefits, including portability.

"Because Kubernetes provides a consistent API, we can move our research experiments very easily between clusters," says Berner.

That's all...

Thanks for reading...

Comment your thoughts on Kubernetes down below...

Till then...See you soon!!

要查看或添加评论，请登录

Ankit Kumar的更多文章

Ansible: Automating Load Balancer and Web Servers on AWS

2020年10月28日

Ansible: Automating Load Balancer and Web Servers on AWS

Today, we'll be deploying a load balancer and three web servers What's Load Balancing? Load Balancing refers to the…
Ansible: Automating Apache Web Server Configuration over AWS with Dynamic Inventory

2020年10月25日

Ansible: Automating Apache Web Server Configuration over AWS with Dynamic Inventory

Today we're going to launch an EC2 instance running on RHEL8 over AWS Cloud and configure Apache Web Server on top of…
Hadoop: Contributing specified storage on Data Node to the Cluster

2020年10月21日

Hadoop: Contributing specified storage on Data Node to the Cluster

Many times while creating Hadoop cluster, a condition arises where we don't want to contribute the entire storage…
Getting your hands dirty with AWS CLI

2020年10月17日

Getting your hands dirty with AWS CLI

What's AWS CLI? The AWS Command Line Interface (CLI) is a tool that provides access to multiple AWS services in one…
Automating Apache Web Server Configuration on Docker using Ansible

2020年9月29日

Automating Apache Web Server Configuration on Docker using Ansible

So we're going to host web pages using Apache's web server (i.e.

See all articles

How OpenAI leverages by using Kubernetes...

Ankit Kumar

Platform Engineer @ Brevo | Kubernetes | Python | Linux | Cloud | RHCE | RHCSA

What is Kubernetes?

Why Kubernetes?

Case Study: OpenAI

Problem statement

Solution

Ankit Kumar的更多文章

社区洞察

其他会员也浏览了

How to Get Started with AWS Generative AI in Just 5 Steps

Why AWS is the Best Cloud Platform for Machine Learning

Forte Spotlight: Hello from AWS re:Invent 2024

Issue #199 - THE ML ENGINEER ??

Insights from AWS re:Invent 2024

The Future of MLOps: Strategies for Scalable AI in the Cloud

Model Deployment Techniques for Machine Learning Models

Estafet Insights - Edition 9

From Kubernetes to Generative AI: The Future of Work - Harnessing the Power of MongoDB Atlas

What is Kubernetes?

Why Kubernetes?

Case Study: OpenAI

Problem statement

Solution

Ankit Kumar的更多文章

Ansible: Automating Load Balancer and Web Servers on AWS

Ansible: Automating Apache Web Server Configuration over AWS with Dynamic Inventory

Hadoop: Contributing specified storage on Data Node to the Cluster

Getting your hands dirty with AWS CLI

Automating Apache Web Server Configuration on Docker using Ansible

社区洞察

其他会员也浏览了

How to Get Started with AWS Generative AI in Just 5 Steps

Why AWS is the Best Cloud Platform for Machine Learning

Forte Spotlight: Hello from AWS re:Invent 2024

Issue #199 - THE ML ENGINEER ??

Insights from AWS re:Invent 2024

The Future of MLOps: Strategies for Scalable AI in the Cloud

Model Deployment Techniques for Machine Learning Models

Estafet Insights - Edition 9

From Kubernetes to Generative AI: The Future of Work - Harnessing the Power of MongoDB Atlas