EMR on EKS by Example
Photo Credit: AWS News Blog

EMR on EKS by Example

EMR on EKS?provides a deployment option for?Amazon EMR?that allows you to automate the provisioning and management of open-source big data frameworks on?Amazon EKS. While a wide range of open source big data components are available in EMR on EC2, only Apache Spark is available in EMR on EKS. It is more flexible, however, that applications of different EMR versions can be run in multiple availability zones on either EC2 or Fargate. Also other types of containerized applications can be deployed on the same EKS cluster. Therefore, if you have or plan to have, for example,?Apache Airflow,?Apache Superset?or?Kubeflow?as your analytics toolkits, it can be an effective way to manage big data (as well as non-big data) workloads. While Glue is more for ETL, EMR on EKS can also be used for other types of tasks such as machine learning. Moreover it allows you to build a Spark application, not a?Gluish?Spark application. For example, while you have to use custom connectors for?Hudi?or?Iceberg?for Glue, you can use their native libraries with EMR on EKS. In this post, we’ll discuss EMR on EKS with simple and elaborated examples.

Set up Amazon EMR on EKS

As described in the?Amazon EMR on EKS development guide, Amazon EKS uses Kubernetes namespaces to divide cluster resources between multiple users and applications. A virtual cluster is a Kubernetes namespace that Amazon EMR is registered with. Amazon EMR uses virtual clusters to run jobs and host endpoints. As illustrated further below, we need to take the following steps so as to set up for EMR on EKS.

  • Enable cluster access for Amazon EMR on EKS
  • Create an IAM OIDC identity provider for the EKS cluster
  • Create a job execution role
  • Update the trust policy of the job execution role
  • Register Amazon EKS Cluster with Amazon EMR

Continue...

要查看或添加评论,请登录

Jaehyeon Kim的更多文章

社区洞察

其他会员也浏览了