登录查看更多内容

EMR Serverless

VirtueTech Inc.

End-to-End Technology, Business, and Digital Solutions

发布日期: 2023年1月9日

As we all know, EMR is a cluster managed platform which handles big data frameworks like Apache Hadoop and Apache Spark. EMR has a lot of cluster and configuration management, which is tough to maintain. One of the main reasons being, slower cluster creation and termination (taking 12-15mins) and problems while creating the cluster.

?As a solution, the EMR Serverless came into existence which reduces a lot of management work and concentrates on the data processing. Some of the advantages being- Optimized Configuration, improved application security & efficiency, auto/manual application start & stop while triggering the jobs etc. Developers don't fac any difficulty in managing the cluster except the development difficulties which are faced all the time and are entirely different from EMR launching difficulties.

Basic requirements for EMR serverless-

1. EMR Studio

2. AWS IAM Execution Role

3. AWS S3 for storing EMR serverless generated logs.

4. Default Spark Properties

??????????????a. --conf spark.emr-serverless.driverEnv.PYSPARK_DRIVER_PYTHON=./environment/bin/python

??????????????b. --conf spark.emr-serverless.driverEnv.PYSPARK_PYTHON=./environment/bin/python

??????????????c. --conf spark.executorEnv.PYSPARK_PYTHON=./environment/bin/python

5. As EMR Clusters run within a VPC, security groups & VPC must also be created/assigned even while creating EMR serverless???????????

Implementation:

1. Creation/Deletion of the EMR application which is managed by AWS.

领英推荐

What is Apache Spark? The Big Data Platform That…

TechScope 9 个月前

Unleashing the Power of Apache Spark: Revolutionizing…

Anthill 8 个月前

Apache Spark: Key Advantages Over Hadoop and the Power…

Omar Khaled 4 个月前

2. Submit the jobs.

Like EMR cluster, EMR Serverless can also be orchestrated via AWS MWAA or AWS Step Functions.

Orchestration from AWS MWAA as mentioned below-

1. AWS provides operators which are used for creation/deletion of the applications and submission of the jobs.

2. MWAA version 2.2.2 is required for EMR Serverless and it doesn't work with version 1.12.0

4. Below input parameters are to be entered for MWAA dag

??????????????a. Application ID

??????????????b. Execution Role

??????????????c. Entry Point

??????????????d. Spark Submit Parameters

??????????????e. S3 Logs storage Path

5. Below spark property has to be used for using Glue as the metastore -> --conf spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory

EMR Serverless

VirtueTech Inc.

End-to-End Technology, Business, and Digital Solutions

领英推荐

VirtueTech Inc.的更多文章

社区洞察

其他会员也浏览了

Power Down Stream Relational Database Aurora Postgres from Apache Hudi Transactional Data Lake with CDC| Step by Step Guide

Discovering the Magic of Big Data with MapReduce, Spark, and (SQL) Hive

Exploring Apache Spark: The Ultimate Guide to Big Data Mastery ??

Exploring AWS EMR (Elastic MapReduce): Evolution, Analysis, and Real-World Use Cases

Unlocking Big Data’s Potential: The Role of MapReduce, Spark, and SQL (Hive)

Unleashing the Power of Apache Spark: A Comprehensive Overview

Big Data Unveiled: Exploring MapReduce, Spark, and SQL (Hive)

Day 1 - 15Day Databricks: Spark Architecture & Internal Working Mechanism

WHAT IS SPARK?

BIG DATA,MAP REDUCE SPARK AND SQL(HIVE)

领英推荐

VirtueTech Inc.的更多文章

How to write a secure javascript code to save your website users from Hackers using XSS

A Bench Is Not For Sitting

Lazy Loading - Cost/performance optimization on the frontend for your website

How To Deal With Missing Values In A Dataset-To Build An Unbiased ML Model

FACTORS FUELING THE NEW WAVE OF DATA MANAGEMENT

社区洞察

其他会员也浏览了

Power Down Stream Relational Database Aurora Postgres from Apache Hudi Transactional Data Lake with CDC| Step by Step Guide

Discovering the Magic of Big Data with MapReduce, Spark, and (SQL) Hive

Exploring Apache Spark: The Ultimate Guide to Big Data Mastery ??

Exploring AWS EMR (Elastic MapReduce): Evolution, Analysis, and Real-World Use Cases

Unlocking Big Data’s Potential: The Role of MapReduce, Spark, and SQL (Hive)

Unleashing the Power of Apache Spark: A Comprehensive Overview

Big Data Unveiled: Exploring MapReduce, Spark, and SQL (Hive)

Day 1 - 15Day Databricks: Spark Architecture & Internal Working Mechanism

WHAT IS SPARK?

BIG DATA,MAP REDUCE SPARK AND SQL(HIVE)