To Process 25GB of data in Spark.

To Process 25GB of data in Spark.

To Process 25GB of data in Spark.

  1. How many CPU Cores are required?
  2. How many Executors are required?
  3. How much each executor memory is required?
  4. What is the total memory required?

How many executor CPU cores are required to process 25GB of data?        

Reverse Engineering

25GM = 25*1024MB = 25600MB

Number of Partitions = 25600MB/128MB = 200

Number of CPU cores = Number of Partitions = 200

Note: By default spark creates one partition for each block of the file (blocks being 128mb by default in HDFS). But you can also ask for a higher number of partitions by passing a target value.


How many executors are required to process 25GM Data?        

Note: To get the better job performance in spark, researchers have found that we can take 2 to 5 maximum core for each executors.

Avg CPU cores for each executor = 4

Total number of executor : 200/4 = 50


How much each executor memory is required to process 25 GM of data?        

Note: Expected memory for each core = Minimum 4*(Default Partition Size) = 4*128 MB = 512 MB

Expected memory is not less than 1.5 times of spark reserved memory (Single core executor memory should not be less than 450MB)

CPU Cores for each executor = 4

Memory for each executor = 4*512MB = 2GB


What is the total memory required to process 25 GB of data?        

Total number of executor = 50

Memory for each executor = 2GB

Total Memory for all the executor = 50*2 GB = 100 GB


Chandra Prakash Yadav

Senior Data Engineer @ Rakuten | Data Migration, Lakehouse Building

5 个月

Please Correct the typo , it should be 128 instead of 126 Venkata Polepalli

Deepak Kumar Nayak

Data Engineer | Spark | SQL | Python | AWS | Databricks

5 个月

Here we have taken CPU cores for each executor as 4 and no. of executors as 50. Is it correct that both of these values can be tuned by us to make the job fast / slow if resources are available.

要查看或添加评论,请登录

Venkata Polepalli的更多文章

社区洞察

其他会员也浏览了