Spark memory configuration approach

Key Components

  • Spark Master, Worker and Executor JVM’s: SparkMaster and Worker JVM’s are the resource managers. All worker JVM’s will register themselves with SparkMaster. They are very small. Example: Master use like 500MB of RAM & Worker uses like 1GB RAM.
  • Master: Master JVM’s job is to decide and schedule the launch of Executor JVM’s across the nodes. But, Mater will not launch executors.
  • Worker: It is the Worker who heartbeats with Master and launches the executors as per schedule.
  • Executor: Executor JVM has generic slots where tasks run as threads. Also, all the data needed to run a task is cached within Executor memory.
  • Driver: When we start our spark application with spark submit command, a driver will start and that driver will contact spark master to launch executors and run the tasks. Basically, Driver is a representative of our application and does all the communication with Spark.
  • Task: Task is the smallest unit of execution which works on a partition of our data. Spark actually calls them cores. —executor-cores setting defines number of tasks that run within the Executor. For example, if we have set —executor-cores to six, then we can run six simultaneous threads within the executor JVM.
  • Resilience: Worker JVM’s work is only to launch Executor JVM’s whenever Master tells them to do so. If Executor crashes, Worker will restart it. If Worker JVM crashes, Master will start it. Master will take care of driver JVM restart as well. But then, if a driver restarts, all the Ex’s will have to restart.
  • Flexible Distribution of CPU resources: By CPU resources, We are referring to the tasks/threads running within an executor. Let’s assume that the second machine in the cluster has lot more ram and cpu resources. Can we run more threads in this second machine? Yes! You can do that by tweaking spark-env.sh file and set SPARK_WORKER_CORES to 10 in the second machine. The same setting if set to 6 in other machines, then master will launch 10 threads/tasks in that second machine and 6 in the remaining one’s. But, you could still oversubscribe in general. SPARK_WORKER_CORES tells worker JVM as to how many cores/tasks it can give out to its underlying executor JVM’s.

Following list captures some recommendations to keep in mind while configuring them:

·        Hadoop/Yarn/OS Deamons: When we run spark application using a cluster manager like Yarn, there’ll be several daemons that’ll run in the background like NameNode, Secondary NameNode, DataNode, JobTracker and TaskTracker. So, while specifying num-executors, we need to make sure that we leave aside enough cores (~1 core per node) for these daemons to run smoothly.

Yarn ApplicationMaster (AM): ApplicationMaster is responsible for negotiating resources from the ResourceManager and working with the NodeManagers to execute and monitor the containers and their resource consumption. If we are running spark on yarn, then we need to budget in the resources that AM would need (~1024MB and 1 Executor).

HDFS Throughput: HDFS client has trouble with tons of concurrent threads. It was observed that HDFS achieves full write throughput with ~5 tasks per executor . So it’s good to keep the number of cores per executor below that number.

MemoryOverhead: Following picture depicts spark-yarn-memory-usage.

Two things to make note of from this picture:

  •  Full memory requested to yarn per executor =spark-executor-memory + spark.yarn.executor.memoryOverhead.
  • spark.yarn.executor.memoryOverhead = Max(384MB, 7% of spark.executor-memory)

So, if we request 20GB per executor, AM will actually get 20GB + memoryOverhead = 20 + 7% of 20GB = ~23GB memory for us.

Running executors with too much memory often results in excessive garbage collection delays.

Running tiny executors (with a single core and just enough memory needed to run a single task, for example) throws away the benefits that come from running multiple tasks in a single JVM.

Enough theory.. Let’s go hands-on..

Now, let’s consider a 10 node cluster with following config and analyse different possibilities of executors-core-memory distribution:

**Cluster Config:**

10 Nodes,16 cores per Node, 64GB RAM per Node

First Approach: Tiny executors [One Executor per core]:

Tiny executors essentially means one executor per core. Following table depicts the values of our spar-config params with this approach:

- `--num-executors` = `In this approach, we'll assign one executor per core

                = `total-cores-in-cluster`

                = `num-cores-per-node * total-nodes-in-cluster`  = 16 x 10 = 160

- `--executor-cores` = 1 (one executor per core)

- `--executor-memory` = `amount of memory per executor`

                    = `mem-per-node/num-executors-per-node`

                    = 64GB/16 = 4GB


Analysis: With only one executor per core, as we discussed above, we’ll not be able to take advantage of running multiple tasks in the same JVM. Also, shared/cached variables like broadcast variables and accumulators will be replicated in each core of the nodes which is 16 times. Also, we are not leaving enough memory overhead for Hadoop/Yarn daemon processes and we are not counting in ApplicationManager. NOT GOOD!

Second Approach: Fat executors (One Executor per node):

Fat executors essentially means one executor per node. Following table depicts the values of our spark-config params with this approach:

- `--num-executors` = `In this approach, we'll assign one executor per node`

                   = `total-nodes-in-cluster` = 10

- `--executor-cores` = `one executor per node means all the cores of the node are assigned to one executor`

                    = `total-cores-in-a-node = 16

- `--executor-memory` = `amount of memory per executor`

                    = `mem-per-node/num-executors-per-node`

                    = 64GB/1 = 64GB

Analysis: With all 16 cores per executor, apart from ApplicationManager and daemon processes are not counted for, HDFS throughput will hurt and it’ll result in excessive garbage results. Also,NOT GOOD!

Third Approach: Balance between Fat (vs) Tiny

According to the recommendations which we discussed above:

·        Based on the recommendations mentioned above, Let’s assign 5 core per executors => --executor-cores = 5 (for good HDFS throughput)

·        Leave 1 core per node for Hadoop/Yarn daemons => Num cores available per node = 16-1 = 15

·        So, Total available of cores in cluster = 15 x 10 = 150

·        Number of available executors = (total cores/num-cores-per-executor) = 150/5 = 30

·        Leaving 1 executor for ApplicationManager => --num-executors = 29

·        Number of executors per node = 30/10 = 3

·        Memory per executor = 64GB/3 = 21GB

·        Counting off heap overhead = 7% of 21GB = 3GB. So, actual --executor-memory = 21 - 3 = 18GB

So, recommended config is: 29 executors, 18GB memory each and 5 cores each!!

Analysis: It is obvious as to how this third approach has found right balance between Fat vs Tiny approaches. Needless to say, it achieved parallelism of a fat executor and best throughputs of a tiny executor!!

Conclusion:                                                                                               

We’ve seen:

·        Couple of recommendations to keep in mind which configuring these params for a spark-application like:

·        Budget in the resources that Yarn’s Application Manager would need

·        How we should spare some cores for Hadoop/Yarn/OS deamon processes

·        Learnt about spark-yarn-memory-usage

·        Also, checked out and analysed three different approaches to configure these params:

1.      Tiny Executors - One Executor per Core

2.      Fat Executors - One executor per Node

3.      Recommended approach - Right balance between Tiny (Vs) Fat coupled with the recommendations.

--num-executors, --executor-cores and --executor-memory.. these three params play a very important role in spark performance as they control the amount of CPU & memory your spark application gets. This makes it very crucial for users to understand the right way to configure them.


要查看或添加评论,请登录

Sai Prabhanj Turaga的更多文章

  • Basics of Kafka Architecture

    Basics of Kafka Architecture

    Kafka is a publish-subscribe based durable messaging system exchanging data between processes, applications, and…

  • Spark Driver out of memory

    Spark Driver out of memory

    A driver in Spark is the JVM where the application’s main control flow runs. More often than not, the driver fails with…

    1 条评论
  • Performance tuning in hive

    Performance tuning in hive

    Performance plays key role in big data related projects as they deals which huge amount of data. So when you are using…

  • Importing data having column names with spaces and clob\blob\nvarchar datatype using sqoop to avro

    Importing data having column names with spaces and clob\blob\nvarchar datatype using sqoop to avro

    In few databases we might have columns names including spaces in them, so while importing data from those databases…

  • Have a look about NoSQL Databases

    Have a look about NoSQL Databases

    What is NoSQl NoSQL is a non-relational database management system. It is designed for distributed data stores where…

    2 条评论
  • Catalyst Optimizer in Apache Spark

    Catalyst Optimizer in Apache Spark

    Most of the power of Spark SQL comes due to Catalyst optimizer, so let’s have a look into it Catalyst optimizer has two…

社区洞察

其他会员也浏览了