Spark memory configuration approach
Sai Prabhanj Turaga
Data Engineer | Specialized in BigData, GCP, AWS, AI, Data modeling, NoSQL, Snowflake, DBT
Key Components
- Spark Master, Worker and Executor JVM’s: SparkMaster and Worker JVM’s are the resource managers. All worker JVM’s will register themselves with SparkMaster. They are very small. Example: Master use like 500MB of RAM & Worker uses like 1GB RAM.
- Master: Master JVM’s job is to decide and schedule the launch of Executor JVM’s across the nodes. But, Mater will not launch executors.
- Worker: It is the Worker who heartbeats with Master and launches the executors as per schedule.
- Executor: Executor JVM has generic slots where tasks run as threads. Also, all the data needed to run a task is cached within Executor memory.
- Driver: When we start our spark application with spark submit command, a driver will start and that driver will contact spark master to launch executors and run the tasks. Basically, Driver is a representative of our application and does all the communication with Spark.
- Task: Task is the smallest unit of execution which works on a partition of our data. Spark actually calls them cores. —executor-cores setting defines number of tasks that run within the Executor. For example, if we have set —executor-cores to six, then we can run six simultaneous threads within the executor JVM.
- Resilience: Worker JVM’s work is only to launch Executor JVM’s whenever Master tells them to do so. If Executor crashes, Worker will restart it. If Worker JVM crashes, Master will start it. Master will take care of driver JVM restart as well. But then, if a driver restarts, all the Ex’s will have to restart.
- Flexible Distribution of CPU resources: By CPU resources, We are referring to the tasks/threads running within an executor. Let’s assume that the second machine in the cluster has lot more ram and cpu resources. Can we run more threads in this second machine? Yes! You can do that by tweaking spark-env.sh file and set SPARK_WORKER_CORES to 10 in the second machine. The same setting if set to 6 in other machines, then master will launch 10 threads/tasks in that second machine and 6 in the remaining one’s. But, you could still oversubscribe in general. SPARK_WORKER_CORES tells worker JVM as to how many cores/tasks it can give out to its underlying executor JVM’s.
Following list captures some recommendations to keep in mind while configuring them:
· Hadoop/Yarn/OS Deamons: When we run spark application using a cluster manager like Yarn, there’ll be several daemons that’ll run in the background like NameNode, Secondary NameNode, DataNode, JobTracker and TaskTracker. So, while specifying num-executors, we need to make sure that we leave aside enough cores (~1 core per node) for these daemons to run smoothly.
Yarn ApplicationMaster (AM): ApplicationMaster is responsible for negotiating resources from the ResourceManager and working with the NodeManagers to execute and monitor the containers and their resource consumption. If we are running spark on yarn, then we need to budget in the resources that AM would need (~1024MB and 1 Executor).
HDFS Throughput: HDFS client has trouble with tons of concurrent threads. It was observed that HDFS achieves full write throughput with ~5 tasks per executor . So it’s good to keep the number of cores per executor below that number.
MemoryOverhead: Following picture depicts spark-yarn-memory-usage.
Two things to make note of from this picture:
- Full memory requested to yarn per executor =spark-executor-memory + spark.yarn.executor.memoryOverhead.
- spark.yarn.executor.memoryOverhead = Max(384MB, 7% of spark.executor-memory)
So, if we request 20GB per executor, AM will actually get 20GB + memoryOverhead = 20 + 7% of 20GB = ~23GB memory for us.
Running executors with too much memory often results in excessive garbage collection delays.
Running tiny executors (with a single core and just enough memory needed to run a single task, for example) throws away the benefits that come from running multiple tasks in a single JVM.
Enough theory.. Let’s go hands-on..
Now, let’s consider a 10 node cluster with following config and analyse different possibilities of executors-core-memory distribution:
**Cluster Config:**
10 Nodes,16 cores per Node, 64GB RAM per Node
First Approach: Tiny executors [One Executor per core]:
Tiny executors essentially means one executor per core. Following table depicts the values of our spar-config params with this approach:
- `--num-executors` = `In this approach, we'll assign one executor per core
= `total-cores-in-cluster`
= `num-cores-per-node * total-nodes-in-cluster` = 16 x 10 = 160
- `--executor-cores` = 1 (one executor per core)
- `--executor-memory` = `amount of memory per executor`
= `mem-per-node/num-executors-per-node`
= 64GB/16 = 4GB
Analysis: With only one executor per core, as we discussed above, we’ll not be able to take advantage of running multiple tasks in the same JVM. Also, shared/cached variables like broadcast variables and accumulators will be replicated in each core of the nodes which is 16 times. Also, we are not leaving enough memory overhead for Hadoop/Yarn daemon processes and we are not counting in ApplicationManager. NOT GOOD!
Second Approach: Fat executors (One Executor per node):
Fat executors essentially means one executor per node. Following table depicts the values of our spark-config params with this approach:
- `--num-executors` = `In this approach, we'll assign one executor per node`
= `total-nodes-in-cluster` = 10
- `--executor-cores` = `one executor per node means all the cores of the node are assigned to one executor`
= `total-cores-in-a-node = 16
- `--executor-memory` = `amount of memory per executor`
= `mem-per-node/num-executors-per-node`
= 64GB/1 = 64GB
Analysis: With all 16 cores per executor, apart from ApplicationManager and daemon processes are not counted for, HDFS throughput will hurt and it’ll result in excessive garbage results. Also,NOT GOOD!
Third Approach: Balance between Fat (vs) Tiny
According to the recommendations which we discussed above:
· Based on the recommendations mentioned above, Let’s assign 5 core per executors => --executor-cores = 5 (for good HDFS throughput)
· Leave 1 core per node for Hadoop/Yarn daemons => Num cores available per node = 16-1 = 15
· So, Total available of cores in cluster = 15 x 10 = 150
· Number of available executors = (total cores/num-cores-per-executor) = 150/5 = 30
· Leaving 1 executor for ApplicationManager => --num-executors = 29
· Number of executors per node = 30/10 = 3
· Memory per executor = 64GB/3 = 21GB
· Counting off heap overhead = 7% of 21GB = 3GB. So, actual --executor-memory = 21 - 3 = 18GB
So, recommended config is: 29 executors, 18GB memory each and 5 cores each!!
Analysis: It is obvious as to how this third approach has found right balance between Fat vs Tiny approaches. Needless to say, it achieved parallelism of a fat executor and best throughputs of a tiny executor!!
Conclusion:
We’ve seen:
· Couple of recommendations to keep in mind which configuring these params for a spark-application like:
· Budget in the resources that Yarn’s Application Manager would need
· How we should spare some cores for Hadoop/Yarn/OS deamon processes
· Learnt about spark-yarn-memory-usage
· Also, checked out and analysed three different approaches to configure these params:
1. Tiny Executors - One Executor per Core
2. Fat Executors - One executor per Node
3. Recommended approach - Right balance between Tiny (Vs) Fat coupled with the recommendations.
--num-executors, --executor-cores and --executor-memory.. these three params play a very important role in spark performance as they control the amount of CPU & memory your spark application gets. This makes it very crucial for users to understand the right way to configure them.