Understanding YARN (Yet Another Resource Negotiator)
For understanding YARN first, we need to understand Hadoop1/ MR1 Architecture:??
?? From Storage Perspective – HDFS??
? Name Node – Master Node (holds the metadata in the form of tables)??
? Data Node – Slave Node (holds the actual data in terms of blocks) - Block Size = 128MB (by default)??
?? From Processing Perspective -???
In hadoop1, the job execution was controlled by ‘2’ processes.??
Master – Job Tracker??
Slave – Task Tracker??
Ex: Suppose if we have a 400-node cluster then we have 400 Task Trackers running and 1 Job Tracker running.??
?? Role of Job Tracker: Used to do a lot of work in Hadoop1 (Scheduling + Monitoring)??
? Scheduling - deciding which job to execute first based on the scheduling algorithm, priority of jobs, getting to know the available resources, and providing the resources to jobs.??
? Monitoring - Tracking the progress of the Job, if a task fails ==> rerun the task. If the task is slow then based on speculative execution, start on another machine. And this used to become a very hectic task.??
For Job Tracker, there are a lot of scheduling and monitoring activities are there. If you have many tasks, a single job tracker has a lot of work to do in Hadoop1. It is the biggest pain point.??
??Role of Task Tracker: This task tracker tracks the tasks on each data node and informs the job tracker about each task.??
? Summary:??
From this, we conclude that the cluster has one master node and many data nodes. The master node has only one Job Tracker and each data node has one task tracker.??
However, the Job tracker has to do most of the work in terms of scheduling and monitoring. The task tracker just sees the mappers and reducers executing locally and reports back to the Job Tracker.??
It means a job tracker is used to do a lot of work compared to a task tracker.??
?Limitations of MR1:??
? Scalability issues with large clusters. [ It was observed that when the cluster size goes beyond 4k data nodes (Yahoo and Facebook) then the job tracker used to becomes a bottleneck]??
? Less resource utilization - Underutilizing the cluster resources.?[In MR1, there used to be a fixed number of map and reduce slots in a cluster.]??
? Only restricted to MR Jobs. [Only map reduce jobs were supported. It is not generic]??
To solve the above problems, YARN came into the picture:??
YARN:???
?? Yarn has three major components:??
1. Resource Manager (master)??
2. Node Manager (slave)??
3. Application Master.??
领英推荐
In Hadoop V1, a major bottleneck was that ‘Job Tracker’ was doing a lot of work (Scheduling + Monitoring).??
But in Hadoop V2, the Monitoring aspect was taken away from your Job Tracker. Now job tracker won’t do anything other than ‘Scheduling’. And they changed the name of Job Tracker to Resource Manager.???
Just like Task Trackers in Hadoop V1, we have Node Managers in Hadoop V2. Task Trackers are used to monitor the local map and reduce tasks. Similarly, Node Manager will manage local map & reduce tasks.??
?Job Tracker replaces Resource Manager. Task Tracker replaces Node Manager. So, who is doing Monitoring?]??
??YARN Execution Flow:??
1. Client submits a request, which goes to the Resource Manager (the master) - Holding only the Scheduling part.??
2. Now resource manager creates a container on one of the Node Managers (Slave). And ones it creates a container, it will launch the application master in that container for this job only. [ The application master will take care of end-to-end monitoring for this Job (application)]??
3. Now this application master will take care of the entire life cycle of this job.??
4. This application master negotiates with the resource manager for ‘resources’ that are required to run this job in the form of containers. [If Application Master didn’t mention any requirements, default resources are provided]???
5. Once Resource Manager allocates the containers, it will tell the ‘containerId’ and ‘Host Name (The Node Manager on which, the container is given)’. Then the role of the application master is to go to these Node Managers and use those containers to run the task. And then the application master has to manage all these things in terms of failures. [ Suppose if there is any failure then the application master needs to re-execute the tasks.]??
Understanding Each Component of YARN:??
??Resource Manager:???
? As part of scheduling, it has to ensure all resources are available in the cluster, which Node managers are alive, and so on. [ Keeps track of live node managers and available resources.]??
? It allocates available resources to appropriate applications and tasks.??
? Also manages ‘Application Master’.??
Ex: Let's say there are 100 different applications/application masters. The resource manager will just see “if the application master is running or not”. If it is stopped due to some reason or any failure of the application master, the resource manager will make sure that it starts another application master for that job.??
??Node Manager:??
? Node manager will be running on every data node.??
? Node manager provides computational resources in the form of containers.??
? Whatever is running inside a container, the Node manager has to manage that container.??
??Application Master:??
? Coordinates the execution of all tasks within its application.??
??Ex: Let's say there is an application that contains 100 tasks. It’s the role of the application master to handle all those tasks and coordinate among them.??
Also, the application master asks for appropriate resource containers to run tasks.??
Q). How does YARN overcome the limitations of MR1???
? Scalability: With the introduction of YARN, it removes the scalability problem. Because some of the work is delegated to the Application Master which will manage the end-to-end life cycle of an application. That means Scheduling is done by ‘Resource Manager’ and ‘Monitoring’ is done by ‘Application Master’.??
??? Resource Utilization: With the concept of logical containers coming in, the resource allocation is much more dynamic, and we can request any amount of CPU & memory. With this cluster, utilization is improved as the resources are not wasted.??
? Generic: No longer restricted to Map Reduce Jobs. We can have other jobs also.?Ex: Spark, tez, Giraph etc??
Data Engineer at Atlas Copco Group Gecia
1 年very well explained.