Hadoop Yarn Fair scheduler advantages.. explained... part1

Hadoop Yarn Fair scheduler advantages.. explained... part1

What is Fair : 

Keywords: Hadoop, MapReduce, task scheduling, yet another resource negotiator, YARN, Hadoop distributed file system, HDFS, JobTracker, TaskTracker

Fair scheduling is a method of assigning resources to applications such that all apps get, on average, an equal share of resources over time. Hadoop NextGen is capable of scheduling multiple resource types. By default, the Fair Scheduler bases scheduling fairness decisions only on memory. It can be configured to schedule with both memory and CPU, using the notion of Dominant Resource Fairness 


Problem statement : If FIFO is the scheduler which is used. Which will queue up the jobs sharing. and our QA was always complaining of either jobs are not executed or job executions are very slow. since all yarn jobs piled up in the queue.

Configuring fair schedular for yarn jobs

advantages :

1) To meet requirements of multi tenant systems .

2) All the jobs gets equal share of resources.

When only one job present occupies entire cluster. As other jobs arrive each job given equal % of cluster.

Example : Each job might be given equal number of cluster wide Yarn containers.

Each container = 1 task of job

Divides cluster into pools

- Typically one pool per user/module.

  • Resources divided equally among pools.
  • Gives each user fair share of cluster
  • With in each pool, can use either
  • Fair share scheduling, or FIFO/Fair configurable.
  • Some pools may have minimum shares.

minimum % of cluster that pool is guaranteed.

When minimum share not met in a pool, for a while

 Take resources away from other pools

- By Pre-empting jobs in those other pools

- By killing the currently running tasks of those jobs 

tasks can be restarted later. since tasks are idempotent.

Note : Preempting is not allowed in Hadoop capacity scheduler

To kill, the scheduler picks most recently started tasks

Can also sets limits on 

  • Number of concurrent jobs per user
  • Number of concurrent jobs per pool.
  • Number of concurrent tasks per pool.

Prevents cluster from being hogged by one user/module/job

Allocations with Hadoop Yarn Fair scheduler example.

Allocations with Hadoop Yarn Fair scheduler example.

With fair scheduling of yarn was able to address above problem by creating dedicated pools for each module.

will discuss fair scheduler configuration in my next post i.e. part2


要查看或添加评论,请登录

Ram Ghadiyaram的更多文章

社区洞察

其他会员也浏览了