Big Data Tutorial: Using YARN in the MapReduce Programming Tool

by December 11, 2015 0 comments

– Narayana Murthy Pola, Sr. Project Manager, DST India

Over the last couple of months, we were discussing Hadoop and its components. We also discussed the need for Hadoop and its Execution engine(Map Reduce programming Paradigm).
In this article, let’s just take a quick look on the limitations of MapReduce V1.0 i.e. when Hadoop was introduced to the world in 2007,what was the resultant improvisation to overcome the same.
A Job run in a typical MapReduce v1.0 consists of four components
• Client which submits the job
• Job tracker that runs the job
• Task tracker which runs the tasks that were split into from the submitted jobs such as mappers and reducers
• HDFS(Distributed File system) used for sharing job files
Job tracker (as discussed last month) is responsible for both, managing the cluster’s resources and also to drive the execution of the MapReduce job.
With this approach, developers and practitioners started to encounter the following limitations:


Narayana Murthy Pola Sr. Project Manager, DST-India

Scalability: Job Tracker runs on single machine performing several tasks. These include
o Resource management
o Job and task scheduling
o Monitoring
With Very large clusters typically the size of 4000 nodes and higher, developers experienced an inconsistency in the MapReduce system. The most common kind of failure that was observed is the cascading failure. This in turn could cause the overall cluster to deteriorate while trying to overload the nodes or replicating the data via network flooding.
Availability: Job Tracker is the single point of availability.All jobs have to restart upon its failure.
Resource utilization: There is a concept of fixed Map slots and reducer slots for each Task trackers.This leads to in-effective resource utilization as the Reducer slots may be left Un-utilized while Mappers slots are full.
Multi-Tenancy: Since the resource management and Map Reducer execution engines are tightly integrated, it is impossible to run non-Mapreduce applications and frameworks on Hadoop clusters.
Attempts to overcome the above said limitations resulted in the development of YARN and MapReduce 2.0./Hadoop 2.0




Yet Another Resource Negotiator (YARN) has taken over the responsibility of cluster management from MapReduce. Now MapReduce just takes care of the data processing. In the new improvisation, MapReduce on Big Data Tutorial Using YARN has the following components
• Client that submits the MapReduce job
• YARN resource manager that coordinates the allocation of computer resources on the cluster
• YARN node managers which launch and monitor the compute containers on machines in the cluster
• MapReduce Application master which coordinates the tasks running the MapReduce job.
• HDFS (Distributed File System) which is used for sharing the job files between the entities
In effect, YARN splits the two major functionalities of the JobTracker i.e. resource management and job scheduling & monitoring into 2 separate components.
o Resource Manager
o Node Manager(node specific)
Application master and MapReduce tasks run in containers that are scheduled by the resource manager and are managed by the node managers that ensure the application does not use more resources than it has been allocated.
In contrast to Job tracker each instance of an application (a MapReduce job) has a dedicated application master which runs for the duration of the application.
YARN overcomes the above mentioned limitations (i.e. Scalability, Availability)and does efficient utilization of resources.Multiple applications both MapReduce and non-MapReduce applications can co-exist on YARN.
MapReduce 2.0 has overcome many limitations of MapReduce 1.0 paving way for numerous applications and tools developed on Hadoop, useful in varying domains and walks of life. Next month let us see some of the tools developed on Hadoop/MapReduce increasing Hadoop/MapReduce’s effectiveness and usability.

No Comments so far

Jump into a conversation

No Comments Yet!

You can be the one to start a conversation.

Your data will be safe!Your e-mail address will not be published. Also other data will not be shared with third person.