Big Data: HDFS: JobTracker

JobTracker take the request and process that data into HDFS.
JobTracker cant talk to DataNode but it can talk to NameNode.
What JobTracker say to DataNode in layman's language

"Hey NameNode, I have a client with a file name File.txt and he wanted me to process his file & give its output to 'testoutput' directory by running a program lets say of 10 KB(program.java) on File.txt"
AND "I don't know which block or what should I take to process my request, so give me the details of my request or send me the MataData".

Now NameNode will check the file name of File.txt (Is it there or not?).
If the file is there, then NameNode simply sends the MataData of that cluster to JobTracker.
Now JobTracker select the nearest hardware from the 3 replicas(from the hardwares having 3 copies of same data) to upload the task(10 KB code) of processing.
Input Split is the set of blocks, whose combination forms a File which is supposed to store in HDFS. For Eg: If there is a file of 200 MB and we have 64 MB of block each then the file will be stored in 192(3 blocks) and 8(1 block and left with 56 MB).
File to store in HDFS is known as Input.
Uploading the program(code) into the block is known as MAP.
Number of Input Splits = Number of Maps
Each DataNode has its own TaskTracker.
TaskTracker is further used by JobTracker. JobTracker gives the task to TaskTracker.
TaskTracker job is to find the nearest DataNode to fetch the data from it and compile the request of the client which was assigned to JobTracker by the client.
If the data is not found in one of the replica then the task will be assigned to another TaskTracker of other replica.

Big Data

Wednesday, 20 May 2015

HDFS: JobTracker

No comments:

Post a Comment

Total Pageviews

About Me