Wednesday, 20 May 2015

HDFS: JobTracker


  • JobTracker take the request and process that data into HDFS.
  • JobTracker cant talk to DataNode but it can talk to NameNode.
  • What JobTracker say to DataNode in layman's language 
"Hey NameNode, I have a client with a file name File.txt and he wanted me to process his file & give its output to 'testoutput' directory by running a program lets say of 10 KB(program.java) on File.txt"
AND "I don't know which block or what should I take to process my request, so give me the details of my request or send me the MataData".
  • Now NameNode will check the file name of File.txt (Is it there or not?).
  • If the file is there, then NameNode simply sends the MataData of that cluster to JobTracker.
  • Now JobTracker select the nearest hardware from the 3 replicas(from the hardwares having 3 copies of same data) to upload the task(10 KB code) of processing.
  • Input Split is the set of blocks, whose combination forms a File which is supposed to store in HDFS. For Eg: If there is a file of 200 MB and we have 64 MB of block each then the file will be stored in 192(3 blocks) and 8(1 block and left with 56 MB).
  • File to store in HDFS is known as Input.
  • Uploading the program(code) into the block is known as MAP.
  • Number of Input Splits = Number of Maps
  • Each DataNode has its own TaskTracker.
  • TaskTracker is further used by JobTracker. JobTracker gives the task to TaskTracker.
  • TaskTracker job is to find the nearest DataNode to fetch the data from it and compile the request of the client which was assigned to JobTracker by the client.
  • If the data is not found in one of the replica then the task will be assigned to another TaskTracker of other replica.

No comments:

Post a Comment