- JobTracker take the request and process that data into HDFS.
- JobTracker cant talk to DataNode but it can talk to NameNode.
- What JobTracker say to DataNode in layman's language
AND "I don't know which block or what should I take to process my request, so give me the details of my request or send me the MataData".
- Now NameNode will check the file name of File.txt (Is it there or not?).
- If the file is there, then NameNode simply sends the MataData of that cluster to JobTracker.
- Now JobTracker select the nearest hardware from the 3 replicas(from the hardwares having 3 copies of same data) to upload the task(10 KB code) of processing.
- Input Split is the set of blocks, whose combination forms a File which is supposed to store in HDFS. For Eg: If there is a file of 200 MB and we have 64 MB of block each then the file will be stored in 192(3 blocks) and 8(1 block and left with 56 MB).
- File to store in HDFS is known as Input.
- Uploading the program(code) into the block is known as MAP.
- Number of Input Splits = Number of Maps
- Each DataNode has its own TaskTracker.
- TaskTracker is further used by JobTracker. JobTracker gives the task to TaskTracker.
- TaskTracker job is to find the nearest DataNode to fetch the data from it and compile the request of the client which was assigned to JobTracker by the client.
- If the data is not found in one of the replica then the task will be assigned to another TaskTracker of other replica.