Wednesday 20 May 2015

HDFS: NameNode


  • NameNode is also known as Single Point Failure.
  • Lets say 3 replicas copies are shared among the local DataNodes(DN) and each DataNode acknowledge back to its previous or linked DataNode (that it has received the replica of the file and is stored to its local disk).
  • Every DataNode will give block report or heartbeat to NameNode
  • There is a data called MataData in which name of the files and where their replicas are stored has written.
  • If this MataData is anyhow lost, then there is no use of Hadoop or we can say we wont be able to get the benefits of Hadoop. Entered cluster will be inaccessable and the HDFS will not be working for that Cluster.
  • MataData is stored in the NameNode Hard disk only.
  • NameNode generally maintains communication with Cheap Hardware, but its better to maintain it with high reliable Hardware.
  • In the cluster to store data, we generally creates a block of big size lets say 64MB instead of 4KB sized block(default size). Why?? Because whenever there enters a file into a block of 4KB there would a log produced in the MataData and this is how all 4 KB blocks will produce the log into MataData file. But if we make a block of 64 MB then very less log will be produced comparative to the previous concept.
  • To get the data from the server/hadoop, we writes a code in java or python or any other language whose size will be countable in KBs. Then we uplink/upload that data and get the access to our data.
By this now we are about to understand the concept of JobTracker.

No comments:

Post a Comment