Veracity - The Truthfulness of Big Data
Veracity
The principle concerning the distinction between right and wrong (Good and bad behavior) defines "Veracity" in Big Data.the challenges are finding the truth which are available in multiple sources.In Other words it all about Various Level of Data Uncertainty and reliability.
Reason for Uncertainty :
1.Prediction : Making a prediction about tomorrow based on data we have today.
2.Sample : Dealing with our own sample data from a population may not able find difference between our sample against other sample (Population).
3.Missing our unknown value.
Reason for Reliability:
1.If one site fails in a distributed system remaining site can continue to operate and get the job done.the function of failed site can be taken over by another site.To Provide reliability system must ensure the correct transfer of function.Failure of the site can be detected by the system and the service should not longer used.Mechanism must be available to integrate the recover site back to system.
Hadoop architecture consist of two main component called Hadoop Distributed file system and Map Reduce.the goal of HDFS is that the file are replicated to handle failures and also in addition to that detect the failures and recover from them.
if you provide a value 5 in the above hdfs-site.xml file then 5 replication copy will be created in 5 Nodes for e.g consider "A1" is file which is replicated and placed under different node (Node 1 to Node 5) . if Node 1 is failed to access the the "A1" file,the same "A1" will be retrieved from rest of Nodes 2,3 .. 5. By doing this operation failure of fault tolerant is avoided.
At the same the Node 1 will also get resumed by the Hadoop system.(Cluster of Nodes). The below diagram shows the Data veracity of Big Data.