Veracity - The Truthfulness of Big Data

Veracity - The Truthfulness of Big Data

Veracity

The principle concerning the distinction between right and wrong (Good and bad behavior) defines "Veracity" in Big Data.the challenges are finding the truth which are available in multiple sources.In Other words it all about Various Level of Data Uncertainty and reliability.

Reason for Uncertainty :

1.Prediction : Making a prediction about tomorrow based on data we have today.

2.Sample : Dealing with our own sample data from a population may not able find difference between our sample against other sample (Population).

3.Missing our unknown value.

Reason for Reliability:

1.If one site fails in a distributed system remaining site can continue to operate and get the job done.the function of failed site can be taken over by another site.To Provide reliability system must ensure the correct transfer of function.Failure of the site can be detected by the system and the service should not longer used.Mechanism must be available to integrate the recover site back to system.

Hadoop architecture consist of two main component called Hadoop Distributed file system and Map Reduce.the goal of HDFS is that the file are replicated to handle failures and also in addition to that detect the failures and recover from them.

if you provide a value 5 in the above hdfs-site.xml file then 5 replication copy will be created in 5 Nodes for e.g consider "A1" is file which is replicated and placed under different node (Node 1 to Node 5) . if Node 1 is failed to access the the "A1" file,the same "A1" will be retrieved from rest of Nodes 2,3 .. 5. By doing this operation failure of fault tolerant is avoided.

At the same the Node 1 will also get resumed by the Hadoop system.(Cluster of Nodes). The below diagram shows the Data veracity of Big Data.











要查看或添加评论,请登录

Sri Sivakumar Ramar ( An Agile Learner )的更多文章

  • AI World

    AI World

    Welcome to AI World As we know how AI help and facilitates the people make better decisions by analyzing large amounts…

  • Dates Always - Comparing your latest week to the same days of the previous week

    Dates Always - Comparing your latest week to the same days of the previous week

    Solved: Dynamic date period Filter based on slicer selecti..

  • ML and Charts Using R

    ML and Charts Using R

    Power BI: Tips and Tricks for Building Professional Power BI Model (and a star schema tutorial) – business intelligist…

  • Essential SQL Query to Know

    Essential SQL Query to Know

    select first and last row sql Code Example (codegrepper.com) SQL Window Functions Cheat Sheet | LearnSQL.

  • Looker - BI for making better business decision

    Looker - BI for making better business decision

    Looker's a Web-based platform that allows users to explore data in a timely manner to help them make better business…

  • I want to know what is Python Pandas

    I want to know what is Python Pandas

    Python Zen Master: Jan 26 1121 am python-tips/python_tips.ipynb at main · CalebCurry/python-tips · GitHub…

  • JASPER REPORT - An Overview

    JASPER REPORT - An Overview

    JASPER Report an Overview - Intended for Begineers Jaspersoft : An embeddable reporting and analytics platform designed…

    2 条评论
  • JavaScript Object Notation

    JavaScript Object Notation

    JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write.

    1 条评论
  • Gradient descent

    Gradient descent

    An Article about Gradient Descent Gradient Descent is a popular algorithm to perform optimization of deep learning such…

  • Visualization ???

    Visualization ???

    Tableau : History of Innovations Analyzing the history of Tableau innovation History of Tableau Innovation | Tableau…

社区洞察

其他会员也浏览了