V's of BIG DATA and HDFS

Big Data V's

1 Volume: Since large volume of data gets

generated day by day (from vertical scaling to horizontal scaling)

2 Velocity: Speed of data coming to you Example: Trending Tweets

3 Variety: The variety of data coming Example:

Structured Data, Unstructured Data, Semi-Structured Data

4 Veracity: How accurate the data is.

5 Value: It sits top of the big data pyramid. Refers to the

ability to transform a tsunami of data into business.


HDFS(Hadoop Distributed File System):  Is a write once read many, distributed storage in Hadoop which gets the code to the data. It allows the big data sets to break into smaller chunks and store in the nodes of the cluster.

Name Node: Tracks which data is in which data node.

Data Node: Actual data(chunks) as stored in the here in the form of HDFS.

Blocks: Smallest unit of storage in the nodes. 1.x. 64 MB Default size 2.x. 128 MB Default size

Replica: Copy of data. Default replication factor is 3. Makes fault tolerant framework/

Anurag Samanta

Software Engineer at JPMorgan Chase & Co.

3 年

Insightful

要查看或添加评论,请登录

社区洞察

其他会员也浏览了