V's of BIG DATA and HDFS
Archana Sahu
Technology Lead at Infosys Ltd || GCP || Snowflake || Microstrategy || Azure || Hadoop || Hive || Big Data || Databrick
Big Data V's
1 Volume: Since large volume of data gets
generated day by day (from vertical scaling to horizontal scaling)
2 Velocity: Speed of data coming to you Example: Trending Tweets
3 Variety: The variety of data coming Example:
Structured Data, Unstructured Data, Semi-Structured Data
4 Veracity: How accurate the data is.
5 Value: It sits top of the big data pyramid. Refers to the
ability to transform a tsunami of data into business.
HDFS(Hadoop Distributed File System): Is a write once read many, distributed storage in Hadoop which gets the code to the data. It allows the big data sets to break into smaller chunks and store in the nodes of the cluster.
Name Node: Tracks which data is in which data node.
Data Node: Actual data(chunks) as stored in the here in the form of HDFS.
Blocks: Smallest unit of storage in the nodes. 1.x. 64 MB Default size 2.x. 128 MB Default size
Replica: Copy of data. Default replication factor is 3. Makes fault tolerant framework/
Software Engineer at JPMorgan Chase & Co.
3 年Insightful