5 V’s?—?Big Data

5 V’s?—?Big Data

What is Big Data?

Big data is a descriptive definition of data (system) that cannot be stored (processed) using traditional databases (computational technologies).

  • All big data systems are based on distributed architecture and not in monolithic architecture due to their true scaling capability ( Which helps in processing millions of data points per day).
  • 3 important factors to consider, for designing a good big data system are?—? storage, processing/ computation, and scalability.

The popular characteristics of Big data are 5 V’s?—?

5 V's of Big data
Volume: Amount of data generated every second?—? Massive data, growing rapidly, requires a scalable and robust system to effectively store and manage it.
 - The Average internet user creates 1.7 MB of data every day.
 - Approximately 328.77 million terabytes of data are created every day.        

  • Example?—?Social Media, Machines ( sensors, smart devices)

Velocity: Speed at which data is created, processed, and analyzed.

  • Example?—?Applications requiring real-time or near-real-time processing and responsiveness (financial trading systems, How many stocks were sold in the last 1 hour, and online transaction systems).

Variety: Various forms of data.

  • Example ?—? structured (Databases), semi-structured (JSON, CSV, XML), and unstructured (Log files, Multimedia files).?

Veracity: Pertains to the reliability and accuracy of data.

Poor data quality can lead to incorrect conclusions and decision—making.?

Life cycle of Veracity

Ensuring the veracity of data?—?through methods to clean, verify, and validate it?—?is crucial for effective decision-making.

Value: The final V stands for value, the importance of turning big data into business value.

This involves extracting actionable insights from raw data that can lead to improved decision-making, innovative products, and better customer experiences.

The Big data

About me?-

Hello! I’m Shoukath Ali, an aspiring data professional, with a Master’s in Data Science?—?Big Data and a Bachelor’s in Computer Science and Engineering.

If you have any queries or suggestions, please feel free to reach out to me at [email protected]

Connect me on LinkedIn?—? https://www.dhirubhai.net/in/shoukath-ali-b6650576/

Disclaimer?—?

The views and opinions expressed on this blog are purely my own. Any product claim, statistic, quote, or other representation about a product or service should be verified with the manufacturer, provider, or party in question.

Taj Basha

Enterprise Architect

10 个月

Very Nice way of categorizing

要查看或添加评论,请登录

Shoukath Ali Shaik的更多文章

  • Apache Spark?-?Data Engineering

    Apache Spark?-?Data Engineering

    Overview of how data and compute resources are distributed across clusters, with optimization. Spark is an alternative…

  • Prompting - Prompt Engineering

    Prompting - Prompt Engineering

    Part one — Convincing LLM, how to generate outputs. ( Brainwash LLM models ??) As discussed in the previous blog…

  • Hadoop — Distributed File System(HDFS)

    Hadoop — Distributed File System(HDFS)

    High-level Overview, Focusing on Distributed Storage Architecture. Large organizations have a typical problem of…

  • LLM — Large Language Models

    LLM — Large Language Models

    Every language model has their own Vocabulary, also known as a group of words, which is used during the pre-training or…

    5 条评论
  • A Day in the Life of —a Big Data Engineer

    A Day in the Life of —a Big Data Engineer

    Big Data Engineer — A Big Data Engineer is a professional who specializes in preparing ‘big data’ for analytical or…

社区洞察

其他会员也浏览了