登录查看更多内容

Big Data Explained Simply?

Dr Mario Bojilov - MEngsSc, CISA, F Fin, PhD

I work with forward-looking, deep-thinking non-executive directors (NEDs) to help them harness Artificial Intelligence (AI) and create profoundly impactful organisations.

发布日期: 2023年2月24日

The first discussion of Big Data appeared in an article written by Mr Doug Laney, an analyst at Forrester Research at the time, in 2001. The paper did not mention Big Data but discussed the three main characteristics of Big Data for the first time:?Volume,?Velocity?and?Variety. The term Big Data only started appearing online in 2006-2007 and has taken hold since then.

Today, a Google search for?Big Data definition?will produce 3.08Bn results. However, this type of variety inevitably results in significant confusion around what Big Data is and when organisations need to start looking at specific applications and solutions related to Big Data. Furthermore, considering only the amount of data is not always sufficient since some organisations routinely process hundreds of terabytes per month, while others struggle with hundreds of gigabytes.

Instead, one way is to look at Big Data in a business-centric manner and consider its effectiveness within an organisational context. This perspective leads to a definition focused purely on business value and not on technical aspects –?We are dealing with Big Data when we cannot obtain the required information within the timeframes necessary for it to be adding value to organisational activities.?Or, to rephrase, organisations need the information to be available before certain events; otherwise, it is useless.

Characteristics

The three Big Data characteristics or 3V’s, identified by Mr Laney in his work, form the foundation used to build Big Data business initiatives and technology infrastructure. While new characteristics constantly appear and broaden the original definition, they often seem redundant and pretentious, created with a marketing purpose in mind. The original 3V’s are discussed below.

Volume

As the name implies, this characteristic refers to the size of the datasets that must be processed. When discussing volume, first, we need to define how it is measured.?As consumers and professionals, we are familiar with kilobytes, megabytes, and gigabytes.?

However, Big Data volumes go well beyond any of these quantities, so a definition is needed at this stage. The list below provides an explanation of terms used to measure data quantities, expressed as bytes, at present:

Kilobyte – 1,000 bytes
Megabyte – 1,000 kilobytes
Gigabyte – 1,000 megabytes
Terabyte – 1,000 gigabytes
Petabyte – 1,000 terabytes
Exabyte – 1,000 petabytes
Zettabyte – 1,000 exabytes
Yottabyte – 1,000 zettabytes

The above definitions are the so-called “decimal definitions" that law courts consider the most appropriate in trade and commerce. The highlighted volumes are the ones that are considered Big Data.

To put?volume?in context, it is worth noting that, according to IDC, in 2018, all data on Earth?was 33 Zettabytes. This amount will grow to 175 Zettabytes by 2025. Moreover, the emergence of COVID-19 and the associated rise in digital technology usage will likely increase this figure even further.

Velocity

Velocity is the 2nd characteristic of Big Data. It refers to the speed of creating data and the rate of processing and consuming data. The emergence of new business models, innovative applications, and widespread use of portable devices has increased velocity significantly.

The US Federal Reserve estimates that in 2012 a total of 24.4Bn general-purpose credit card transactions were made, while in 2018, that figure grew to 40.9Bn, an increase of 68%. Moreover, the electronic payments trend will further accelerate because of COVID-19, since electronic transactions were the only option for most during the lockdowns, and now people are very comfortable with digital technology. This trend, however, was visible even before the pandemic when banks started reducing the number of their Automated Teller Machines (ATMs) in some countries, like Australia.

The increased e-payment volumes are just some examples of increasing data velocity. Another example is social media. For example, Microsoft, LinkedIn’s parent company, reports that in Q4-2020, engagements are up by 31% on LinkedIn. These engagements include text and other data types such as video, audio, graphics, etc. And, this assortment of data brings us to the last characteristic of Big Data – variety.

Bernard Marr 8 年前

Big Data: The All-Important 90/10 Rule

Bernard Marr 9 年前

5 common biases in big data

Naveen Joshi 6 年前

Variety

When related to Big Data, Variety refers to the data sources that need to be processed. There are three main types of data sources we need to deal with:

Structured
Semi-structured
Unstructured

Structured?– this data resides within enterprise systems, and its structure is well-defined. Examples include Payroll, Finance, or other ERP systems. In each case, a database stores all data. An example of such a data record is an HR system’s employee record. It will contain, as a minimum, an employee ID, first name, last name and other fields, as required.?

Structured data has been around since the early 80s. It is the easiest to process and the smallest of the three types in quantity.

Semi-structured?– this data type consists of large volumes of individual records with small sizes and a simple record structure. An example would be the data sent by an intelligent power meter to a central system.?Each packet has the same format: timestamp – 10 bytes, location – 10 bytes, consumption – 10 bytes + other information – 80 bytes.

Thus, information about electricity consumption takes 110 bytes. However, the 110 bytes is misleading since the?daily volume?in a city of 500,000 households with 5-sec intervals will be 950GB (110*12*60*24*500,000). Within a month, this dataset will grow to 11.4 Terabytes; after one year, its size will reach 137 Terabytes.?

Intelligent electricity meters are just one example of semi-structured data. With the proliferation of Internet-of-Things (IoT) devices, semi-structured data will be the fastest-growing one of the three types.

Unstructured?– strictly speaking, this data is still structured. However, we deal with many different structures and formats in this case. A more accurate term will be multi-structured; however, unstructured is currently used for one reason or another.

Examples of?unstructured?data include social media posts, such as audio, video, graphics, and text. Additionally, external systems, and data from enterprise sources, such as Word files, emails, and PDFs, are included here.

No alt text provided for this image — Figure 1. Internet Activity per Minute of Day in 2022. Source: domo.com

Figure 1 shows the wide variety of data items generated every minute in 2022. Some highlights contributing to Big Data include users sharing 1.7m Facebook pieces of content, uploading 5 hours of YouTube videos, and spending 104.6k hours in Zoom meetings.

Summary

Figure 1 highlights the continuous significant growth in Big Data in all three characteristics –?volume, velocity,?and?variety. However, this infographic presents only part of the picture – the data generated by the activities of individual consumers. Even higher data volumes are coming from organisations in various industries. Ad, this growth in Big Data will accelerate significantly during COVID-19 and afterwards, as organisations adopt new technologies and deploy new infrastructure, while “connected” consumers adopt new ways of connecting, shopping and working with great confidence.

#bigdata #enterprise #analytics

What do you think of the need for Big Data in your organisation? Please feel free to drop me a message or leave a comment below.

Big Data Explained Simply?

Dr Mario Bojilov - MEngsSc, CISA, F Fin, PhD

I work with forward-looking, deep-thinking non-executive directors (NEDs) to help them harness Artificial Intelligence (AI) and create profoundly impactful organisations.

Characteristics

领英推荐

Digital Risk + Transformation

2,539 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Big Data: The 5 Vs Everyone Must Know

Big Data: Do You Have a Plan?

What Is Big Data?

Data Nugget November 2023

Journey From Big Data to Smart Data: How Big Data Testing Can Help You?

THE REAL CHALLENGE OF BIG DATA

How to Address the Big Data Challenges like a Pro

HOW TO CHOOSE A BIG DATA SOLUTION: STEP BY STEP

The 10 Vs of Big Data

Big Data: All About Finding Value

Characteristics

领英推荐

Digital Risk + Transformation

2,539 位关注者

?? A Thought on AI ??

2024年11月24日

A Thought on AI

2024年11月11日

A Thought on AI

2024年11月4日

Seven Insights from the U.S. Memo on Advancing AI Leadership to Guide Global AI Governance

2024年10月30日

A Thought on AI

2024年10月28日

Empowering Accessibility: Part 1 - Enhanced Mobility and Autonomy with AI

2024年8月31日

AI-Augmented Compliance: A Three-Part Exploration - Part 3: AI Compliance "Experts"

2024年7月22日

AI-Augmented Compliance: A Three-Part Exploration - Part 2: New Legislation Alignment

2024年7月16日

AI-Augmented Compliance: A Three-Part Exploration - Part 1

2024年7月13日

AI-Powered Audits: Navigating the New Normal

2024年7月9日

社区洞察

其他会员也浏览了

Big Data: The 5 Vs Everyone Must Know

Big Data: Do You Have a Plan?

What Is Big Data?

Data Nugget November 2023

Journey From Big Data to Smart Data: How Big Data Testing Can Help You?

THE REAL CHALLENGE OF BIG DATA

How to Address the Big Data Challenges like a Pro

HOW TO CHOOSE A BIG DATA SOLUTION: STEP BY STEP

The 10 Vs of Big Data

Big Data: All About Finding Value