登录查看更多内容

Learning Analytics Series: Terms Beginning with "Data _____" (Part III)

Mark DeRosa

2025 FORUM IT100 Award Winner | Data Analytics Evangelist | Innovative Thought Leader | Master Problem Solver | Agile Expert

发布日期: 2023年2月7日

Introduction

Welcome to the third (Advanced) installment of this series where another 10 data terms will be covered, bringing you up to 30 terms at this point. If you missed the first two articles, I recommend reading Part I (Novice) and Part II (Intermediate) first since some of the terms build upon previous definitions.

Third 10 Terms Beginning with Data _____ (in alphabetical order)

Term 21: Data Aggregation

Data aggregation is generally done to provide data in summary form to make it easier to understand, query, and use. Aggregating data beforehand makes it easier to work with because it eliminates the guesswork from joining tables properly. This method is particularly helpful because different users will get the same trusted answer from an aggregated view of pre-joined data as opposed to users querying (and joining) multiple tables on their own. Oftentimes, data aggregation is placed in a database view that retrieves data from multiple objects (tables or views).

Term 22: Data Governance

Data governance provides the policies and procedures for properly handling data, including the strategies and controls used by people throughout the organization. These policies and procedures include references to federal compliance such as laws, executive orders, and memorandums. Formal governance increases an organization's security posture by reducing the risk of data breaches and unauthorized access to data. Data governance defines the management of data throughout its lifecycle, ensuring the availability of high-quality information. (We'll touch upon data governance a bit more below with Term 28 when data management is defined.)

Term 23: Data Granularity

Data granularity is the level of detail represented in a dataset, object (e.g. table), or star schema in the case of dimensional models for data warehouses. Knowing the granularity is very helpful because it sets the context for the data being processed or analyzed. For example, the granularity for an employee table is a single person uniquely identified by an employee number. Granularity can be low-level detail or high-level summary data. In good database designs (e.g., 3NF), granularity generally maps to a single record; however, granularity may vary within a single record in denormalized designs (e.g., aggregated views).

Term 24: Data Journalism

Data journalism is the art of telling stories based on data, also known as data-driven storytelling. The primary goal of data journalism is to report fact-based information in the form of stories for public interest, with the facts substantiated by data. And the more data used to construct the story, the better. The resulting information can be presented using a combination of forms, combining narrative text with graphics and charts.

One of the oldest and most highly regarded examples of data journalism is Charles Joseph Minard's graphical depiction of Napoleon's losses in the Russian campaign of 1812 (shown below). This single infographic quickly communicates so much valuable information, visually and textually. More information is available on Edward Tufte's website.

Charles Joseph Minard's Graphic Portrayal for the War of 1812

Term 25: Data Lake

A data lake is a centralized repository of structured, unstructured, and semi-structured data collected from many different sources. The idea is to provide access to data for quick analysis to help determine where value may exist and serve as a unified data source for other downstream systems. For example, data scientists may access 'dirty data' to perform some quick statistical analysis to discover potentially valuable information without waiting for data to be formally built in a system (like an enterprise data warehouse). In some cases, accessing data in its rawest form is preferred to avoid using mutated data.

Term 26: Data Lineage

Data lineage is the documentation trail of a data element's journey, from source to target. As data travels from its source (origin), it may undergo some transformations before landing in its target (destination). Documenting these travels and any changes along the way is known as data lineage. The best form of data lineage is bi-directional, meaning that you can trace a data element from source to target and vice versa.

领英推荐

NEW from Maven Analytics on Medium!

Maven Analytics 1 年前

Terminologies in Data Analytics

Pratibha Kumari J. 1 年前

Defining Data Literacy

Stephen Downes 1 年前

The image below shows a simplified view of data extracted from a Source Database that may undergo some form of Processing and is placed into a Target Database ready for use.

The idea is to be able to find the source data element(s) from the target, the target data element(s) from the source, and understand any changes that occur between those endpoints. This bi-directional traceability provides transparency and instills confidence in the data because users understand what came from where and how it landed at its destination.

Term 27: Data Literacy

Data literacy involves reading, understanding, and communicating data in a consistent manner that develops a competent workforce and facilitates effective collaboration. Oftentimes, people interpret (or use) the same data in different ways, which leads to confusion and possibly incorrect results. Improving data literacy mitigates confusion and errors by educating and informing users of the data available, its meaning, and intended purpose.

Term 28: Data Management

Data management enacts the policies and procedures from the data governance program to ensure the implementation matches the plan. Aligning the implementation of data management to data governance protects the organization's data with secure and reliable solutions. Data management and data governance are often confused with each other. The easiest way to remember the difference is that data governance is the functional framework whereas data management is the technical implementation supporting that functional framework.

The Venn diagram below shows some of the main components supporting Data Governance vs. Data Management. A Venn diagram is used to describe these components because they are so inter-related, and occasionally overlap.

Term 29: Data Profiling

Data profiling is the systematic examination of data to gather information about its size, data types, relationships, and summary statistics (e.g., min, max, avg, length, NULLs, unique values). Data profiling is a useful first step upon receiving a new dataset to quickly understand the contents and where potentially valuable information may exist. This step is critically important to understanding the data, especially when no other useful information is available such as data models or data dictionaries.

PRO TIP: Make sure your project has current data models and data dictionaries that are treated as living and breathing artifacts. No database structure changes should be implemented until they are modeled and defined. ??

Term 30: Data Sampling

Data sampling is a statistical technique used to select a subset of data from a much larger dataset that can be used to perform analysis. Some datasets are too large for quick experimentation or testing, so a sampling of data is selected. The idea is to work with a much smaller, but accurate, representation of the entire dataset. There are multiple methods that can be used such as simple random sampling, stratified sampling, cluster sampling, and systematic sampling.

Summary

That concludes the third article in this series with some advanced terms preparing you for the final article. The last article in this series (Part IV - Expert) will be published in a few weeks and cover ten (10) more terms as follows:

Data Blending
Data Discretization
Data Fabric
Data Imputation
Data Mesh
Data Pipeline
Data Processing (Pre & Post)
Data Sanitization
Data Science
Data Wrangling

要查看或添加评论，请登录

Mark DeRosa的更多文章

Want to innovate? Consider this...

2024年11月15日

Want to innovate? Consider this...

Introduction Most, if not all, organizations talk about innovation or being innovative. Innovation is one of those…

2 条评论
Learning AI Series: Part II - Common AI Techniques

2024年10月1日

Learning AI Series: Part II - Common AI Techniques

Introduction In Part I: Demystifying Artificial Intelligence (AI), we described AI at its most basic level to establish…
Learning AI Series - Part I: Demystifying Artificial Intelligence (AI)

2024年3月26日

Learning AI Series - Part I: Demystifying Artificial Intelligence (AI)

Introduction Artificial intelligence (AI) is on the tip of everyone’s tongue these days. It seems like you can’t go…

1 条评论
Agile Analytics

2023年9月26日

Agile Analytics

Introduction We are quick to consider Agile when we think of front-end development (e.g.

3 条评论
How to Become a Rockstar Database Developer

2023年8月13日

How to Become a Rockstar Database Developer

Introduction In order to become a great database developer, one must first understand the characteristics of building…

2 条评论
Success as a Chief Data Officer (CDO)

2023年8月8日

Success as a Chief Data Officer (CDO)

Introduction This article covers the basic tenets required for an Office of the Chief Data Officer (OCDO) to achieve…

10 条评论
Learning Analytics Series: Glossary of Terms Beginning with "Data _____"

2023年3月7日

Learning Analytics Series: Glossary of Terms Beginning with "Data _____"

Introduction This article combines terms from all four articles over the course of this series, plus one walk-on (Data…

6 条评论
Learning Analytics Series: Terms Beginning with "Data _____" (Part IV)

2023年2月21日

Learning Analytics Series: Terms Beginning with "Data _____" (Part IV)

Introduction Welcome back for the fourth and final (Expert) installment with the last 10 data terms, totaling 40 terms…

2 条评论
Learning Analytics Series: Terms Beginning with "Data _____" (Part II)

2023年1月24日

Learning Analytics Series: Terms Beginning with "Data _____" (Part II)

Introduction Welcome to the second (Intermediate) installment of this series where another 10 data terms will be…

3 条评论
Learning Analytics Series: Terms Beginning with "Data _____" (Part I)

2023年1月10日

Learning Analytics Series: Terms Beginning with "Data _____" (Part I)

Introduction There are plenty of keywords, phrases, and cute metaphors in the world of data analytics. And some of them…

12 条评论

See all articles

Learning Analytics Series: Terms Beginning with "Data _____" (Part III)

Mark DeRosa

2025 FORUM IT100 Award Winner | Data Analytics Evangelist | Innovative Thought Leader | Master Problem Solver | Agile Expert

Introduction

Third 10 Terms Beginning with Data _____ (in alphabetical order)

Term 21: Data Aggregation

Term 22: Data Governance

Term 23: Data Granularity

Term 24: Data Journalism

Term 25: Data Lake

Term 26: Data Lineage

领英推荐

Term 27: Data Literacy

Term 28: Data Management

Term 29: Data Profiling

Term 30: Data Sampling

Summary

Mark DeRosa的更多文章

社区洞察

其他会员也浏览了

Automated Data Preparation: Reducing the Time Spent on Data Cleaning and Preprocessing

Data Technology Growth in the new age

Transforming Unstructured Data into Insights with Power Query

Data Preparation (Structured vs. Unstructured Data to Preprocessing, Integration, and Wrangling Techniques)

The Essential Guide to Data Cleaning and Preprocessing with Pandas

Master Data Wrangling: Unlocking the Power of Data Preprocessing

Navigating a Career in Data Analytics: Seizing Future Trends

7 Success Factors for Data Products

What is data cleaning?

The Challenges of Managing Unstructured Data in Everyday Life: A Comprehensive Analysis

Introduction

Third 10 Terms Beginning with Data _____ (in alphabetical order)

Term 21: Data Aggregation

Term 22: Data Governance

Term 23: Data Granularity

Term 24: Data Journalism

Term 25: Data Lake

Term 26: Data Lineage

领英推荐

Term 27: Data Literacy

Term 28: Data Management

Term 29: Data Profiling

Term 30: Data Sampling

Summary

Mark DeRosa的更多文章

Want to innovate? Consider this...

Learning AI Series: Part II - Common AI Techniques

Learning AI Series - Part I: Demystifying Artificial Intelligence (AI)

Agile Analytics

How to Become a Rockstar Database Developer

Success as a Chief Data Officer (CDO)

Learning Analytics Series: Glossary of Terms Beginning with "Data _____"

Learning Analytics Series: Terms Beginning with "Data _____" (Part IV)