登录查看更多内容

Learning Analytics Series: Terms Beginning with "Data _____" (Part II)

Mark DeRosa

2025 FORUM IT100 Award Winner | Data Analytics Evangelist | Innovative Thought Leader | Master Problem Solver | Agile Expert

发布日期: 2023年1月24日

Introduction

Welcome to the second (Intermediate) installment of this series where another 10 data terms will be covered. If you missed the first (Novice) installment, you can read Part I here. I recommend reading these articles in order since some of the terms build upon previous definitions.

Second 10 Terms Beginning with Data _____ (in alphabetical order)

Term 11: Data Architecture

Data architecture is the blueprint of the data landscape, showing the inputs (sources), processes (integration), outputs (targets), and hardware/software that comprise the solution. The architecture should cover everything from naming the data sources to how the data is collected and consumed, including file formats, data integration methods, processing techniques (e.g., batch vs. real-time, flush & fill vs. change data capture), and security protocols. Similar to data models (Part I), architectures may also come in three (3) forms: Conceptual, Logical, and Physical.

Term 12: Data Cleansing

Data cleansing is the process of improving data by fixing errors, removing duplicates (de-duplication), correcting typos/misspellings, normalizing values, standardizing formats, and handling missing values (data imputation, Part IV). These techniques, and more, increase data quality to improve accuracy and confidence in the data when creating reports, dashboards, or conducting data analysis.

Term 13: Data Definition Language

Data Definition Language (DDL) represents the commands used to create or modify objects in a database. Examples include CREATE (to construct new objects), ALTER (to modify existing objects), and DROP (to delete existing objects). Each of these commands, when successfully executed, physically creates or modifies objects, thereby defining the structure of the database. The DDL statements below illustrate each example with syntactically correct commands.

-- Create new table
CREATE TABLE students (student_id    INT
                      ,last_name     VARCHAR(50)
                      ,first_name    VARCHAR(50));

-- Alter table by adding new column for email address
ALTER TABLE students ADD email_address VARCHAR(255);

-- Delete table
DROP TABLE students;

You can play with these DDL statements, and much more, using a free online virtual database instance from W3Schools.

Term 14: Data Engineering

Data engineering is a broad term that generally covers the work required to extract data from sources, prepare data for users, and everything in between. This work includes data integration, data cleansing, writing DDL/DML statements, creating data pipelines (Part IV), and more. Data engineering is responsible for the plumbing of the data architecture to ensure data is flowing smoothly and ready for usage.

Term 15: Data Integration

Data integration is the process of combining data from disparate data sources to created unified datasets (Part I) that support analysis and reporting. The primary objective of data integration is to create a consolidated view of data that's reliable and accurate, which includes processing different data formats (e.g., flat files, databases) and normalizing data across domains (e.g., unifying personnel data from payroll, human resources, and training systems).

领英推荐

Data Science Approaches to Data Quality: From Raw Data…

Yair R. 2 年前

Advantages and Challenges of 3D Data Visualization

Analytics Insight? 8 个月前

Lessons Learned from Top Data Professionals

Muhammad Ishtiaq Khan 5 个月前

Term 16: Data Integrity

Data integrity is often conflated with data quality and although they are closely related, they are quite different. Data integrity helps to enforce data quality by ensuring the data makes sense. Some techniques include the implementation of primary keys (to ensure uniqueness), foreign keys (to establish relationships that avoid creating orphan records), and constraints (to enforce validation rules). These data integrity techniques improve data quality by preventing errors from being introduced at the point of entry.

Term 17: Data Leakage

Data leakage refers to the unexpected, or unauthorized, release of data to an outside entity or location, which can be problematic because sensitive data may be available to users without permission. Data leakage can occur as a result of data breaches, user error/oversight, relaxed data security rules, poor data integration protocols, or negligence. Formal data governance policies and robust data management practices, which will be defined in Part III, mitigate data leakage.

Term 18: Data Manipulation Language

Data Manipulation Language (DML) represents the commands used to control the contents within the objects of a database (e.g., tables). Examples include INSERT (to create records), UPDATE (to modify records), or DELETE (to remove records). Each of these commands, when successfully executed, physically change the contents in objects, thereby manipulating the contents of the database. The DML statements below illustrate each example with syntactically correct commands.

-- Insert record for new student named John Smith
INSERT INTO students (student_id, last_name, first_name, email_address)
              VALUES (341, 'Smith', 'John', '[email protected]');

-- Update John Smith's email address
UPDATE students
   SET email_address = '[email protected]'
 WHERE student_id = 341;

-- Delete John Smith's record
DELETE FROM students WHERE student_id = 341;

Feel free to play with these DML statements using the W3Schools link shared above (see 'Data Definition Language'). Be sure to create and alter the table again since it was dropped at end of the sample DDL script above in Term 13. ??

Term 19: Data Mining

Data mining is the process of exploring and analyzing datasets to find potentially useful information, identifying relationships and patterns in the data. Data mining is particularly useful for identifying areas of interest for further analysis, separating signal from noise. Exploratory Data Analysis (EDA) is the process typically used to conduct initial investigations of data, especially with new datasets (or sources).

Term 20: Data Warehouse

A data warehouse is a central repository of cleansed data specifically designed and built for business intelligence (BI) reporting and analytics. A data warehouse generally integrates data from multiple data sources, potentially with different formats, using Extraction-Transformation-Load (ETL) routines. The ETL output produces data that is ready for reports, dashboards, and analysis. The most common data modeling technique for a data warehouse is the star schema, containing a fact table surrounded by dimension tables (as shown below).

Summary

That concludes the second article in this series with some intermediate terms preparing you for the remaining two articles. The next article in this series (Part III - Advanced) will be published in a few weeks and cover ten (10) more terms as follows:

Data Aggregation
Data Governance
Data Granularity
Data Journalism
Data Lake
Data Lineage
Data Literacy
Data Management
Data Profiling
Data Sampling

Alexa Tsui

#GovCom Influencer/Community Builder/Human Speakeasy for talent

2 年

Thanks for posting, Mark! This is so helpful! ??

1 次回应

查看更多评论

要查看或添加评论，请登录

Mark DeRosa的更多文章

Want to innovate? Consider this...

2024年11月15日

Want to innovate? Consider this...

Introduction Most, if not all, organizations talk about innovation or being innovative. Innovation is one of those…

2 条评论
Learning AI Series: Part II - Common AI Techniques

2024年10月1日

Learning AI Series: Part II - Common AI Techniques

Introduction In Part I: Demystifying Artificial Intelligence (AI), we described AI at its most basic level to establish…
Learning AI Series - Part I: Demystifying Artificial Intelligence (AI)

2024年3月26日

Learning AI Series - Part I: Demystifying Artificial Intelligence (AI)

Introduction Artificial intelligence (AI) is on the tip of everyone’s tongue these days. It seems like you can’t go…

1 条评论
Agile Analytics

2023年9月26日

Agile Analytics

Introduction We are quick to consider Agile when we think of front-end development (e.g.

3 条评论
How to Become a Rockstar Database Developer

2023年8月13日

How to Become a Rockstar Database Developer

Introduction In order to become a great database developer, one must first understand the characteristics of building…

2 条评论
Success as a Chief Data Officer (CDO)

2023年8月8日

Success as a Chief Data Officer (CDO)

Introduction This article covers the basic tenets required for an Office of the Chief Data Officer (OCDO) to achieve…

10 条评论
Learning Analytics Series: Glossary of Terms Beginning with "Data _____"

2023年3月7日

Learning Analytics Series: Glossary of Terms Beginning with "Data _____"

Introduction This article combines terms from all four articles over the course of this series, plus one walk-on (Data…

6 条评论
Learning Analytics Series: Terms Beginning with "Data _____" (Part IV)

2023年2月21日

Learning Analytics Series: Terms Beginning with "Data _____" (Part IV)

Introduction Welcome back for the fourth and final (Expert) installment with the last 10 data terms, totaling 40 terms…

2 条评论
Learning Analytics Series: Terms Beginning with "Data _____" (Part III)

2023年2月7日

Learning Analytics Series: Terms Beginning with "Data _____" (Part III)

Introduction Welcome to the third (Advanced) installment of this series where another 10 data terms will be covered…
Learning Analytics Series: Terms Beginning with "Data _____" (Part I)

2023年1月10日

Learning Analytics Series: Terms Beginning with "Data _____" (Part I)

Introduction There are plenty of keywords, phrases, and cute metaphors in the world of data analytics. And some of them…

12 条评论

See all articles

Learning Analytics Series: Terms Beginning with "Data _____" (Part II)

Mark DeRosa

2025 FORUM IT100 Award Winner | Data Analytics Evangelist | Innovative Thought Leader | Master Problem Solver | Agile Expert

Introduction

Second 10 Terms Beginning with Data _____ (in alphabetical order)

Term 11: Data Architecture

Term 12: Data Cleansing

Term 13: Data Definition Language

Term 14: Data Engineering

Term 15: Data Integration

领英推荐

Term 16: Data Integrity

Term 17: Data Leakage

Term 18: Data Manipulation Language

Term 19: Data Mining

Term 20: Data Warehouse

Summary

Mark DeRosa的更多文章

社区洞察

其他会员也浏览了

The Art and Science of Data Visualization: Turning Raw Numbers into Compelling Narratives that Drive Decision-Making

What Is Data Wrangling in Data Science? Importance And Techniques

Sourcetable: Revolutionizing Data Management and Analysis with AI

The 16 Best Data Science and Machine Learning Platforms for 2021

Empowering Careers and Decision Making: Dialog Enterprise's Data Science Academy

Data Transformations in Pandas: The Key to Actionable Insights

Turning spider webs into understanding

What Data Visualization Books Should I Read?

Automate Data Science To Make Your Life Easier; 3 Easy Ways

Meet Ultipa Manager: Graph Visualization

Introduction

Second 10 Terms Beginning with Data _____ (in alphabetical order)

Term 11: Data Architecture

Term 12: Data Cleansing

Term 13: Data Definition Language

Term 14: Data Engineering

Term 15: Data Integration

领英推荐

Term 16: Data Integrity

Term 17: Data Leakage

Term 18: Data Manipulation Language

Term 19: Data Mining

Term 20: Data Warehouse

Summary

Mark DeRosa的更多文章

Want to innovate? Consider this...

Learning AI Series: Part II - Common AI Techniques

Learning AI Series - Part I: Demystifying Artificial Intelligence (AI)

Agile Analytics

How to Become a Rockstar Database Developer

Success as a Chief Data Officer (CDO)

Learning Analytics Series: Glossary of Terms Beginning with "Data _____"

Learning Analytics Series: Terms Beginning with "Data _____" (Part IV)