登录查看更多内容

Data Profiling

Nivedita singh

Hr Professional

发布日期: 2023年4月3日

What Is Data Profiling?

Data profiling is the process of reviewing source data, understanding structure, content and interrelationships, and identifying potential for data projects.?

Data profiling is a crucial part of:

Data warehouse and business intelligence (DW/BI) projects—data profiling can uncover data quality issues in data sources, and what needs to be corrected in ETL.
Data conversion and migration projects—data profiling can identify data quality issues, which you can handle in scripts and data integration tools copying data from source to target. It can also uncover new requirements for the target system.
Source system data quality projects—data profiling can highlight data which suffers from serious or numerous quality issues, and the source of the issues (e.g. user inputs, errors in interfaces, data corruption).

Data profiling involves:

Collecting descriptive statistics like min, max, count and sum.
Collecting data types, length and recurring patterns.
Tagging data with keywords, descriptions or categories.
Performing data quality assessment, risk of performing joins on the data.
Discovering metadata and assessing its accuracy.
Identifying distributions, key candidates, foreign-key candidates, functional dependencies, embedded value dependencies, and performing inter-table analysis.

领英推荐

What is Data Quality and how is it measured?

Jose Almeida 1 年前

The Data Analysis Process: From Data Collection to…

Akshay Sangave 4 个月前

What Is the Data Analysis Process? (A Complete Guide)

RND Experts 11 个月前

Types of data profiling

There are three main types of data profiling:

Structure discovery

Validating that data is consistent and formatted correctly, and performing mathematical checks on the data (e.g. sum, minimum or maximum). Structure discovery helps understand how well data is structured—for example, what percentage of phone numbers do not have the correct number of digits.?

Data profiling and data quality analysis best practices

Basic data profiling techniques:

Distinct count and percent—identifies natural keys, distinct values in each column that can help process inserts and updates. Handy for tables without headers.
Percent of zero / blank / null values—identifies missing or unknown data. Helps ETL architects setup appropriate default values.
Minimum / maximum / average string length—helps select appropriate data types and sizes in target database. Enables setting column widths just wide enough for the data, to improve performance.

要查看或添加评论，请登录

Nivedita singh的更多文章

Front-End vs. Back-End: What’s the Difference?

2023年4月20日

Front-End vs. Back-End: What’s the Difference?

Front-End Development Front-end development focuses on the user-facing side of a website. Front-end developers ensure…
Talend

2023年4月13日

Talend

What is Talend? Talend is an open source software platform which offers data integration and data management solutions.…
Snowflake

2023年4月8日

Snowflake

Snowflake Inc. is a cloud computing–based data cloud company based in Bozeman, Montana.
Data Engineering

2023年4月1日

Data Engineering

In the modern world, it is tough to think of any industry that has not been revolutionized by data science. Although…
Data Scrubbing

2023年3月31日

Data Scrubbing

What is Data Scrubbing? If in the course of doing household chores, someone told you to clean the floor, you most…
Computer Vision

2023年3月29日

Computer Vision

What is computer vision? Computer vision is a field of artificial intelligence (AI) that enables computers and systems…
CSS

2023年3月28日

CSS

What is CSS? Cascading Style Sheets (CSS) is used to format the layout of a webpage. With CSS, you can control the…
Microsoft 365

2023年3月27日

Microsoft 365

Microsoft 365 is a product family of productivity software, collaboration and cloud-based services owned by Microsoft…

2 条评论
Front-End Developer

2023年3月25日

Front-End Developer

Front-End Front-End Development Front-end development focuses on the user-facing side of a website. Front-end…
Data Mining

2023年3月24日

Data Mining

Data mining is the process of extracting and discovering patterns in large data sets involving methods at the…

See all articles

Data Profiling

Nivedita singh

Hr Professional

What Is Data Profiling?

领英推荐

Types of data profiling

Structure discovery

Data profiling and data quality analysis best practices

Nivedita singh的更多文章

社区洞察

其他会员也浏览了

What is Data Analysis?

Data Integrity - a more detailed Data Science perspective

Data Scrubbing

A simple representation of what Data Analysis Means

Challenges with Data Mapping work?

A Comprehensive Guide: Step-by-Step Approach for Data Analysis

Demystifying Data Cleaning: Strategies for Handling Messy Data

A simple representation of what Data Analysis Means

Data Wrangling

Data Enhancement and Data Enrichment: Everything You Need to Know

What Is Data Profiling?

领英推荐

Types of data profiling

Structure discovery

Data profiling and data quality analysis best practices

Nivedita singh的更多文章

Front-End vs. Back-End: What’s the Difference?

Talend

Snowflake

Data Engineering

Data Scrubbing

Computer Vision

CSS

Microsoft 365

Front-End Developer

Data Mining

社区洞察

其他会员也浏览了

What is Data Analysis?

Data Integrity - a more detailed Data Science perspective

Data Scrubbing

A simple representation of what Data Analysis Means

Challenges with Data Mapping work?

A Comprehensive Guide: Step-by-Step Approach for Data Analysis

Demystifying Data Cleaning: Strategies for Handling Messy Data

A simple representation of what Data Analysis Means

Data Wrangling

Data Enhancement and Data Enrichment: Everything You Need to Know