Episode 13 – Data Cleansing

Episode 13 – Data Cleansing


Previous newsletters have systematically covered the topic of MDM and explained how your company can achieve better data quality. Now, however, it is time to take a look back. What happens to the existing master data? I assume that the data quality of the new master data will be close to 100% from the time of the switch to MDM. However, the existing master data that was created prior to the introduction of MDM may not have been created with MDM principles in mind and therefore have a different quality than the "new" master data. In short, the existing master data is of lower quality.

What means Data Cleansing?

Data cleansing in the context of Master Data Management (MDM) for standard and purchase parts is the process of identifying and correcting or removing inaccurate, incomplete, redundant or outdated data from master data records. Here are some specific aspects of how data cleansing is performed in this context:

Identification and removal of duplicates:

Standard and purchase parts can often be present several times in the database, under slightly different designations or article numbers. Data cleansing helps to detect and consolidate these duplicates so that each piece of information is stored only once and correctly.

Validation and standardization:

Master data is checked and adjusted according to predefined standards. For example, units of measurement, designations, categories, and classifications are standardized to ensure consistency and comparability.

Correction of errors:

Typos, incorrect article numbers, incorrect descriptions or inaccurate technical specifications are identified and corrected.

Completion of missing data:

Missing information, such as technical specifications, manufacturer information or classifications, is supplemented to make the data set complete.

Cleaning up stale data:

Outdated or no longer relevant records are removed or archived to keep the database up-to-date and relevant.

Harmonization of data sources:

Data from different sources, such as supplier databases, internal systems or external catalogs, is harmonized and merged. This includes the consolidation of different formats and structures.

2 Examples, master data of low and varying quality:

Different spellings among manufacturers,

for example, if "Company Example" were an official manufacturer (assuming not to name real companies), can lead to confusion and data inconsistencies.

? Company Example

? Company_Example

? Company, Example

? Example; Company

? EXAMPLE-COMPANY

? CompanyExample

? Compani Example

? And so on

There are numerous variations on how a manufacturer can be entered in the master data (including typos, upper- and lower-case letters, or multiple spaces). However, this is always the same manufacturer.

Different spellings in the naming of articles,

for example, an "adapter" should be created in the master data.

1. Adapter

2. Screw joint

3. Adaptor

4. Link

5. Transition piece

6. Clutch

7. Converter

8. Reducer

9. Adapter piece

10. Fitting

11. Connector

12. Conversion Piece

13. Mounting piece

And so on.

In addition to the many different spellings, add typos, different separators, upper and lower case, too many spaces, and so on.

Result:

Different spellings of manufacturers and article names in MDM lead to data inconsistencies that affect data quality and reduce efficiency. This makes it difficult to search and find information, resulting in time-consuming and error-prone searches. It can also create unnecessary duplicates, draining system resources and increasing operational costs.

This "red thread" runs through all types of master data: Weight specifications vary between grams and kilograms, dimensions are sometimes given in millimeters, sometimes in different decimal formats such as 3-digit with a comma or 2-digit with a dot, etc. These inconsistencies make it much more difficult to use and process the data consistently.


Example processes in data cleansing for standard and purchase parts:

? Automated scripts and tools:

These are used to regularly check for inconsistencies and errors in the master data and to correct them automatically.

Often people try to solve this problem with a spreadsheet program, but I would advise against this from the beginning. Spreadsheet programs quickly reach their limits with large amounts of data, which can lead to performance problems and increased susceptibility to errors. For comprehensive data cleansing processes, specialized data management or database tools are usually a better choice.

? Rule-based systems:

Use specific rules and algorithms to detect and correct deviations. (Again, only a spreadsheet program can be discouraged, the effort would be too high to work out all the rules and maintain them consistently).

? Manual review:

Data analysts manually review suspicious or unclear records to ensure that the information is accurate and complete. In many cases, it is not possible without manual "manual work", tools and AI are to support, but even these reach their limits at some point and manual intervention is indispensable.

Advantages of data cleansing in the MDM context:

? Improved data quality:

Accurate, complete and up-to-date master data increases the reliability and efficiency of operational processes.

? Cost reduction:

Reduction of costs by avoiding errors in procurement, warehousing and production.

? Better decision-making:

Informed decisions can only be made on the basis of high-quality data.

? Compliance and risk mitigation:

Compliance with standards and regulations is facilitated and the risk of data breaches is reduced.

Rely on the expertise of third-party providers now and have your existing data professionally cleaned. The advantage of data cleansing by professionals is that they have specialized knowledge and tools to carry out the process efficiently and precisely. This results in faster results and is more cost-effective than performing the cleanup in-house.


The written article is based on my personal experiences and my individual selection. It makes no claim to completeness. If something is incorrect, I kindly request information or feedback.

best wishes

Sascha Hartung



要查看或添加评论,请登录

社区洞察

其他会员也浏览了