Data Perturbation
Data Perturbation

Data Perturbation

Data Perturbation/ Swapping/ Shuffling/ Data Scrambling/ Data Obfuscation all use the same technique to hide the identify of an entity but keeping the essence of the data.

Let’s decode it…..

Data Swapping is about replacing the values of data at the same level of detail as compared to original data. In other words, cardinality (explained in a separate topic) of the replaced data is maintained.

·??????In Data Anonymization, identifiable data was removed.

·??????In Data Generalization, identifiable data is summarized.

·??????In Data Swapping, level of identifiable data is replaced.

‘Data Swapping technique is majorly used by Data Scientists or Data Citizen who require data at the lowest level to create and train their Machine Learning Models.’

Let’s refer to the same Age example used in topic: Data Generalization/ Blurring and Specialization. For example, there can be an Age column with value: 26, 28, 31, 33, 37, 42, 42, 46, 48, 49, 54, 57, 57, 58, 59. In Data Swapping, to maintain the same level of detail i.e., ages can be swapped 26 with 59, 28 with 58, 32 with 57 and so on.

This way data is not changed at modelling normalization level or its not removed, or it not summarized in another column.

Type of Data Swapping

·??????Partial Swapping doesn’t replace all the value.

·??????Full Swapping replaces all the value.

‘Data Perturbation/ Swapping/ Shuffling/ Data Scrambling/ Data Obfuscation/ Data Masking all are permanent change of data and very had to revert rather most cases, are not reversible.’

Cheers.

要查看或添加评论,请登录

Mustafa Qizilbash的更多文章

社区洞察

其他会员也浏览了