Data Tokenization [Series#2: I am Data!]
Mustafa Qizilbash
‘Open for New Opportunities (Globally), Author, Data & AI Practitioner & CDMP Certified, Innovator of Four 4s Formula, DAC Architecture, PVP Approach
Data Tokenization is a known term and mostly mixed up with Data Masking but there are major maintenance differences.
‘Data Masking with Mappability’
Let’s decode it…..
We know how data masking works i.e., it changes data with random values, means to hide original data so if data reached into wrong hands, one should not be able to identify the individual to whom the record refers.
Data Tokenization also replaces sensitive data with unique system generated value called Token but during the process original data + token along with mapping details are stored in a Token Vault somewhere externally for safe keeping.
As Token replaces original data in actual data store and its mapping details are still safe in Token Vault so actual data is always reversible in contrast with Data Masking where after sensitive data is randomly changed it’s no longer reversible.
A quick example for experienced Data Warehousing folk is, Surrogate Key where original business keys are replaced with unique autogenerate values, but original data is not touched. In Data Warehousing Surrogate Key sit in actual data whereas in Data Tokenization, Surrogate Key sit in Token Vault.
Cheers.
I am an Enterprise Data Management, Data Governance, Data Modeling Experienced Professional | As a Team Leader, I ensure the highest data quality, security, and compliance standards.
2 年Well-defined tokenization. Data masking, tokenization, and canned data have their application and usage. I did not find any conflict. For example, structure data can be masked or canned if we want to share the data for development purposes. However, tokenization is needed if we share the actual sensitive data for decision-making or real-time analysis. What do you think?
Author of 'Enterprise Architecture Fundamentals', Founder & Owner of Caminao
2 年While "Data tokenization" is good for the buzz, "Canned data" is a better option.