Going Incognito: The minute details of data masking
The increased data breach cases make protecting personally identifiable information (PII) crucial. The GDPR policy mandates organizations to ensure that personal data is gathered legally and under stringent conditions. Additionally, data processors who collect and manage data must protect it from misuse and exploitation and respect the rights of data owners to avoid penalties.?Although many tried and tested measures ensure data security, the GDPR recommends data masking techniques to reduce data exposure risk and fulfill data compliance regulations.??
Data Masking?
The data masking techniques create a version of data that appears primarily like the original data but masks or hides the sensitive information. The prime purpose of data masking is to create a practical substitute that does not reveal the actual data.?While most organizations follow stringent security measures to protect data used for business and other purposes, compliance issues and data risk arise when an organization's data is used for general purposes such as training or use by third parties.?
In situations like these, data masking allows users to use the same data format that emulates the original data but with changed values of sensitive details. Some examples of data masking are randomizing characters, hiding data using symbols, nulling, scrambling, and encrypting.?While data masking can be used to conceal many types of data, the data that are most commonly masked include:
Data Masking Types
The most used data masking types are as follows:
Static Data Masking – This method allows the creation of the replicated version of a dataset that contains fully or partially masked data. Static data masking alters data to emulate the original data that can be used for demos, testing, or training purposes. The process involves copying a database to another environment, eliminating unnecessary data, masking the data, and finally, saving the masked data. The masked data can be stored and maintained in a separate database for use as and when required.??
Dynamic Data Masking – This procedure happens in real-time, where data is streamed depending on user requirements and access. Dynamic data masking ensures that the original information is visible only to authorized users, and any other user can see masked data. It is used mainly for processing role-based security for applications and applies to read-only circumstances. Since dynamic data masking happens at run time, there is no need for the masked data to be saved in a separate database.?
On-the-fly data masking – This process modifies sensitive information in transit so the data is hidden before reaching its destination. A technique such as this is ideal for organizations that migrate data from one system to another or continuously deploy software or integrate substantially.
An example of Data Masking for authorized and unauthorized users:
Data Masking Techniques
The standard data masking techniques are:
领英推荐
Pseudonymization - As defined in the GDPR [1], Pseudonymization is a technique in which a fictitious name or pseudonym replaces the original data. It involves removing any direct identifiers. Therefore, it becomes difficult to identify persons without using any additional details. However, the extra information must be stored separately following organizational measures to prevent identifying the person.?
Anonymization - Anonymization involves removing all direct and indirect personal identifiers that may identify a person. For example, someone may be directly identifiable by name, address, phone number, or any other characteristic specific to that person. At the same time, the same person can also be identified through linked data such as their health condition or job details. Anonymization uses encoding to mask any identifiable data that link an individual.? By truly anonymizing data and removing individual identifiers, the data will not fall within GDPR's definition [2] and be more readily usable.
Scrambling – In this technique, randomly rearranged characters replace the original content. For example, an ID number such as 46753 can be used in a test environment instead of 76543. This is the most straightforward technique where the actual values remain but in a shuffled form. Hence, when masked, a column listing employees’ salaries will have the same aggregate value with only the actual salaries scrambled. Therefore, the column will list the salary but not reveal the salary that belongs to each employee.?
Substitution – Substitution is masking data by replacing it with fake but realistic values. For example, real names can be replaced by a random selection of names from a telephone directory. This technique allows the use of realistic data in a test environment without exposing the original data.
Encryption - This method masks data using an algorithm where only someone with the decryption key can only view an encrypted file. Data masking in this form is the most secure and complex approach to implement because ongoing encryption has to be managed and shared with mechanisms for key management. However, data masking and encryption are two technically different data privacy practices.?
Data Masking and Data Encryption – Fundamental Differences
An encryption function serves as a data masking function at the structured data field level. Nevertheless, both can help with GDPR and CCPA regulatory compliances and privacy-related uses like safeguarding big data analytics.?Data encryption is widely used to protect files on a local system, other computer networks, a cloud, or when data is transmitted over the internet. However, the fundamental difference is that encryption generally applies to resting or moving data, not during long-term data storage or transfers. Typically, an algorithm is used to encrypt the data and make it unreadable for others except those with the secret key to decrypt it into readable format.
On the other hand, data masking hides data elements that certain users should not see. It obscures sensitive information and creates a replica of that data so that the complexity remains intact.?Data masking may not necessarily be needed for all the data of a record because it can be done at a fine-grained level according to usage and user access. Since the masked data cannot be referenced to the original information, it renders the data useless for those who might expose data intentionally or unintentionally. Masking data is primarily about creating alternate versions of data that are difficult to identify or reverse engineer, thereby protecting sensitive data.?
Conclusion
For many organizations, masking sensitive data is an essential process for ensuring the safety of sensitive data. Although masked data may still be vulnerable to sophisticated attacks, infrastructure and data sources must be safe from increasingly sophisticated threats. Protecting sensitive data with data masking requires a comprehensive security solution. Based on security policies, the software offers data masking over any data platform to be set based on identity, location, and type of data. By implementing a data governance solution, you can ensure compliance and preserve the benefits of cloud investments, agility, and cost.?Learn how QueryPie provides a comprehensible data masking experience to protect critical data and protects user privacy here [3]
References:?
[1] Article 4 EU GDPR “Definitions,” https://bit.ly/3MGJ9Oa?
[2] Recital 26 EU GDPR, https://bit.ly/3aMYoIb?
[3] QueryPie, https://querypie.com/en