Data Synthetic [Series#2: I am Data!]
Data Synthetic [Series#2: I am Data!]

Data Synthetic [Series#2: I am Data!]

Data Synthetic is not a famous term as compared to Data Masking to anonymize data.

Let’s decode it…..

There are many ways to achieve data privacy like data generalization, data swapping, data masking, data encryption etc., all these techniques change original data which of course is not desired by most. All these are done to assure data should not reach in wrong hands.

Data Synthetic is something which will really solve the problem statement like anonymize data before sharing with developers, data science teams for data models etc.

Data Synthetic is about generating new dataset using computer simulations. This dataset contains fake data with right properties or attributes. For example, customer information is required in almost every application or system, but it can’t be shared to protect individual identity. Data Synthetic tools create a dummy dataset for customer with name, email, contact number, family details etc. but nothing related to original data. This way, developers can build the data pipelines and data scientists can build their data models without thinking of breaching any regulatory or compliance clauses.

Following are but not limited to few uses of Synthetic data.

·????????Synthetic data is best fit for Machine Learning (ML).

·????????It can be used to generate a larger set of data from a small dataset for Analytics like what will be prediction if number of similar kind of transactions increased by 500%.

·????????Fulfill compliance requirements to anonymize data.

·????????Synthetic data representing original Prod data can be share with QA (Quality Assurance) team for testing.

·????????Fake data can be parked somewhere to confuse attackers.

Type of Synthetic data

·????????Full Synthetic Data: Referring to original data, whole set of data is replaced.

·????????Partial Synthetic Data: In this approach, only confidential data is replaced.

·????????Hybrid Synthetic Data: This approach, refers original as well as other synthetic data to generate another set of synthetic data. This provides the most privacy capability.

Cheers.


要查看或添加评论,请登录

Mustafa Qizilbash的更多文章

社区洞察

其他会员也浏览了