Legal reasons why Synthetic Data is better that real data

Legal reasons why Synthetic Data is better that real data


As I wrote before, generating Synthetic Data really helps to solve many technical problems (data hungriness, poor quality, events yet to occur).

But there are also legal reasons why it is convenient to use synthetic data.

Regulations allow/disallow the usage of data on AI solutions. In any case, if an AI solution needs sensitive data, a company should budget compliance costs (documentations, authorizations, data governance) and likely increase its cybersecurity costs.

Needless to say, an AI project will likely progress slower in this scenario as just to set up the right data structure in place would take time and effort.

These costs are non-existent if you use Synthetic data. Yes, there is the cost of ‘generating’ the synthetic data but this should be a fraction of the cost of the legal compliance.

There is more.

If you use sensitive real data, chances are part of this data come from different sources and even third party sources. Integrating different sources is always a problem. Even more so if data is sensitive.?

Authorising sensitive data aggregation from different sources can be cumbersome and it requires additional legal agreements among the different parties involved.

Synthetic data is an obvious solution in this case. And yes there are technologies that should allow creating AI models without disclosing sensitive data (federated learning for example). But the set up cost of these technologies is as much as the legal structure to use the real data.

Frankly, it is likely that an AI project that requires aggregation from different sensitive data sources never starts.?

Again, synthetic data allows to unlock projects that otherwise would be even considered possible.

And that’s something.

#ai #artificialintelligence #innovation #data #business

要查看或添加评论,请登录

社区洞察

其他会员也浏览了