How GDPR can transform the AI industry
From 25 May 2018 the General Data Protection Regulation (GDPR) will be in force across the EU. One of its unintended consequences could be a radical transformation of the AI landscape. The main impact of GDPR will not be a drastic slowing down of developments in the data-hungry deep-learning AI, as some observers have feared; Instead, GDPR could vastly speed up development.
This doesn’t happen automatically, however. Someone could create a big business making this transformation happen. Or open source activists could take up the challenge. In my forthcoming contribution to the FTA 2018, Future in the Making -conference in Brussels, I suggest the latter possibility.
The idea is simple. One of the biggest problems for AI start-ups and researchers is lack of data. The currently popular neural AI architectures need inordinate amounts of data for learning. In most cases, the data need to be labeled by humans. When such large datasets are available, it is a relatively trivial task to feed them to a machine learning system, and let the data work on the system until it starts to do things it is aimed to do.
Today and in the near future, almost all economically important neural AI systems are such datavores. As a result, those who have access to data dominate the business. Because of this, Google can freely share its TensorFlow neural AI and machine learning platform without worrying that it loses any of its competitive advantage. The more people use TensorFlow, the bigger pool of competent AI engineers Google has to choose from. And while learning the art of AI, they will recognize that normal computers are not enough for teaching their neural networks, and figure out that they need to rent or buy the required compute power from Google or Amazon.
This business model is possible because Google-scale data is a natural monopoly. With enough data, you can create services that people use more. And whenever they use these services, they generate more data.
Now GDPR can radically change this dynamic. It requires that people have free access to personal data they generate. Moreover, GDPR requires that these data have to be provided in a portable format.
To transform the AI landscape, a couple of interventions are needed. First, these data have to be cleaned from elements that third parties could use to associate data with real persons. The data need to be anonymized or pseudonymized. This is already part of the GDPR. Second, the portable representations have to be translated and aggregated into forms that are usable for machine learning. GDPR portability requires that data must be provided by the controller of data in a structured and commonly used standard electronic format. What remains to be done is to convert these data structures into data for learning. This is the type of thing that information systems people and web service developers do every day.
And then the resulting data need to be put on public domain.
Such would be the data environment where AI researchers and application developers would thrive. There would be less need for regulating natural data monopolies. Data availability would level the playing field.
In my FTA conference contribution, one policy recommendation is that we do ”MyData for AI.” MyData, of course, comes from MyData. Instead of relying on “data controllers” such as Google and Facebook, the data can be submitted to an open data infrastructure by the “data subjects” that generate the data, based on their own choices. If you like to submit your movement data to AI researchers and start-ups that try and improve public transport systems, for example, you can accept their data profiles, and your browser and device can contribute the required data in pseudonymized format for the betterment of humanity.
I called the back-end the “EU open data infrastructure for anonymized learning.” It would need some basic server-side processing to accept incoming data, and load it to a repository in the infrastructure.
This is all doable. Some standardization, some run-of-the-mill data mangling. And a bit of social innovation. And it would radically transform the landscape of deep learning.
Professor Emeritus Free University of Brussels; director Frontiers Planet Prize; Editor In Chief Frontiers Policy Lab
6 年we call this the European Open Science Cloud!