A CEO’s Guide: The Brave New World of Data Privacy and Accountability
Venkata Pingali
Scribble Data | AI for Financial Services | Co-Founder & CEO | Hiring!
The compliance landscape involving data is significantly changing in 2020, and it is necessary to understand these changes as soon as possible so you can chart your path, and that of your organization’s, over the next few years.
Emerging mega-trends in the world of data
Regulations are coming. Personal data protection laws such as GDPR(EU), CCPA(US) and PDP (India) have been, or are about to be, passed in almost all major geographies. This is just the beginning, because data has been assessed to be a strategic asset worldwide, and countries would like to regulate data to achieve their economic, social and other objectives. GDPR itself is being re-assessed for its effectiveness today. We can expect regulations to become more specific and loopholes to get tighter. This rapid, simultaneous progress in regulatory oversight on data across geographies is happening via the cross-pollination of ideas among regulators.
No data science without accountability. Algorithmic accountability is being discussed across the world today. Transparency and explainability requirements are already being baked into the law. We expect that FDA-like institutions and pharma-like safety laws will be mandated cross-domain. These mandates will be broad-based given the number and diversity of data applications. Compliance is not enough. Netflix’s Social Dilemma, concerns around facial recognition, and the political machinations around Tik-Tok are examples of how data ownership, privacy, and accountability issues are being discussed around the dinner table, resulting in the education of a large number of consumers. The battle will not be about whether consumers should share data with companies because for now, that answer will be a reluctant and judiciously-considered “yes”. The new battle lines will be drawn along WHICH companies can demonstrate responsible handling of data, internally (employees, shareholders) and externally (end-consumers, business partners, regulators).
A shift in the competitive landscape. The last few years have seen companies that hoard specific classes of data and develop applications using this data. Their valuation has been driven by speculation (possible further applications), and the ownership of said data. Regulators and lawmakers are looking at the question of who owns the data. If they determine that data to be a sovereign resource or owned by the individual, and not a private resource, then the companies turn into custodians and not owners of the data. If this comes to pass, the competitive advantage will shift from the collection of data to the use of data. Incentives will shift from investing in data moats to investing in data refineries that process, enrich, and apply raw data.
The changing economics of data science. Production use of data science (i.e. where data science is on the frontlines, making or aiding business decisions with impact on the top or bottom line) requires that ML models work reliably and in a real-world context - not just on training data. The demonstration is not only to internal stakeholders such as the board and CXOs but also to regulators and customers. The definition of a model “working” is now being expanded beyond how accurate or quick it is, to include, among others, ideas such as fairness, transparency and explainability. The result of this will be that the emphasis in the last few years towards ever more complex models will be reversed. Simpler, maintainable, explainable and operationally efficient models will become the trend.
What Should You Do?
Be a responsible data company. The concerns around the use of data and regulatory interest in ensuring a safe data ecosystem are real. Data is here to stay, and there is immense value to be created out of it. It makes sense to adopt the mindset, values, processes, and tools designed to keep all stakeholders’ trust and confidence at all times, to ensure that you stay in business, and can credibly signal this to your customers as well. The reputational and legal costs of non-compliance can be high. Consumers too will likely start paying with their dollars for horses that come from high-integrity stables.
Focus on creating value through your data, rather than safeguarding it. In the last few years, companies have built complex data systems with high engineering costs - not just in the building, but also in their operation and maintenance. This has been driven by the desire to scale, especially on the data collection side. “More data” was the axiom. But now, this axiom comes with two new risks - (a) the risk of having to share the fruits of data collection with third parties, and (b) the risk of high privacy management costs if consumers invoke portability, erasure, or other such requests. Companies would be well-advised to take a balanced approach that emphasizes (a) data consumption and applications, as much as data collection, and (b) quality and incremental value of data, rather than quantity.
Build for transparency and trust. GDPR-like laws require declarations of intent, usage of data, and consent. But more than that, one should assume that consumers, regulators, and employees will ask questions about the defensibility of data applications. Therefore, applications have to be built from the ground up while assuming that someone is always looking over your shoulder. Shoot to exceed compliance requirements, and implement periodic data (engineering and science) audit processes to examine compliance, because there will be moving pieces (and people) within the organization that will continue to act as weak links.
Partner with responsible data practitioners. When customers interact with, say, AirBnB and there is a privacy violation, they blame AirBnB. They don't distinguish between employees, partners, and vendors. Given the complexity of modern data platforms, it is inevitable that there will be many individuals involved in the process, and risks propagate through the system. Partner with or hire individuals and companies that share your desire to be a responsible data company and who take your obligations seriously.
What Happens if Companies Don't Adapt
Face continuous regulatory risks. The risk of punitive fines in the face of data privacy violations and non-compliance is an ever-present spectre with GDPR, CCPA and other laws. But more importantly, it brings the wrong kind of attention to the company that gives pause to your stakeholders. Here’s an example: if an analyst, engineer, or even an automated system sends an email to a customer listed in some spreadsheet, and that customer happens to have asked for their information to be deleted (“forget me”), it will be a violation of GDPR. The discovery process will place in public view the systems, processes, and attitude of the company towards customer data. The intent to adhere to the letter and spirit of the law, or lack thereof, will be clear.
Repel the best data talent. When the fairness, equity, and transparency of ML models, analyses, or data products are questioned, employees who build these systems will be asked to defend them due to the technical nature of the work. This potential “exposure” will make employees cautious about which companies they choose to work for. They will look for legal, engineering, and process excellence so they can be comfortable doing their job. The best data talent will also look for the best data process companies to amplify their resumes, rather than risk playing mechanic at an organization where a lot of the data processes needs fixing.
Lose business. If data sharing laws come into effect, the value of the company will be driven by data enrichment/preparation and downstream models, rather than by the raw data they own. This could significantly alter the economics of the company, and potentially put it out of business, if the data moat was their sole competitive advantage.
Final Thoughts
In summary, the writing on the wall leaves no room for thinking about “whether” data compliance trends will converge to strict regulations with a need to completely rethink data-driven competitive advantages. Rather, it is a question of “when”. Customers, as well as the best talent, will flow naturally toward companies who can demonstrate that they are responsible custodians of data.
Scribble Data is a data compliance-focused ML Feature Store company, out of Toronto and Bangalore. Our Feature Store, Enrich, helps data science teams accelerate the journeys of their ML models to production, keeping data protection and compliance in mind.
Photo Credits: Bobby Burch, unsplash