Governance Frameworks - Data Sharing Taskforce Update #10
Ian Oppermann
Commonwealth Data Standards Chair, Co-founder ServiceGen, Industry Professor, UTS
Much of the discussion around data sharing has focussed on trust and context. There is a strong requirement of trust within any data sharing agreement. Either the parties sharing the data must directly trust each other, or trust another entity whose job it is to protect each of the counterparties.
Technology can help to ensure that minimum thresholds are met or to control access, but ultimately the use of these tools and the appropriate handling of data is managed by a governance framework. Part of the role of a governance framework is to provide guidance to practitioners as to what to do, and then also the tools with which to do it.
With the discussion of the risk frameworks such as the “Safe” Data Sharing frameworks in Update #9 , it becomes clear that judgement of appropriate use or appropriate outcomes is required at multiple stages in the use of data: as a project is initiated, as data is gathered, as results are generated, and as results are released. In a project which involves an aspect of discovery, the conclusion that results will be “safe” to release cannot be made in most cases. Rather, an iterative review of “Risk” is required at different stages.
The Framing Questions for the Taskforce
- What international standards or guidelines exist for data governance?
- Can we develop nationally accepted guidelines for different data types?
- Under what conditions can data with different levels of Personal Information Factor (PIF) be accessed, processed, and the results released?
1. Existing Standards Driven Frameworks
A standard protocol for defining requests and establishing data governance would improve the confidence and efficiency associated with data sharing projects, however the fundamental uncertainty as to the presence of personal information in sets of data sets highlights the limitations of most existing governance frameworks. The inability of human judgment to determine “reasonable” likelihood of re-identification when faced with sets of large complex data limits the ability to appropriately apply the regulatory test.
1.1 ISO Standard 38505-1
In December 2015, Alison Holt, published a framework for data sharing in the form of a Voluntary Code, based on the developing ISO standards for the Governance of Data[1]. The Code takes three areas from the data accountability map in the developing ISO standard 38505-1; namely Collect, Store, Distribute, and applies the aspects of Value, Risk and Constraint to provide seven maxims for sharing data. To assist with adoption and compliance, the Code provides references to best practice and examples.
With the release in 2017 of the ISO/IEC 38505-1:2017 standard, there are now internationally acknowledged guiding principles for the acceptable use of data within organizations. The standard is meant to apply to the governance of the current and future use of data that is created, collected, stored, or controlled by information Technology systems, and impact the management processes and decisions relating to data.
The challenge with both the Voluntary Code and the ISO/IEC standard is that the basis is fundamentally Information Technology governance rather than the challenges explored by this Taskforce. The Code and the Standard to not explore “value”, the framework for “reasonable”, service types based on data usage or a risk framework as discussed in earlier Updates . Consequently, more work remains to be done.
The ISO/IEC 38505-1:2017 standard provides guiding principles for members of governing bodies of organizations (which can comprise owners, directors, partners, executive managers, or similar) on the effective, efficient, and acceptable use of data within their organizations by
- applying the governance principles and model of ISO/IEC 38500 to the governance of data,
- assuring stakeholders that, if the principles and practices proposed by this document are followed, they can have confidence in the organization's governance of data,
- informing and guiding governing bodies in the use and protection of data in their organization, and
- establishing a vocabulary for the governance of data.
While the standard focuses on the governance of data and its use within an organization, guidance on the implementation arrangement for the effective governance of IT in general is found in ISO/IEC/TS 38501. The constructs in ISO/IEC/TS 38501 can help to identify internal and external factors relating to the governance of IT and help to define beneficial outcomes and identify evidence of success.
1.2 European Union - General Data Protection Regulation
In April 2017, the European Union (EU) produced Guidelines on Data Protection Impact Assessment (DPIA) in support of Article 29 of the EU’s General Data Protection Regulation (GDPR). DPIA’s determine whether processing is “likely to result in a high risk”. A single DPIA may address either a single data processing operation or multiple processing operations if they are similar in terms of risk, scope, context, and purpose.
The GDPR does not require a DPIA to be carried out for every processing operation which may result in risks for the rights and freedoms of natural persons. Performing a DPIA is only mandatory where a processing is “likely to result in a high risk to the rights and freedoms of natural persons”. The guidelines particularly highlight the need for a DPIA when new data processing technology is being introduced.
The penalties for non-compliance in fulfilling DPIA requirements are significant. Violations can result in fines of up to 10 million euros or up to 2% of the organization’s total worldwide annual turnover for the preceding financial year.
2. Evolutionary Governance Models
One of the fundamental principles underpinning the challenge of data sharing is addressing the challenge of value, risk and trust in data sharing. This can change as a data analysis (the simplest case being data sharing) project develops through the major phases of the "Safe" model (See Figure 1):
- Project scoping (including identification of people),
- Data collection, organisation and curation,
- Data analysis,
- Results interpretation, and
- Release of results.
Figure 1. "5 Safe's Framework with Quantified Overlay
As each of these phases progresses, the “value” of the outcomes increases, and the potential risk may also increase. The “value” versus risk trajectory a project follows depends on the factors considered throughout this paper and may be mitigated by the approaches discussed in earlier updates.
An important consideration is that projects which involve any element of discovery needs periodic review depending on the level of risk which is assessed at each of the major project phases. Identification of the impact on privacy or the ethical considerations of a project will depend on what is identified and this may not be known at the outset.
A more flexible approach to data analysis projects may allow light touch up-front assessment of privacy impact, people and technology, and increase the frequency or intensity of these assessments as the project continues.
A summary of possible guidelines is given in Figure 2. Figure 3 attempts to map the major data analysis project phases to the risk mitigation focus for each dimension in the “Safe’s” model. The benefit of a multistage assessment for privacy and ethics is that it is no longer necessary to preconceive at the outset of the project all of the issues or risks which may arise during analysis.
Figure 2. Ethics, Privacy Impact, Technology, and People Assessments for Different Risk Levels
Figure 3. Mapping to the Safe's Framework
3. Conclusions
Underpinning the transformation to a smarter, truly digital economy is the ability to share data beyond the boundaries of an organisation, company, or government agency. Future smart services for homes, factories, cities, and governments rely on sharing of data between individuals, organisations, and governments. The ability to create locally optimised, individually personalised services, depends on sharing of ever more personal information in the form of preferences, context, and usage patterns.
Beyond the technical challenges, data sharing comes with a range of legal obligations, privacy considerations, data security requirements, and concerns about unintended consequences of data sharing. These factors are highly dependent on the question of whether personal information is present in sets of data sets.
A fundamental challenge to answering this question is that there is no way to unambiguously determine if personal information is present in linked data. Even if an unambiguous test was possible for a given data set, the practical reality is also that data sharing does not occur in a vacuum. In almost any imaginable environment, aggregated data can be linked with data from other sources and so decomposed to a more personal level. The ability to increase the level of personally information factor is limited only by the determination and ability to link extraneous data to the set which has been shared.
Aggregation is often used as a means to protect individual identity and different levels of aggregation are used by different organisations depending on a perceived value of risk associated with the data to be shared. Aggregation has been shown to be a very weak form of protection, however the implications of the blunt instrument of aggregation can be profound when thinking of the use cases which come in and out of scope depending on the level of aggregation used.
The technologies discussed in these updates – determining minimum cohort size, differential privacy, homomorphic encryption, and privacy preserving linkage – all address concerns associated with re-identification of individuals from linked data sets. The space is moving rapidly, and has the potential to alleviate privacy and data security concerns in areas as diverse as health care to cities smart without disclosing our personal data.
The power of computational data analytics and the ability of new techniques to address expressed concerns about privacy actually surfaces a newer and bigger ethical concern. The privacy preserving computational techniques enable applications that were not possible when privacy legislation was framed and when the concept of privacy was considered in a joined-up digital economy. The unease that some privacy advocates feel about new personalised services is not readily addressed by the discussions of minimum cohort size or homomorphic encryption. The question best describes these concerns: Just because we can, should we?
The irresistible digitisation of our lives coupled with innovative application of analytics have led to astonishing changes in the way we understand the world, the services we create and the level of intimacy companies have with customers.
The challenge to address head on is identifying the sources of this unease at their most fundamental level, develop practical frameworks which allow the creation of value and yet preserve our privacy, and then adapt these frameworks for jurisdictions in Australia.
The higher order challenge is to reframe the national conversation on data sharing to be around the service created from data and the rights and obligations of people creating, delivering, and using these services. The prize is the opportunity to create benefit for Australian Industry, increased efficiency of government, greater decision-making transparency for the citizens of Australia, while still protecting the rights and the sensitive, personal information associated with each of us as individuals.
4. Recommendations
Recommendation 1: Clarification of existing legal frameworks around privacy needs to include quantified descriptions of acceptable levels of risk in ways which are meaningful for modern data analytics.
Recommendation 2: Development of a framework with supports anonymization of data which in turn facilitates sharing. New technologies – determining minimum cohort size, differential privacy, homomorphic encryption, and privacy preserving linkage – all address concerns associated with re-identification of individuals from linked data sets, and yet all are at relatively early stages of development. Maturing these technologies by encouraging pilot projects and safe trials would benefit all jurisdictions.
Recommendation 3: Development of a test for the existence of Personally Identifiable Data.
Recommendation 4: Establish agreed standards for minimum cohort size based on data type. In order to protect individual privacy and to acknowledge concerns about “likely” or “reasonably” re-identification, minimum cohort sizes should be agreed and communicated for different levels of data value. This would help data joining and minimise challenges around use of widely varying levels of aggregation.
Recommendation 5: Agreed standards for Obfuscation / Perturbation: As a complementary Recommendation to 4, standards should be agreed for obfuscation and perturbation. This can not only help provide confidence that data has been robustly de-identified, it can also help with the creation of minimum cohort sizes.
[1] Available online https://blogs.oii.ox.ac.uk/policy/new-voluntary-code-guidance-for-sharing-data-between-organisations/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+oiiblogs+%28Oxford+Internet+Institute+-+Blogs%29
Partner, Pinsent Masons TMT, Australia
7 年Very timely Peter. Thanks. This can be used for Big Data projects too.
Conference Content Producer
7 年Thanks for sharing
35 years Fin Tech roles, now working facilitating Japan TSE electronic trading and other Asian markets. I’m happy to share that I’m starting a new challenge as Director at Mossman Sugar Mill Pty Ltd!
7 年I could see this being good for Mifid 2, and covering the bases of where similar yet unknown rules take us next
Senior Principal Architect covering APJ region at Dynatrace | Investor | Mentor (x-Elastic; x-IBM, x-CSIRO, x-Manjrasoft, x-UniMelb, x-Apex)
7 年Great read, thanks for sharing. (Our IEEE Transactions paper Surya Nepal Xuyun Zhang Jinjun Chen)