"The Big Data revolution should be about knowledge security"
Emmanuel Letouzé, PhD
Director & co-Founder, Data-Pop Alliance | Adjunct Faculty at Columbia & Sciences Po | MIT & Harvard Fellow | ex-Marie Curie Fellow, U Pompeu Fabra Barcelona | Co-Founder, OPAL | ex-Fulbright | Political Cartoonist
[Note: I recently found this post I wrote and published in April 2014 for the post2015.org initiative of the Overseas Development Institute, and thought it had retained some relevance. I have edited it marginally].
"In my view, Big Data can fuel a Data and Development Revolution. Much of its appeal stems from its potential — true or false — to find and refine data to yield ‘insights’ about human populations that can power more agile and better targeted policies and programmes. I have at least three issues with this general line of reasoning, and propose instead a “knowledge security” approach to (Big) Data for (Human) Development, inspired by food security.
First, my three issues with the mechanistic "better insights ==> better decisions from well-intended politicians ==> better lives!" are the following.
One, the ‘insights’ approach says nothing about how data will actually be turned into policy; it, largely erroneously as history shows, assumes that bad policies and outcomes result primarily from lack of data or information on the part of decision makers — such that better ‘insights’ about poverty will somehow mechanistically lead to less of it.
Second, what qualifies as an ‘insight’, or, more fundamentally what ‘insights’ are is unclear; the term is used to avoid having to talk about ‘information’ —and even more so, knowledge — both of which have well-defined meanings backed by significant theoretical work.
Third, its reference to fossil fuel (Big data being the “new oil”) overlooks or downplays the negative impacts that the ‘old oil’ has typically had on human development, which led to the development of the ‘resource-curse’ theory — rooted in elite capture, not to mention environmental impacts.
It also overlooks the historical and historiographical lessons of another Revolution — the Green one— and of much of the literature on food security since then — with their central message that defeating hunger and famine is as much a political as a techno-scientific endeavour; as much about press freedom as about fertilizer use.
With this in mind, I wonder(ed): how can the ‘Big Data revolution’ serve human development and avoid the advent of a ‘techno-scientifically induced data curse’ caused by the de-humanization and de-democratization of decision-making processes, i.e. a situation where only a handful can access and analyse data and have the ability to extract and use the resulting ‘insights’? In other words, what is or should — in a normative sense — the ‘Big Data revolution’ be about?
My answer is: knowledge security, inspired by food security.
Knowledge is commonly considered to be the last stage of the data-information-knowledge transformation chain — in much the same way that nutrition is the ultimate goal of the food chain. The parallel between food and data as inputs in processes affecting human populations’ well-being through their bodies and minds is an especially rich one.
And I want to argue that a ‘real’ or desirable Big Data revolution entails and requires putting in place the conditions necessary for societies to enjoy knowledge security, a concept mirrored on that of food security.
According to the United Nations, food security “exists when all people, at all times, have physical and economic access to sufficient, safe and nutritious food that meets their dietary needs and food preferences for an active and healthy life”. Knowledge security is centred on data as inputs and its four pillars or preconditions are the same as those of food security: availability, access, utilization and stability.
What would the pillars of knowledge security look like? In what follows, the only major changes to the description of the official FAO food security framework is the substitution of ‘data’ or ‘knowledge’ for ‘food’, as appropriate, plus a few minor edits left apparent.
Promoting ‘knowledge security’ in the age of Big Data may entail improving:
1. Data availability — i.e. “the availability of sufficient quantities of data of appropriate quality, supplied through domestic production or imports (including data aid)”.
This principle brings out the importance of producing data that meets societal demands and needs. It also stresses how ‘quality’, including representativeness, as characterized in the 2nd Fundamental Principle of Official Statistics, has to remain a central concern in the production of data by official statistical systems — i.e. official statistics.
2. Data access — i.e. “the access by individuals to adequate resources (entitlements) for acquiring appropriate data for a nutritious diet to enhance their knowledge. Entitlements are defined as the set of all commodity bundles over which a person can establish command given the legal, political, economic and social arrangements of the community in which they live”.
This highlights the importance of transparency, user-friendliness and visibility in the presentation of data. For example, as underlined by Enrico Giovannini, knowing that “95% of Google users do not go beyond the first page, it is clear that either institutes of statistics structure their information in such a way as to become easily findable by such algorithms, or their role in the world of information will become marginal.”
3. Data utilization — i.e. “the utilization of data through adequate diet, clean water, sanitation and health care individual and collective processing to reach a state of nutritional well-being knowledge where all physiological information needs are met. This brings out the importance of non-data inputs in knowledge security.”
This critical point stresses the fundamental importance, mentioned by Enrico Giovannini, of “considering how [information] is brought to the final user by the media, so as to satisfy the greatest possible number of individuals …, the extent to which users trust that information (and therefore the institution that produces it), and their capacity to transform data into knowledge (what is defined as statistical literacy)”—to which I prefer the concept of data literacy.
4. Data stability — i.e. “to be knowledge secure, a population, household or individual must have access to adequate data at all times. They should not risk losing access to data as a consequence of sudden shocks (e.g. an economic or climatic crisis) or cyclical events (e.g. seasonal data insecurity). The concept of stability can therefore refer to both the availability and access dimensions of knowledge security.”
This suggest that sustainable legal and policy frameworks are needed to ensure steady and predictable access to some data — aggregated, anonymized — held by corporations, in contrast to the ad hoc way researchers have tended to access CDRs in recent years [Note of 2020: which the Open Algorithms (OPAL) approach aims to address; the OPAL model and others are discussed in this recent piece 'Sharing is Caring: Four Key Requirements for Sustainable Private Data Sharing and Use for Public Good" written with Nuria Oliver].
This knowledge security framework would complement the Fundamental Principles of Official Statistics by sketching the four preconditions of knowledge security in the (Big) Data age:
- Data availability: there are good data available
- Data access: people can access relevant data safely
- Data utilization: people are willing and able to use data
- Data stability: the supply of data is predictable
As with the food security framework, it does not provide concrete guidance as to how these preconditions can be met. But recognizing knowledge security as a central objective and key determinant of a Big Data revolution that would serve human development broadly may be an important first step.
Feel free to share your comments!