My kingdom for digital data
Source: Dr Stephen Harwood

My kingdom for digital data

Dr. Stephen Harwood (TechnoForeSight / University of Edinburgh, Business School)?? Twitter: @drsharwood


The new mantra is that organisations need to be ‘data driven’. But what does this mean?

A starting point is to establish what we mean by data. The Oxford English Dictionary defines data as items of information. In other words, data informs. But what is it that informs?

What is this ‘data’ thing?

If we look to the natural world, a landscape can inform a geologist about the prominent geological structures of that landscape. The geologist selects specific data and ignores other data. Likewise, the farmer will select other data to understand issues relating to farmed land. This implies that data exists all around us in the natural world. Moreover, data is fundamental to our everyday existence as humans. We select specific data from all the data present because of its relevance to what we are interested in. Some data has relevance to a specific usage, whilst other data is ignored. This indicates that data only has meaning within the context that gives data its meaning.?I suggest that this is an important issue regarding the notion of being ‘data driven’.

No alt text provided for this image

If we think about data in an organisational context, then this raises another issue. Organisations, since the earliest of days have used data to keep track of what is going on. For example, Sumerian and neo-Sumarian tablets from over 4,000 ago recorded trade and inventory. DuPont developed a set of indices in the 1920s that provided an integrated accounting framework linking sales, expenses, assets and equity, thereby allowing the Return on Investment (ROI) and Return on Equity (ROE) to be calculated. ERP systems have their origin in inventory systems from the 1950s with the emergence of the earliest commercial computers, thereby allowing the data to inform about all aspects of the business. So, it is unclear why there is the current buzz about being ‘data driven’.

Source: The Metropolitan Museum of Art, Purchase, by exchange, 1911, 11.217.20 LINK

Source: The Metropolitan Museum of Art, Purchase, by exchange, 1911, 11.217.20

What is new about data?

The distinction between the data forms previously mentioned and those that are inferred by today’s rhetoric of being ‘data driven’, is perhaps related to the rapid development and diffusion of emerging digital technologies and their transformative, if not disruptive effect. This has led to the exponential growth in digital data collection (e.g. social media posts, IoT devices) over the last decade and coincides with advances in the ability to transmit (e.g. broadband, 5G), store (e.g. data centres) and analyse (analytics, AI) data.

Moreover, the opportunities offered by the newer forms of technologies that are associated with data, are realising the visions of those promoting the Decision Support Systems (DSS) of the 1970s, Executive Information Systems (EIS) of the 1980s and Business Intelligence systems of the 1990s. Today, data is converted into real time information displayed in purpose serving dashboards on personal mobile devices, with this all taking place in the digital domain. Consequently, it is more correct to be talking about ‘digitally data driven’ organisations’.

In terms of the proliferation of data, in 2012, it was reported in IBM Systems Journal that 2.5 quintillion bytes or exabytes of data were being produced daily, with 90% of the data having been produced in the previous two years. Statista has reported that there were 6.6 zettabytes (6,600 exabytes) produced in 2012, with this forecast to rise to 180 zettabytes by 2025. To put this into scale, to create 1 zettabyte capacity, it would require 1,000,000,000,000 one terabyte hard drives, with every person in the world each having around 140 one terabyte hard drives.?

Associated with the newer forms of technologies amplifying the capability for continuous data collection, near instantaneous transmission, unlimited storage capacity and analysis, are new forms of data sets. The natural world landscape with all its data possibilities, which are latent, embedded in the objects of nature and await recognition, has been superseded by a digital landscape, whereby a continuous stream of digital items about everything and anything are collected to populate the ‘data universe’, comprising an infinite number of ‘data lakes ’ of raw data. This metamorphism of the traditional structured database with its predefined fields into an unstructured heterogeneous data universe has created its own challenges. Poorly designed data lakes, resulting in inadequate contextual metadata, renders the data inaccessible and hence unusable, which creates a ‘data swamp ’. In contrast, is data collected, but which is never used, thus creating a ‘data graveyard ’. Moreover, a Forrester survey in 2021 suggests that data is being collected faster than it can be analysed, which is overwhelming data teams. Irrespective, when used, it is claimed to be predominantly within a 90 day window. Further, Statista reveals that only a few percent of data is kept, drawing upon the example of data created and used in 2020 that was transferred into 2021. This suggests that whilst the majority of data is deleted, there is still much stored data that is either unusable or not reused, which, given the growth in data being generated, raises the question of what to do with this. The notion of open data provides the opportunity for reuse. This is an initiative whereby data held by the public sector is made available for anyone to use, and invites the question of whether this can be extended to other sectors.

One specific form of data set that invites attention is due to its relevance to every person in the world. This is the notion of a ‘data body’. So long as we each have a unique identifier then all the data that exists out there about us can, in principle, be collated and assigned to this identifier, creating a detailed digital profile of each of us – our ‘data body’. This data can be collected, irrespective of our knowledge or consent, though the growing presence of all-pervading sensing, monitoring and surveillance technologies. For example, how ubiquitous are outward facing security cameras on domestic properties? More importantly, who owns the video and audio data generated and can this be used and shared without the consent of those caught in these cameras? Alternatively, wearables are an example of how sensors offer the potential for personalised preventative health care, though this raises similar issues.

This contrasts with the ‘digital twin’, the digital equivalence of a material something, such as a product, process, building, or settlement, which may be used in a design mode and involve some form of simulation. Indeed, this may well be the building block of the metaverse, whereby a countless number of digital twins are compiled to create a digital space that comprises a complex virtual domain that permits human immersion.

The significance of digital data is perhaps placed into perspective by considering the volume (quantity), and variety (multiple sources which are increasingly including unstructured data) of data generated at the velocity (speed) at which it is generated and handled. However, there are issues with the veracity or truthfulness of data, with the phrase ‘alternative facts’ entering our language and misinformation proliferating. This is aside from data quality issues arising from errors in records. Moreover, the affordances of this data and its mobility has established value (worth) in data sets, which has led to data being tradable. However, this raises concerns about the legitimacy of use by those acquiring these data sets, the privacy of those with a presence in these data sets and whether governance mechanisms adequately protect stakeholders. Perhaps overlooked is the invisible nature of data in our everyday as a fundamental resource, without which digital technology cannot function. However, this invisibility is made visible when there is a problem, bringing to the fore the issues of resilience and (cyber)security.

The notion of being ‘data driven’ thus invites a number of considerations or concerns relating to the ‘data’.

Concerns

Today, digital data, especially that relating to individuals, can be collected, by default, on an automated continuous basis, then accessed, tampered with, faked, copied, transmitted, mistranslated, stolen and traded and all without a person’s knowledge or consent, with potentially undesirable consequences for that person through its use. Further, there are also the issues of who owns the data, who is doing what with the data and what rights a person has over data relating to self.

This raises the concerns about stakeholder commitment to, firstly, protecting people’s welfare from harm due to data collection and misuse, as well as, secondly, to respecting personal privacy. There might be a need for data to enhance strategic, operational and technological performance and deal with dysfunction. Indeed, the many forms of data collection, such as smartphone monitoring, workplace biometrics, public space surveillance can be justified for the reasons of performance and security management, with involuntary consent being necessary against the alternative of denial of employment, service or access. Moreover, there tends to be a lack of transparency about the nature of the embodiment of digital data in hosting systems. Trust becomes redundant, with a blind acceptance of the status quo and the hope that nothing adverse will happen. This has more serious implications for the potential erosion of a person’s freedom and other human rights. Ultimately, the question is how the quest to satisfy the need for more and more data about everything impacts, positively or negatively, the wellbeing of each and every person and whether this enhances inclusivity and diversity, or privileges some and disadvantages others.

However, in addition to the above social concerns, there are also environmental considerations. The focuses attention upon the physical devices used to handle all this (unused?) data and their end-of-life contribution to eWaste . One significance of eWaste is the toxicity of the materials used in electronics devices, which when dumped in landfills or waste-ground, has serious impacts upon both people’s health and the environment, especially those communities whose livelihood is dependent upon scavenging through this eWaste.

There are also the energy demands, with it being estimated that, whilst globally, datacentres consume around 1% of global electricity, with technological efficiencies offsetting data growth, thus perhaps not resulting in much change, at a national level, a datacentre’s consumption can rise to 30% of a nation’s electricity usage. Further, attention is also drawn to how the heat generated by datacentres can be more sustainably utilised or whether it is wastefully transferred into the atmosphere.

The data, its use and the associated physical infrastructure are each problematic.

Whither Governance?

It leads to the question of how data is managed and the norms and values that guide this. It raises the distinction between legislation or self-governance.

Should data and its handling be guided solely by legislation, with associated sanctions. It may provide an even playing field, but it is unlikely to have the requisite variety to catch all possibilities of abuse. The promise of ‘it ensures’ is undermined by the question of how compliance is ensured and whether this relies upon enforcement. Alternatively, there is the reliance upon a commitment to self-governance, this underpinned by ethical values, though this assumes that everyone upholds these values, which is unrealistic. Legislation implies a culture of compliance, whilst self-governance extends beyond compliance to a higher level of standards and a culture of values, but how many hold these values? Perhaps the baseline is legislation.

However, it is unclear what is implied by ‘self-governance’. Governance can be viewed as the ability to regulate whatever it is that is being governed. Governance embraces such notions as vision, values and policy (direction), environmental scanning and future thinking (intelligence), monitoring (performance metrics, auditing and tracking), synchronisation (co-ordination, especially with regard to distributed collaboration and sharing of data) and control, though not necessarily control by edict and expected compliance (legislation, policy), but control by agreement (self-governance and shared values). Implicit is the need for security and accountability. It requires clear definition of roles and responsibilities. All this translates into practices, which, at the pragmatic level, include such issues as the data source integration, data cataloguing, data quality, data relationships, data aggregation, data transformation and analytics as well as self-service access for use.

Governance raises the question about whether there is a culture conducive to good data related practices. Culture is defined as the way we do things around here as opposed to how you do things over there. In other words, is about how we do all the above. This has implications for how culture is developed by those leading the organisational context of the ‘data system’, whether this be a clearly bounded organisational entity or the organisation of different entities through a shared cause. This highlights the importance of leadership to create not only operational conditions but also cultural conditions that together are conducive so that everyone can succeed.

Moreover, governance is not about being centralised but it is also not about being decentralised. Instead, governance is distributed throughout the different levels of whatever is defined as the system being governed (e.g. servers, data centres, enterprise applications). Data governance is multi-level, with what is appropriate at more local levels accommodating local conditions. It implies that (cyber)security is distributed throughout the system to protect both the integrity of the data and anyone existing in the data.

Opportunity

Data has a long history of serving the interests of those seeking economic gain. However, the digital nature of data requires a new paradigm which focuses attention upon an additional set of what are the ‘big’ challenges. Firstly, is to prevent harm to the personal wellbeing of anyone, such as will arise due to identity theft, financial loss, intimidation, lack of privacy or any other inappropriate data use., intended or unintended. Secondly, to enhance society in terms of the fundamental needs of food and fresh water security, healthcare, education and energy provision. Thirdly, to protect and regenerate the environment, especially the challenges of climate change, biodiversity loss and deforestation.

This redirects attention to sustainability. It raises the question of how data can be collected and used to support not only business needs and resilience, but also sustainability. Here, sustainability is viewed in terms of the synergies arising at the intersection of its three pillars: economic, social and environmental. Its goals manifest in the SDGs .

No alt text provided for this image

With the exponential expansion in the volume of data stored, amplification of smart analytical capability and growing knowledge and expertise that relate to data assets, can better use can be made of these for pursuing sustainable impacts and both global and local challenges? Indeed, given the shaping significance of the digital giants upon the everyday, with their data assets and capabilities, then it can be asked how they translate their guiding values into measurable action with impact. This invites partnerships between digital companies and those seeking solutions (e.g. NGOs and social enterprises) but who lack technical knowledge, resources and funds. It is already happening, but can this be scaled up to a more prevalent level? It is not data for the sake of data, but what data is both required as well as exists that is useful and can, with appropriate attention enlighten and fuel social and environmental impact.

Further, measurements allow the effects of change and impact to be assessed. The notion of the ‘triple bottom line’, whilst challenging in practice, offers an opportunity for the reporting of the three pillars on sustainability in, for example, annual reports and affords transparency about contribution. Fundamentally, it can be expected that any investment and innovation that is impactful in sustainability, has value in the return it provides to these businesses. Moreover, as younger generations take greater interest in social and environmental matters and align themselves to businesses whose values align with theirs, then those businesses that are listening are more likely to benefit. Thus, it is not just the ‘triple bottom line’ that provides a measure of impact, but also the top line in a form such as Return on Sustainable Impact (ROSI).

Where does this leave us?

This notion of being data driven appears to be about being able to effectively harness digital data. Data is a fuel that can energise action. As a fuel, it comes in many forms and is not exclusive to the digital domain. However, its presence in the digital domain is overwhelming, with significant impact at the level of nation state upon energy consumption. Moreover, digital data has raised to prominence such concerns as cybersecurity, privacy, human rights and ultimately human wellbeing.

The digital nature of data has disrupted our notion of data. No longer can a letter be written, posted, read then burnt. Today, a letter today is more likely to be an email or some other form of digital item. Once it has been created, it has its own existence in an invisible and poorly understood complex domain, with little possibility of it being burnt, but with every possibility of it being distributed in an endless chain of digital manifestations. Digital data is a form of technology and consequently it invokes all the complexity inherent in how technology in general is handled.

Digital data, as a technology being introduced into a complex domain raises such questions as: What is the strategy and approach to how data application and management is to be improved? What are the skills, knowledge, expertise, management and organisational implications for effective collection, analysis etc. of data? What are the anticipated benefits in terms of improvement in productivity, innovation and entrepreneurial activity? What are the broader security, environmental and societal wellbeing implications and how are the negatives to be mitigated, but more importantly the positive impacts to be enhanced?

To end on a positive note, data in the form of digital data is a fuel, in other words, and like all data, is a fundamental resource. It invisibly supports the everyday. It visibly supports purposeful action without concern about broader impacts. However, there is a third aspect where the emphasis is upon impact for good. At this level, its potential is to energise action, not only for economic benefit, but also for social and environmental good. Should this be a vision for all??

Manohar Lala

Tech Enthusiast| Managing Partner MaMo TechnoLabs|Growth Hacker | Sarcasm Overloaded

1 年

Stephen, thanks for sharing!

回复
Alan H.

Education UK&I @ Fluido | Process Improvement, Strategy, Change Management

2 年

要查看或添加评论,请登录

社区洞察

其他会员也浏览了