Edge Data Management
?????? Dr. Vivien Dollinger
On-device Database & Data Sync with Vector Search for Mobile, IoT & embedded devices | Decentralized Data | Edge AI | tiny ML | Local AI with efficiency, privacy & sustainability at its core ??
Why is Data Architecture central to successful digitization?
Data follows business needs
The goal of data architecture is to match business requirements with data and system requirements, as well as to manage data flow through the enterprise in accordance with organisational strategy. Good data architectures make data available when and where needed; data follows the needs of the business.
Data modeling is a closely related term. Both data architecture and data modeling seek to bridge business goals with technological capabilities. Data architecture, on the one hand, spans the organization and takes a high-level, holistic approach, whereas data modeling is focused on specific systems or business cases.
In any case, the architecture or modeling should be based on business requirements. Often however, the data architecture is on the highest level already given? – it is centralised / cloud-based. As McKinsey puts it: “Using data effectively requires the right data architecture, built on a foundation of business requirements. However, most companies take a technology-first approach, building major platforms while focusing too little on killer use cases.“
With the dominance of centralized cloud computing comes "inaccessibility of edge data," which means lost business opportunities as well as a barrier to innovation and value creation.? Or in other words: We need an edge over the cloud.
Compute follows data
There are already more connected devices on the planet than people. Their number grows by the day and they all generate and use data. As a result, meaningful data is increasingly being generated and used outside of traditional data centers and cloud environments. This data is sometimes called “edge data”.
Many businesses struggle to access and use this wealth of decentralized edge data, which can come from anywhere, anytime, e.g. machines on factory floors, cars on the street, batteries in airplanes, or just the smartphones of field workers. The possibilities for generating value from this data, if it were accessible, are countless.
Transferring all of this data to the cloud to make it accessible is currently not feasible purely due to bandwidth. There are, however, several more good reasons why the cloud is not a viable option for making this data usable:
Edge Computing is a distributed computing paradigm that places computation and data storage near data sources. Edges are often distributed systems (edge of edges). Edge Computing lacks essential infrastructure software. Unlike the centralised paradigm with its long-established cloud infrastructure, the decentralised / distributed paradigm has little infrastructure software yet to help establish new projects quickly and easily.
Legacy systems, combined with a cloud-centric infrastructure make it difficult to access and capture the value of edge data – and there is no easy option available. Because “distributed systems are hard”, it is nothing that is easily or quickly solved in-house.?
And, while there are always ways to obtain (at least some of) that data, DIY solutions for these types of challenges typically take a long time to implement, do not scale well, are slow, and are difficult to maintain in the long run. In short, such costs do kill edge projects.
There are many use cases that only work based on the decentralised Edge Computing topology, and many that only make sense based on that topology. Some examples:
In all of these examples, the typical use case entails a number of different devices from controlling units to machines to phones to tablets to PCs. The edge consists of many edges that need to exchange data, however the use cases really need at least guaranteed response times, but often speed is also of the essence and so on. If you look closer at the cases, they all unite the same base requirements:?
And all those requirements aim at one thing: Having the right data available when needed where needed.
Anything that can go wrong, will go wrong?
What sounds trivial conceals a number of more complicated challenges arising from the challenge of Data Access in distributed systems (plus industry-specific challenges). The complexity becomes clear in the context of the system environment:
领英推荐
Fun fact: these have arisen from the “8 fallacies of distributed systems” that were defined back in the 90s (e.g. the false assumption that the network is reliable). Bottom line: Anyone working on a distributed system needs to assume “Anything that can go wrong, will go wrong” - and it can be hard to take care of all the individual parts failing at any given moment, let alone any possible combinations thereof.?
The complexity therefore already begins before the solution is implemented, as it necessitates a thoughtful approach to data architecture and the governance of a distributed system. What makes it even more difficult is the fact that distributed systems have multiple components spread across multiple locations, domains, and technologies. Without core infrastructure software for the edge, every project will keep re-inventing the wheel. It take a lot of time, expert knowledge, money, and nerves to implement just the basic mechanisms of such a complex system.
Last not least, such projects face another challenge: It's way easier to get budgets for visible and shiny features than for hidden features "no one ever sees". A “quick PoC” might be set up using the cloud to demonstrate the value of the application and using that data. However, it cannot be used in production later on due to all the reasons given above. So, digitization is stalling and the reality today is that, across verticals, a ton of data is unused and companies struggle to access edge data.
Why should I care?
The power of the edge
Pushing data management capabilities toward edge environments adds value in a variety of ways. Once you can manage edge data on the edge, you can:
Just imagine what else you can do… all of this while using fewer resources and making digitization as environmentally friendly as possible.?
There also is a silver lining in terms of development: new solutions in the Edge Computing space aim to deliver core infrastructure software to developers, allowing them to easily manage edge data on the edge. Edge Database Management Systems (Edge Databases) make decentralized edge data usable outside of data centers and public clouds. The decentralized Edge Computing topology collects, retrieves, stores, distributes, and governs data.
Data architectures impact sustainability - a lot
Already feeling the drastic effect of our actions on the planet, we cannot wait any longer before starting to embed the sustainable mindset into everything we do. Making the investment into carefully chosen data architectures including the Edge Computing topology will prove to give a lot more than just lowering your carbon emissions. Choosing a fitting and efficient data architecture will also pay off economically and have a positive social impact.?
Economical impact
Cloud waste is generally defined as unused or underutilised cloud services (e.g. idle VMs, overprovisioning). And, while estimates vary, between 30 and 50% of cloud spending is wasteful in that sense - wasting not only CO2 but also money. For example, Andreessen Horowitz recently estimated that across the top? 50 public companies, 100B USD are lost in market value due to cloud waste. However, that is only looking at it from one side of the coin: The cloud also encourages wasteful development behaviour and data architecture: A lot of data is needlessly transferred to the cloud and back, while it is really primarily used and useful on the edge. From an unnecessary cloud setup for decentralised cases to ignoring CPU and Memory consumption of the code to inefficient data management (e.g. unnecessary data streaming, repeat transferals, full dumps)... it all increases ongoing costs and CO2 emissions.
Environmental impact
Did you know that cloud data centres already use 2-3% of the world’s electricity? They therefore contribute to 2-3% of the world's CO2 emissions. Additionally, with the ongoing digitization, a rapidly soaring number of devices, and exponentially growing data volumes, the CO2-impact of digitalisation is growing. Therefore, the potential global impact of more sustainable digitization projects is enormous. And the greatest leverage lies in choosing efficient data architectures that avoid unnecessary and wasteful networking and cloud use.?
Social impact
From a societal standpoint, Edge Data Management solutions make it easy for developers to keep data at or close to the source, e.g. on the user’s device. Keeping data on the device increases data ownership, data privacy, and data security. Individual devices can be hacked, but the potential loss is much smaller than when a massive central cloud server is corrupted, where the data of millions of users can easily be compromised. A wider spectrum of data architectures and solution providers will also strengthen the ecosystem, empowering more innovations and greater independence from hyperscalers.
All in all: Sustainable digitization needs an edge