Edge Data Management
Edge Data Management - digitising the world sustainably with new data architectures

Edge Data Management

Why is Data Architecture central to successful digitization?

Data follows business needs

The goal of data architecture is to match business requirements with data and system requirements, as well as to manage data flow through the enterprise in accordance with organisational strategy. Good data architectures make data available when and where needed; data follows the needs of the business.

Data modeling is a closely related term. Both data architecture and data modeling seek to bridge business goals with technological capabilities. Data architecture, on the one hand, spans the organization and takes a high-level, holistic approach, whereas data modeling is focused on specific systems or business cases.

In any case, the architecture or modeling should be based on business requirements. Often however, the data architecture is on the highest level already given? – it is centralised / cloud-based. As McKinsey puts it: “Using data effectively requires the right data architecture, built on a foundation of business requirements. However, most companies take a technology-first approach, building major platforms while focusing too little on killer use cases.“

With the dominance of centralized cloud computing comes "inaccessibility of edge data," which means lost business opportunities as well as a barrier to innovation and value creation.? Or in other words: We need an edge over the cloud.

No alt text provided for this image
The edge lacks core infrastructure software

Compute follows data

There are already more connected devices on the planet than people. Their number grows by the day and they all generate and use data. As a result, meaningful data is increasingly being generated and used outside of traditional data centers and cloud environments. This data is sometimes called “edge data”.

Many businesses struggle to access and use this wealth of decentralized edge data, which can come from anywhere, anytime, e.g. machines on factory floors, cars on the street, batteries in airplanes, or just the smartphones of field workers. The possibilities for generating value from this data, if it were accessible, are countless.

Transferring all of this data to the cloud to make it accessible is currently not feasible purely due to bandwidth. There are, however, several more good reasons why the cloud is not a viable option for making this data usable:

  • Networking and cloud costs
  • Data security concerns
  • Data privacy concerns
  • Latency
  • No or intermittent connectivity

Edge Computing is a distributed computing paradigm that places computation and data storage near data sources. Edges are often distributed systems (edge of edges). Edge Computing lacks essential infrastructure software. Unlike the centralised paradigm with its long-established cloud infrastructure, the decentralised / distributed paradigm has little infrastructure software yet to help establish new projects quickly and easily.

Legacy systems, combined with a cloud-centric infrastructure make it difficult to access and capture the value of edge data – and there is no easy option available. Because “distributed systems are hard”, it is nothing that is easily or quickly solved in-house.?

And, while there are always ways to obtain (at least some of) that data, DIY solutions for these types of challenges typically take a long time to implement, do not scale well, are slow, and are difficult to maintain in the long run. In short, such costs do kill edge projects.

There are many use cases that only work based on the decentralised Edge Computing topology, and many that only make sense based on that topology. Some examples:

  • “Smart vehicles”, anything from Connected Cars to Autonomous Driving, need to satisfy high availability and reliability requirements as well as depend on lightning-fast data processing and transmission onboard. A car of course is a distributed system itself and cannot afford the latency and uncertainty a cloud-based approach brings. Intermittent connections can be used to transport some parts of the data to the cloud.
  • Anything in a remote location without (reasonable) Internet access, e.g. Oil & Gas fields, tunnels, entertainment parks really need applications that work without an Internet connection, on the edge. If you look at the oil and gas industry, failures can be disastrous. The many assets on-site therefore need to be carefully monitored, but are typically in a remote location with little to no Internet connectivity. Once a connection is available, some data can be transferred to the cloud. However, high MNO costs and limited bandwidth as well as flaky networks often require frugal choices with regards to data transferal.
  • Smart Manufacturing applications, at least in Europe, typically need to satisfy high data protection and security requirements and there may often be no or no direct connection to the Internet for this reason - and many manufacturers are reluctant to have their data in the cloud. On-site, you can find everything from low-frequency brownfield devices to high-frequency greenfield devices on a factory floor. As a rule, the machine controllers in use are not designed to store or transmit data. They usually lack not only the functionality, but also the resources to support this. The clear separation of machine control and edge data processing unit ensures that there is no risk of unintentional interference with the machine controller.

In all of these examples, the typical use case entails a number of different devices from controlling units to machines to phones to tablets to PCs. The edge consists of many edges that need to exchange data, however the use cases really need at least guaranteed response times, but often speed is also of the essence and so on. If you look closer at the cases, they all unite the same base requirements:?

  • Decentralized devices
  • Quality of Service (QoS) control
  • Fast response times
  • High availability
  • Secure data handling
  • Scalability and Resilience
  • Fault-resilient capabilities

And all those requirements aim at one thing: Having the right data available when needed where needed.

No alt text provided for this image
What it's really all about: Access to the right data at the right time

Anything that can go wrong, will go wrong?

What sounds trivial conceals a number of more complicated challenges arising from the challenge of Data Access in distributed systems (plus industry-specific challenges). The complexity becomes clear in the context of the system environment:

  • The network is unreliable
  • Latency is not zero
  • Bandwidth is finite
  • The network is not secure
  • Topology can change
  • It’s unknown if there is an administrator at all, or how many
  • Transport is costly
  • The network is heterogeneous

Fun fact: these have arisen from the “8 fallacies of distributed systems” that were defined back in the 90s (e.g. the false assumption that the network is reliable). Bottom line: Anyone working on a distributed system needs to assume “Anything that can go wrong, will go wrong” - and it can be hard to take care of all the individual parts failing at any given moment, let alone any possible combinations thereof.?

The complexity therefore already begins before the solution is implemented, as it necessitates a thoughtful approach to data architecture and the governance of a distributed system. What makes it even more difficult is the fact that distributed systems have multiple components spread across multiple locations, domains, and technologies. Without core infrastructure software for the edge, every project will keep re-inventing the wheel. It take a lot of time, expert knowledge, money, and nerves to implement just the basic mechanisms of such a complex system.

No alt text provided for this image
Hidden requirements / concealed challenges of Data Access in distributed systems

Last not least, such projects face another challenge: It's way easier to get budgets for visible and shiny features than for hidden features "no one ever sees". A “quick PoC” might be set up using the cloud to demonstrate the value of the application and using that data. However, it cannot be used in production later on due to all the reasons given above. So, digitization is stalling and the reality today is that, across verticals, a ton of data is unused and companies struggle to access edge data.

Why should I care?

The power of the edge

Pushing data management capabilities toward edge environments adds value in a variety of ways. Once you can manage edge data on the edge, you can:

  • act with ludicrous speed and empower real-time use cases. By uniting data and computation on the edge, you get to a completely new level of efficiency and speed.
  • enable smarter devices and make more use of the already available and distributed hardware, including remote management or autonomous behaviour via onboard (edge) data.
  • overcome data inconsistencies, protection, privacy and other data governance issues that arise from siloed or unused edge data, or cloud-centric data architectures.
  • reap the benefits of digitization independent from an Internet connection. Bandwidth costs and scenarios with limited or intermittent connectivity are no longer showstoppers.
  • provide greater fault tolerance and resiliency.

Just imagine what else you can do… all of this while using fewer resources and making digitization as environmentally friendly as possible.?

There also is a silver lining in terms of development: new solutions in the Edge Computing space aim to deliver core infrastructure software to developers, allowing them to easily manage edge data on the edge. Edge Database Management Systems (Edge Databases) make decentralized edge data usable outside of data centers and public clouds. The decentralized Edge Computing topology collects, retrieves, stores, distributes, and governs data.

No alt text provided for this image
Edge Data Management Systems make edge data available when and where needed

Data architectures impact sustainability - a lot

Already feeling the drastic effect of our actions on the planet, we cannot wait any longer before starting to embed the sustainable mindset into everything we do. Making the investment into carefully chosen data architectures including the Edge Computing topology will prove to give a lot more than just lowering your carbon emissions. Choosing a fitting and efficient data architecture will also pay off economically and have a positive social impact.?


Economical impact

Cloud waste is generally defined as unused or underutilised cloud services (e.g. idle VMs, overprovisioning). And, while estimates vary, between 30 and 50% of cloud spending is wasteful in that sense - wasting not only CO2 but also money. For example, Andreessen Horowitz recently estimated that across the top? 50 public companies, 100B USD are lost in market value due to cloud waste. However, that is only looking at it from one side of the coin: The cloud also encourages wasteful development behaviour and data architecture: A lot of data is needlessly transferred to the cloud and back, while it is really primarily used and useful on the edge. From an unnecessary cloud setup for decentralised cases to ignoring CPU and Memory consumption of the code to inefficient data management (e.g. unnecessary data streaming, repeat transferals, full dumps)... it all increases ongoing costs and CO2 emissions.


Environmental impact

Did you know that cloud data centres already use 2-3% of the world’s electricity? They therefore contribute to 2-3% of the world's CO2 emissions. Additionally, with the ongoing digitization, a rapidly soaring number of devices, and exponentially growing data volumes, the CO2-impact of digitalisation is growing. Therefore, the potential global impact of more sustainable digitization projects is enormous. And the greatest leverage lies in choosing efficient data architectures that avoid unnecessary and wasteful networking and cloud use.?


Social impact

From a societal standpoint, Edge Data Management solutions make it easy for developers to keep data at or close to the source, e.g. on the user’s device. Keeping data on the device increases data ownership, data privacy, and data security. Individual devices can be hacked, but the potential loss is much smaller than when a massive central cloud server is corrupted, where the data of millions of users can easily be compromised. A wider spectrum of data architectures and solution providers will also strengthen the ecosystem, empowering more innovations and greater independence from hyperscalers.

All in all: Sustainable digitization needs an edge

No alt text provided for this image
The end

要查看或添加评论,请登录

社区洞察

其他会员也浏览了