登录查看更多内容

A Simple Guide to Considering Data Gravity in a Hybrid & Edge Computing World

Mark Thiele

发布日期: 2018年6月6日

I recently asked a few questions of my incredibly smart LinkedIn community. It starting with a discussion about on-prem vs public cloud when I decided to get more specific and ask a question about Data Gravity. My concern is that some of us, OK, maybe just me, are, in the abstract, oversimplifying the discussion of where data needs to live, especially when you consider the hybridized nature of modern IT. Well, I asked for it and I got it, dozens of great comments and answers and the following is just a sampling. If you’d like the whole story, which includes contributions I didn't include here or would like to participate, feel free to come visit the discussion on LinkedIn .

The Questions

What is the best method to review your #Datagravity requirements?
What process would you use to continue to validate decisions on location or application design?
Would you consider AI or fixed policies on data replication, latency or location?
Would you consider having a review process or tool for managing your application ecosystem?

When you think about your data, the following are likely to be your key considerations as to placement for best value.

Val Bercovici CEO/Founder (@Valb00)

Data Domesticity Regulations: The Law trumps size, elasticity and performance influences on Data Gravity

Ralph Loura CTO/CIO @RalphLoura

Traffic patterns to/from the data along with latency requirements

Would prefer having a tool to monitor/enforce and potentially automate data management, with humans setting the policy. Not opposed to the idea of AI, but not necessarily in favor yet either

Paul Clark Enterprise Data Architect

Regarding review and understanding of data gravity, from a data architecture perspective, we should first understand data as an organism within the enterprise. For this discussion, there are three essential natures of data: pulsatile, persisted and residual. Pulsatile data, much like blood being pumped to and from the heart, lives close to the action. As Dave mentioned, in an e-commerce system where customers are placing orders. Persisted data, such as customer and product data, doesn't have a pulse necessarily but is essential for supporting business function. Residual data is the footprint of activity left behind by the running of the business such as completed orders and stale customer data. If we can accurately classify our enterprise data, we can then know where data needs to live and what performance and accessibility demands are expected. For instance, pulsatile data requires proximity to the action where residual data does not.

Lori MacVittie Principle Technical Evangelist @lmacvittie

Gravity is affected by the weight (mass) of data sources. The bigger the data store, the less likely it - and highly dependent apps - are going to move off prem. Data on new app dev says only 15% or so are cloud native. The rest are traditional architectures. Makes that 85% harder (but not impossible) to migrate.

Rick Parker Senior Systems Engineer @parkercloud

Data gravity, location proportionally closest to biggest users validate design against cost comparisons vs public clouds Fluid policies on replication latency and location. AI or ML preferred. I use monitoring as a review process and tool

Dave McCrory VP Software Eng for Machine Learning / IIoT & father of “Data Gravity” @mccrory

First, Data in flight if frequently sampled has both gravity and inertia, regardless of persistence. Best method of identifying #Datagravity requirements is to measure end to end (and point to point) request/response latency and bandwidth vs required/desires latency and bandwidth response. This could be API calls, DB requests, App responsiveness, etc. The process answer is somewhat baked in, but additional tools or monitoring could be used. There are also elements of data governance/provenance, costs, and outside factors that need to be accounted for. AI wouldn’t be considered IMO, but yes to easily understood policies regarding replication, latency, bandwidth, and location. A review process is absolutely needed, a tool managing an all ecosystem? I haven’t seen such a tool that doesn’t have a crazy amount of overheard and maintenance involved.

Jason Collier Founder @bocanuts

Here is an example of a customer of ours in Europe. This customer is a large global retailer with over 6000 stores worldwide. They deploy both private cloud and edge. Each one of the stores requires an inventory tracking system, point-of-sale systems, security camera archive, and other misc. resources. The data that is then relevant to the corporate goals is processed at the edge and then relayed up to HQ. In their case the relevancy of the data determines its locality. Also, some of these stores are smaller and in remote regions with very poor connectivity options making it a requirement that they must be able to operate independent of a connection to HQ for extended periods.

Rick Drescher Data Center, Interconnection & Cloud Strategy Consultant @Rick_Drescher

Speaking from the perspective of a recent project, this issue came into play with a massive data warehousing/analytics platform that had hundreds of terabytes of historical data, with the growth of that data dramatically slowing over the past few years. The initial thought was that the behemoth data set, living on a pricey SSD SAN would be treated like a boat anchor in the enterprise data center, closely coupled with the compute required to run analytics on the data as required by clients for the foreseeable future (this was a SaaS data analytics platform). With the help of data analytics tools and some really smart developers, it was determined that a consolidated data set, less than 10% of the size of the full data set, produced analytics that were within 0.004% of accuracy of the full data set, at an almost unbelievable performance improvement of more than 700%. This made it instantly plausible to migrate the platform to the public cloud, which avoided a repeated sizable capital investment that the client had undertaken every 5-7 years or so simply to be able to continue purchasing maintenance on the hardware running it, and updating operating system and underlying software to keep security current.

Ryan Fay CIO @RyancFay

Data gravity is relative to the use case and often decided based on numerous factors such as compliance and regulations. Same with location proportionally with my preference to keep it as close to the EDGE/ FOG as physically possible thus reducing latency and cost. We currently validate our designs against both private, public and hybrid multi-cloud value/cost metrics. We concentrate on automating fluid policies based on replication latency, use case, and location. GCP offers fantastic AI and ML tools that have saved my teams an immense amount of time and energy. As I stated in another comment below, the more your applications use native platform or cloud features, the less likely that your apps will be easily portable. The reason is many desirable capabilities (that I truly value) are tied to a specific PaaS, IaaS, SaaS, and TaaS, and those just can’t be migrated as is or in some use cases should not be relocated for many reasons

Yuval Dimnik Co-Founder @yuvaldim

Datagravity can be somewhat factored by the following to compare complexities across organizations and projects: Number of data classes you have (this is mostly factored for complexity which affects TCO). Net size of each data class - The more data you have the more gravity you have. Required performance - both in data creation as well as in consumption. Higher requirements mean more gravity. Generators/consumer location - to be factor against the next two items. Provided performance for each of the clients location - the better performance you can actually get the lower the gravity. Cost of consumption (egress cost...) - if it is expensive to move data to that client - gravity is higher. Cost of management to achieve the above - dependent on the number of data classes, clients, locations, vendors, external and internal regulations and more. If an org has 30% of an FTE to optimize data that has egress of 5K a year - it doesn't make sense. If they don't optimize and have egress of $200K - well... same problem.

Whether you’ve already put a data management plan in place for modern deployment of applications or are just considering one, I would highly recommend taking some of the advice from this incredible group of technologists and strategists.

Why

I asked the questions because I like to challenge myself and my assumptions. There is a ton of FUD out there and if you haven’t looked at how to solve a data gravity issue in the last 6 months, you’ve probably missed out of some important strategies and technologies that would help create opportunities for you in new and unique ways.

Shout out to Dave McCrory for the use of the term #DataGravity

Yuval Dimnik

6 年

Great Summary Mark. BTW, I'm also the founder of NooBaa ;)

1 次回应

mike d. kail

6 年

IMHO, one large driver of data gravity, or rather data inertia, is that the applications that create and access the data are rarely architected from the onset with planning for how to deal with the ever-increasing amount of data, thus resulting in the proverbial "data lake" that is challenging to migrate. This is somewhat analogous to how Las Vegas was built. Instead of building the roads first, the city (data in this case) was built first, and then the ways to access it were constructed.

4 次回应

Val Bercovici

Building AI Factories, Open Source & Cloud Native

6 年

What a great summary post Mark! Thanks for level-setting where Cloud is roughly a decade after it entered our consciousness :)

4 次回应

Elliot Ross

IT Strategy, Architecture & IT enabled process improvement, ITIL & LEAN Six σ accredited

6 年

What I truly loved Mark is the chorus of "it depends" and 'Your mileage may vary"? I was never fond of the extremist views of 'all cloud all the time' - workloads will vary, and I love the term data gravity - because it accurately sums up the issue. With the average large organization running at least hundreds if not thousands of legacy apps - the concept of data gravity will give architecture teams a great pointer on what to look at first.

4 次回应

查看更多评论

要查看或添加评论，请登录

Mark Thiele的更多文章

The Evolution of Edge - The Next 7 Years

2022年5月10日

The Evolution of Edge - The Next 7 Years

In my last short-take blog on the “evolving edge” I threatened my readers with the notion of this follow up blog, can’t…

12 条评论
Why Web3

2022年4月19日

Why Web3

My company has pivoted to focus more energy on the Web3 side. This was a tough decision for me, as I’ve spent the…

3 条评论
Evolution of Edge - The Last Seven Years

2022年4月12日

Evolution of Edge - The Last Seven Years

In 2014 when I first really started to dive into the opportunity of Edge, I believed then, that Edge would be a large…

26 条评论
Why Digital Transformation will Drive Increased Enterprise Ownership of Infrastructure

2021年5月11日

Why Digital Transformation will Drive Increased Enterprise Ownership of Infrastructure

"Whoa there Nelly, did he just say enterprises will increase the size of their infrastructure ownership" or digital…

17 条评论
Edge – What is it and Where is it, a Final Answer

2021年3月16日

Edge – What is it and Where is it, a Final Answer

Of course, it’s not a final answer, that was just click bait. I’m bold, and I sometimes think highly of myself, but I…

23 条评论
The Struggle with Cloud Adoption

2021年3月2日

The Struggle with Cloud Adoption

We all know cloud and we all should (by now) know - why cloud. However, it would appear that some of the nuances that…

34 条评论
The Future Demands of Edge Infrastructure Require a Rethink

2021年1月27日

The Future Demands of Edge Infrastructure Require a Rethink

Edge computing will be 100X what public cloud is – Michael Dell* I was recently discussing Edge build out during one of…

47 条评论
Why CyberSecurity is our Drug War

2021年1月14日

Why CyberSecurity is our Drug War

The drug war effectively started in the 30s as a way to keep the Feds busy after the end of prohibition. Since that…

28 条评论
Thiele’s Law and How it Applies to Edge

2020年12月10日

Thiele’s Law and How it Applies to Edge

First thing you’re thinking is “there must be another Thiele, cause there’s no way Mark has the ego big enough to think…

21 条评论
Edge Computing World

2020年10月9日

Edge Computing World

It’s that time of year, when Christmas decorations again go up way too early, and on the positive side, it’s time for…

2 条评论

See all articles

A Simple Guide to Considering Data Gravity in a Hybrid & Edge Computing World

Mark Thiele

Mark Thiele的更多文章

社区洞察

其他会员也浏览了

Building a hyperscale data center gets dirty as racketeering and computer fraud charges are filed

Data Singularity: how all-inclusive data platforms are conquering the entire data landscape

What Does The Data Suggest?

Unlocking the Power of Data with Microsoft Fabric

Global Big Data- Market Size, Share & Growth | Industry Report, 2030 ...

Google-IDC Data-AI Trends 2023

A Deep Dive into Microsoft Fabric: What It Means for Your Data Strategy

Achieving Interoperability in Dataspaces

Get Ready for the Next Generation of DataOps Observability

How the Internet Boom Revolutionized Data Management

Mark Thiele的更多文章

The Evolution of Edge - The Next 7 Years

Why Web3

Evolution of Edge - The Last Seven Years

Why Digital Transformation will Drive Increased Enterprise Ownership of Infrastructure

Edge – What is it and Where is it, a Final Answer

The Struggle with Cloud Adoption

The Future Demands of Edge Infrastructure Require a Rethink

Why CyberSecurity is our Drug War

Thiele’s Law and How it Applies to Edge

Edge Computing World

社区洞察

其他会员也浏览了

Building a hyperscale data center gets dirty as racketeering and computer fraud charges are filed

Data Singularity: how all-inclusive data platforms are conquering the entire data landscape

What Does The Data Suggest?

Unlocking the Power of Data with Microsoft Fabric

Global Big Data- Market Size, Share & Growth | Industry Report, 2030 ...

Google-IDC Data-AI Trends 2023

A Deep Dive into Microsoft Fabric: What It Means for Your Data Strategy

Achieving Interoperability in Dataspaces

Get Ready for the Next Generation of DataOps Observability

How the Internet Boom Revolutionized Data Management