Much Like Society, Data is better with Democracy.

Much Like Society, Data is better with Democracy.

Becoming a data-driven organisation remains one of the top strategic goals of many companies I work with.

Most are well aware of the benefits of becoming?intelligently empowered: providing the best customer experience based on data and hyper-personalisation; reducing operational costs and time through data-driven optimisations; and giving employees super powers with trend analysis and business intelligence.

They have been investing heavily in building enablers such as data and intelligence platforms. Despite?increasing effort and investment in building such enabling platforms, the organizations find the results middling, why is that so?

In this article I'll touch on why I think this is the case and how it can be solved as a business leader.

Monkey see and Monkey do

Here in Australia there's a trend of organisations have been working diligently to stamp out line of business having the control and freedom to make decisions and spend accordingly, mostly because at one point it was like the wild west; Where there were hundreds of investments with poor integrations , shadow IT popping up an siloed data and security concerns been normal.

This led domain architects to build/ buy solutions that would eventually lead to centralised, monolithic and domain agnostic data platforms to remedy this.

Essentially we have moving away from data ownership that is specific to certain domains, to centralized data ownership that is not domain-specific and we have been very proud of creating the biggest monolith of them all - the big data platform.

This has worked in the past, prior to the explosion of data and cloud adoption, but in today's world has led to significant problems.

Centralised and monolithic "Big government" style ownership.

Unfortunately this centralised model can work for organisations that have a smaller number of different types of customers and consumers but it fails for companies with a lot of different types of customers and a lot of sources for their data. This is because the more data that is available everywhere, the harder it becomes to control all of it in one place. This is especially true for data about customers. There are more and more sources of customer information, both inside and outside of organisations. Trying to store all this data in one place will limit our ability to use all the different sources of information.

No alt text provided for this image

The Titanic Effect - Inability to move quickly

Organisations also need to experiment quickly, fail fast and learn from previous mistakes, which means that there are more ways that data from the platform can be used. This in turn means that there are more transformations of the data- aggregates, projections, and slices- that can satisfy the needs of organisations for data. However, the long response time to satisfy the data consumer needs has been a point of friction for organizations in the past and remains to be so in well-established data platform architectures such as in Data Lakes and Data warehouses.

Ironically siloed ownership and Frustrated users.

Siloing data engineers from the operational units is not sustainable.?The platform's hyper-specialised teams have little understanding of their source domains and need to work with a diverse set or needs, whether it be analytical or business intelligence related - but without clear guidance on where they can find these experts within an organisation who will provide access for consuming applications that use big Data tooling like Spark etc., then this separation only leads towards suboptimal outcomes due lack alignment across functions internally as well externally.

Data engineering centralisation creates disconnected source teams, frustrating consumers fighting for a spot on top of the data platform team backlog and an over stretched Data Platform Team.

No alt text provided for this image

How do we Evolve Past this ?

Centralised, monolithic and domain agnostic data platforms as I have explained above have created a lot of learnings over the last decade or so, and from those learnings businesses are now realising that decentralising, democratising that data making it available everywhere and interconnected is incredibly important.

This is called Data Mesh

Data Mesh emphasises data governance and data sharing across organisational silos. The data mesh approach encourages organisations to build data products that are relevant, meaningful, shareable, and governed by data policy.

A data mesh architecture includes a data hub, data proxies, data services, and a discovery layer. The data hub is the central repository for all data products. Data proxies are used to access data from disparate data sources. Data services provide APIs for data access and management. The discovery layer aids in the discovery of data products and their underlying data sets. A data mesh provides a flexible, scalable way to manage data across an organization. It enables organisations to better utilize their data assets and build better data products.

Wait! Silo's ... Isn't this full circle?

Much like some first love couples break up .. and end up together later in life, grown up and matured ( hopefully happily ever after) building a data mesh has taken the learnings of time and applied principles which are :

  • Secure - In this world of decentralized domain oriented data products, access control is applied at a finer granularity- for each individual item within the product. It also means some form of Global security control must be applied and this standardised upon to ensure threats can't spread across a network.
  • Addressable - Data products should have a unique address that follows a global convention. This way, users can find and use them easily. Different organizations might use different conventions for their data, depending on how it is stored and formatted. But it is important to make sure data is easy to use, so a standard for addressability should be developed. This would make it easier to find and access information in a polyglot environment.
  • Discoverable - A data product must be easily discoverable, fairly no brainer here. the main difference from traditional platforms is previously, there was a single platform that collected data and used it for its own purposes. Now, each domain provides its data in a way that is easily discoverable.
  • Interoperable - One of the most important things in a distributed data architecture is being able to correlate data between different domains. This lets you see how everything fits together and find insights. To do this, you need to follow certain standards so the data can be harmonized. This includes making sure the fields are formatted the same way, identifying multiple meanings for words across domains, using the same address conventions, and having common metadata fields.?
  • Self Describing - Good products don't need people to help them work. People can find them, understand them, and use them on their own. To make it easy for data engineers and data scientists to use your data, you need to have well-described semantics and syntax for your data, along with sample datasets. Data schemas are a good way to do that.
  • Trustworthy - People will not use a product if they can't trust it. In traditional data platforms, it is okay to extract and onboard data that has errors in it. This is where the majority of the efforts of centralized data pipelines are concentrated: cleansing data after ingestion. Domain owners will need to set Service level objectives (SLO's) and create some form of data quality indicators to ensure the trustworthiness and truthfulness of the data product.

I don't know if i'm coining the acronym "SADIST" here, feel free to use it if you have a sense of humor .

What's important is that cross functional skills and teams be invested in by businesses, policies and governance be implemented and the guiding principles be adhered to avoid going backwards.

Wrapping up i'll show you this paradigm shift looks like in the real world in this Diagram, what's key to note is that each domain has its preferences and toolboxes , but all are interoperable and sharing data in one big cohesive web.

No alt text provided for this image

Cloudera where I work specialises in bringing this together and enabling our customers to implement modern data architecture principles like Data mesh.

I suggest you check out our site or send me a message if you're interested to find out more.

Thanks for reading , See you on the next one.

Michael Kogan

Data Challenges Solver | Partner for your Data Dreams | Enterprise Technology Sales Executive @ Databricks

2 年

Try posting this on Medium

要查看或添加评论,请登录

Dennis Balada的更多文章

社区洞察

其他会员也浏览了