Data requirements and evolving implementations

I have been thinking about it for some time now and was wondering what has changed over time that has resulted in so many solutions in “data” space.

We have had business transaction data or operational data even earlier. In order to decipher insights from collected data, then came analytical processing. To make it analytical processing friendly came the need to de-normalize it. However, the “Normalization” or “De-normalization” were the technical aspects to facilitate the data record / maintenance and had nothing to do with business / end needs.

If I look at the business / end-user needs then those can be summarized as,

a)????The transactional / operational data is managed and maintained

b)????Insights could be derived from collected data to improve business value and / or customer experience, i.e., some positive impact in some part(s) of the value chain

c)?????As far as possible insights / data management could be self-served

With operations across geographies and online operations, came the need for,

d)????Distributed data access, and

e)????Scalability

As far as ‘Data Integrity’, ‘Security’, and ‘Governance’ are concerned, these attributes were present from the day the data collection started; however, became more pronounced since online transaction had started. These attributes or requirements might not have been stated explicitly, and perhaps, are still not; however, those are an integral part of any data.

Therefore, any data solution today must fulfill points #a-#e, including the attributes mentioned in above paragraph. However, segregation of transactional / operational data from analytical data is not a business requirement, rather it is a technical or solution implementation requirement as envisaged by the solution architects at that time or in current time.

I understand that #c was not there in the original list and got added later (I don’t know when); however, the requirement is an obvious extension of the basic requirements.

Over the decades of journey businesses and people engaged in data collection and management have seen Data Warehouses, Big Data, Data Lakes, Data Lakehouse, Multi Modal Data Lakes (thanks to cloud), Data Fabric, Data Mesh, Data Virtualization…the list just goes on, and in future might become even more lengthier.

The point is, you take any technology / solution the fundamental requirements / needs remain the same (points #a-#e, and three attributes mentioned in following paragraph of those points). However, with each subsequent iteration of data evolvement, in terms of size / volume, types of data, accessibility of data, and representation of data, the solutions have also evolved. Though, it is worth mentioning that no single data solution could qualify as panacea.

Over a period, the data creation or collection has increased exponentially thanks to data in form of devices emitting data, web / apps analytics needs, GPS etc. The fact of the matter is, data is collected to generate certain insights from it, apart from having a system of record or compliance. If that were true, then how we process data to serve the needs as stated above (#a-#e) are simply solution requirements, which have evolved because of volume and variety of data. Further compounded by geographical spread and statutory requirements, as enforced by various governments and related agencies.

For example, we have had data ingested by devices directly to cloud leading to need to powerful compute resources, and storage. Which over a period got an element of edge computing to ease out that load on cloud and make it pocket friendly to its users. The heterogenous data led to the need of a ‘store’ to quickly store the data and manage it efficiently. And so on.

Over time, we found that easy and secure accessibility to raw or processed data directly to its end users or users involved in generating insights is becoming a norm, or at least make more sense, as any intermediary solution results in loss of agility. What it also meant was that data should not be replicated to avoid silos and data integrity issues. Though, again, latter is a technical requirement, and not a business requirement.

As evident these complications are from solutioning perspective though fundamental requirements have not changed (points #a-#e). Therefore, the discussions around whether we go for Virtualization, Data Fabric, Data Mesh, Data Lake…are something that an end user must not bother about. Whether it is one or some combination of available techniques and technologies that make sense from a solutioning perspective, why should it matter to me as an end user? As an end user I am least bothered, how my requirements are fulfilled (if those are fulfilled ethically).

The problem arises when solution experts try to hide behind buzz words / jargons to fulfill their objectives. I believe that is where the ‘Ethical’ part comes into play. Now, without getting sucked into ethics and its implications etc., let’s briefly see what each of these technology trends mean to us.

Broadly, there are three parts to any data journey,

1)????Creation / Ingestion or Accessibility of Data,

2)????Data Management, or governance, and

3)????Data Consumption / Exposure

Therefore, a technology / tool / framework could either be addressing a part, or all parts of this journey. Nonetheless, those revolve around these three aspects of data journey.

A Data Fabric based solution would need to provide how the multiple (possibly disparate) data stores are integrated, such that those can fulfill the data requirements in an integrated manner, transparently. Whether behind the scenes, it runs certain ETL processes, enforce certain policies or create certain batch jobs to provide the required data view / access, all those are transparent to the end user.

Similarly, it may enforce certain policies for data access, rendering, compliance to statutory requirements in form of GDPR, HIPAA, FCRA etc., the fact of the matter is the data is secured when in rest, and when in motion, and is only accessible to authorized user as per assigned privileges. Of course, it would involve policies around retention, archival, deletion etc., however, once again, those are transparent to the end user.

Lastly, the data needs to be consumed to generate insights. Perhaps, those are provisioned based on roles, and granularity, and takes into consideration the support to underlying technology and extensibility of that interface too. For example, it may canned views, insights, catalogs as per roles, like, Business Analyst, Developer, Scientist etc., and support Low / No Code technologies.

As evident it addresses all three parts of ‘Data Journey’; however, it is not very prescriptive of how each of those are managed. The ISVs can operate in one or all three parts and may provide either a partial or complete solution through the suggested framework.

The problem with earlier solution approaches and even with this approach is that we are dependent on technologists to fulfill business requirements. Invariably those technologists are not domain experts and lack functional knowledge to understand the working, challenges, dependencies, risks, and problems that a business is trying to answer.

Now, the obvious alternative to this approach would be to attach the accountability of data to its origin / function / domain such that they serve the need of its customers by anticipating, and gathering needs of its customers, and providing a ready to consume data (‘Data as a Product’ in Data Mesh parlance) through consolidation, summarization, augmentation etc. In nutshell, they ‘process’ the data for easy consumption of various users. Therefore, there seems to be a shift in data ownership, and its processing from ‘Technologists’ to its ‘Owners’.

Interestingly, this is the approach that ‘Data Mesh’ has suggested. I believe, it is a good approach, as now even if the actual processing is still being done by the technologists (for the time being, until some low / no code solution enables functional owners to do that); however, those are being done in consultation / collaboration with the functional experts. And, over a period, even those technologists could gain that functional expertise / knowledge.

Though, segregation of data by domain has its own challenges in form of having a risk of creating silos; however, it is assumed that those would collaborate and would work seamlessly through internal integrations and/or automations. A big assumption to fulfill, given human psychology, nonetheless, required for it to work efficiently. However, even ‘Data Fabric’, works on a principle of distributed databases with seamless integration, which can easily be adopted for Data Mesh.

The ‘Self-Serve Infrastructure as a Platform’ and ‘Federated Governance’ under ‘Data Mesh’ points cater to point #3 and point #2 (in part) of ‘Data Journey’, as detailed above. Therefore, the real change the ‘Data Mesh’ has proposed is in the form of ‘Data Ownership’ and ‘Data Formulation and Packaging’. But, then ‘Data Fabric’ did not prescribe any centralized data store either. Both approaches would require transparent and seamless integration of distributed and potentially disparate databases to serve the needs of its customers

The thinking of a ‘Single Source of Truth’ has also been challenged, thanks to affordable and high performant compute powers of modern cloud. Now, the focus is to replicate / localized and refresh as per system requirements to break away from shackles of centralized data store and associated constraints.

Personally, I’m not interested in the debate of Data Fabric vs Data Mesh or any other technical implementation, as long as my fundamental requirements (#a-#e) are being served effectively and efficiently.

Shyam, You took me through the evolution of data technologies from DW to Data mesh and many more jargon introduced by various firms. But still the moot question remains, are they addressing the user requirements effectively ??

赞
回复

要查看或添加评论,请登录

Shyam Singhal (Ph.D.)的更多文章

  • AI – a marketing gimmick, well…

    AI – a marketing gimmick, well…

    AI in recent times not only have become an integral part of us, but have also become the most abused word, as everyone…

    1 条评论
  • Challenges in Agile Projects

    Challenges in Agile Projects

    Do we have problems in Agile projects? The other day in casual discussion with my friend, the discussion moved towards…

    2 条评论
  • Data Privacy Influencing Technology, or the Other Way Around?

    Data Privacy Influencing Technology, or the Other Way Around?

    The other day I came across one question, “How Data Privacy is influencing Technology?”. At that time, I did not pay…

    2 条评论
  • AI: Understanding is mostly artificial

    AI: Understanding is mostly artificial

    I came across a recent conversation on shortage of AI talent in India. To me, it sounded a bit overstated.

    1 条评论
  • GCCs: Beyond Labels and Categories, Towards Tailored Objectives

    GCCs: Beyond Labels and Categories, Towards Tailored Objectives

    The other day I was having a discussion around GCC with a consulting firm. Though, the discussion and experience were…

  • Embracing AI Innovation: While Keeping a Watchful Eye on the Challenges

    Embracing AI Innovation: While Keeping a Watchful Eye on the Challenges

    The predictions, and their transient state Everyone is excited, cautious, and apprehensive about new technologies and…

  • The polysemic and polymorphic AI

    The polysemic and polymorphic AI

    Despite of our liking or disliking AI is here to stay. Since its evolvement the world has seen many forms and debates…

    3 条评论
  • Obsession with AI/ML resulting in biased thinking

    Obsession with AI/ML resulting in biased thinking

    Everyone seems to be in awe of AI, and its applications. So much so that even if it is not AI, it is attributed to AI…

    1 条评论
  • Agile Transformation

    Agile Transformation

    Much has already been written on this topic; however, like many other topics, it seems an elusive term, when it comes…

  • Transformation, and the tale of two internal factors

    Transformation, and the tale of two internal factors

    ‘Transformation’, a topic, which we are living through and we read about every day. Interestingly, so much has been…

社区洞察

其他会员也浏览了