ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Data requirements and evolving implementations

Shyam Singhal (Ph.D.)

Digital & Agile Transformation | Product Innovation and Development | Practice Head | Delivery Head | Excellence Head | Ex-Microsoft, Hewlett-Packard, Accenture

å‘å¸ƒæ—¥æœŸ: 2022å¹´9æœˆ26æ—¥

I have been thinking about it for some time now and was wondering what has changed over time that has resulted in so many solutions in â€œdataâ€ space.

We have had business transaction data or operational data even earlier. In order to decipher insights from collected data, then came analytical processing. To make it analytical processing friendly came the need to de-normalize it. However, the â€œNormalizationâ€ or â€œDe-normalizationâ€ were the technical aspects to facilitate the data record / maintenance and had nothing to do with business / end needs.

If I look at the business / end-user needs then those can be summarized as,

a)????The transactional / operational data is managed and maintained

b)????Insights could be derived from collected data to improve business value and / or customer experience, i.e., some positive impact in some part(s) of the value chain

c)?????As far as possible insights / data management could be self-served

With operations across geographies and online operations, came the need for,

d)????Distributed data access, and

e)????Scalability

As far as â€˜Data Integrityâ€™, â€˜Securityâ€™, and â€˜Governanceâ€™ are concerned, these attributes were present from the day the data collection started; however, became more pronounced since online transaction had started. These attributes or requirements might not have been stated explicitly, and perhaps, are still not; however, those are an integral part of any data.

Therefore, any data solution today must fulfill points #a-#e, including the attributes mentioned in above paragraph. However, segregation of transactional / operational data from analytical data is not a business requirement, rather it is a technical or solution implementation requirement as envisaged by the solution architects at that time or in current time.

I understand that #c was not there in the original list and got added later (I donâ€™t know when); however, the requirement is an obvious extension of the basic requirements.

Over the decades of journey businesses and people engaged in data collection and management have seen Data Warehouses, Big Data, Data Lakes, Data Lakehouse, Multi Modal Data Lakes (thanks to cloud), Data Fabric, Data Mesh, Data Virtualizationâ€¦the list just goes on, and in future might become even more lengthier.

The point is, you take any technology / solution the fundamental requirements / needs remain the same (points #a-#e, and three attributes mentioned in following paragraph of those points). However, with each subsequent iteration of data evolvement, in terms of size / volume, types of data, accessibility of data, and representation of data, the solutions have also evolved. Though, it is worth mentioning that no single data solution could qualify as panacea.

Over a period, the data creation or collection has increased exponentially thanks to data in form of devices emitting data, web / apps analytics needs, GPS etc. The fact of the matter is, data is collected to generate certain insights from it, apart from having a system of record or compliance. If that were true, then how we process data to serve the needs as stated above (#a-#e) are simply solution requirements, which have evolved because of volume and variety of data. Further compounded by geographical spread and statutory requirements, as enforced by various governments and related agencies.

For example, we have had data ingested by devices directly to cloud leading to need to powerful compute resources, and storage. Which over a period got an element of edge computing to ease out that load on cloud and make it pocket friendly to its users. The heterogenous data led to the need of a â€˜storeâ€™ to quickly store the data and manage it efficiently. And so on.

Over time, we found that easy and secure accessibility to raw or processed data directly to its end users or users involved in generating insights is becoming a norm, or at least make more sense, as any intermediary solution results in loss of agility. What it also meant was that data should not be replicated to avoid silos and data integrity issues. Though, again, latter is a technical requirement, and not a business requirement.

As evident these complications are from solutioning perspective though fundamental requirements have not changed (points #a-#e). Therefore, the discussions around whether we go for Virtualization, Data Fabric, Data Mesh, Data Lakeâ€¦are something that an end user must not bother about. Whether it is one or some combination of available techniques and technologies that make sense from a solutioning perspective, why should it matter to me as an end user? As an end user I am least bothered, how my requirements are fulfilled (if those are fulfilled ethically).

é¢†è‹±æŽ¨è

Know the power of your Data with ibi? Data Intelligence

Amtex Systems Inc. 4 ä¸ªæœˆå‰

Data Quality is Key to Amplify the Business Value of Your Data

Data Quality is Key to Amplify the Business Value ofâ€¦

Syniti 2 å¹´å‰

How to Protect Your Data Pipeline Process with Data Contracts

How to Protect Your Data Pipeline Process with Dataâ€¦

XenonStack 3 ä¸ªæœˆå‰

The problem arises when solution experts try to hide behind buzz words / jargons to fulfill their objectives. I believe that is where the â€˜Ethicalâ€™ part comes into play. Now, without getting sucked into ethics and its implications etc., letâ€™s briefly see what each of these technology trends mean to us.

Broadly, there are three parts to any data journey,

1)????Creation / Ingestion or Accessibility of Data,

2)????Data Management, or governance, and

3)????Data Consumption / Exposure

Therefore, a technology / tool / framework could either be addressing a part, or all parts of this journey. Nonetheless, those revolve around these three aspects of data journey.

A Data Fabric based solution would need to provide how the multiple (possibly disparate) data stores are integrated, such that those can fulfill the data requirements in an integrated manner, transparently. Whether behind the scenes, it runs certain ETL processes, enforce certain policies or create certain batch jobs to provide the required data view / access, all those are transparent to the end user.

Similarly, it may enforce certain policies for data access, rendering, compliance to statutory requirements in form of GDPR, HIPAA, FCRA etc., the fact of the matter is the data is secured when in rest, and when in motion, and is only accessible to authorized user as per assigned privileges. Of course, it would involve policies around retention, archival, deletion etc., however, once again, those are transparent to the end user.

Lastly, the data needs to be consumed to generate insights. Perhaps, those are provisioned based on roles, and granularity, and takes into consideration the support to underlying technology and extensibility of that interface too. For example, it may canned views, insights, catalogs as per roles, like, Business Analyst, Developer, Scientist etc., and support Low / No Code technologies.

As evident it addresses all three parts of â€˜Data Journeyâ€™; however, it is not very prescriptive of how each of those are managed. The ISVs can operate in one or all three parts and may provide either a partial or complete solution through the suggested framework.

The problem with earlier solution approaches and even with this approach is that we are dependent on technologists to fulfill business requirements. Invariably those technologists are not domain experts and lack functional knowledge to understand the working, challenges, dependencies, risks, and problems that a business is trying to answer.

Now, the obvious alternative to this approach would be to attach the accountability of data to its origin / function / domain such that they serve the need of its customers by anticipating, and gathering needs of its customers, and providing a ready to consume data (â€˜Data as a Productâ€™ in Data Mesh parlance) through consolidation, summarization, augmentation etc. In nutshell, they â€˜processâ€™ the data for easy consumption of various users. Therefore, there seems to be a shift in data ownership, and its processing from â€˜Technologistsâ€™ to its â€˜Ownersâ€™.

Interestingly, this is the approach that â€˜Data Meshâ€™ has suggested. I believe, it is a good approach, as now even if the actual processing is still being done by the technologists (for the time being, until some low / no code solution enables functional owners to do that); however, those are being done in consultation / collaboration with the functional experts. And, over a period, even those technologists could gain that functional expertise / knowledge.

Though, segregation of data by domain has its own challenges in form of having a risk of creating silos; however, it is assumed that those would collaborate and would work seamlessly through internal integrations and/or automations. A big assumption to fulfill, given human psychology, nonetheless, required for it to work efficiently. However, even â€˜Data Fabricâ€™, works on a principle of distributed databases with seamless integration, which can easily be adopted for Data Mesh.

The â€˜Self-Serve Infrastructure as a Platformâ€™ and â€˜Federated Governanceâ€™ under â€˜Data Meshâ€™ points cater to point #3 and point #2 (in part) of â€˜Data Journeyâ€™, as detailed above. Therefore, the real change the â€˜Data Meshâ€™ has proposed is in the form of â€˜Data Ownershipâ€™ and â€˜Data Formulation and Packagingâ€™. But, then â€˜Data Fabricâ€™ did not prescribe any centralized data store either. Both approaches would require transparent and seamless integration of distributed and potentially disparate databases to serve the needs of its customers

The thinking of a â€˜Single Source of Truthâ€™ has also been challenged, thanks to affordable and high performant compute powers of modern cloud. Now, the focus is to replicate / localized and refresh as per system requirements to break away from shackles of centralized data store and associated constraints.

Personally, Iâ€™m not interested in the debate of Data Fabric vs Data Mesh or any other technical implementation, as long as my fundamental requirements (#a-#e) are being served effectively and efficiently.

Kireeti Kesavamurthy

2 å¹´

Shyam, You took me through the evolution of data technologies from DW to Data mesh and many more jargon introduced by various firms. But still the moot question remains, are they addressing the user requirements effectively ??

èµž

å›žå¤

æŸ¥çœ‹æ›´å¤šè¯„è®º

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Shyam Singhal (Ph.D.)çš„æ›´å¤šæ–‡ç«

AI â€“ a marketing gimmick, wellâ€¦

2025å¹´1æœˆ14æ—¥

AI â€“ a marketing gimmick, wellâ€¦

AI in recent times not only have become an integral part of us, but have also become the most abused word, as everyoneâ€¦

1 æ¡è¯„è®º
Challenges in Agile Projects

2024å¹´8æœˆ19æ—¥

Challenges in Agile Projects

Do we have problems in Agile projects? The other day in casual discussion with my friend, the discussion moved towardsâ€¦

2 æ¡è¯„è®º
Data Privacy Influencing Technology, or the Other Way Around?

2024å¹´5æœˆ28æ—¥

Data Privacy Influencing Technology, or the Other Way Around?

The other day I came across one question, â€œHow Data Privacy is influencing Technology?â€. At that time, I did not payâ€¦

2 æ¡è¯„è®º
AI: Understanding is mostly artificial

2024å¹´3æœˆ6æ—¥

AI: Understanding is mostly artificial

I came across a recent conversation on shortage of AI talent in India. To me, it sounded a bit overstated.

1 æ¡è¯„è®º
GCCs: Beyond Labels and Categories, Towards Tailored Objectives

2024å¹´2æœˆ27æ—¥

GCCs: Beyond Labels and Categories, Towards Tailored Objectives

The other day I was having a discussion around GCC with a consulting firm. Though, the discussion and experience wereâ€¦
Embracing AI Innovation: While Keeping a Watchful Eye on the Challenges

2024å¹´2æœˆ26æ—¥

Embracing AI Innovation: While Keeping a Watchful Eye on the Challenges

The predictions, and their transient state Everyone is excited, cautious, and apprehensive about new technologies andâ€¦
The polysemic and polymorphic AI

2023å¹´12æœˆ6æ—¥

The polysemic and polymorphic AI

Despite of our liking or disliking AI is here to stay. Since its evolvement the world has seen many forms and debatesâ€¦

3 æ¡è¯„è®º
Obsession with AI/ML resulting in biased thinking

2023å¹´2æœˆ8æ—¥

Obsession with AI/ML resulting in biased thinking

Everyone seems to be in awe of AI, and its applications. So much so that even if it is not AI, it is attributed to AIâ€¦

1 æ¡è¯„è®º
Agile Transformation

2022å¹´9æœˆ27æ—¥

Agile Transformation

Much has already been written on this topic; however, like many other topics, it seems an elusive term, when it comesâ€¦
Transformation, and the tale of two internal factors

2021å¹´4æœˆ22æ—¥

Transformation, and the tale of two internal factors

â€˜Transformationâ€™, a topic, which we are living through and we read about every day. Interestingly, so much has beenâ€¦

See all articles

Data requirements and evolving implementations

Shyam Singhal (Ph.D.)

Digital & Agile Transformation | Product Innovation and Development | Practice Head | Delivery Head | Excellence Head | Ex-Microsoft, Hewlett-Packard, Accenture

é¢†è‹±æŽ¨è

Shyam Singhal (Ph.D.)çš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Redington Announces Partnership with Talend to Add Business Value to Organizationsâ€™ Data

DATA FABRIC AND REALITY - PART I

What is the difference between data silos and data domains?

A Unified Approach to Data Practices

Behind the scenes of data contracts

Data Products: The Future of Data Strategy in Business

Architect Your Future: Building a Winning Data Strategy

Overcoming Common Challenges in Data Modernization Projects

Ready-Made Data Connectors for Swift Data Value Realization

How Paystack creates a single source of truth across its data stack with Secoda

é¢†è‹±æŽ¨è

Shyam Singhal (Ph.D.)çš„æ›´å¤šæ–‡ç«

AI â€“ a marketing gimmick, wellâ€¦

Challenges in Agile Projects

Data Privacy Influencing Technology, or the Other Way Around?

AI: Understanding is mostly artificial

GCCs: Beyond Labels and Categories, Towards Tailored Objectives

Embracing AI Innovation: While Keeping a Watchful Eye on the Challenges

The polysemic and polymorphic AI

Obsession with AI/ML resulting in biased thinking

Agile Transformation

Transformation, and the tale of two internal factors

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Redington Announces Partnership with Talend to Add Business Value to Organizationsâ€™ Data

DATA FABRIC AND REALITY - PART I

What is the difference between data silos and data domains?

A Unified Approach to Data Practices

Behind the scenes of data contracts

Data Products: The Future of Data Strategy in Business

Architect Your Future: Building a Winning Data Strategy

Overcoming Common Challenges in Data Modernization Projects

Ready-Made Data Connectors for Swift Data Value Realization

How Paystack creates a single source of truth across its data stack with Secoda

é¢†è‹±æŽ¨è

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†