How Enterprise Data Observability Improves Data Asset Reliability
Property Diagram - Copyright Willem Koenders, ZS Associates. Additional details tying in EDO added by Ramon Chen CPO, Acceldata

How Enterprise Data Observability Improves Data Asset Reliability

A few months ago ZS Associates' Willem Koenders published on Medium a post titled "The best way to explain data governance to beginners". It was an intriguing analogy relating various elements of data management and data governance to a physical building, its assets, and its operations. The post included a beautifully drawn and described visual (shown below) associating various elements below to their property and building counterparts.

Beautifully drawn diagram of the Property analogy from Willem Koenders Medium article here

As the Chief Product Officer of Acceldata I'm on a mission to call attention to a new groundbreaking category of technology called Enterprise Data Observability (EDO), for which Acceldata is a pioneer.

In the context of this analogy, EDO serves as an overseeing layer that enhances each aspect of data management. I would describe it as:

Real-time Property Reliability, Wiring & Plumbing Monitoring, Anomaly Detection including changes in structure, typical activities, Operating Cost Optimization, Chargeback to individual tenants, Real-time Rules-based Alerting, and Proactive Response

As such I added some scaffolding to Willem's original diagram to reflect the capabilities of EDO in context of the analogy.

Property Diagram - Copyright Willem Koenders, ZS Associates. Additional details tying in EDO added by Ramon Chen CPO, Acceldata

Enterprise Data Observability: is akin to a state-of-the-art building management system for your data assets. In the sections below, I renovate (pun intended) and sketch out how EDO supports and improves each area of data management and governance so eloquently described by Willem. Each section below is borrowed from Willem's article unchanged. I've added an EDO: section to contrast the added capabilities that make each element better.

Data Asset: At the heart of the analogy lies the data asset, which corresponds with the building or property in real estate management. The data asset can also be perceived as a data product or a dataset. Both data and real estate management revolve around managing assets that generate value when governed and nurtured adequately, but that lead to risks and losses when mismanaged.

  • EDO: Provides Real-time Property Reliability, ensuring that the data asset remains a valuable and reliable resource, operating as it should, servicing the needs of its tenants, as a proactively-maintained building.

Data (Product) Ownership: A critical concept in data management is ownership — responsibilities may be delegated to others, but at the end of the day, one person or team should be the owner of the data. The same is true for a building, where this would be the property owner or landlord.

  • EDO: Through Alerting and Proactive Response, ensures that data owners are immediately notified of any issues, similar to a property owner being alerted about maintenance needs. The key difference here though is unlike traditional data quality alerts and fixes, EDO happens in real-time, at the source, before damage to the building can take place that results in much more expensive and potentially serious damage to the building.

Data Steward: Data stewardship involves assigning responsibility for the management of data assets to specific individuals or teams, for example, to ensure that data is sufficient quality. In real estate management, data stewardship can be compared to the role of property managers who are responsible for the upkeep and maintenance of a property.

  • EDO: Aids in Operating Cost Optimization by providing data stewards with insights into data usage and quality, similar to how property managers would benefit from real-time monitoring of building systems. After all, the cost of operation of the building impacts profitability. A tap left running, lights not turned off, heating and cooling not optimized, leads to significant waste. Much as in data and analytics, tons of resources and spend are wasted on poor, long running, and even duplicative SQL queries, Spark instances not being shutdown.

Data Consumers / Users: Various individuals and business processes may consume the data, internal and external to the organization. This can be compared to the tenants that use the building for their respective purposes.

  • EDO: Data Reliability and Anomaly Detection ensures that data consumers are not affected by irregularities in the data, much like tenants would want to be assured of their safety in a building. They rely on the data to execute their business goals. The Operating Cost Optimization charge back capabilities of an EDO means that any Utilities consumed by the Users are reported back to them, and associated with their budget. Ensuring good practices and sustainable use.

Data Monetization: Data monetization involves leveraging data assets to generate revenue, for example, by selling data to other organizations. In real estate management, this would be equivalent to finding ways to generate income from a property, such as renting space out to tenants or for an event, selling advertising space, or selling it altogether.

  • EDO: Supports Data Monetization by ensuring the reliability and quality of the data being sold, similar to how a well-maintained and monitored building is more attractive to potential tenants or buyers. With many enterprises looking to monetize their data, and to create "Data Products", EDO becomes more even more critical. Some of the largest commercial Data Providers in the world rely on Acceldata EDO for the Data Assets they sell and supply to their customers.

Data Contract: A data contract is a formal agreement between a data producer and data consumer, confirming what data is to be exchanged and the corresponding formatting and quality requirements. This can be compared to a lease agreement, in which it is described what is expected of the landlord and in what state the property will be made available. It also outlined what the property can be used for (and specifically, what cannot be done to or with it) — the data contract can be used for similar purposes.

  • EDO: Through Real-time Property Reliability and Alerting, ensures that both parties in a data contract can trust the data being exchanged, much like a secure and well-maintained property fosters trust between landlord and tenant. EDO's can also be set up to ensure compliance and adherence, by alerting parties immediately if there is any deviation from the rules set forth within the contract.

Value Quantification: In both cases, it is a worthwhile exercise to estimate the value associated with the asset. Just as the value of a property depends on its location, size and condition, the value of data depends on its relevance, accuracy and accessibility.

  • EDO: Provides metrics that assist in the accurate Value Quantification of data assets, similar to how property valuation would include assessments of building condition and location. The operating cost optimization capabilities also ensures that the effort and investment is within the budget and bounds of the outcome. This is particularly critical in the age of #genai where the processing and effort required to run Large Language Models #LLMs isn't yet fully known

Data Security and Access Controls: Data security refers to the protection of data assets from unauthorized access, use or disclosure. In real estate management, data security can be compared to the use of locks, alarms and security systems to protect a property from theft or vandalism.

  • EDO: With its capabilities for Alerting and Proactive Response, an EDO can be configured to detect unusual changed behavior or "drifts" in the data or schema. While it's not directly tied into cyber security or hacking at this time, it can immediately notify of anomalies that could signal wider problems. Including potential insider threat of manipulation of data leading to falsification of critical reports for malicious purposes.

Data Architecture: This can be compared to the blueprint of a property, which defines the layout, design and construction of the building. Similarly, data architecture involves the design and structure of data storage and retrieval systems. Architecture standards can provide guidelines and best practices for how buildings are constructed, and data architecture standards do the same for data assets.

  • EDO: Wiring & Plumbing Monitoring aka Data pipeline observability ensures that the data architecture and data flow is robust and functioning as designed. Again any change in a building's blueprint, could indicate Schema drift. And any interruption in data flow through pipelines would be akin to blockage in a building's plumbing, or interruption in electrical service to critical appliances.

Data Domains: Just as a city is divided into neighborhoods, data can be divided into domains based on its subject matter. Any property belongs to a single neighborhood, and together, all neighborhoods include all properties — the same holds for data assets and domains. Each neighborhood has its own characteristics, such as demographics and property values, and similarly, each data domain has its own attributes and requirements. An organization like a Homeowners Association (equivalent to data domain owners or stewards) can be chartered to oversee that these requirements are implemented.

  • EDO: Through Real-time Property Reliability and Anomaly Detection, ensures that each data domain maintains its integrity and quality, similar to how a Homeowners Association would oversee the well-being of a neighborhood.

Data Policies & Standards and Regulatory Compliance: This can be compared to the different regulations that govern the use and development of properties, such as zoning laws, environmental regulations, and building and fire codes. Similarly, data policies and standards define the rules for managing data in an organization, which are derived from applicable regulations such as those related to data privacy and data protection.

  • EDO: Helps in ensuring Regulatory Compliance by continuously monitoring data against established policies and standards, much like regular inspections would ensure a building is up to code, except in the case of EDO, this is done continuously in real-time. So compliance violations can be immediately dealt with as soon as they occur.

Metadata Management: Metadata is data about the data — it can describe the data asset in terms of the data attributes it contains, who owns it, who has access, who did access it and when, its location, how many records there are, and the size of the total asset. It can be compared to detailed information about a property and its features, for example, the total square and cubic footage, the owner, the number of rooms, its location and who has keys to the building.

  • EDO: Capabilities in Wiring & Plumbing Monitoring can be extend to Metadata Management, ensuring that metadata is accurate and up-to-date, similar to how a property's records would be meticulously maintained. Again identifying and alerting to potential Schema drift is one such example.

Data Quality: Data quality refers to the fitness-for-purpose of data as measured along dimensions like accuracy, completeness and consistency. In real estate management, data quality can be compared to the condition and upkeep of a property, such as whether it has any defects or safety hazards.

  • EDO: Through Anomaly Detection and Proactive Response, EDO can elevate Data Quality by addressing potential issues much earlier in the flow of data. Unlike regular inspections that ensures a property is in good condition, and fixes them when it finds flaws, EDO operates in a continuous mode of detecting not just malformed data, but also predicting possible issues with the data, even before they occur, and can have significant downstream impact. An example might be that EDO could observe a crack in a pipe, and alert to a potential water break, while traditional Data Quality, might only fix the pipe once the leak is detected and starts to occur.

Data Remediation: Data remediation refers to the process of identifying and correcting data quality issues. In real estate management, data remediation can be compared to the process of identifying and correcting property defects, such as a leaky roof or a faulty foundation, to maintain the value and safety of a property.

  • EDO: Supports Data Remediation by quickly identifying issues through Anomaly Detection. In practice, EDO becomes the "canary in the coal mine" predicting, and alerting to issues that may cause significant Data Quality issues. Many Data Remediation efforts could be reduced if an EDO provides early detection, and prevents bad data from entering and polluting the pipelines and downstream systems that rely on the information.

Data Usage: This can be compared to the measurement of the usage of properties, which helps in determining their potential value. This includes occupancy rates but perhaps even more detailed logs of who entered the building, when, and for how long. Similarly, data usage measurement involves tracking and measuring how and by whom data is used in an organization, and to what extent data assets are adopted.

  • EDO: Provides detailed metrics on not just Data Usage, aiding in value assessment, but also in the flow of data, as well as the operating cost, so that the most efficient data query and processing can be applied to achieve the insights required for the business. Without EDO data usage cost could be untenable, leading to poor margins for the business. In some cases you might even say that the data processing "carbon footprint" can be significantly minimized through the use of EDO.

Interoperability: This can be compared to the compatibility of a property with other properties and (upstream or downstream) systems, and its ability to share common infrastructure or resources. For example, a building is connected to the electrical grid, water network and sewage system, where each of these connections comes with precisely defined standards in terms of voltage, water pressure and pipeline sizes, and sewage standards. In a similar sense, data interoperability refers to the ability of the asset to exchange data and work together seamlessly with various other systems and applications, subject to common standards.

  • EDO: Ensures Interoperability by monitoring how well data assets interact with other systems through data pipeline flow, schema mapping/drifts, and freshness, and throughput of the data coming from different formats and sources.

Data Storage: Data storage can be compared to the physical size and foundational structure of a property. A property might have to be of a certain minimum size, for example to accommodate industrial machines or to house families of a certain size. Similarly, data storage refers to the physical or virtual storage capacity in databases, data warehouses or data lakes.

  • EDO: Through Operating Optimization, Enterprise Data Observability can help in efficiently managing Data Storage, by identifying data that is not being used at all over a period of time, and could therefore be archived or removed. Similar to how a property manager would optimize and free the use of building space.

Data Lifecycle: This can be compared to the life cycle of a property, which involves various stages such as construction, maintenance, renovation, and demolition. Similarly, data lifecycle management involves managing data through various stages such as creation, storage, usage, archiving and disposal.

  • EDO: provides proactive alerts and insights throughout the Data Lifecycle, ensuring optimal performance and value retention, and providing lineage that can be used to troubleshoot, forming also a historical baseline for compliance and anomaly detection.

Data Integration: Different properties and neighborhoods are connected by roads and transportation systems. A particular building may provide easy access to public transport and a nearby highway. Data integration involves connecting data from different domains and sources, which can involve tasks such as data cleansing, data mapping and data transformation to ensure that data from different systems can be used together. Without integration, you can’t access or use the data, the same way you would not be able to enter or make use of a building.

  • EDO: ensures continuous observability of existing and newly introduced Data Integrations by monitoring new data sources, data flows, and transformations. Similar to how new roads and new traffic flows are created, EDOs can be set up to automatically apply appropriate rules to new integrations and new sources in a manner of days, compared to weeks or months using legacy and manual efforts

Before I conclude my article, please Follow Willem Koenders of ZS Associates and like (clap) his Medium article "The best way to explain data governance to beginners". Without his brilliant analogy, I would not have been able to layer in Enterprise Data Observability concepts in a simplistic manner.

The Future of EDO

EDO is already gaining wide acceptance. Gartner continues to advance EDO in it's hype cycle for data management. And Acceldata was featured in no less than 12 Hype cycles due to the prominence of EDO, and it's applicability across a wide range of disciplines and industries including #FinOps.

For those who might still be confused about the difference between EDO and Data Quality. Gartner again has published a report describing how EDO makes DQ better, and the best practices that enterprises should adopt today. You can get a free copy of that report here.

Thank you for reading this lengthy article, and thank you to Willem Koenders for his inspiration and genius that made this adaptation possible.

Dhruv T.

Business Development & Account Manager | Strategic Partnerships, Mergers

1 年

Using a building analogy to explain data governance and EDO is really smart. It shows how important EDO is in managing data and helps people understand the complex world of data governance. Thanks for sharing ?? Ramon Chen!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了