Let's collaborate to add "Data Product" to Wikipedia

Let's collaborate to add "Data Product" to Wikipedia

During the Enterprise Architecture Innovation Summit, 14th and 15th Dec, 2024, in Amsterdam, I found that the term Data Products though widely used still does not have a clear definition and this leads to everybody and their uncles and Aunties, assuming something or the other.

I asked the 50 or so senior, international architects to define 'Data Product', It was interesting to note that people where able to talk about its properties, but what is it? was a defenition missing in action.

I was not very surprised. In fact, some learned architects, did point out that there was no clarity on this in any literature and so on.

So wnen I got back home, I decided to see what Wikipedia says about the subject, I found 'DATA PRODUCTS' but it was about an enterprise that was manufacturing all kinds of infra products starting with punched cards and so on.

I think it a good idea and indeed the right time, to start the work to do so. Reading the Wikepedia guidelines I realise that to be valuable the defenition should have common ground and not push any individual or commercial agenda.

I invite you to work with me to ensure that we set up a good and reliable page. I think it is a good idea to Crowd Source this defenition.

So here is my proposed Draft. Before mounting it on Wikipedia, I would love to have your views and contributions to this.

Please note I also see the need to create a new role for Governance called The Data Product Owner, different from a Data Owner, who owns SoR Data.


A Data Product is a curated dataset or a combination of datasets, often derived from various systems of record and enriched with reference data, designed to meet specific business needs or analytical objectives.

This concept emphasizes treating data as a product, ensuring it is managed, maintained, and delivered with a focus on timeliness, quality, usability, and value creation.

Key Characteristics of Data Products:

  • Purpose-Driven: Developed to address particular business questions or analytical requirements.
  • Curated and Enriched: Involves the integration of data from multiple sources, including systems of record and reference data, to provide comprehensive insights.
  • Managed Lifecycle: Subject to continuous management, including updates, quality assurance, and user feedback incorporation.
  • User-Centric Design: Structured to be easily accessible and interpretable by end-users, facilitating informed decision-making.
  • Ownership for Governance: Data Products have an Identifed "Data Product Owner" who owns the actions related to specifying the Schema and Quality Parameters for any source to add its data to the Product.

Examples of Data Products:

  • Trustworthy stream of data: used for analytics and or for integration between systems
  • Dashboards and Reports: Visual representations of key performance indicators (KPIs) derived from various data sources.
  • Recommendation Engines: Systems that analyze user behavior and preferences to suggest products or services.
  • Predictive Models: Analytical tools that forecast future trends based on historical data.

Academic Perspectives:

The concept of data products has been explored in academic literature. For instance, Hasan and Legner (2023) define data products as “a managed artifact that satisfies recurring information needs and creates value through transforming and packaging relevant data elements into consumable form.” ?

Additionally, Microsoft’s Cloud Adoption Framework discusses the importance of treating data as a product to enhance data quality and usability. ?

Significance in Data Management:

Adopting a data product approach aligns with modern data management strategies, such as data mesh, which advocates for decentralized data ownership and treating data as a product to improve scalability and agility.

  • Focus on Interoperability: A data product is designed not just for individual use but to be interoperable across systems and usable by multiple teams or departments within an organization.
  • Embedded Quality and Trust: Quality is an inherent feature of a data product. It comes with metadata, documentation, and measures to ensure that users trust the data; Including lineage information to show where the data originates and how it was transformed.
  • Consumer-Centricity: Like any product, a data product prioritizes the end-user experience, ensuring ease of access, clarity, and relevance.
  • Automated Governance: Modern data products leverage automated governance tools to enforce policies, ensure compliance, and maintain data integrity without manual intervention.
  • Lifecycle Management: Data products are not static; they evolve through a managed lifecycle that includes: - Versioning to track changes, Retirement of outdated or irrelevant products. Feedback loops to incorporate user input and adapt to new requirements.
  • Business Value Alignment: A data product is purpose-driven, directly linked to business objectives or problems, making it easy to measure its impact.
  • Separation from Systems of Record: Data products are independent entities, derived from, but decoupled from, the underlying systems of record. This enables their reuse without dependency on the original applications.
  • Built for Analytics and Action: Data products are crafted to be directly usable for Analytical purposes (e.g., dashboards, machine learning models), Operational actions (e.g., triggering workflows or notifications).
  • Role in Modern Architectures: Data products are a foundational element in a Data Mesh: Enabling decentralized data ownership.
  • Event-Driven Architectures: Providing timely, event-based updates to consumers.
  • Data Fabric: Serving as building blocks in an interconnected data ecosystem.
  • Collaborative Autonomy: Data products enable teams to work autonomously, creating their own products while adhering to common organizational standards.

?References:

  • Hasan, S., & Legner, C. (2023). Data products, data mesh, and data fabric. Business & Information Systems Engineering. ?
  • Microsoft. (2024). What is a data product? - Cloud Adoption Framework. ?




Matja? Markelj

Enterprise IT architect

3 个月

Hi Sukumar, Thanks for sharing this great description. As you noticed, the definitions are still not totally clear. The one you proposed is very good and clear. To make it even clearer, I would recommend removing the part "often derived from various systems of record" because this is just a possibility and doesn't help with clarifying the definition. BR Matja?

回复

要查看或添加评论,请登录

Sukumar Daniel的更多文章

社区洞察

其他会员也浏览了