登录查看更多内容

Data Product Thinking and Data Product Managers

Jeffrey T. Pollock

Vice President Product Development

发布日期: 2021年2月28日

This is part of a Data Mesh blog series here on the LinkedIn articles platform. I am basing this series of posts on content I developed for the youtube playlist and the Oracle technical paper about Dynamic Data Fabric and Trusted Data Mesh.

Data as Capital

For long time data management pros like myself, nothing gets us going quite like a good discussion about data monetization, data capital and you know – data is the new oil!

As ‘data people’ we’ve been preaching the primacy of data for decades. Gradually, the rest of the world is catching on. In the 1990’s the term “information economy” was coined, by 2006 Clive Humby first uttered the words, “data is the new oil!”. The now famous 2017 Economist article declared that, “the world’s most valuable resource is no longer oil, but data”. Others have modified this data-as-oil metaphor to make their own points that data is like nuclear power, or that data is the new solar.

Doug Laney tells a particularly powerful origin story about how following 9/11 he was receiving client calls about insurance companies reimbursing hardware claims (from losses due to the tragedy in NYC) but refusing claims on data because they could not place a value on it. Laney started to enquire why insurers didn’t consider information an asset, and he has since gone on to become the most visible proponent of Infonomics. As a next step, the emerging accounting practices for Data Valuation may someday institutionalize how we ascribe cash value to data capital. Studies like the 2020 Value of Data from Cambridge are going a long ways towards making data value accounting mainstream.

Data Product Thinking

So, for old-timers like me, why does Data Product Thinking seem fresh and cool? We’ve spent the past 20+ years promoting Data Governance, Infonomics and Value of Data… we get it, right?

Well, yes and no. I for one really like this new turn of phrase – Data Product Thinking – precisely because it changes the point of emphasis around data value. Here’s what I mean:

An “asset” is something to be governed and managed
A “capital asset” is something to be counted and accounted for

These attributes are certainly true of data, but they lead us down the path with a bias towards governing/managing (eg; Data Stewardship) or counting and valuation (eg; Data Monetization). These are fine and great and necessary. But the concept of a Data Product is placing emphasis – a bias in thinking - on something altogether different:

A “product” is something that fulfills a need, it does a job

By changing the words and the way we talk about the ideas, we can shift our mindset ever so slightly towards an action-oriented posture. The ‘product bias’ is towards action – the act of creating something that does a job. For practitioners and data professionals, data isn’t just about being an asset (it is!), or whether is has value or not (it does!), the real trick is figuring out how to repeatedly and reliably make data provide something useful – the act of ‘data productization’ is like the refinery is to oil or the factory is to raw materials.

Data Product Thinking is the union of Design Thinking and Data Product concepts

Data Product Thinking as I see it is the union of two powerful concepts, (1) Design Thinking, and (2) Data Products. The concept of data products really took off in the data science community as a way to bring focus back to the business goals of the AI/ML projects. Initially, the phrase ‘data product’ mostly meant something specific to data science, but gradually over time the term is taking on a broader meaning in the context of Data Management practices overall, even Harvard Business Review has published advice on How to Build Great Data Products.

Design Thinking is the internationally famous methodology to help designers create outstanding products (digital or physical). Back in the 1990’s I was an avid fan of IDEO, who took human-centered design and design thinking methodology mainstream and into the tech domain in particular. And, as the product head of the Information Integration & Governance area at IBM, I was personally trained in IBM’s Design Thinking methodology and can attest to the immense value it brings to product owners focused on outcomes rather than features.

I could go on and on about Design Thinking, but if you choose to pull only one adjacent thread from this rich area of content, the one you will want to investigate further is JTBD Theory…

Know Your Customers’ “Jobs to be Done” (JTBD) Theory

Clayton Chistensen of Innovators Dilemma and disruptive-innovation fame (I mean, come on… how many business books still feel this fresh after 25yrs?) is also the guy to take JTBD Theory mainstream. Christensen asks the question whether innovative, great products are a just luck or if there is a repeatable ‘formula’ that may be applied. His answer, is that if you truly understand your customers “Jobs to be Done” then you are far more likely to discover and build great products.

To build great products of any kind we must examine the needs of our customers. In some cases, that can counterintuitively mean looking past what your customers ask for. As Henry Ford reputedly said about designing the Model T, “if I asked customers what they wanted, they would have said a faster horse.” Likewise, when asked about creating the iPhone, Steve Jobs said, “Some people say give the customers what they want, but that's not my approach. Our job is to figure out what they're going to want before they do. […] People don't know what they want until you show it to them.” In JTBD Theory, the word ‘Job’ is shorthand for what an individual seeks to accomplish, what their core needs are – and oftentimes that means looking deeper than ‘enhancement requests’ and ‘product focus groups.’

Using JTBD helps to drive customer-centric thinking as we design and develop Data Products. This focus on needs (‘Jobs’) enables a systematic approach for working towards a concrete solution. In Clayton Christensen’s book, Competing Against Luck, he says, “Innovation becomes much more predictable — and far more profitable — when it begins with a deep understanding of the job the customer is trying to get done.” This fantastic Product Thinking 101 post on UX Planet is a great primer on how to apply JTBD Theory to your digital projects.

Data Products and Data Product Managers

The mission at Udacity is to “train the world’s workforce in the careers of the future…” and with their focus on AI and Data Science, they’ve been an early proponent of the Data Product Manager role. Other’s have simply declared that we are already in the new “Age of Data Product Managers!” I like Udacity’s simple definition of the role:

A Data Product Manager oversees the full lifecycle of how data is used within an organization. The role is similar to that of a traditional PM, but instead of treating data like raw material, it’s treated as a product.

But what exactly is a data product?

Organizations of even moderate size are swimming (drowning?) in data. Obviously, not all data is a data product. Building on the language of JTBD Theory, I’m going to say that:

Data Products are data-centric, digital artifacts that are produced and consumed in a multi-party exchange for the purposes of meeting the needs of a job that the consumer is trying to get done. Further, the lifecycle controls of data products closely mirror those of physical products in the commercial realm.

Not all data is exchanged between different entities, and not all data that is exchanged is for the purposes of a specific job to be done (eg; a business outcome). Examples of data products may include:

Analytics – historic/real-time reports, dashboards, charts and data visualizations
Models – domain objects, schema and data models (ER, Ontology, Taxonomy, XSD, etc), object models (UML), Machine Learning features and attributes
Algorithms – Machine Learning models (production models), scoring, business rules (RETE, DROOLS, etc)
Data Services & APIs – Payloads (JSON, Avro, Protobuf), topics/queues (JMS, Kafka, Pulsar, etc), REST APIs (gateways and contracts), DDD Aggregates (eg; serialized payloads)

Importantly, a data product doesn't imply a heavy-handed amount of processing. Just as raw materials are input products to a supply chain, some kinds of raw data can also be a data product as input to a data supply chain. Other, perhaps more conventional, data products may be highly refined data objects that are 'mastered' and aggregated for visibility in C-level reporting tools.

Ultimately, the exact definition of whether a particular data (datum?) should rise to the level of being a Data Product is up to your organization and whether the added rigor around data product management will help with the innovation, quality and governance of the multi-party exchange. There will inevitably be a lot of data flowing which is not managed ‘as a product.’

So what exactly do we do differently with data products?

My go-to metaphor for this is to ask you to imagine walking down the aisle at your local grocery store. When you grab a product off the shelf at the store, think about all the ‘productization’ work that has gone into that product. Everything from the supply chain of the raw ingredients, sustainability practices, packaging, documentation, regulatory standards, quality controls, marketing and competitive positioning, lifecycle controls and detailed tracking of provenance, etc etc. Even the simplest products in your local stores have 1,000’s of person hours of work behind them.

A shopper shopping for products in a store aisle

Data products are like that. In order to rise to the standard of being a ‘product’, the data must not only have a ‘job to be done’ but it must also exist within a lifecycle that has the familiar attributes of physical products in the commercial realm.

Example attributes of a data product:

Packaging, Documentation – how is it consumed? is the use self-evident, or are instructions required? is the packaging self-describing?
Purpose & Value – implicit/explicit value? does it have a depreciation schedule?
Quality, Consistency – KPIs and SLAs of usage? how repeatable is the data product?
Safety & Security – should the packaging be tamper-proof? do users need to be validated/trusted?
Provenance, Lifecycle & Governance – trust & explainability? can you accurately explain where the data, and data parts came from? how the inferred / derived data was calculated?

In the classical data management realm of Data Governance we’ve often dealt with questions like these. But Data Governance concepts have typically been addressed in a fairly passive, librarian-esqe way of approaching the problem. Lot’s of attention around ‘data catalogs’ and ‘curating data’ as if data just sort of arrives in to a data museum of sorts, and then we need curators and stewards to neatly place the data on digital shelves to make it easier to find. Even the more consumer friendly concepts of “shopping for data”, which we explored back in my IBM days circa 2011/12, are still focusing on the shopper and the act shopping, rather than the act of producing the data products.

Some may say, “bah, pedantic semantics!” but I think words matter.

Data Product Thinking brings the focus to the act of creating and the production of data products, rather than the more passive roles of governing, stewarding, cataloging, managing, curating, or shopping for data.

In the Context of a Data Mesh

As you can tell already, one of the reasons I’ve embraced the new lingo around “Data Mesh” is that is gives us the opportunity to reset the discussion towards a more action-biased way of speaking about data products. We can put the emphasis on the lifecycle of creating and producing great data products, without having to throw away any of the tried-and-true elements of good data governance principles.

A couple of weeks ago, the data team at Intuit published a nice blog about their journey with the Data Mesh. Intuit are heavy users of Oracle GoldenGate for some of their data mesh layers, but I particularly enjoyed their recent description of how they achieved their business domain modeling alignment with data products.

You can see in the picture above how the general pattern of a business domain is then applied to specific data realms such as Identity, Product Entitlements, Customer Analytics and Product Analytics. Each business domain is comprised of key stakeholders (which could include IT and/or the Business function), as well as data processes, pipelines, data stores, and the data consumption APIs.

I think this is an excellent example of how to apply Data Product Thinking into a repeatable and modular lifecycle that supports a tangible ‘Job to be Done’.

Here at Oracle, we are working on the GoldenGate microservices platform for streaming data as a means to cover operational data products (eg; highly available data stores, data migrations, transaction outbox, etc.) as well as analytic data products (eg; curated change data, data lake ingestion, time series analytics etc.). Some examples of data products that GoldenGate provides are:

Within the GoldenGate platform our users can create and govern the full lifecycle of data products, or simply replicate real-time data events into other 3rd party tools used for data products.

In Conclusion

In the Oracle technical paper about Dynamic Data Fabric and Trusted Data Mesh that this whole blog series is about, we fully embrace the notion of Data Product Thinking and I hope that this post has explained why. Like Design Thinking and JTBD Theory, the action-bias is towards producing data products that are focused on high-value outcomes. The words we use drive active focus on the creation and production of valuable data products, and not just on the governance, stewardship, curation and accounting elements of data value.

So, for those of us that are already living in the post-‘data is the new oil!’ world, the concepts of Data Product Thinking help to place focus on the data engineering tasks necessary to build this brave new world where data product innovations are rigorously conceived, created and produced by participants (ie, Data Product Managers and Data Engineers) who have an active hand in making data value-creation a repeatable, productized process.

Ron Urwongse

3 年

Great piece!

Max B?reb?ck

Enterprise architecture supports and enable business to be successful

3 年

Thanks for a good article and the linked Intuit blogpost was really good, I like the way they visualize the different kind of capabilities and how the domains are using them, in our own work in this area we name domain service encapsulation Intuit also have a very good framework, defining criteria or principles of what needs to be in place to offer a data product

Tim Law

Research Director at IDC, AI & Automation

3 年

This is a compelling blog. Clearly, the way we think about Data Products is integral to enabling the reusability and extensibility of AI services. It also makes us think differently about the automation of data catalogs (perhaps more correctly "Data Product Catalogs") and the discovery and recommendation process with ML-enabled catalogs.

Shaun Thompson

Dad | Dog Wrangler | CEO, Architect, Developer

4 年

This entire series is a gold mine of information. Touching on product thinking, cultural aspects of bringing a Data Mesh into a company and diving deep into what is a Data mesh is on the technical front and what it’s not. Thank you for putting together this information Jeff! Also give a big thanks to the team that put together the graphics to go along with the presentation - well done.

1 次回应

查看更多评论

要查看或添加评论，请登录

Jeffrey T. Pollock的更多文章

Data Mesh is not a Data Lake!

2021年7月27日

Data Mesh is not a Data Lake!

Data Mesh is not a Data Lake. Nor is it a Data Lakehouse, or a Data Warehouse.

6 条评论
Data Events: Trust, Transactions and ACID Properties

2021年5月6日

Data Events: Trust, Transactions and ACID Properties

It’s been more than a month since my last post in the Data Mesh blog series. April went by in a flash, I think I was…

4 条评论
Trusted, Polyglot Data Streams

2021年3月29日

Trusted, Polyglot Data Streams

This is part of a Data Mesh blog series here on the LinkedIn articles platform. I am basing this series of posts on…
Data Ledgers for Data Integration

2021年3月19日

Data Ledgers for Data Integration

This is part of a Data Mesh blog series here on the LinkedIn articles platform. I am basing this series of posts on…

6 条评论
Decentralized, Modular Data Mesh

2021年3月9日

Decentralized, Modular Data Mesh

This is part of a Data Mesh blog series here on the LinkedIn articles platform. I am basing this series of posts on…

2 条评论
Data Mesh: 2021 and Beyond

2021年2月22日

Data Mesh: 2021 and Beyond

This is the first of a multi-part series that I plan to cover here on the LinkedIn articles platform. I am basing this…

24 条评论

See all articles

Data Product Thinking and Data Product Managers

Jeffrey T. Pollock

Vice President Product Development

Data as Capital

Data Product Thinking

Know Your Customers’ “Jobs to be Done” (JTBD) Theory

Data Products and Data Product Managers

But what exactly is a data product?

So what exactly do we do differently with data products?

In the Context of a Data Mesh

In Conclusion

Jeffrey T. Pollock的更多文章

社区洞察

其他会员也浏览了

The Role of the NDMO Framework in Your Data Strategy

digna: Pioneering User-Friendly Data Quality Platform for the Modern Business

Data Governance and the Maturity Assessment Model

Revolutionizing Test Data Management: Achieving Near Zero Downtime with Dynamic Data Replicator

My Data Quality Notes

The Power of Data Governance in Asset-Intensive Organizations

A Pragmatic Approach to Building a Data Governance Framework

Data Stewardship: Combining Strategy and Technology for Competitive Advantage

D3Clarity Blog | Revealing the Secrets of Data Governance for Effective Data Management in a Data-Driven World

Standing on the Shoulders of Giants

Data as Capital

Data Product Thinking

Know Your Customers’ “Jobs to be Done” (JTBD) Theory

Data Products and Data Product Managers

But what exactly is a data product?

So what exactly do we do differently with data products?

In the Context of a Data Mesh

In Conclusion

Jeffrey T. Pollock的更多文章

Data Mesh is not a Data Lake!

Data Events: Trust, Transactions and ACID Properties

Trusted, Polyglot Data Streams

Data Ledgers for Data Integration

Decentralized, Modular Data Mesh

Data Mesh: 2021 and Beyond

社区洞察

其他会员也浏览了

The Role of the NDMO Framework in Your Data Strategy

digna: Pioneering User-Friendly Data Quality Platform for the Modern Business

Data Governance and the Maturity Assessment Model

Revolutionizing Test Data Management: Achieving Near Zero Downtime with Dynamic Data Replicator

My Data Quality Notes

The Power of Data Governance in Asset-Intensive Organizations

A Pragmatic Approach to Building a Data Governance Framework

Data Stewardship: Combining Strategy and Technology for Competitive Advantage

D3Clarity Blog | Revealing the Secrets of Data Governance for Effective Data Management in a Data-Driven World

Standing on the Shoulders of Giants