Data Product Thinking and Data Product Managers
This is part of a Data Mesh blog series here on the LinkedIn articles platform. I am basing this series of posts on content I developed for the youtube playlist and the Oracle technical paper about Dynamic Data Fabric and Trusted Data Mesh.
Data as Capital
For long time data management pros like myself, nothing gets us going quite like a good discussion about data monetization, data capital and you know – data is the new oil!
As ‘data people’ we’ve been preaching the primacy of data for decades. Gradually, the rest of the world is catching on. In the 1990’s the term “information economy” was coined, by 2006 Clive Humby first uttered the words, “data is the new oil!”. The now famous 2017 Economist article declared that, “the world’s most valuable resource is no longer oil, but data”. Others have modified this data-as-oil metaphor to make their own points that data is like nuclear power, or that data is the new solar.
Doug Laney tells a particularly powerful origin story about how following 9/11 he was receiving client calls about insurance companies reimbursing hardware claims (from losses due to the tragedy in NYC) but refusing claims on data because they could not place a value on it. Laney started to enquire why insurers didn’t consider information an asset, and he has since gone on to become the most visible proponent of Infonomics. As a next step, the emerging accounting practices for Data Valuation may someday institutionalize how we ascribe cash value to data capital. Studies like the 2020 Value of Data from Cambridge are going a long ways towards making data value accounting mainstream.
Data Product Thinking
So, for old-timers like me, why does Data Product Thinking seem fresh and cool? We’ve spent the past 20+ years promoting Data Governance, Infonomics and Value of Data… we get it, right?
Well, yes and no. I for one really like this new turn of phrase – Data Product Thinking – precisely because it changes the point of emphasis around data value. Here’s what I mean:
- An “asset” is something to be governed and managed
- A “capital asset” is something to be counted and accounted for
These attributes are certainly true of data, but they lead us down the path with a bias towards governing/managing (eg; Data Stewardship) or counting and valuation (eg; Data Monetization). These are fine and great and necessary. But the concept of a Data Product is placing emphasis – a bias in thinking - on something altogether different:
- A “product” is something that fulfills a need, it does a job
By changing the words and the way we talk about the ideas, we can shift our mindset ever so slightly towards an action-oriented posture. The ‘product bias’ is towards action – the act of creating something that does a job. For practitioners and data professionals, data isn’t just about being an asset (it is!), or whether is has value or not (it does!), the real trick is figuring out how to repeatedly and reliably make data provide something useful – the act of ‘data productization’ is like the refinery is to oil or the factory is to raw materials.
Data Product Thinking as I see it is the union of two powerful concepts, (1) Design Thinking, and (2) Data Products. The concept of data products really took off in the data science community as a way to bring focus back to the business goals of the AI/ML projects. Initially, the phrase ‘data product’ mostly meant something specific to data science, but gradually over time the term is taking on a broader meaning in the context of Data Management practices overall, even Harvard Business Review has published advice on How to Build Great Data Products.
Design Thinking is the internationally famous methodology to help designers create outstanding products (digital or physical). Back in the 1990’s I was an avid fan of IDEO, who took human-centered design and design thinking methodology mainstream and into the tech domain in particular. And, as the product head of the Information Integration & Governance area at IBM, I was personally trained in IBM’s Design Thinking methodology and can attest to the immense value it brings to product owners focused on outcomes rather than features.
I could go on and on about Design Thinking, but if you choose to pull only one adjacent thread from this rich area of content, the one you will want to investigate further is JTBD Theory…
Know Your Customers’ “Jobs to be Done” (JTBD) Theory
Clayton Chistensen of Innovators Dilemma and disruptive-innovation fame (I mean, come on… how many business books still feel this fresh after 25yrs?) is also the guy to take JTBD Theory mainstream. Christensen asks the question whether innovative, great products are a just luck or if there is a repeatable ‘formula’ that may be applied. His answer, is that if you truly understand your customers “Jobs to be Done” then you are far more likely to discover and build great products.
To build great products of any kind we must examine the needs of our customers. In some cases, that can counterintuitively mean looking past what your customers ask for. As Henry Ford reputedly said about designing the Model T, “if I asked customers what they wanted, they would have said a faster horse.” Likewise, when asked about creating the iPhone, Steve Jobs said, “Some people say give the customers what they want, but that's not my approach. Our job is to figure out what they're going to want before they do. […] People don't know what they want until you show it to them.” In JTBD Theory, the word ‘Job’ is shorthand for what an individual seeks to accomplish, what their core needs are – and oftentimes that means looking deeper than ‘enhancement requests’ and ‘product focus groups.’
Using JTBD helps to drive customer-centric thinking as we design and develop Data Products. This focus on needs (‘Jobs’) enables a systematic approach for working towards a concrete solution. In Clayton Christensen’s book, Competing Against Luck, he says, “Innovation becomes much more predictable — and far more profitable — when it begins with a deep understanding of the job the customer is trying to get done.” This fantastic Product Thinking 101 post on UX Planet is a great primer on how to apply JTBD Theory to your digital projects.
Data Products and Data Product Managers
The mission at Udacity is to “train the world’s workforce in the careers of the future…” and with their focus on AI and Data Science, they’ve been an early proponent of the Data Product Manager role. Other’s have simply declared that we are already in the new “Age of Data Product Managers!” I like Udacity’s simple definition of the role:
A Data Product Manager oversees the full lifecycle of how data is used within an organization. The role is similar to that of a traditional PM, but instead of treating data like raw material, it’s treated as a product.
But what exactly is a data product?
Organizations of even moderate size are swimming (drowning?) in data. Obviously, not all data is a data product. Building on the language of JTBD Theory, I’m going to say that:
Data Products are data-centric, digital artifacts that are produced and consumed in a multi-party exchange for the purposes of meeting the needs of a job that the consumer is trying to get done. Further, the lifecycle controls of data products closely mirror those of physical products in the commercial realm.
Not all data is exchanged between different entities, and not all data that is exchanged is for the purposes of a specific job to be done (eg; a business outcome). Examples of data products may include:
- Analytics – historic/real-time reports, dashboards, charts and data visualizations
- Models – domain objects, schema and data models (ER, Ontology, Taxonomy, XSD, etc), object models (UML), Machine Learning features and attributes
- Algorithms – Machine Learning models (production models), scoring, business rules (RETE, DROOLS, etc)
- Data Services & APIs – Payloads (JSON, Avro, Protobuf), topics/queues (JMS, Kafka, Pulsar, etc), REST APIs (gateways and contracts), DDD Aggregates (eg; serialized payloads)
Importantly, a data product doesn't imply a heavy-handed amount of processing. Just as raw materials are input products to a supply chain, some kinds of raw data can also be a data product as input to a data supply chain. Other, perhaps more conventional, data products may be highly refined data objects that are 'mastered' and aggregated for visibility in C-level reporting tools.
Ultimately, the exact definition of whether a particular data (datum?) should rise to the level of being a Data Product is up to your organization and whether the added rigor around data product management will help with the innovation, quality and governance of the multi-party exchange. There will inevitably be a lot of data flowing which is not managed ‘as a product.’
So what exactly do we do differently with data products?
My go-to metaphor for this is to ask you to imagine walking down the aisle at your local grocery store. When you grab a product off the shelf at the store, think about all the ‘productization’ work that has gone into that product. Everything from the supply chain of the raw ingredients, sustainability practices, packaging, documentation, regulatory standards, quality controls, marketing and competitive positioning, lifecycle controls and detailed tracking of provenance, etc etc. Even the simplest products in your local stores have 1,000’s of person hours of work behind them.
Data products are like that. In order to rise to the standard of being a ‘product’, the data must not only have a ‘job to be done’ but it must also exist within a lifecycle that has the familiar attributes of physical products in the commercial realm.
Example attributes of a data product:
- Packaging, Documentation – how is it consumed? is the use self-evident, or are instructions required? is the packaging self-describing?
- Purpose & Value – implicit/explicit value? does it have a depreciation schedule?
- Quality, Consistency – KPIs and SLAs of usage? how repeatable is the data product?
- Safety & Security – should the packaging be tamper-proof? do users need to be validated/trusted?
- Provenance, Lifecycle & Governance – trust & explainability? can you accurately explain where the data, and data parts came from? how the inferred / derived data was calculated?
In the classical data management realm of Data Governance we’ve often dealt with questions like these. But Data Governance concepts have typically been addressed in a fairly passive, librarian-esqe way of approaching the problem. Lot’s of attention around ‘data catalogs’ and ‘curating data’ as if data just sort of arrives in to a data museum of sorts, and then we need curators and stewards to neatly place the data on digital shelves to make it easier to find. Even the more consumer friendly concepts of “shopping for data”, which we explored back in my IBM days circa 2011/12, are still focusing on the shopper and the act shopping, rather than the act of producing the data products.
Some may say, “bah, pedantic semantics!” but I think words matter.
Data Product Thinking brings the focus to the act of creating and the production of data products, rather than the more passive roles of governing, stewarding, cataloging, managing, curating, or shopping for data.
In the Context of a Data Mesh
As you can tell already, one of the reasons I’ve embraced the new lingo around “Data Mesh” is that is gives us the opportunity to reset the discussion towards a more action-biased way of speaking about data products. We can put the emphasis on the lifecycle of creating and producing great data products, without having to throw away any of the tried-and-true elements of good data governance principles.
A couple of weeks ago, the data team at Intuit published a nice blog about their journey with the Data Mesh. Intuit are heavy users of Oracle GoldenGate for some of their data mesh layers, but I particularly enjoyed their recent description of how they achieved their business domain modeling alignment with data products.
You can see in the picture above how the general pattern of a business domain is then applied to specific data realms such as Identity, Product Entitlements, Customer Analytics and Product Analytics. Each business domain is comprised of key stakeholders (which could include IT and/or the Business function), as well as data processes, pipelines, data stores, and the data consumption APIs.
I think this is an excellent example of how to apply Data Product Thinking into a repeatable and modular lifecycle that supports a tangible ‘Job to be Done’.
Here at Oracle, we are working on the GoldenGate microservices platform for streaming data as a means to cover operational data products (eg; highly available data stores, data migrations, transaction outbox, etc.) as well as analytic data products (eg; curated change data, data lake ingestion, time series analytics etc.). Some examples of data products that GoldenGate provides are:
Within the GoldenGate platform our users can create and govern the full lifecycle of data products, or simply replicate real-time data events into other 3rd party tools used for data products.
In Conclusion
In the Oracle technical paper about Dynamic Data Fabric and Trusted Data Mesh that this whole blog series is about, we fully embrace the notion of Data Product Thinking and I hope that this post has explained why. Like Design Thinking and JTBD Theory, the action-bias is towards producing data products that are focused on high-value outcomes. The words we use drive active focus on the creation and production of valuable data products, and not just on the governance, stewardship, curation and accounting elements of data value.
So, for those of us that are already living in the post-‘data is the new oil!’ world, the concepts of Data Product Thinking help to place focus on the data engineering tasks necessary to build this brave new world where data product innovations are rigorously conceived, created and produced by participants (ie, Data Product Managers and Data Engineers) who have an active hand in making data value-creation a repeatable, productized process.
Great piece!
Enterprise architecture supports and enable business to be successful
3 年Thanks for a good article and the linked Intuit blogpost was really good, I like the way they visualize the different kind of capabilities and how the domains are using them, in our own work in this area we name domain service encapsulation Intuit also have a very good framework, defining criteria or principles of what needs to be in place to offer a data product
Research Director at IDC, AI & Automation
3 年This is a compelling blog. Clearly, the way we think about Data Products is integral to enabling the reusability and extensibility of AI services. It also makes us think differently about the automation of data catalogs (perhaps more correctly "Data Product Catalogs") and the discovery and recommendation process with ML-enabled catalogs.
Dad | Dog Wrangler | CEO, Architect, Developer
4 年This entire series is a gold mine of information. Touching on product thinking, cultural aspects of bringing a Data Mesh into a company and diving deep into what is a Data mesh is on the technical front and what it’s not. Thank you for putting together this information Jeff! Also give a big thanks to the team that put together the graphics to go along with the presentation - well done.