The Chief Data Officer's Playbook (review)
Nick Radcliffe
CEO, Stochastic Solutions ?Behaviour modelling ? Data Science ? Data Quality ? Sustainability ? Organizer, PyData Edinburgh.
The Chief Data Officer's Playbook, by Caroline Carruthers & Peter Jackson, is an essential guide to a new role by two of the most experienced hands in the business. It's pretty wide ranging and while its primary audience is current and aspiring CDOs, it also has a lot of guidance for CEOs and business owners thinking of hiring a CDO. Some of the information should be useful to anyone involved in data, especially those who have an interest in data governance, data lineage and (master) data management.
What is a CDO?
The book starts by explaining what a CDO is and why organisations might need one. They answer a common question—What's the difference between a CDO and a CIO?—using an analogy:
“The difference between a CIO and and CDO…is best described using the bucket and water analogy. The CIO is responsible for the bucket (technology), ensuring that it is complete without any holes in it, the bucket is the right size with just a little bit of spare room but not too much and it’s all in a safe place. The CDO is responsible for the liquid (the data) you put in the bucket, ensuring that it is the right liquid, the right amount and that it is not contaminated. The CDO is also responsible for what happens to the liquid; making it available when it’s needed.”
The authors emphasise that the CDO is a business role, rather than a technology role, and express a clear preference for the CDO to report to the CEO or another senior business function, rather than to the CIO. The CDO is a “cheerleader for data”, always smiling, always happy to explain master data management, and ever ready with coffee and cake, asking what keeps other business leaders awake at night.
Generations of CDOs
They make much of the difference between first-generation and second-generation (and later) CDOs. In their view a first-generation CDOs should be much less disruptive than later generations, focusing on setting up basic data governance, establishing data ownership and master data management, data strategy and finding some tactical quick wins. They quote Christopher Blood on a common problem:
“I suspect that one of the pitfalls to avoid is the scenario where the organisation feels they've 'outsourced' their data challenges by assigning ownership to someone with a fancy title — they can't fire and forget everything to do with data!” — Christopher Blood
Second-generation and later CDOs can take on longer-term projects transforming the organisation's use of and attitude to data, bring in more advanced analytical approaches and so forth.
How to be (and hire) a CDO
Jackson and Carruthers devote a lot of space to the nuts and bolts of relationship building, creating trust, finding the fires to put out (“burning platforms”), building a team and so forth. It's a very practical book, with a mixture of common sense and lessons obviously born of painful experience. They have a lot to say on the mix of skills a CDO needs and the different kinds of CDOs that might be appropriate in different situations. There's a lot of good information on roles in data governance in Chapter 6 (Relating to the Rest of the Business), though unfortunately some of it is in diagrams that are almost completely unreadable. But you can get the gist despite this.
Data Hoarding
The longest chapter in the book, Chapter 15, is also one that should be of interest well beyond the C-suite. It's also one of the deepest and most challenging. I don't agree with everything in it, but it is thought provoking and I think almost everyone would benefit from reading it.
In discussing data hoarding, Carruthers and Jackson are talking about two very different things—data duplication and the issue of what data should be kept, and for how long (data expiration).
They really don't like data duplication:
- “And this is before we get started on the Big Data agenda. Bigger does not equal better, and as with so many things in life, it’s what you do with it that counts. Yes, we can do some absolutely brilliant things with data; our unicorns* can be unleashed and come up with those flashes of brilliance. But they probably aren’t coming up with those flashes of brilliance based on 47 copies of the same photo you e-mailed to your friends … They are also not achieving them based on the fact that because you didn’t trust your IT you have copied that precious document onto your C drive and onto a shared drive, sent it to yourself so you have e-mail as a back-up — and created a copy on a flash drive stored in the back of your cupboard, all before we even talk about the weaknesses and lack of logic behind many enterprise storage solutions.” (*unicorn is their term for “that key element of your data science team that equally understands the technology, data science and business”)
All of this, of course, is fair and true. But we've all seen organisations where almost nothing can be achieved with data because it is over-centralised, access is difficult, you need transformations that can't be done centrally etc. Carruthers & Jackson are aware of this, of course, and do say “Governance doesn't only have to be about control, it can be about enabling the organisation to move forward.” But as with other things, duplication is sometimes absolutely the answer, albeit preferably in the context of good master data management. A good example is modern version control systems (git and friends), which are completely distributed, in contrast with the older systems (sccs, rcs, cvs etc.) where everyone used a central repository. With git, every developer has a local copy of the entire database (history as well as the current state), and git has absolutely destroyed centralised version control systems because it is demonstrably more effective (despite have a famously “challenging” user interface). But of course, git has powerful mechanisms to ensure that different copies can stay in sync, as necessary.
The other part of data hoarding is the question of what data should the organisation actually gather and keep, and for how long. They have a nice phrase:
“A small governed data puddle will be more use to you than a large contaminated data swamp.”
Based on our work at Stochastic Solutions, I'd certainly endorse that: more data is usually better, but only if you know what it means, where it comes from and how it was collected.
In Chapter 15, the authors make a powerful case for not mindlessly keeping all data forever, but carefully considering the value of each kind of data and how long it needs to be kept. They refer to data as an asset that should be valued, and (though they don't use this language), they also discuss the flip-side, which is that data can turn into a liability, particularly if it is personal data that should not be held after a certain time, or cannot be justified, or is unreliable or erroneous.
Their suggested tests for assessing the value of data to an organisation are:
- Are you the only company that has it?
- How accurate is it?
- How complete it is?
- Is it up to date?
- How relevant is it?
- What are you using it for?
- Can you use it for more than you currently use it for?
- What would happen if you lost it?”
Carruthers & Jackson also make the important point that if you want to delete data it's important to have a data owner identified, or it will be really hard to get agreement:
“It's even harder to get rid of something if no one will take ownership of it! ... No one wants to take ownership of the information/data because they sit in silos and while it enters their silo it also leaves. Why would they take responsibility for something they have no control over? What if they make a decision about the management of the data, such as deletion, and it was the wrong decision. It's much safer to keep it 'just in case' than to take the risk of deleting it and then finding it was critical to someone else in the business.”
Decisions to delete data (when it's not required to be deleted) are always going to be tough, but chapter 15 is an invaluable aid to thinking through the data hoarding issue, wherever you stand on that.
Tinker, Tailor, Soldier, Data ...
There's lots more in the book, including an interesting chapter with a few stories from Tim Carmichael. These are quite amusing, but more importantly, each illustrates a data transformation a CDO might want to foster with an old world situation in which there was similar change. It's quite effective. I won't spoil it, but the end of the "Soldier" story is particularly good.
Who Should Read It?
If you're an aspiring CDO, or a new CDO, or have a senior data governance role, you should definitely read this book. CEOs/COOs/CIOs who are thinking of hiring a CDO would also do well to read it. (Come to that, CIOs should probably read it anyway, whatever their view of the desirability of a CDO is.)
But there are parts of the book that I think are very relevant to wider audience, especially Chapter 15 (data hoarding), but also lots of the stuff explaining data governance. It's not a long book, is quite easy to read, and is also easy to navigate so I think would be easy to dip in and out of. Many data scientists—even those not as focused on data quality and testing of data & data processes as we are at Stochastic Solutions—would get a lot out of parts of the book.
#CDO #datagovernance #masterdata #book #review
Nick Radcliffe is CEO of Stochastic Solutions, an Edinburgh-based data science company that helps clients with predictive modelling and produces tools for helping assess and improve data quality and test the correctness and accuracy of data transformation pipelines.
Data Careworker
5 年Thank you for this review, Nick. I already have the book, but it's always good to see what other people think.
Data Cheerleader, Author, Chief Executive, Problem solver
5 年Thanks for this Nick Radcliffe!