What Does the Data Say?
Michelle Sandford
Developer Engagement Lead @ Microsoft - Azure Data Science & AI Certified GAICD
The answer is of course nothing. On its own Data has no voice. It is an unrelated mass of information which might form patterns. But Data makes no statements, forms no conclusions, and gives no recommendations. Contrary to what is often said the data does not speak for itself: It requires humans to construct a narrative about it.
One of the questions I often get is:
"What is a Data Scientist ?"
and I've been thinking very hard about this question since the first time it was asked. Or more precisely, what I was originally asked was:
"How do I convince Industry that I am a Data Scientist?"
I had been talking about the role Industry Certifications like the Microsoft AI Fundamentals or the Data Scientist Associate play in augmenting degrees and domain knowledge. I posited the idea that these demonstrate a continuous learning mindset and show a willingness to jumpstart the onboarding process by confirming an understanding of industry tools. The person I was speaking to said:
"But I have a PhD."
There was a full-stop and an underline implied in that reply. I was going to make that Bold Text as well but that would make it look like they shouted, and they didn't. They had been studying and researching for most of their life and I was suggesting they add some micro-credentials to their toolkit, as if what they had done so far was not enough. It should be enough - all that work, all that expertise, all that time and dedication to the field. The exasperation and frustration in their voice communicated all of that. And they were right. It should be enough. Hard work should be rewarded...
The problem is, the work doesn't stop. You never reach a point where you can say you have done enough. You have to keep striving to do the next thing, to demonstrate through your words and actions that you are making a difference in the world.
You can't just walk into Industry and say:
"I have a PhD, therefore I am a Data Scientist"
Because you are not. You are a Researcher.
A Data Scientist is someone who makes sense of the data and constructs a narrative around it. Someone who puts the findings in a meaningful business context for the decision maker and provides them with clear recommendations on what to do.
If you cannot convince a Hiring Manager that you are a Data Scientist. Then you are not one. Being able to convince a stakeholder of the correct decision to make based on data you have analysed, is the key to role.
I didn't say this at the time, for a couple of reasons. It seemed mean. I also didn't think it would be very smart to say what seemed like a mean thing to someone obviously much smarter and more qualified than I am.
To be clear, not everyone who can make data tell a story and convince a stakeholder to make a certain decision is a Data Scientist. They could be a Storyteller, a Sales Person, a Con Artist or any other number of roles. A Data Scientist usually applies a human lens to computing, mathematical and statistical data. They know how to program algorithms and use Machine Learning, they understand cloud computing. There is a whole depth of mathematical and computing skills they need to combine with domain knowledge to get close to what a Data Scientist does, and many researchers have the aptitude and the expertise to become a great Data Scientist. But it is not your credentials or your expertise that make you one even so.
Everything starts with the Business Problem. There is no point in investigating the data if there is no problem to solve. Finding this out is not as easy as it seems because often the business puts its own interpretation on what is going wrong, or what needs to be done to make it right. This introduces bias from the start. A good Data Scientist has to make sure they have really understood the starting point.
They would start with Data exploration. A description of data sourcing and processing prior to exploratory data analysis (EDA) and analysis, including setup, cleaning,?examination, and benchmarking, among others, as needed?and then EDA which is a summary of the main characteristics of the data, as needed.
Then they would perform a review of the Analysis methods employed and their applicability to the problem at hand (the "How"), With?summary of alternatives considered but not taken, if any or as needed.?
The Findings come from quantitative results of the analysis (the "What")?
Which lead to the Conclusions, which are qualitative opinions of what the findings indicate or demonstrate—in other words, what is the meaning of?the findings? (the "So What?")?
Then the Recommendations - Actions the decision-makers or stakeholders should consider taking for business impact, or decisions they?should make, as a result of the conclusions (the "Now What?")?
All the data, the charts, the related or ancillary information that is outside the main flow goes into the appendixes... or is cut out altogether. The bulk of the work done to get to the recommendations is not even shown to the Stakeholders.
They would then be able to craft a coherent Introduction, comprised of the actual Business Problem: What is the current state and what needs to change. The true Stakeholders, customers, partners and influencers involved and a quick description of the above structure (which would actually follow the introduction in a Data Science deliverable).
All of the above helps the Data Scientist construct the Executive Summary, which would appear at the very front of the report. It has a brief summary in 2-3 sentences, of the business problem and stakeholders, customers, partners, and?influencers involved. With a Findings summary: Quantitative results of the analysis, in 2-3 sentences?and a Conclusions summary: Qualitative opinion of what the findings indicate or demonstrate, in 2-3 sentences?and a Recommendations summary: What the decision-makers or stakeholders should consider doing for business?impact, in 2-3 sentences.
So 12 sentences should comprise the key information the Data Scientist gives to the Decision Maker. ?If you cannot communicate what needs to be done and why in those 12 sentences then you are not a Data Scientist.
So, in answer to my question at the top of this article - "What does the Data say?" I'd reply, "The Data says nothing - it is the Data Scientist that explains what the Data says, and if they are able to put it in a meaningful context for the business, then the Stakeholders will invest in the change that is instigated as a result".
Data Scientists are world changers. People who have the power in their words to convince the governments and organisations that run the world to do something differently.
If you'd like to do one of the Learning Paths I have referenced the links are below, If you can complete them before end of June I might be able to find you a free exam voucher - just message me on LinkedIn :-)
Michelle Sandford is a Developer Engagement Lead at Microsoft in Australia, she works with emerging communities like Students, Data Scientists and AI/ML engineers.
A/Portfolio Manager - Health, Nursing, Vet Nursing and Animal Studies at South Metropolitan TAFE
2 年This was an interesting read Michelle... I've struggled in the past with the concept of data translating to actual tangible, usable information. It's clear that the role of Data Scientist is more crucial than I gave it credit for. Thank you for sharing and publishing =)