Putting “The Science” back in Data Science

Putting “The Science” back in Data Science

Being novelties a couple of years ago, the concepts “big data”, “data science” and “analytics” are now embraced by business leaders. Increasing degree of digitization and availability of data adds on the importance of data science and analytics. People almost unanimously acknowledge that harvesting data through analytics methods will improve business performance in some or other way. Many people understand well that analytics and data science will help them understand behaviors of critical business drivers so they could find a way to affect those to create top or bottom line impacts.

Despite all acknowledgement, overall understanding for analytics solutions are mostly fixated on two ends of a continuum. At both ends, people consider data science and analytics as a set of highly software and algorithm dominated solutions, rather than as a methodology to discover cause-effect relationships to improve decision-making processes, hence the business performance. On one end of the continuum, there is data visualization and use of visualization tools. Data visualization is useful for creating dashboards and reporting tools. What is more, it is extremely helpful to discover patterns in data. It allows to see how a couple of key variables move together, such as advertising spending and sales amount over time. However, when number of variables increase, it does not tell much about how multiple drivers affect a particular outcome collectively. Hence, its benefit is limited. The other end of the continuum is machine learning and algorithms resulting from the study of huge data sets containing a myriad of tangled variables. Many of credit scoring algorithms, online recommendation engines etc. are working well, thanks to developments in machine learning.  A common problem, however, with machine learning and alike is that, they create a black box: Data comes in, and then we get out a working algorithm, based on established relationships amongst many variables. Yet, it is rather difficult to understand what really drives predicted outcome scores. There is little traceability and transparency to share with the rest of the organization. Another typical problem is that spurious correlations can easily distort results from machine-learning algorithms. If some variables are accidentally correlated with each other without having any causal relationship, models end up with high predictive power without an easily understandable reason. To sum, machine-learning models may predict well, but when it comes to explaining why things work in the way that they do, they fall short. (There is also an ethical side of the things, discussed in this piece)

What many data scientists and people who deal with data analytics are missing between the two ends of this continuum is good old scientific thinking and hypotheses based approach. In order to create real business benefits at macro and mezzo levels, professionals dealing with analytics should incorporate scientific methodology in their ways of thinking. For years, researchers in social sciences have been using quantitative research methods to uncover cause-effect relationships and to create a body of knowledge about the way the world works, from perspective of their disciplines. The scientific, hypothesis based approach not only helps to explain how a set of drivers affect an outcome, but it also helps to uncover the mechanism that underpins the way why this happens. Once these are comprehended clearly, we have a theory, based on which we can predict outcomes we are interested in.

Here is a hypothetical and somehow mundane example explaining how science would work in business. Imagine a steak house chain with several thousand restaurants. They want to increase the size of the orders placed by their customers per visit. They also know that at some of their locations customers consistently place larger orders. They develop different hypotheses that point out to the reasons for variation in order amount. A hypothesis may be about the restaurant environment, which differ from city to city. The color of the environment may be one of the variables to put in a model. Once we collect the data about the order sizes, and restaurant colors along with other variables we put them in a statistical test. We find out that environments decorated predominantly with red and yellow colors make people order more and eat more. Now we know that a causal relationship between color of the restaurant environment and amount of steak ordered exist. Still, we do not know the mechanism that explains why this relationship holds. The mechanism that connects degree of hunger feeling, thus more food order and color of the environment could be some hormones human body produces, prompted by environmental color. To test this, we again need to collect data on hormones produced by human body when exposed to different levels of environment, hunger levels felt by people under these conditions. Then we analyze this data to see if the mechanism we describe holds. When all done, we know something that can affect business performance: If you design your restaurants predominantly in red and yellow variants, you will get larger order size from your customers. We also know about the reasoning (mechanism) behind it. Now we can communicate the decision with the rest of the stakeholders in the restaurant organization and convince them to invest in painting restaurant walls with suggested colors, because we know what may happen and why, when we do that. Not least, with this approach we expanded the knowledge base of the firm about their business permanently. We can multiply the number of examples of this sort. (For more, please read “Explaining Social Behavior: More Nuts and Bolts for the Social Sciences”)

Application of scientific methodology puts human and human knowledge in center. It places the whole analytics process in real business context. It uses results as a facilitator for human decision-making. After all, humans still run the firms, and they make fundamental decisions in a social setting. Analytics should support key decision-making processes and enhance coordination of decision makers, rather than just producing a score from a black box. Knowing why and how things affect each other and create different outcomes is fundamental for influencing underlying drivers of business performance. More importantly, communicating this with different stakeholders based on facts helps avoiding conflicts and paralysis from repeatedly asking the question “why” and not getting the answer.  

This approach is extremely useful in practice, and produces good results. It complements and improves outcomes of the aforementioned analytics methods well. To illustrate, in one of our projects, we created a process for one of the global clients to run their marketing campaigns via employing basic scientific research methodology. By verifying and rejecting certain hypotheses by use of advanced analytics, we created a full story of customers’ motives, needs, and behavior. The story convincingly explained why and when customers (do not) buy. Combining the story with their product and market knowledge, client team designed creative campaigns and decided what to do next for improved returns. At the end, we observed a very large increment in customer spending, compared to what recommendation engine, based on association rules algorithms produced. The whole process was fun, collaborative and instructive. The decision makers felt that they are in full control over their actions and customer portfolio. It also added to our client’s body of knowledge about their customers, markets and the very marketing process.

To conclude, in this short script, I tried to underscore the importance of incorporating quantitative research methodology for social sciences in analytics projects. It is important to consider this approach to solve business problems with analytics, because (1) It creates a deeper understanding of why and how things happen (2) It puts human knowledge in the center and increases absorptive capacity of the firm (3) It helps improve organizational processes and routines (4) It may create higher returns. Data scientists working in firms or consulting firms, who are primarily mathematicians, computer scientists, engineers should consider getting a basic understanding of quantitative research methods in social sciences to make their solutions more human, transparent and easily communicable. Using these along with visualization techniques and machine-learning algorithms collectively will definitely help putting analytics practice on a higher ground for solving business problems in the digital age. 

Diego MacKee

Seasoned executive in IT/HiTech/Telco - AI Advisor and Practitioner - Board member - MSc and MBA - Multilanguage communicator

5 年

I like this article because it puts all together. It also shows why so often when trying to recruit somebody to help in machine learning, and more specifically, in image recognition using supervised learning, one can end up by hiring the wrong resource. I have found myself many times talking to candidates that are good at analytics and quantitative methods, but haven’t yet understand the learning aspect of it which you only can develop by also learning how to program. Interesting.

Thanks for the opinions. Good insights...

要查看或添加评论,请登录

社区洞察