Most common question from marketing teams: Why Statistics?
Kunal Mehta
Global Data Platform Head | Product Owner | Associate Director | Data Analytics | Google Analytics | Adobe Analytics | Google Cloud Platform | Machine Learning | Data Science | Data Engineering | Speaker
For an awful long time we all have been trying to dodge statistics like anything, atleast I have. I still remember when I had to learn probability, permutation, combination for the very first time in school, it still give me nightmares. But after 13 years, OK, 15, I fully realize the mistake that I committed. Statistics is the best way to formulate your hypothesis, test your hunch and draw meaningful insights based on a sampled data rather than actually going all out and 'will see how it goes' approach.
Just to give you an example, think about the entire cosmic universe history condensed into one calendar year, time of big bang, Jan 1, 00:00:01 AM. Last second that went by, Dec 31, 11:59:59 PM. An year where every month is 1.2 bullion years long, every day is 400 million years long, when our Sun came into existence sometime in middle of August, and Earth formed in September. Probably you get the idea, if not, watch Cosmic universe presentation by Neil deGrasse Tyson. Anyhow, the point being that in this one calendar year of such monumental size and quantum, we started exploring the world of mathematics and science just 4 seconds ago. We as a human race, started adding 1 and 1 just 4 seconds ago. And now think about the facts that we have 2 spaceships which have crossed the bounds of our solar system, we have landed on Saturn, we have unlocked DNA, we have mastered Macarena. Yep, no small feat, in all but 4 seconds of maths and science. We did it because we are probably the only species which not only creates patterns, but also can recognize them, draw insights from them and act on them. There was a guy sitting in front of a fireplace, or sitting below an apple tree, whatever story suits you, with a pen and a paper in his hand, who solved the equation of gravity, Funny enough, it was another man sitting in front of another fireplace with the same equipment in his hands, pen and a notepad, who discovered the flaw in the theory of gravity and created his own general theory of relativity.
All this was done because there was somebody who tried to understand the effect of actions of one variable on the other. Someone who tried to see these casual relations, compute the strength of the correlations, plot and project the estimated variations, predict the behavior in future and validate with experiments with acceptable margin of errors. In short, someone was able to apply statistics.
Now that we have established how important stats are, let's talk about how our marketing teams can cross pollinate some of the same guiding principles to come up with some tangible benefits for us all.
Markov Attribution Analysis: The original paper was published in 1906 by Andrey Markov, the applications are mind-numbingly important even today. It talks about the theory that the next state of any process can be predicted by looking at the current state of the process, no matter how the current state has been reached. Extremely handy in terms of attribution analysis. When we are trying to predict the next step that my prospect is going to take. We calculate probabilities of all these next steps till the time we reach the conversion, and now we can distribute the credit of the conversion on the basis of probabilities, not on the basis of the last channel/source/dealer/interaction that was there just before conversions. We all do know that conversion is nothing but a sum total of all the experiences and stimulus that we provided to our prospects in their entire journey, not just in the part of the journey prior to the conversion. The most widely used models just take the converted point of action into consideration, but not only it's wrong to do so, but it's like throwing 90% data out of the window and then praying that whatever 10% that you are left with, would somehow answer your business questions, it won't.
Survival Analysis: That's where we try to deduce, statistically, about how long it would take for some particular event to occur, for example in clinical studies, death of an organism, or subject, whatever you feel comfortable reading. So, in a nutshell, this analysis tries to predict the lifetime of someone or something till the time irreparable damage has been done, and it's impossible to bring that thing or someone back to life. Can we utilize this model to understand the 'lifetime' of our current customers? How long do we have them on our books, and what is the expected retention stay of my existing customer, when can I expect my customers to stop engaging with my brand or product? Now when I do have the time frame in front of me, what kind of actions can I take to change the predicted outcomes? What campaigns can I launch to prolong their relationship? Think from retailers' perspective, especially the big ones like Amazon, Flipkart, Snapdeal (OK sorry, I said big ones, not big failed ones), and you start getting the idea how utterly useful it can be for brands who have been pumping in a lot of money to acquire customers, and are currently having 0 to no idea about for how long they are there for? Reason? Simple, they don't use statistics.
Linear Regression: Old? yes, but darn solid. marketers love their funnels. If they can, they make funnel charts about everything under the sun. How much sunlight came from sun, how much reached stratosphere, how much reached thermosphere and then.. well you go the idea. So, the logic is that if you want to get 10% more sunlight in your street, how much of sunlight we need Sun to emit? Because if we increase the visible light from Sun, we get more sunlight here, down at our planet, like in my porch, right? Brilliant, you are right, the logic is iron clad, only one simple glitch, we can't really change the quantity of visible light getting emitted by sun. What's next? let's try to find all the possible reasons that can be attributed to the low amount of sunlight that we are receiving in our streets. Great, now that you have a list, lets run correlations, try to find those parameters which have the most significant role to play in limiting the amount of sunlight. Great, now your list is shortened to 10 variables, let's run a linear regression model on it. This mathematical equation at the end of day will tell us which factors are important, which factor has the highest degree of affect, and how much do need to change one parameter, keeping everything else in constant, to have higher amount of sunlight here, in our planet. Now, if you are still following, replace sunlight by revenue, profits or whatever you want, and you will see how it can be massive for us marketers. If we can't substantially increase the number of prospects walking into our stores, on our websites, let's see what are the different variables, conditions, that affect our conversion rate, number of orders, revenue, average order size etc. Then create the equation of all these variables to see what is affecting you the most, and how to improve them to have direct impact on top and bottom lines.
This is where I am going to end this article right now, and will follow up with another, shortly, with more such techniques, from simple to complex, to understand how interdisciplinary theories can come to rescue for all of us. We will also talk about how to implement some of the simpler statistical techniques and then draw insights out of them. Really hope you would follow, if not, it's OK, I knew writing probability, permutation and combination in first line itself, was a mistake.