IPL Fever ... Some stats gyan
How would you interpret the batting average of 99.6 of Bradman and 53.7 of Tendulkar. Does it mean that in every innings that Bradman played, he scored 99.6 runs or did it mean that over every 10 innings, Tendulkar scores 537 runs? Does it mean that if Bradman scored a zero, then in the next innings he will score 199 to get to an average of 99,6? What can you say about the probability of Tendulkar scoring a 50? Is it 1, 0.75 or 0.5? What about for Bradman?
Mean is a mis-leading number in this situation. The individual scores of a batsman has a distribution which is skewed to the right - which in simple words mean, that most of the scores are centered around a number with a few large scores. Tendulkar scored 51 centuries over 329 innings while Bradman scored 29 centuries over 80 innings. For variables which have a skewed distribution, median, the score which divides all the scores into two equal parts is much more of representative number. For example, the median score for Bradman is 56.5. This means that out of 80 innings that Bradman played, 40 scores were below 56.6 and 40 above. For Tendulkar, the median is 32, meaning that out of 329 innings, 164 innings were below 32 and others above 32. The median also allows you to say, that there is a 50% chance that Bradman will score above 56 while for Tendulkar you can similarly say, that there is 50% chance that he might not cross 32.
I think I can understand 32 much better than the average of 53.7 because the memories related to each of the 51 centuries are not that strong while the memories of 164 smaller than 32 scores are stronger.
Also not counting the not out innings in calculating the means is not a very good way of adjusting for censoring while median is much better as individual scores do not effect it much.
We should use medians rather than means to rank batsmen on what they do in an individual innings. Even with this measure, Bradman remains the king!
Raw data downloaded from Cricinfo statsguru
Lead, Rewards People Insights
6 年This calls out for mann Whitney u test to really see who is better .....just a thought :-)
Pharmaceutical Physician/Medical Writing
6 年Really liked the simple analogy by using a comparison between scores of Sachin and Bradman. Even in clinical trial data, sometimes when the sample is less, or not normal, then median (min-max) provides a better view of the results compared to mean (SD). It is such small tricks, which make report writing such an art, rather than science.
SAS (Software), Statistical Programming |Open to Contract, Remote and Hybrid opportunities, | Statistical Programmer | Senior Statistical Programmer | San Francisco Bay Area |
6 年This is called, batting , bowling and feelding with data....nice prespective
Associate Director (Data Science), AI solutions at Scale for Pharma through Data and Tech
6 年Makes complete sense.