登录查看更多内容

Out of the Box Insights with Boxplot

Chandralekha Ghosh

General Manager Accountability at Omnicom Media Group | Cross-Media Measurement & Audit | Proficient in Data Analytics & Statistical Modeling | Passionate About Reducing Media Waste and Enhancing Client Satisfaction

发布日期: 2023年2月26日

Lazy Sunday. No wonder, my mind started visualizing. Science says mind visualization is a powerful tool to boost your creativity. Well, the same goes for data analysis too, visualization is a powerful tool to bring out the facts that aren't visible otherwise through the numbers.?

How do we compare a set of variables? To describe and summarize a dataset, which measurement technique do we consider? Do we only look at the mean?

Let's explore a hypothetical scenario.

Suppose we collected historical spot viewership rating points(TRPs)for various Television channels that we want to analyze. Based on these historical TRPs, we'll select the most efficient Television Channel that meets the stringent CPRP (Cost Per Unit Rating Point) commitment stipulated in the agency-client contract.?

Here we're focusing on one particular Target Audience, genre, and market. For simplicity's sake, assume 3 channels are shortlisted with identical Prime Time/Non-Prime Time dispersion, and also all 3 channels satisfy the desired reach% commitment. Hence, our determinant factor will only be the rating points (TRPs) which we normalized for 10". Also, assuming the average cost per spot for all these channels is appx similar.

And we collected the spot TRPs for these 3 channels, Channel-X, Channel-Y, and Channel-Z. Let's look at the average TRP (mean):

Channel????Mean TRP

Channel X?? 0.50

Channel Y?? 0.31

Channel Z?? 0.84

Based on the mean, Channel Z is the clear winner. However, this is incomplete if we only look at the mean.?Let's look at the Standard Deviation for spread.

Channel????Std

Channel X??0.31

Channel Y??0.24

Channel Z??0.90

Ok, so Channel Z has the highest mean and standard deviation value. But there's more to it, which is not getting highlighted by the numbers. Here comes Boxplot! Boxplots visualize the distribution of observations with five number summary, Centre(Median), Spread (IQR & Range), and identify potential outliers, So, here are our boxplots for all three channels:

The first thing to notice, there're a lot of outliers in the data with many unusually high values, particularly for Channel Y &Z. Also, the distribution is not symmetric and is skewed heavily to the right (would be clearer with a histogram).

For instance, for Channel Z, the unusually high data points are pulling the mean up, so, our first approach of using mean or simple average to summarize this data was not appropriate. Instead, we should look at the median value, which is much less sensitive to outliers.

Channels????Median TRP

领英推荐

Generative BI: Pyramid Analytics

Kate Strachnyi 11 个月前

Support Vector Machines: The Ultimate Bouncer of the…

Shameem Ansari 9 个月前

Null Hypothesis #7: vertical limit

Giancarlo Vercellino 2 年前

Channel X?? 0.62

Channel Y 0.24

Channel Z 0.64

When we considered the mean to compare, due to the outliers, Channel Z appeared to be way more attractive than Channel X. However, the tables are turned when we use the median value, Channel X now looks equally good. We can further deep dive to know the causes of the outliers? For example, maybe a certain program is generating very high TRPs because something related to the show is in news recently, etc.?

Secondly, Channel Z has the highest median TRP. However, the interquartile range (IQR, length of the box) is also highest for Channel Z. This indicates higher spread, values for Z are more dispersed. They spread further away from the median, leading to a larger variance and standard deviation. Whereas, Channel X reflects a much lesser spread indicating most of its data points are closer to its average value.?

So? So, in my opinion, Channel X could be a better choice with minimum variance, thus more dependable. Who doesn't want a little normality in life?

Here are the Summary Statistics:

???? Channel X? Channel Y??Channel Z

mean 0.50 ? 0.31 ??? 0.84

std 0.31 ? 0.24 ??? 0.90

min 0.00 ? 0.00 ??? 0.00

25% 0.21 ? 0.15 ??? 0.19----> First Quartile?

50% 0.62 ? 0.24 ??? 0.64---> Median

75% 0.79 ? 0.39 ??? 1.10---> 3rd Quartile

max 4.68 ? 2.20 ??? 6.92

So, here we're able to get a pretty comprehensive look at this data and demonstrate how useful boxplots are for making comparisons of sets of observations. Also, how mean could be misleading in certain situations.

I know what you're thinking! Boxplots can hide some shape aspects of distribution, Histograms do a better job at displaying shape. Yes, but that's for another day!?

PS-Just to mention, I used Python Seaborn & Pandas to analyze and create the graph. Also, I haven't gone into the theoretical details of Boxplot, you can visit Khan Academy tutorials to brush up on the concept.?

#statistic #datascience #data #stats #dataanalytics #python

要查看或添加评论，请登录

Chandralekha Ghosh的更多文章

The Growth Hacker’s Journey: Can She Crack the Code?

2025年2月23日

The Growth Hacker’s Journey: Can She Crack the Code?

She was staring at the stagnant sales numbers on her screen. Despite all her efforts, her startup—a small but promising…

1 条评论
Can You Estimate the Market Size of a Product Category Using Marketplace Bestseller Data?

2025年2月16日

Can You Estimate the Market Size of a Product Category Using Marketplace Bestseller Data?

This weekend, I was on the hunt for a good night cream. As I scrolled through the bestsellers on multiple marketplaces,…

4 条评论
Exploring Sales Funnel Analysis for E-commerce with GA4

2023年5月21日

Exploring Sales Funnel Analysis for E-commerce with GA4

"Life is a Struggle". Yes, it is! We all have our battles to fight.
Deciphering Customer Lifetime Value(CLV) with GA4

2023年4月30日

Deciphering Customer Lifetime Value(CLV) with GA4

The User Lifetime Value Report is one of the biggest upgrades in GA4. Yes, it's available in the older GA (GA3)…

2 条评论
A dive into Data-Driven Attribution model with GA4

2023年4月16日

A dive into Data-Driven Attribution model with GA4

The FIFA 2022 WC is long over now, but still so fresh in my mind. Do you give 100% credit to Messi for winning the WC?…
Explained: Hypothesis - Testing to a marketier (Inferential statistics- part II)

2023年4月2日

Explained: Hypothesis - Testing to a marketier (Inferential statistics- part II)

A new financial year has started. It's time we look at a hypothetical scenario.
Inferential Statistics- Part I: Explained: Confidence Interval with Conversion Rate

2023年3月26日

Inferential Statistics- Part I: Explained: Confidence Interval with Conversion Rate

Even though the FIFA WC ended a few months back, I still couldn't fully recover from Messi's magic. Who is the better…
Decoding the Impact of User Engagement on Revenue with Google Analytics

2023年3月12日

Decoding the Impact of User Engagement on Revenue with Google Analytics

Gone are the days when CPM, CPC, or CTR was used as the essential KPI to measure success. In today's ecosystem, these…
Explained: Simpson's Paradox with Sherlock's Quotes

2023年3月5日

Explained: Simpson's Paradox with Sherlock's Quotes

“There is nothing more deceptive than an obvious fact.” Sherlock Holmes's quote came to my mind looking at the dataset…

1 条评论
Bring your Data to life with Histogram...

2023年3月1日

Bring your Data to life with Histogram...

So here we are, on a Tuesday evening, experimenting with real data we extracted from the Google Analytics (GA) demo…

See all articles

Out of the Box Insights with Boxplot

Chandralekha Ghosh

General Manager Accountability at Omnicom Media Group | Cross-Media Measurement & Audit | Proficient in Data Analytics & Statistical Modeling | Passionate About Reducing Media Waste and Enhancing Client Satisfaction

领英推荐

Chandralekha Ghosh的更多文章

社区洞察

其他会员也浏览了

Fun with Graphing in Power BI - Part 1

How "Real" Do Your Visualizations Need to Be? As Real as You Can Make Them!

How to build a hierarchical Bayesian model (and include team-specific effects on win probability)

Approaches to Repeated Measures: Repeated Measures ANOVA, Marginal, and Mixed Models

Dashboards: The Overpriced Paperweights of the AI Era (And Why Agents Will Bury Them)

Avoiding Errors of Interpretation: the case of Selby & Ainsty

Navigating the Future: A Quick Guide to Time Series Forecasting Theories

Exploring Univariate Combo Charts

Exploring Different Types of Graphs and Their Applications

Beyond R-squared: Assessing the Fit of Regression Models

领英推荐

Chandralekha Ghosh的更多文章

The Growth Hacker’s Journey: Can She Crack the Code?

Can You Estimate the Market Size of a Product Category Using Marketplace Bestseller Data?

Exploring Sales Funnel Analysis for E-commerce with GA4

Deciphering Customer Lifetime Value(CLV) with GA4

A dive into Data-Driven Attribution model with GA4

Explained: Hypothesis - Testing to a marketier (Inferential statistics- part II)

Inferential Statistics- Part I: Explained: Confidence Interval with Conversion Rate

Decoding the Impact of User Engagement on Revenue with Google Analytics

Explained: Simpson's Paradox with Sherlock's Quotes

Bring your Data to life with Histogram...

社区洞察

其他会员也浏览了

Fun with Graphing in Power BI - Part 1

How "Real" Do Your Visualizations Need to Be? As Real as You Can Make Them!

How to build a hierarchical Bayesian model (and include team-specific effects on win probability)

Approaches to Repeated Measures: Repeated Measures ANOVA, Marginal, and Mixed Models

Dashboards: The Overpriced Paperweights of the AI Era (And Why Agents Will Bury Them)

Avoiding Errors of Interpretation: the case of Selby & Ainsty

Navigating the Future: A Quick Guide to Time Series Forecasting Theories

Exploring Univariate Combo Charts

Exploring Different Types of Graphs and Their Applications

Beyond R-squared: Assessing the Fit of Regression Models