Mastering Outliers in Marketing Mix Modelling (MMM): What You Need to Know

Mastering Outliers in Marketing Mix Modelling (MMM): What You Need to Know

Hello there, Dr Wei here! I know, I know—I haven’t discussed MMM for a while. Consider this a follow-up to our previous conversation about the impending ‘cookie monster’—the shift to a cookieless world and how MMM is emerging as a resilient alternative. Today, we’re diving into a crucial aspect of MMM that can make or break your analysis: outliers.

What Are Outliers?

Imagine you're at a gathering of average-height people, and suddenly, a professional basketball player walks in. At over seven feet tall, they tower over everyone else in the room. That height would be an outlier—something that stands out because it’s significantly different from the norm.

In most situations, our first instinct when encountering outliers is to either remove them (which is like asking the basketball player to leave the room) or to transform them so they fit in better with the rest of the data (perhaps by asking everyone else to stand on a chair!). While outliers might represent legitimate and important events, they can distort your results if not handled correctly. That’s why in MMM, we need to be a bit more nuanced in our approach to outliers—neither simply ignoring them nor letting them skew our insights.


Author's creation

What’s Different About Outliers in MMM?

In the context of MMM, outliers are often tied to specific marketing activities. These aren’t just occasional anomalies; they can be frequent and are sometimes integral to understanding your marketing efforts. Here are some marketing activities examples:

  • TV Advertising: Imagine the spikes in spending during Super Bowl ads. These are huge investments that don’t happen every day but are critical to understanding the impact of your TV campaigns.
  • Digital Advertising: Think of those sudden bursts of spending during product launches or flash sales. These spikes can dramatically skew your data if left unchecked.
  • Sponsorship: One-off large sponsorship deals, like sponsoring the Olympics or FIFA World Cup, can create massive outliers in your data.
  • Content Marketing: Viral content can cause unexpected spikes in engagement, leading to outliers that need careful consideration.
  • Online Marketing: Large investments during Black Friday or other major sales events are typical outliers in this channel.
  • Affiliates: High conversion rates from a specific partner can lead to unusually high payments, creating outliers.
  • Search Engine Marketing (SEM): High bids during competitive periods, such as holiday shopping seasons, can result in outliers in SEM spend.

Author's creation

These scenarios show that outliers are not just anomalies—they are often the result of significant marketing efforts that you can’t afford to ignore. But simply removing them isn’t the solution. So, what should you do?


Key Factors to Consider for Transformation

As a new MMM consultant, one of your first tasks is to figure out how to manage these outliers effectively. Before we dive into the specifics of different regression methods and transformations, let’s go over what you need to consider:

  1. Distribution of the Data:

  • Skewness: Start by examining the distribution of your data. Is it right-skewed, with a long tail to the right, or left-skewed? In marketing, right-skewed data is quite common. This occurs when a few instances of very high spending lead to significant peaks in sales or conversions. This pattern often reflects the positive impact of marketing investments—especially in channels where ad stock effects allow the impact of the spend to carry over for a period of time. In these cases, transformations like log or square root are often effective in managing the skewness. On the other hand, left-skewed data, where the tail is on the left, is less common in marketing and may require a different approach, but we won’t dive into those details in this article.


picture source: statology.org

  • Presence of Zero or Negative Values: Be mindful if your data includes zeros or negative numbers—this can often happen with certain types of investments, like radio advertising, which might have periods with no spend (zero values) but typically won't have negative values. For example, log transformations can’t be applied directly to zero or negative values, so you may need to shift your data first to handle those zeros. In some cases, a Box-Cox transformation might be a better option, as it can accommodate zeros more flexibly without requiring such shifts.

2. Magnitude of Outliers:

Consider how extreme your outliers are. Are they simply mild deviations, or do they represent significant spikes? Extreme outliers might benefit from methods like Winsorisation or Huber Regression, which address these anomalies without removing them. For less severe outliers, a milder transformation like square root or log might suffice.

3. Relationship Between Variables:

Look at the relationships between your variables. Are they linear or non-linear? Transformations such as log or Box-Cox can help linearise relationships, which is beneficial for regression models used in MMM. However, if your variables are already linear, you might opt for milder transformations to avoid distorting them.

4. Purpose of the Analysis:

Think about what you’re trying to achieve. Are you focused on making your model as interpretable as possible, or are you aiming for the highest predictive accuracy? For interpretability, you might prefer milder transformations. For accuracy, more aggressive options like Box-Cox could be necessary.

5. Multicollinearity:

Now, here’s a tricky little beast that can throw a wrench into your MMM analysis—multicollinearity. This is what happens when your variables start getting too cosy with each other, creating a tangled web of relationships that make it tough to figure out who’s actually driving the results.

For example, investments in different media channels—like TV, digital, and radio—often move in sync because they’re usually part of a coordinated campaign. Then you’ve got the economic indicators like GDP, CPI (Consumer Price Index), and interest rates, which tend to dance together since they all reflect the broader economic environment. When these variables are highly correlated, it can be a real headache trying to pinpoint the individual impact of each one.

One of the biggest challenges we face with multicollinearity is figuring out which channel is really pulling the strings. If TV and digital spending are both high, how do we know which one is driving sales? This uncertainty can lead to overestimating the impact of one channel while underestimating another, which can really mess up your budget allocation.

When you’re up against multicollinearity, tread carefully with your transformations. Techniques like log or Box-Cox can sometimes help by normalising the data or compressing the range, making the relationships less tangled. But watch out—sometimes these transformations can make the problem worse by introducing new correlations you didn’t see coming.

So, what’s the workaround? One approach is to aggregate those highly correlated variables before you start transforming them. For example, you could lump all your media spend into a single composite index or roll up the economic indicators into a broader economic index. This can help simplify your model, reduce the multicollinearity, and make it a lot easier to see what’s really going on. But fair warning—even with all this effort, completely untangling multicollinearity is no walk in the park. It’s one of those persistent challenges in MMM that we just have to navigate carefully.


Author's creation

6. Data Type and Scale:

Consider whether your data is continuous or categorical. Most transformations are designed for continuous data. If your data spans a wide range, transformations like log or Box-Cox can help compress the range and reduce the influence of extreme values.

7. Model Assumptions:

Does your model assume normally distributed errors or constant variance (homoscedasticity)? If so, transformations like log or square root that normalise your data and stabilise variance can be particularly helpful.

8. Ease of Implementation:

Finally, consider the complexity and computational resources required. Simpler transformations like log or square root are easier to implement, especially in production environments, whereas methods like Box-Cox might require more preprocessing.


Real-Life Scenarios of Outlier Handling

To bring these concepts to life, let's look at some real-world MMM scenarios:

1. Retail Industry: Optimising Holiday Campaigns

  • Scenario: A retail chain wants to optimise marketing spend during the holiday season, particularly around Black Friday and Christmas.
  • Outlier Handling: The retailer uses a log transformation to handle the right-skewed distribution of spend during these peak periods, ensuring that these spikes don’t distort the model.
  • Action: By reallocating part of the TV ad budget to online promotions, they achieve a 15% increase in holiday sales.

Author's creation


2. Financial Services: Promoting a New Credit Card with a Flexible Payment Scheme

  • Scenario: A payment company is launching a new credit card and wants to optimise customer acquisition across different channels.
  • Outlier Handling: Huber Regression is used to reduce the influence of large peaks in spend during special offers, ensuring the model remains robust.
  • Action: By reallocating budget from direct mail to online ads and social media, the bank increases customer acquisition by 20% while reducing the cost per acquisition by 15%.

Author's creation


3. Tech Industry: Launching a New Feature on a Dating App

Scenario: A popular dating app is rolling out a new feature that helps users find friends or business partners, in addition to romantic matches. To ensure the success of this new feature, the company needs to optimise its marketing mix across various channels.

Outlier Handling: Winsorisation is applied to cap extreme values in digital ad spend during major tech conferences and influencer promotions, preventing these spikes from skewing the results.

Action: By reallocating budget from traditional ads to influencer marketing and targeted social media campaigns, the app sees a 10% increase in feature adoption and engagement.


Author's creation

Final Thoughts

Handling outliers isn’t just about cleaning up your data—it’s about making sure your MMM analysis is accurate and reliable. By carefully considering the type of outliers, the nature of your data, and your specific goals, you can choose the right transformation method to improve your model’s performance.

#MarketingMixModelling #MMM #Outliers #DataTransformation #MarketingStrategy #DataDrivenInsights #DataScience


Join the Nerdy Marketing Scientists Community

If you’ve enjoyed diving into the world of Marketing Mix Modelling with me, why not stay connected? Follow me on LinkedIn for more insights, strategies, and the latest in marketing analytics. Let’s connect, share ideas, and grow our networks together. And hey, if you’re as passionate about marketing data science as I am, be sure to subscribe to my newsletter, Nerdy Marketing Scientists, where I explore all the nerdy details that help you stay ahead in the ever-evolving marketing landscape.

Stay informed, stay connected, and let’s keep the conversation going!

#MarketingMixModelling #MMM #Outliers #DataTransformation #MarketingStrategy #DataDrivenInsights #DataScience #MarketingDataScience



P.S. If you are unfamiliar with the transformation methods, I have made a table for you which provides a detailed comparison of different approaches, helping you decide which method is best suited for your specific scenario:


Author's summary


Pavan Kalyan Reddy B

Data Scientist | Machine Learning | Marketing Science

6 个月

Hello, can we apply Adstock and Saturation on transformed variables directly? And also how to Interpret the coefficients at the end when we do some log transformation to the data.

回复

要查看或添加评论,请登录

Wei Hutchinson, PhD的更多文章

社区洞察

其他会员也浏览了