登录查看更多内容

How Confounding Variables Produce Misleading Results: An Example With Super Mario Maker

Zach Nabavian

Data Analyst | Demand Forecasting | System Optimization

发布日期: 2023年3月14日

While working on an exploratory data analysis project on Super Mario Maker, I tried to answer what I thought would be a simple question. In this article, I want to share how a confounding variable complicated that question and caused my initial conclusion to be misleading.

Super Mario Maker is a video game that allows users to create and upload their own Super Mario Bros levels. For this analysis, I wanted to see what variables influence how well received levels are.

One important caveat: the data I use here is from the original game, not the more current sequel Super Mario Maker 2.

To measure a level's reception, I calculated each level's "like rate", the number of people who liked a level as a proportion of the number of people who played it. My initial question was what relationship difficulty has with a level's like rate. Based on the graph below, it looks like the hardest "super-expert" levels are the most well liked with a 20% like rate average versus just 11% for all other difficulties.

While it's easy to draw a straightforward conclusion from this graph, something is very off. If there truly was a linear relationship between difficulty and like rates, then this graph would look like an upward staircase. It's strange that like rates are instead uniform across the board except for a huge bump for the hardest super-expert levels.

I decided to compare like rates to difficulty another way. A level's clear rate is the number of times someone successfully completes a level as a proportion of all attempts at that level. A lower clear rate means a higher difficulty. Below I plotted a scatter plot comparing thousands of levels' clear rates to their like rates. I was surprised to see no correlation at all.

If the harder levels were more well liked, this graph would show a clear negative correlation. But this scatter plot makes it very clear there is no relationship between like rates and difficulty. The question now is why super-expert levels seem so much more well-liked, even though high difficulty isn't the cause. Therefore, I decided to look at another variable.

When a user uploads a level, they can choose up to two unique tags to describe the level. Some tags are significantly more well liked on average than others, as seen below.

Some key takeaways: music levels, levels that play a song using background sound effects as a player progresses, are by far the most well-liked. To my surprise, Yoshi levels aren't as well-liked as others. But for the sake of this discussion, the following two observations are significant.

领英推荐

Scatterplot Gallery

Kristopher Abdelmessih 9 个月前

How data helps me create an individual theory

Maxim Kovalev 2 年前

Data Comics #48 - Let me sing a George Kutty story..

Raghunandh Sekar 4 年前

First, the color gradient on each bar indicates the average difficulty for each tag, with lighter colors indicating easier levels on average. It's clear that players don't have a preference one way or another for tags with a higher average difficulty. For instance, music levels, the easiest on average, and speedrun levels, among the hardest on average, are both some of the most well-liked tags.

Second, null tag levels, levels that weren't assigned any tag, have by far the lowest like rates. These null tag levels are extremely important because they make up over 90% of all levels. Therefore, they have a major impact on each difficulty's average like rates. I believe these null tag levels are the confounding variable that cause super expert levels to have a surprisingly high average like rate.

While over 90% of levels have no tag, that percentage varies across each difficulty. Among super expert levels, just 83% of levels have no tag. In contrast, among the other difficulties, more than 95% of levels have no tag, as the graph below shows.

Since levels without a tag make up such a large proportion of all levels, they have a huge effect on each difficulty's average like rate. But since super expert levels include relatively fewer null tag levels, the effect on the average like rate is much smaller.

My first graphs shows a 20% average like rate for the hardest super expert levels versus 11% for all other difficulties. When I say that the initial graph is misleading, I'm not saying that the numbers themselves are incorrect. I'm saying that these numbers can give a false impression. Without digging deeper, it's easy to conclude from the first graph that people simply prefer the hardest levels. But the huge difference between super expert levels and other difficulties is really the result of unequal distribution of null tag levels. Looking at the average like rates for each tag, people seem to prefer levels with a distinct identity, such as music and speedrun levels, regardless of difficulty. Levels without a tag in contrast tend to be relatively bland and therefore aren't as likely to be memorable.

Confounding variables can produce misleading results in any kind of analysis. This project was just meant to be for fun and low-stakes. But confounding variables can become a problem the moment I started analyzing this data in Tableau. This is the kind of problem people need to keep in mind for any kind of data analysis.

One final note: I want to reiterate that for this article I only used data from the original Super Mario Maker, not the more recent sequel. While levels with no tag heavily influenced average like rates in the original game, the sequel requires each level to have exactly two tags. Therefore, the results I presented are only applicable to the first game. However, I think that it's important to prioritize giving a level its own distinct identity rather than make it as hard as possible even in the sequel, but that's just my opinion.

Thanks for reading. To wrap up I have some links to my Tableau page, where I created the graphs featured in this article, my GitHub, where I uploaded the SQL scripts I used to clean and prepare the data, and my source for the initial dataset. My Tableau page also has other graphs I did not use for this article.

Tableau: https://public.tableau.com/app/profile/zachary.nabavian/viz/SuperMarioMakerAnalysis/LIkeRatesbyDifficulty

GitHub: https://github.com/Zach-Nabavian/Mario-Maker-SQL-Project

Original Dataset: https://www.kaggle.com/datasets/leomauro/smmnet

Meir Wieder, FSA

CVP and Actuary at NY Life

1 年

Very cool!

Tobias Brevik, M.S.

Data Problem-Solver and Process Improvement Specialist. | Six Sigma Yellow Belt | Tool-Agnostic but recently: SQL, Excel

1 年

Love this!

1 次回应

查看更多评论

要查看或添加评论，请登录

Zach Nabavian的更多文章

Lessons Learned from My Energy Demand Forecasting Project

2025年2月11日

Lessons Learned from My Energy Demand Forecasting Project

I recently concluded my biggest data science portfolio project. The goal of this project was develop demand forecast…

2 条评论
Using Shiny to Analyze Salary Data

2023年9月5日

Using Shiny to Analyze Salary Data

Shiny apps are simple web applications built using R or Python, though the latter is still early in development. They…

2 条评论
Putting A/B Tests into Practice

2023年8月29日

Putting A/B Tests into Practice

Background The developers of the mobile app Cookie Cats wanted to see how its user base responded to changes in the…

4 条评论
Understanding Rotten Tomatoes's Weird Scoring System

2023年6月8日

Understanding Rotten Tomatoes's Weird Scoring System

Introduction Rotten Tomatoes has a very strange way of aggregating movie scores. A movie's Rotten Tomatoes (RT) score…

How Confounding Variables Produce Misleading Results: An Example With Super Mario Maker

Zach Nabavian

Data Analyst | Demand Forecasting | System Optimization

领英推荐

Zach Nabavian的更多文章

社区洞察

其他会员也浏览了

Maximum Score from Subarray Minimums

How does a Kalman Filter work?

7 Gaming Themes for Your Tableau Iron Viz Qualifier ??

Go slice (comparatively) deeper insights (Episode 0)

Adding a Secondary Axis in a ggplot2 Plot was Easier than I Thought!

How to describe a statistical population using R - Part 2: Distribution

Quantiles Explained (Part 1.)

Accuracy & Prediction of your forecast: Throw the dart to find out.

How to Build a Hierarchical Bayesian Model with PyMC (and Make a Comeback)

Binary Search On Answer Algorithm

领英推荐

Zach Nabavian的更多文章

Lessons Learned from My Energy Demand Forecasting Project

Using Shiny to Analyze Salary Data

Putting A/B Tests into Practice

Understanding Rotten Tomatoes's Weird Scoring System

社区洞察

其他会员也浏览了

Maximum Score from Subarray Minimums

How does a Kalman Filter work?

7 Gaming Themes for Your Tableau Iron Viz Qualifier ??

Go slice (comparatively) deeper insights (Episode 0)

Adding a Secondary Axis in a ggplot2 Plot was Easier than I Thought!

How to describe a statistical population using R - Part 2: Distribution

Quantiles Explained (Part 1.)

Accuracy & Prediction of your forecast: Throw the dart to find out.

How to Build a Hierarchical Bayesian Model with PyMC (and Make a Comeback)

Binary Search On Answer Algorithm