How Confounding Variables Produce Misleading Results: An Example With Super Mario Maker
While working on an exploratory data analysis project on Super Mario Maker, I tried to answer what I thought would be a simple question. In this article, I want to share how a confounding variable complicated that question and caused my initial conclusion to be misleading.
Super Mario Maker is a video game that allows users to create and upload their own Super Mario Bros levels. For this analysis, I wanted to see what variables influence how well received levels are.
One important caveat: the data I use here is from the original game, not the more current sequel Super Mario Maker 2.
To measure a level's reception, I calculated each level's "like rate", the number of people who liked a level as a proportion of the number of people who played it. My initial question was what relationship difficulty has with a level's like rate. Based on the graph below, it looks like the hardest "super-expert" levels are the most well liked with a 20% like rate average versus just 11% for all other difficulties.
While it's easy to draw a straightforward conclusion from this graph, something is very off. If there truly was a linear relationship between difficulty and like rates, then this graph would look like an upward staircase. It's strange that like rates are instead uniform across the board except for a huge bump for the hardest super-expert levels.
I decided to compare like rates to difficulty another way. A level's clear rate is the number of times someone successfully completes a level as a proportion of all attempts at that level. A lower clear rate means a higher difficulty. Below I plotted a scatter plot comparing thousands of levels' clear rates to their like rates. I was surprised to see no correlation at all.
If the harder levels were more well liked, this graph would show a clear negative correlation. But this scatter plot makes it very clear there is no relationship between like rates and difficulty. The question now is why super-expert levels seem so much more well-liked, even though high difficulty isn't the cause. Therefore, I decided to look at another variable.
When a user uploads a level, they can choose up to two unique tags to describe the level. Some tags are significantly more well liked on average than others, as seen below.
Some key takeaways: music levels, levels that play a song using background sound effects as a player progresses, are by far the most well-liked. To my surprise, Yoshi levels aren't as well-liked as others. But for the sake of this discussion, the following two observations are significant.
领英推荐
First, the color gradient on each bar indicates the average difficulty for each tag, with lighter colors indicating easier levels on average. It's clear that players don't have a preference one way or another for tags with a higher average difficulty. For instance, music levels, the easiest on average, and speedrun levels, among the hardest on average, are both some of the most well-liked tags.
Second, null tag levels, levels that weren't assigned any tag, have by far the lowest like rates. These null tag levels are extremely important because they make up over 90% of all levels. Therefore, they have a major impact on each difficulty's average like rates. I believe these null tag levels are the confounding variable that cause super expert levels to have a surprisingly high average like rate.
While over 90% of levels have no tag, that percentage varies across each difficulty. Among super expert levels, just 83% of levels have no tag. In contrast, among the other difficulties, more than 95% of levels have no tag, as the graph below shows.
Since levels without a tag make up such a large proportion of all levels, they have a huge effect on each difficulty's average like rate. But since super expert levels include relatively fewer null tag levels, the effect on the average like rate is much smaller.
My first graphs shows a 20% average like rate for the hardest super expert levels versus 11% for all other difficulties. When I say that the initial graph is misleading, I'm not saying that the numbers themselves are incorrect. I'm saying that these numbers can give a false impression. Without digging deeper, it's easy to conclude from the first graph that people simply prefer the hardest levels. But the huge difference between super expert levels and other difficulties is really the result of unequal distribution of null tag levels. Looking at the average like rates for each tag, people seem to prefer levels with a distinct identity, such as music and speedrun levels, regardless of difficulty. Levels without a tag in contrast tend to be relatively bland and therefore aren't as likely to be memorable.
Confounding variables can produce misleading results in any kind of analysis. This project was just meant to be for fun and low-stakes. But confounding variables can become a problem the moment I started analyzing this data in Tableau. This is the kind of problem people need to keep in mind for any kind of data analysis.
One final note: I want to reiterate that for this article I only used data from the original Super Mario Maker, not the more recent sequel. While levels with no tag heavily influenced average like rates in the original game, the sequel requires each level to have exactly two tags. Therefore, the results I presented are only applicable to the first game. However, I think that it's important to prioritize giving a level its own distinct identity rather than make it as hard as possible even in the sequel, but that's just my opinion.
Thanks for reading. To wrap up I have some links to my Tableau page, where I created the graphs featured in this article, my GitHub, where I uploaded the SQL scripts I used to clean and prepare the data, and my source for the initial dataset. My Tableau page also has other graphs I did not use for this article.
Tableau: https://public.tableau.com/app/profile/zachary.nabavian/viz/SuperMarioMakerAnalysis/LIkeRatesbyDifficulty
GitHub: https://github.com/Zach-Nabavian/Mario-Maker-SQL-Project
Original Dataset: https://www.kaggle.com/datasets/leomauro/smmnet
CVP and Actuary at NY Life
1 年Very cool!
Data Problem-Solver and Process Improvement Specialist. | Six Sigma Yellow Belt | Tool-Agnostic but recently: SQL, Excel
1 年Love this!