Pokémon - An Analysis of Stats
Lisette M.
Customer Care Supervisor | E-Commerce Specialist | Data Analyst | Data Visualization | SQL | Tableau | Excel
The INTRODUCTION
Throughout my data journey so far, I've analyzed all sorts of data: sports data, manufacturing data, hospital/patient data, financial data and even school data. These datasets have been very helpful in growing my skills while allowing me to explore different industries. With this project, it was my time to choose the data.
For this project, I wanted to look at something involving the entertainment/gaming industry - something a bit more fun! Fortunately for me, I came across a Pokémon dataset that I wanted to work with.
Pokémon holds a very special place in my heart, along with the Zelda franchise and World of Warcraft, it's one of the few video games that I have followed for decades.
The DATA
The data consists of 13 columns of information on 1008 Pokémon from Eight Generations and encompasses basic data including Base Stats, Types of Pokémon, Generation and whether or not the Pokémon is a Legendary/Mythic Pokémon.
The Data Dictionary can be found here: Pokémon Data Dictionary.
The information was scraped from?Pokemon DB and Serebii as well as I added the 7th and 8th generations from Bulbapedia. The original dataset is available on Kaggle: Pokémon Dataset.
With this dataset, I aimed to analyze the distribution of Pokémon types across the series, determining the strongest and weakest types as well as identifying the individual starting Pokémon with the highest strength by Generation.
The ANALYSIS
Using Python, I set out to explore the dataset. I imported the typical libraries that you'd find in data analysis: pandas, matplotlib, and seaborn as well as numpy.
Upon reading in the CSV file, I called the head, and columns functions as well as checking how many rows and columns were in the dataset:
After checking what I was working with in terms of columns and rows, I wanted to see if there were any null values.
The results show that there are 519 Pokémon showing Null in the 'Type 2' column. Having domain-knowledge of the Pokémon series, this is not unexpected, as there are Pokémon with only a single Type, so this makes sense to have numerous Nulls in this column. There are no other Null values in this dataset, so it appears to be complete.
Cleaning the Dataset
This dataset is fairly basic and doesn't not require a lot of cleaning, however when looking at the head of the data frame, you can see that there are rows that contain the name of Mega Pokémon. Because we don't need these for this analysis, we need to remove them from the data frame.
Next, I made a new data frame without the Mega Pokémon evolutions and filled all the Null values in the Type 2 column with 0's:
Moving onto the analysis, first I needed to find how many Pokémon per Generation. I defined a function to do this and displayed each Generation:
Using matplotlib, I visualized the Generations:
Next, we needed to find out how many different types of Pokémon there are. All Pokémon have a primary type, however not all have a secondary type. Here's the breakdown of the two:
From the two charts, you can see that for the primary type, Water is the most common, followed by Normal and Grass with Flying being the least common. However, when it comes to a Pokémon's secondary type, Flying is the most common. An example of this would be Bug-Flying or Dragon-Flying.
Why is Type important? Knowing a Pokémon's type is an essential aspect of the gameplay and strategy - it determines it's strengths and weaknesses in battle.
Knowing the types of both your own Pokémon and your opponent's Pokémon is critical because certain types are more effective against others. For example, a Water type Pokémon will deal more damage to a Fire type Pokémon, while an Electric type Pokémon will be weak against Ground type Pokémon.
By understanding the strengths and weaknesses of different types, players can make more informed decisions when building their teams and choosing which Pokémon to use in battle.
Next, I wanted to visualize using a heatmap to show the distribution of Type 1 vs. Type 2:
You can see that it's more common to have a Normal-Flying Pokémon, followed by Grass-Poison and Bug-Flying. (Note: The first column represents Pokémon with a single type).
Stats Analysis
The heart of our analysis lies in the most intriguing part - examining the starter Pokémon followed by looking at the Legendary Pokémon.
What are starter Pokémon? With each release of a Pokémon game, comes a new Generation of Pokémon. Each of these Generations have "starter" Pokémon - an set of three different types that you can choose from for your first Pokémon.
Let's take a look at which starter Pokémon, from each of the Generations (1 through 9), will give you the best start. A Google search will tell you which Pokémon are included as starters for each Generation. With this information, I created a new data frame by filtering out only the names of the starter Pokémon for each Generation and displayed them with a countplot using the Seaborn library.
The Total Stats are the sum of the following base stats: HP, Attack, Defense, Sp. Atk, Sp. Def and Speed:
Based on Total Stats, the strongest starter Pokémon per Generation are:
What's interesting is how 56% of the Generations have starting Pokémon that share the same Total Base Stats. Upon initial observation of this, one would assume it would not matter which Pokémon you chose, however knowing that the Type of Pokémon makes a difference in battle - it would be then prudent to find out which kinds of Pokémon you would be encountering in the wild and also what kind of Pokémon the first number of Trainers you would battle would have in their Pokédex. This sort of analysis is outside the scope of this project, but I would love to explore at a later time!
领英推荐
Legendary Pokémon
Legendary Pokémon are a group of incredibly rare and often very powerful Pokémon, generally found in the legends and myths of the Pokémon world.
Let's have a look at these Pokémon and see how many of them there are as well as how their stats compare to non-legendary Pokémon.
First, I created a new data frame with only Legendary Pokémon and created a bar chart to visualize this using matplotlib and seaborn:
Now let's compare the Total Stats of Legendary vs Non-legendary Pokemon. I did this using a scatterplot:
The majority of Legendary Pokémon have higher Total Base Stats, however there are some outliers: Two of the Psychic type Pokémon, Cosmog and Cosmoem, as well as the Fighting type Pokémon Kubfu have lower than average stats expected for a Legendary Pokemon.
You can also see how the distribution is - there are many more blue dots than orange dots, which is to be expected - Legendary Pokémon are intended to be rare.
Next, let's look at the distribution of primary types for the Legendary Pokémon.
Interestingly, Legendary type Pokémon tend to be Psychic type!
Distribution of Stats
Looking at the Legendary Pokémon stats will let us see how each parameter is distributed. We will look at the following stats: Attack, Defense, Sp. Atk, Sp. Def and Speed. Using distplot in Seaborn, we can plot how each stat would be estimated using histograms and a kernel density estimate plot estimating the PDF (probability density function) of the parameters:
The above histograms show the following stats are likely for a Legendary Pokemon:
Non-legendary Pokemon - Types per Generation
Earlier, I had looked at the Pokémon Types overall. I wanted to break it down a bit further to see just how the types varied by Generation.
To do this, I need to make a non-legendary data frame.
Each Generation presents a predominant primary Type.
Type 1
You can see that the majority of Generations are more likely to have water type Pokémon as their primary Type with five out of nine generations listing water as their main Type.
Type 2
Note: The first column, "0", represents Pokémon that have a single primary Type. This column is not included in the below list.
As secondary Types go, the most likely Type you will find is flying with five out of nine Generations listing flying as the secondary Type of Pokémon.?
Both of these echo the same overall findings as originally noted at the beginning of this analysis.
Dragon Type Pokemon
Lastly, I wanted to look at a specific type of Pokémon: Dragon type. Dragon type Pokémon are considered special in that they are near legendary and thus have higher than average base stats.
I created a new data frame to identify non-legendary Dragon type Pokémon and visualized them by Generation:
You can see the difference between Legendary (red dots) vs non-legendary Dragon types (blue dots) - That Pokémon in the top right is Dragonite, and has better base stats than the first three Legendary Pokémon: Zapdos, Articuno and Moltres.
The CONCLUSION
This notebook was created with nostalgia in mind. I grew up with Pokémon, back in the 90's, and have loved them ever since! Working on this dataset was a lot of fun. It tested my abilities to work with Python and create visualizations of the dataset.
This dataset was quite small, so it would have been a perfect fit to use Excel to analyze the data, however I wanted to focus on showing the skills I've learned over the past 10 weeks in Avery Smith 's DAA bootcamp.
Although this dataset was based on video game characters and their stats, the premise can be translated across any industry. The Pokémon can be thought of as products, and the stats can be swapped for features. It can also be translated for Marketing campaigns and e-commerce, as well as supply chain - how do the products perform? There are endless possibilities and it's exciting in all that can be discovered simply by looking at the data, and asking questions and see where it takes you!?
Thanks so much for taking the time to read this analysis! If you'd like to see more of my journey, please follow me?Lisette Mohammed?to stay connected!
Sr.Consultant MA WMS | IIM Udaipur'24
1 年I came looking for an article that could help me relate to the concepts taught in Data Analytics, and this one has been helpful in understanding the code and output. More power to you!
Data Analyst | Data Science Volunteer | Cruises | Airlines | Urban Planning | Basketball | Buffets
1 年I had a pokemon strategy guide for Ruby/Sapphire and Crystal version i always had fun looking at the stats and moves of all the pokemon. Its cool seeing them visualized like this!
building Fahm | MSDS @ UT | Huth Lab
1 年Love Pokémon! So cool you made this!
Data-Driven Controller | Data Analytics | Data Visualization | SQL | Tableau | Excel
1 年Absolutely amazing!!! You did such a great job!
Editor of 'The AI Way' a weekly email newsletter focussed on Education and AI. | Pioneering AI in Education & Self-Learning | Explore AI's Frontier with My Weekly Newsletter |1340+ Subscribers & Growing
2 年Looks great. Lisette.