Pokémon - An Analysis of Stats

Pokémon - An Analysis of Stats

The INTRODUCTION

Throughout my data journey so far, I've analyzed all sorts of data: sports data, manufacturing data, hospital/patient data, financial data and even school data. These datasets have been very helpful in growing my skills while allowing me to explore different industries. With this project, it was my time to choose the data.

For this project, I wanted to look at something involving the entertainment/gaming industry - something a bit more fun! Fortunately for me, I came across a Pokémon dataset that I wanted to work with.

Pokémon holds a very special place in my heart, along with the Zelda franchise and World of Warcraft, it's one of the few video games that I have followed for decades.

The DATA

The data consists of 13 columns of information on 1008 Pokémon from Eight Generations and encompasses basic data including Base Stats, Types of Pokémon, Generation and whether or not the Pokémon is a Legendary/Mythic Pokémon.

The Data Dictionary can be found here: Pokémon Data Dictionary.

The information was scraped from?Pokemon DB and Serebii as well as I added the 7th and 8th generations from Bulbapedia. The original dataset is available on Kaggle: Pokémon Dataset.

With this dataset, I aimed to analyze the distribution of Pokémon types across the series, determining the strongest and weakest types as well as identifying the individual starting Pokémon with the highest strength by Generation.


The ANALYSIS

Using Python, I set out to explore the dataset. I imported the typical libraries that you'd find in data analysis: pandas, matplotlib, and seaborn as well as numpy.

No alt text provided for this image
Import Libraries

Upon reading in the CSV file, I called the head, and columns functions as well as checking how many rows and columns were in the dataset:

No alt text provided for this image
Functions - head(), columns() and printed rows/columns.

After checking what I was working with in terms of columns and rows, I wanted to see if there were any null values.

No alt text provided for this image
Function to find Null values.

The results show that there are 519 Pokémon showing Null in the 'Type 2' column. Having domain-knowledge of the Pokémon series, this is not unexpected, as there are Pokémon with only a single Type, so this makes sense to have numerous Nulls in this column. There are no other Null values in this dataset, so it appears to be complete.


Cleaning the Dataset

This dataset is fairly basic and doesn't not require a lot of cleaning, however when looking at the head of the data frame, you can see that there are rows that contain the name of Mega Pokémon. Because we don't need these for this analysis, we need to remove them from the data frame.

No alt text provided for this image
Defining function to find Mega Pokemon.
No alt text provided for this image
Calling custom function to retrieve the list of Names of Pokemon containing "Mega"

Next, I made a new data frame without the Mega Pokémon evolutions and filled all the Null values in the Type 2 column with 0's:

No alt text provided for this image



Moving onto the analysis, first I needed to find how many Pokémon per Generation. I defined a function to do this and displayed each Generation:

No alt text provided for this image
Custom fuction to count how many Pokemon by Generation.


Using matplotlib, I visualized the Generations:

No alt text provided for this image
Bar chart of Pokemon per Generation




Next, we needed to find out how many different types of Pokémon there are. All Pokémon have a primary type, however not all have a secondary type. Here's the breakdown of the two:

No alt text provided for this image
Primary Pokemon Types


No alt text provided for this image
Secondary Pokemon Types

From the two charts, you can see that for the primary type, Water is the most common, followed by Normal and Grass with Flying being the least common. However, when it comes to a Pokémon's secondary type, Flying is the most common. An example of this would be Bug-Flying or Dragon-Flying.

Why is Type important? Knowing a Pokémon's type is an essential aspect of the gameplay and strategy - it determines it's strengths and weaknesses in battle.

Knowing the types of both your own Pokémon and your opponent's Pokémon is critical because certain types are more effective against others. For example, a Water type Pokémon will deal more damage to a Fire type Pokémon, while an Electric type Pokémon will be weak against Ground type Pokémon.

By understanding the strengths and weaknesses of different types, players can make more informed decisions when building their teams and choosing which Pokémon to use in battle.


Next, I wanted to visualize using a heatmap to show the distribution of Type 1 vs. Type 2:

No alt text provided for this image
Heatmap for distribution

You can see that it's more common to have a Normal-Flying Pokémon, followed by Grass-Poison and Bug-Flying. (Note: The first column represents Pokémon with a single type).


Stats Analysis

The heart of our analysis lies in the most intriguing part - examining the starter Pokémon followed by looking at the Legendary Pokémon.

What are starter Pokémon? With each release of a Pokémon game, comes a new Generation of Pokémon. Each of these Generations have "starter" Pokémon - an set of three different types that you can choose from for your first Pokémon.

Let's take a look at which starter Pokémon, from each of the Generations (1 through 9), will give you the best start. A Google search will tell you which Pokémon are included as starters for each Generation. With this information, I created a new data frame by filtering out only the names of the starter Pokémon for each Generation and displayed them with a countplot using the Seaborn library.

The Total Stats are the sum of the following base stats: HP, Attack, Defense, Sp. Atk, Sp. Def and Speed:

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

Based on Total Stats, the strongest starter Pokémon per Generation are:

  • Generation 1: Bulbasaur
  • Generation 2: Chikorita
  • Generation 3: All three starters share the same Total Base Stats
  • Generation 4: Turtwig
  • Generation 5: All three starters share the same Total Base Stats
  • Generation 6: Froakie
  • Generation 7: All three starters share the same Total Base Stats
  • Generation 8: All three starters share the same Total Base Stats
  • Generation 9: All three starters share the same Total Base Stats

What's interesting is how 56% of the Generations have starting Pokémon that share the same Total Base Stats. Upon initial observation of this, one would assume it would not matter which Pokémon you chose, however knowing that the Type of Pokémon makes a difference in battle - it would be then prudent to find out which kinds of Pokémon you would be encountering in the wild and also what kind of Pokémon the first number of Trainers you would battle would have in their Pokédex. This sort of analysis is outside the scope of this project, but I would love to explore at a later time!

Legendary Pokémon

Legendary Pokémon are a group of incredibly rare and often very powerful Pokémon, generally found in the legends and myths of the Pokémon world.

Let's have a look at these Pokémon and see how many of them there are as well as how their stats compare to non-legendary Pokémon.

First, I created a new data frame with only Legendary Pokémon and created a bar chart to visualize this using matplotlib and seaborn:

No alt text provided for this image
Legendary Pokemon per Generation

Now let's compare the Total Stats of Legendary vs Non-legendary Pokemon. I did this using a scatterplot:

No alt text provided for this image
Legendary vs Non-legendary Pokemon - Total Stats

The majority of Legendary Pokémon have higher Total Base Stats, however there are some outliers: Two of the Psychic type Pokémon, Cosmog and Cosmoem, as well as the Fighting type Pokémon Kubfu have lower than average stats expected for a Legendary Pokemon.

You can also see how the distribution is - there are many more blue dots than orange dots, which is to be expected - Legendary Pokémon are intended to be rare.


Next, let's look at the distribution of primary types for the Legendary Pokémon.

No alt text provided for this image

Interestingly, Legendary type Pokémon tend to be Psychic type!


Distribution of Stats

Looking at the Legendary Pokémon stats will let us see how each parameter is distributed. We will look at the following stats: Attack, Defense, Sp. Atk, Sp. Def and Speed. Using distplot in Seaborn, we can plot how each stat would be estimated using histograms and a kernel density estimate plot estimating the PDF (probability density function) of the parameters:

No alt text provided for this image
Custom function to find distribution of parameters.


No alt text provided for this image


No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image


The above histograms show the following stats are likely for a Legendary Pokemon:

  • HP: 93
  • Attack: 110
  • Defense: 98
  • Sp. Atk: 109
  • Sp. Def: 100
  • Speed: 97
  • Total: 607



Non-legendary Pokemon - Types per Generation

Earlier, I had looked at the Pokémon Types overall. I wanted to break it down a bit further to see just how the types varied by Generation.

To do this, I need to make a non-legendary data frame.

No alt text provided for this image


No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

Each Generation presents a predominant primary Type.

Type 1

  • Generation 1: water
  • Generation 2: water
  • Generation 3: water
  • Generation 4: normal
  • Generation 5: water, normal, bug
  • Generation 6: ghost
  • Generation 7: normal
  • Generation 8: water
  • Generation 9: normal

You can see that the majority of Generations are more likely to have water type Pokémon as their primary Type with five out of nine generations listing water as their main Type.

Type 2

Note: The first column, "0", represents Pokémon that have a single primary Type. This column is not included in the below list.

  • Generation 1: poison
  • Generation 2: flying
  • Generation 3: psychic
  • Generation 4: flying
  • Generation 5: flying
  • Generation 6: grass
  • Generation 7: flying
  • Generation 8: ghost, dragon
  • Generation 9: fighting, flying

As secondary Types go, the most likely Type you will find is flying with five out of nine Generations listing flying as the secondary Type of Pokémon.?

Both of these echo the same overall findings as originally noted at the beginning of this analysis.



Dragon Type Pokemon

Lastly, I wanted to look at a specific type of Pokémon: Dragon type. Dragon type Pokémon are considered special in that they are near legendary and thus have higher than average base stats.

I created a new data frame to identify non-legendary Dragon type Pokémon and visualized them by Generation:

No alt text provided for this image
No alt text provided for this image


No alt text provided for this image

You can see the difference between Legendary (red dots) vs non-legendary Dragon types (blue dots) - That Pokémon in the top right is Dragonite, and has better base stats than the first three Legendary Pokémon: Zapdos, Articuno and Moltres.

The CONCLUSION

This notebook was created with nostalgia in mind. I grew up with Pokémon, back in the 90's, and have loved them ever since! Working on this dataset was a lot of fun. It tested my abilities to work with Python and create visualizations of the dataset.

This dataset was quite small, so it would have been a perfect fit to use Excel to analyze the data, however I wanted to focus on showing the skills I've learned over the past 10 weeks in Avery Smith 's DAA bootcamp.

Although this dataset was based on video game characters and their stats, the premise can be translated across any industry. The Pokémon can be thought of as products, and the stats can be swapped for features. It can also be translated for Marketing campaigns and e-commerce, as well as supply chain - how do the products perform? There are endless possibilities and it's exciting in all that can be discovered simply by looking at the data, and asking questions and see where it takes you!?


Thanks so much for taking the time to read this analysis! If you'd like to see more of my journey, please follow me?Lisette Mohammed?to stay connected!

Ebin Prakash

Sr.Consultant MA WMS | IIM Udaipur'24

1 年

I came looking for an article that could help me relate to the concepts taught in Data Analytics, and this one has been helpful in understanding the code and output. More power to you!

回复
CJ Alonzo

Data Analyst | Data Science Volunteer | Cruises | Airlines | Urban Planning | Basketball | Buffets

1 年

I had a pokemon strategy guide for Ruby/Sapphire and Crystal version i always had fun looking at the stats and moves of all the pokemon. Its cool seeing them visualized like this!

回复
Arifeen Saeed

building Fahm | MSDS @ UT | Huth Lab

1 年

Love Pokémon! So cool you made this!

回复
Kim Gasgarth

Data-Driven Controller | Data Analytics | Data Visualization | SQL | Tableau | Excel

1 年

Absolutely amazing!!! You did such a great job!

回复
Niel de Kock

Editor of 'The AI Way' a weekly email newsletter focussed on Education and AI. | Pioneering AI in Education & Self-Learning | Explore AI's Frontier with My Weekly Newsletter |1340+ Subscribers & Growing

2 年

Looks great. Lisette.

回复

要查看或添加评论,请登录

Lisette M.的更多文章

  • The Sports Project - Tableau

    The Sports Project - Tableau

    The INTRODUCTION In sports, analytics are important because they can provide valuable insights that can help teams and…

    6 条评论
  • The Healthcare Project - SQL

    The Healthcare Project - SQL

    The INTRODUCTION Community hospitals not only respond to emergency situations, but also offer important services for…

    6 条评论
  • The Finance Project - SQL

    The Finance Project - SQL

    The INTRODUCTION For this project, I worked with historical data from the International Development Association (IDA)…

    11 条评论
  • The Education Project - Tableau

    The Education Project - Tableau

    The INTRODUCTION For this project, I played the part of a data analyst for the Massachusetts Department of Education…

    4 条评论
  • The DoorDash Project - Excel

    The DoorDash Project - Excel

    I don't know about you, but when I was younger - there were two main choices for food delivery: pizza or chinese. If…

    8 条评论

社区洞察

其他会员也浏览了