Soccer Stats Showdown

Soccer Stats Showdown

Let's connect! Send me a connection invitation. I regularly share Jupyter Notebooks on Pandas and would love to expand my network.
Explore my profile: Head to my profile to see more about my work, skills, and experience.
If you're feeling generous: Repost this article with your network and help spread the word!


Description:

Imagine you're the Data Scientist for a professional soccer league. The season is heating up, and the league wants to gain insights into team and player performance. They've collected a vast dataset of match statistics, but it's a bit of a mess. That's where you come in!


Tasks:

  • Goal Scoring Analysis: Calculate the top 5 teams with the highest median goal scoring per match, and display the results in a bar chart.
  • Player Performance: Create a histogram showing the distribution of assists among all players in the season.
  • Matchup Insights: Determine the teams with the highest winning percentage when playing at home, and display the results in a sorted table.

Python version 3.11.7 | packaged by Anaconda, Inc. | (main, Dec 15 2023, 18:05:47) [MSC v.1916 64 bit (AMD64)]
Pandas version 2.2.1
Numpy version 1.26.4        

The Data

This dataset contains 500 matches, with team and player performance statistics. Your task is to wrangle this data using pandas to extract valuable insights!


Columns:

  • match_id: a unique identifier for each match.
  • team_home and team_away: IDs of the teams playing in each match, ranging from 0 to 19. These IDs don't correspond to specific team names, but rather represent different teams in the simulation.
  • goals_home and goals_away: the number of goals scored by each team in each match.
  • player_assists and player_goals: the number of assists and goals scored by a player in each match.

Good luck, and have fun!

We are expecting all of the columns to be numeric, let's take a look and verify this.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype
---  ------          --------------  -----
 0   match_id        500 non-null    int32
 1   team_home       500 non-null    int32
 2   team_away       500 non-null    int32
 3   goals_home      500 non-null    int32
 4   goals_away      500 non-null    int32
 5   player_goals    500 non-null    int32
 6   player_assists  500 non-null    int32
dtypes: int32(7)
memory usage: 13.8 KB        


Goal Scoring Analysis:

Calculate the top 5 teams with the highest goal scoring average per match, and display the results in a bar chart.

500 rows × 2 columns

Note: We will try to use the median as the data is not Normally distributed.

You can see that for most of the away matches, the teams seem to score a median of 1 goal. For games where they have a home field advantage, we have a few teams scoring a median of 2 goals per game.

Let's pull out top 5 scoring teams based on the median goals scored at home games. There was a tie for 5th place, team 18 and team 15 both scored a median of 1.5 goals per home game.


Player Performance:

Create a histogram showing the distribution of assists among all players in the season.

As you can see, most players do not have many assists according to the data.

If we take a look at some descriptive statistics, we can conclude that there were very few high scoring/assisting players in the season.


Conclusion

In this tutorial, we analyzed a dataset of soccer match statistics to gain insights into team and player performance. We calculated the top 5 teams with the highest median goal scoring per match and created a histogram showing the distribution of assists among all players.


What You Learned:

  • How to calculate goal scoring averages per match
  • How to visualize data using bar charts and histograms
  • How to extract insights from data


Can you solve the BONUS question?

Matchup Insights: Determine the teams with the highest winning percentage when playing at home, and display the results in a sorted table.

David Rojas, E.I.

17+ years in Tech | Follow me for posts on Data Wrangling

6 个月

?? Free Pandas Course: https://hedaro.gumroad.com/l/tqqfq

回复

要查看或添加评论,请登录

David Rojas, E.I.的更多文章

  • Optimizing Santas Workshop

    Optimizing Santas Workshop

    Let's connect! Send me a connection invitation. I regularly share Jupyter Notebooks on Pandas and would love to expand…

    1 条评论
  • Tourism Trends

    Tourism Trends

    Let's connect! Send me a connection invitation. I regularly share Jupyter Notebooks on Pandas and would love to expand…

  • Customer Purchase Analysis for a Fashion Retailer

    Customer Purchase Analysis for a Fashion Retailer

    Let's connect! Send me a connection invitation. I regularly share Jupyter Notebooks on Pandas and would love to expand…

  • Data Cleaning Job

    Data Cleaning Job

    Let's connect! Send me a connection invitation. I regularly share Jupyter Notebooks on Pandas and would love to expand…

    3 条评论
  • Pandas - GroupBy and Plot

    Pandas - GroupBy and Plot

    Let's connect! Send me a connection invitation. I regularly share Jupyter Notebooks on Pandas and would love to expand…

  • Challenge: "Sales Analysis"

    Challenge: "Sales Analysis"

    Let's connect! Send me a connection invitation. I regularly share Jupyter Notebooks on Pandas and would love to expand…

  • Movie Madness

    Movie Madness

    Let's connect! Send me a connection invitation. I regularly share Jupyter Notebooks on Pandas and would love to expand…

  • How to Export Excel Cells into Text Files

    How to Export Excel Cells into Text Files

    Let's connect! Send me a connection invitation. I regularly share Jupyter Notebooks on Pandas and would love to expand…

  • Analyzing Student Performance

    Analyzing Student Performance

    Let's connect! Send me a connection invitation. I regularly share Jupyter Notebooks on Pandas and would love to expand…

  • Election Insights: Uncovering Voter Trends

    Election Insights: Uncovering Voter Trends

    Let's connect! Send me a connection invitation. I regularly share Jupyter Notebooks on Pandas and would love to expand…

社区洞察