Soccer Stats Showdown
Let's connect! Send me a connection invitation. I regularly share Jupyter Notebooks on Pandas and would love to expand my network.
Explore my profile: Head to my profile to see more about my work, skills, and experience.
If you're feeling generous: Repost this article with your network and help spread the word!
Description:
Imagine you're the Data Scientist for a professional soccer league. The season is heating up, and the league wants to gain insights into team and player performance. They've collected a vast dataset of match statistics, but it's a bit of a mess. That's where you come in!
Tasks:
Python version 3.11.7 | packaged by Anaconda, Inc. | (main, Dec 15 2023, 18:05:47) [MSC v.1916 64 bit (AMD64)]
Pandas version 2.2.1
Numpy version 1.26.4
The Data
This dataset contains 500 matches, with team and player performance statistics. Your task is to wrangle this data using pandas to extract valuable insights!
Columns:
Good luck, and have fun!
We are expecting all of the columns to be numeric, let's take a look and verify this.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 match_id 500 non-null int32
1 team_home 500 non-null int32
2 team_away 500 non-null int32
3 goals_home 500 non-null int32
4 goals_away 500 non-null int32
5 player_goals 500 non-null int32
6 player_assists 500 non-null int32
dtypes: int32(7)
memory usage: 13.8 KB
Goal Scoring Analysis:
Calculate the top 5 teams with the highest goal scoring average per match, and display the results in a bar chart.
500 rows × 2 columns
Note: We will try to use the median as the data is not Normally distributed.
You can see that for most of the away matches, the teams seem to score a median of 1 goal. For games where they have a home field advantage, we have a few teams scoring a median of 2 goals per game.
Let's pull out top 5 scoring teams based on the median goals scored at home games. There was a tie for 5th place, team 18 and team 15 both scored a median of 1.5 goals per home game.
Player Performance:
Create a histogram showing the distribution of assists among all players in the season.
As you can see, most players do not have many assists according to the data.
If we take a look at some descriptive statistics, we can conclude that there were very few high scoring/assisting players in the season.
Conclusion
In this tutorial, we analyzed a dataset of soccer match statistics to gain insights into team and player performance. We calculated the top 5 teams with the highest median goal scoring per match and created a histogram showing the distribution of assists among all players.
What You Learned:
Can you solve the BONUS question?
Matchup Insights: Determine the teams with the highest winning percentage when playing at home, and display the results in a sorted table.
17+ years in Tech | Follow me for posts on Data Wrangling
6 个月?? Free Pandas Course: https://hedaro.gumroad.com/l/tqqfq