Premier League 2022/23: A Data Analysis - Part I

Premier League 2022/23: A Data Analysis - Part I


This series and analyses is still work in progress. You can follow the progress of the analysis on my Github. Of course you can follow the progress of the series here.

I am a huge football (soccer) fan, and I was curious about the story behind the 2022/2023 season. Luckily, I found this dataset on Kaggle and decided to dive in. What do the numbers say? Are they indicative of where a team would finish in the league?

Side note: The dataset did not include the final positions of the teams. I should have added this to the dataset and run some correlation analysis, but it is what it is. This might not be a bad thing; I can lean more on narrative as opposed to statistics.

This is a Narrative!

A little on the methodology, I did not know what I was looking for at the start, so I went through the data files and sorted out the data to look for trends. Kinda like solving a puzzle without ever seeing what it looks like on the box, you don't have a concrete idea of what the finished item should be but you can piece it together.


Shooting

The first dataset I analyzed was the file with the shooting numbers.

The first thing I wanted to find out if the shooting stats were somehow predictive about where clubs finish, so I went through the categories Shots, Shots on Target, Expected Goals (xG) and Goals--actual goals and ranked the top 5. Here is what I saw:


Shots

Surprisingly Brighton had the most shots in the league. I have these mental model that more shots = more goals, more goals = more wins, based on this napkin statistics the teams who finished highest shots are more likely to finish highest in the league.

Looking at the top 5 for shots, this logic mostly tracks. 4 of teams in the top 5 for shots all finished in the top 5 except for Brighton.

Shots On Target

The same teams were in the top 5 for Shots on Target and Brighton led the pack --again!

Expected Goals(xG)

Expected Goals(xG) is a controversial statistic and not very well understood so I think it is worth defining it and looking at how it is calculated. I would recommend reading the article blow.


Expected Goals (xG) is a metric designed to measure the probability of a shot resulting in a goal. An xG model uses historical information from thousands of shots with similar characteristics to estimate the likelihood of a goal on a scale between 0 and 1. For example, a shot with an xG value of 0.2 is one that we would generally expect to be converted twice in every 10 attempts. An xG model uses historical information from thousands of shots with similar characteristics to estimate the likelihood of a goal on a scale between 0 and 1. For example, a shot with an xG value of 0.2 is one that we would generally expect to be converted twice in every 10 attempts. - StatsBomb
How is xG calculated? Each xG model has its own characteristics, but these are the main factors that have traditionally been fed into the large majority of Expected Goals models: distance to goal, angle to goal, body part with which the shot was taken, and type of assist or previous action (throughball, cross, set-piece, dribble, etc…). Based on historical information of shots with similar characteristics, the xG model then attributes a value between 0 and 1 to each shot that expresses the probability of it producing a goal. - StatsBomb

Goals Scored


The main stat. Man City and Arsenal led the pack, logic tracks since they finished 1st and 2nd.

Liverpool finished 3rd in goals but finished 5th in league, without digging deep into the data you can assume they also gave a lot of goals away (we would see when the defensive statistics are analyzed.

Brighton again.

For the first time we see Tottenham Hotspurs (Spurs). This caught me by surprise but they well over performed their xG . They finished 8th so most likely shipped in a lot of goals too.


Shooting Overview

Brighton

The most interesting piece of looking at the shooting data was Brighton, from the shooting statistics we can infer that Brighton were one of the best teams in the league, if not the best going forward and creating shooting changes. They had the most Shots, Shots On Target. Second to champions Man City on Expected Goals(xG). From the myopic view of shooting statistics you would expect this team to be challenging for the title, they actually finished 6th. One important nuance, they slightly underperformed their Expected Goals(they scored less goals than expected based on the shooting chances they had) maybe this is is a potential reason they did not challenge for the league. I don't know how much of a correlation that is because only 2 teams (Manchester City and Arsenal) that outperformed their expected goals finished above Brighton. More on Arsenal to come.

I guess that can be an indicator of a great side, it was clear that Man City and Arsenal were the premier teams in the league last season so it makes sense not only that they created a lot of quality shooting chances (Top 5 in Shots, Shot on Target, Expected Goals(xG) and Goals) but they outperformed expectations in the regard.

As we look at other categories I would pay attention to Brighton, as I am intrigued to see if we can define their style of play from the data.


Arsenal

From a chance creation and conversion perspective. Arsenal did not only have high volume but they were the most efficient. For a team that was 5th in Shots, Shots on Target and Expected Goals(xG) but finished 2nd in Goals scored. They had the highest Goal per shot ratio and Goal per shot on target ratio. In essence they were clinical. This is a new Arsenal from the one we have been used to for the last decade plus. I knew they created a lot of chances but I did not expect them to be the most efficient scorers in the league.


Others

In efficiency of conversion of Shots and Shots on target. Man City and Arsenal were ahead of the pack, both scored more goals than expected. In the scatter plots below they are the 2 rightmost values. There is clear separation from the pack.

I also did analysis on teams that over performed their xG and those that underperformed. that can be found in the Github repo.


Thanks for reading, the next analysis would be on possession statistics.



Fascinating project, looking forward to seeing the insights your analysis of the Premier League season data will reveal!

回复
Mike Bynum, MBA

?? Strategic Executive | Growth-Focused Leader | Driving Innovation in Defense, Technology & Deep Tech

1 年

Intetesting. I’d be interested to see the correlation between the data and where they are on the table and also different periods during the season. I believe MUs shots on goals and goals have been multiples over the last 5 matches vs the rest of the season.

回复

要查看或添加评论,请登录

Ayomide Aremu-Cole的更多文章

  • Vibe Coding: Building an AI Image Comparison App

    Vibe Coding: Building an AI Image Comparison App

    Check out the app here My understanding of vibe coding is that you give a coding agent a set of instructions and you…

  • The Gallery - Part I (maybe//)

    The Gallery - Part I (maybe//)

    When DALL-E first emerged over a year ago, I generated some images on the platform and thought it was incredibly cool…

  • Introducing On Building

    Introducing On Building

    To start this off, I want to thank everyone who has been a part of my life so far—especially those who have mentored me…

  • the Brain and the Computer

    the Brain and the Computer

    What the Brain Teaches Us About AI The dominant idea in AI today is that scale is all you need. Larger datasets, bigger…

    1 条评论
  • Musings on Investing & Markets

    Musings on Investing & Markets

    THIS IS NOT INVESTMENT ADVICE Just to be clear before I get started, I’m not some superstar investor, and I don’t have…

    1 条评论
  • Building A Data Platform

    Building A Data Platform

    This blog based on an exercise in the book Fundamentals of Analytics Engineering Building on the previous edition about…

    1 条评论
  • Back to Basics - Designing a Data Model for a Sports Information System

    Back to Basics - Designing a Data Model for a Sports Information System

    If you've read previous editions of this blog, you know I mostly analyze datasets using various techniques. For this…

    2 条评论
  • Premier League 2022/23: A Data Analysis - Part IV

    Premier League 2022/23: A Data Analysis - Part IV

    This series and analyses is still work in progress. You can follow the progress of the analysis on my Github.

    3 条评论
  • With Respect to Spreadsheets

    With Respect to Spreadsheets

    Effective knowledge management is crucial for organizations on many fronts. When considering data retention, transfer…

  • Innovation is not about ideas, it’s about necessity.

    Innovation is not about ideas, it’s about necessity.

    We've all heard about Steve Jobs and how his endless curiosity and ideas kickstarted the personal computer revolution…

    3 条评论

社区洞察

其他会员也浏览了