Premier League 2022/23: A Data Analysis - Part I
This series and analyses is still work in progress. You can follow the progress of the analysis on my Github. Of course you can follow the progress of the series here.
I am a huge football (soccer) fan, and I was curious about the story behind the 2022/2023 season. Luckily, I found this dataset on Kaggle and decided to dive in. What do the numbers say? Are they indicative of where a team would finish in the league?
Side note: The dataset did not include the final positions of the teams. I should have added this to the dataset and run some correlation analysis, but it is what it is. This might not be a bad thing; I can lean more on narrative as opposed to statistics.
This is a Narrative!
A little on the methodology, I did not know what I was looking for at the start, so I went through the data files and sorted out the data to look for trends. Kinda like solving a puzzle without ever seeing what it looks like on the box, you don't have a concrete idea of what the finished item should be but you can piece it together.
Shooting
The first dataset I analyzed was the file with the shooting numbers.
The first thing I wanted to find out if the shooting stats were somehow predictive about where clubs finish, so I went through the categories Shots, Shots on Target, Expected Goals (xG) and Goals--actual goals and ranked the top 5. Here is what I saw:
Shots
Surprisingly Brighton had the most shots in the league. I have these mental model that more shots = more goals, more goals = more wins, based on this napkin statistics the teams who finished highest shots are more likely to finish highest in the league.
Looking at the top 5 for shots, this logic mostly tracks. 4 of teams in the top 5 for shots all finished in the top 5 except for Brighton.
Shots On Target
The same teams were in the top 5 for Shots on Target and Brighton led the pack --again!
Expected Goals(xG)
Expected Goals(xG) is a controversial statistic and not very well understood so I think it is worth defining it and looking at how it is calculated. I would recommend reading the article blow.
Expected Goals (xG) is a metric designed to measure the probability of a shot resulting in a goal. An xG model uses historical information from thousands of shots with similar characteristics to estimate the likelihood of a goal on a scale between 0 and 1. For example, a shot with an xG value of 0.2 is one that we would generally expect to be converted twice in every 10 attempts. An xG model uses historical information from thousands of shots with similar characteristics to estimate the likelihood of a goal on a scale between 0 and 1. For example, a shot with an xG value of 0.2 is one that we would generally expect to be converted twice in every 10 attempts. - StatsBomb
How is xG calculated? Each xG model has its own characteristics, but these are the main factors that have traditionally been fed into the large majority of Expected Goals models: distance to goal, angle to goal, body part with which the shot was taken, and type of assist or previous action (throughball, cross, set-piece, dribble, etc…). Based on historical information of shots with similar characteristics, the xG model then attributes a value between 0 and 1 to each shot that expresses the probability of it producing a goal. - StatsBomb
Goals Scored
The main stat. Man City and Arsenal led the pack, logic tracks since they finished 1st and 2nd.
Liverpool finished 3rd in goals but finished 5th in league, without digging deep into the data you can assume they also gave a lot of goals away (we would see when the defensive statistics are analyzed.
Brighton again.
For the first time we see Tottenham Hotspurs (Spurs). This caught me by surprise but they well over performed their xG . They finished 8th so most likely shipped in a lot of goals too.
Shooting Overview
Brighton
The most interesting piece of looking at the shooting data was Brighton, from the shooting statistics we can infer that Brighton were one of the best teams in the league, if not the best going forward and creating shooting changes. They had the most Shots, Shots On Target. Second to champions Man City on Expected Goals(xG). From the myopic view of shooting statistics you would expect this team to be challenging for the title, they actually finished 6th. One important nuance, they slightly underperformed their Expected Goals(they scored less goals than expected based on the shooting chances they had) maybe this is is a potential reason they did not challenge for the league. I don't know how much of a correlation that is because only 2 teams (Manchester City and Arsenal) that outperformed their expected goals finished above Brighton. More on Arsenal to come.
I guess that can be an indicator of a great side, it was clear that Man City and Arsenal were the premier teams in the league last season so it makes sense not only that they created a lot of quality shooting chances (Top 5 in Shots, Shot on Target, Expected Goals(xG) and Goals) but they outperformed expectations in the regard.
As we look at other categories I would pay attention to Brighton, as I am intrigued to see if we can define their style of play from the data.
Arsenal
From a chance creation and conversion perspective. Arsenal did not only have high volume but they were the most efficient. For a team that was 5th in Shots, Shots on Target and Expected Goals(xG) but finished 2nd in Goals scored. They had the highest Goal per shot ratio and Goal per shot on target ratio. In essence they were clinical. This is a new Arsenal from the one we have been used to for the last decade plus. I knew they created a lot of chances but I did not expect them to be the most efficient scorers in the league.
Others
In efficiency of conversion of Shots and Shots on target. Man City and Arsenal were ahead of the pack, both scored more goals than expected. In the scatter plots below they are the 2 rightmost values. There is clear separation from the pack.
I also did analysis on teams that over performed their xG and those that underperformed. that can be found in the Github repo.
Thanks for reading, the next analysis would be on possession statistics.
Fascinating project, looking forward to seeing the insights your analysis of the Premier League season data will reveal!
?? Strategic Executive | Growth-Focused Leader | Driving Innovation in Defense, Technology & Deep Tech
1 年Intetesting. I’d be interested to see the correlation between the data and where they are on the table and also different periods during the season. I believe MUs shots on goals and goals have been multiples over the last 5 matches vs the rest of the season.