"But we deserved to win..."
It's the 93rd minute, 0-0 away from home despite enjoying the lion's share of possession, tackles and touches in the final third. Your 76th-minute substitute loses the ball in midfield, the opposing winger breaks away and rattles the ball into the top corner. You lie down in distress at the final whistle, ignoring your half-drunk beverage and What's App messages. Sound familiar?
With my recent downtime between contracts, exploring R in a topical manner has punctuated Thomas Tuchel's Chelsea, 3-4-2-1 and Roy Keane's Sky Sports punditry. I will show you how to build a post-game analysis worthy enough to silence any pundit, except Roy. It only takes five minutes, two functions, someone French (not Giroud!), RStudio IDE and base-level knowledge.
"In statistics, a Poisson distribution is a probability distribution that can be used to show how many times an event is likely to occur within a specified period of time?. In other words, it is a count distribution. Named after Siméon Denis Poisson."
The key to a "quick win today" is the assumption of random occurrence, independence and probability of an event occurring in a given interval do not vary with time (debatable with fatigue, but can work for the purpose of the post). David Sumpter, Professor and author of Soccermatics considers it to be random and that's good enough for today.
What is xG in football & how is the statistic calculated? It's even used by data and tech guru, Nate Silver! xG for a match is available on various websites, but I find FiveThirtyEight a good source with a couple of options:
- Adjusted goals take into account that not all goals are created equal: A team’s final score is reduced if a goal comes late in a game that it’s leading or when an opponent is a man down. (Meanwhile, goals that are scored in regular situations are adjusted upward to balance out the total number of goals across a league.)
- Shot-based expected goals (xG) is an estimate of how many goals a team could have scored given the location of its shots and the players who took them.
- Non-shot expected goals is an estimate of how many goals a team could have scored given their nonshooting actions in and around their opponent’s penalty area."
I'm happy for you to pick whatever:
- Load your packages: All fairly standard - load as below.
2. Building the Grid function:
3. Building the Map function:
4. Run the script and input your teams, goals and source to the console via the Map function:
When inputting shot-based and non-shot xG from Sunday's Manchester derby, we can analyse that despite losing 2-0, Pep Guardiola had some grounds to feel the result flattered United.
When inputting non-shot xG from Sunday's Manchester derby:
Maybe Big Sam was right after all?
Currently available for contract roles using R and Python in the data analytics and science space across the UK. If you're interested in learning what this can bring to your business or client, contact me on [email protected] for more information.
Head of UK Operations at Spokesafe
3 年Oliver Radford