Bayes cannot go alone to Qatar 2022

Bayes cannot go alone to Qatar 2022

There are only a few days left for the kick-off of Qatar 2022. Soccer arouses passions and that is clear. And following the traditional phrase "it is an invention of the English that the Germans always win" it is relevant to understand how to approach predicting the results of this championship.

From my perspective, it is interesting not only because of the passion that Uruguayans have for soccer, but also because it involves a statistical analysis of an event where there are clear rules, which makes it easier to know what is good and what is bad.

But modeling the behavior of a soccer championship is very complex. Why is it so? Because many times Bayes does not participate in the championship. In other words, what happens beforehand does not determine at all what will happen in a match or even in a final. Therefore, this independence from the past directly affects statistical modeling. If you don't believe me, ask PSG in the 2022 Champions League against Real Madrid, or Barcelona in the 2020 Champions League against Bayern Munich. And several examples that surely you know as well as I do.

In recent weeks, the BCA Research analysis has become "famous" as they predicted that Argentina will be the new world champion in Qatar 2022. (btw, Messi deserves it). But let's go into the analysis of the model a bit. They define the following:

“We stick to the same framework we used in 2018 for this 2nd edition: A unique two-step model combining macro and micro variables as well as our very own soccertise . However, not satisfied with our 2018 forecasting record, we improved the model in three ways:”

  1. We refined our methodology by adding a goal difference estimator for the group stage . We also calculated the precise probability that each team has in finishing first , second, third or last in their group.
  2. We modified some of the key player variables that our model uses in the knockout stages . Football tactics have changed dramatically over the past few years, making historical relationships unreliable. We account for this with our new variable choices.
  3. We added two control variables to our model : The Home Advantage Dummy and the Winner's Curse Dummy.

And they use this Data to model:

“We rely on the database of player statistics used in Electronic Arts (EA) Sports FIFA video game simulation. Our sample now includes all matches played during the 2006, 2010, 2014, and 2018 FIFA World Cups, with 192 group stage games and 64 knockout stage”.

You can look at the model in detail in the following link.

As background we have the analysis presented here, where it was shown that Germany would be the champion in Russia 2018 (which also became very famous in the media). And as we all know, the 2018 champion was France, and Germany was eliminated in the group stage.

No hay texto alternativo para esta imagen

Other theoretical investigations show us different analysis methodologies.

No hay texto alternativo para esta imagen

But the important conclusion is that it is very difficult to use Bayesian models for chaotic behavior. That is why it is recommended, for this type of analysis, in the Bayesian case, to combine it with a surrogate (or complementary) model based on polynomial chaos expansion for a computationally efficient inverse analysis.

Chaos theory is the branch of mathematics, physics, and other sciences (biology, meteorology, economics, among them) that deals with certain types of complex systems and nonlinear dynamical systems that are highly sensitive to variations in initial conditions. Small variations in these initial conditions can imply large differences in future behavior, making long-term prediction impossible. This happens even though these systems are strictly deterministic, that is; its behaviour can be completely determined by knowing its initial conditions. I recommend the book “Chaos Theory in Economics: Methods, Models and Evidence”

This happens even though these systems are strictly deterministic, that is; its behaviour can be completely determined by knowing its initial conditions. I recommend this book “Chaos Theory in Economics: Methods, Models and Evidence”This approach has several drawbacks. The uncertainties, probability distribution functions (PDF), lower and upper bounds of the input parameters may not be known a priori at the micro level (psychological situation of the players, social situation of the team, and a huge amount of qualitative variables), and they are based on personal opinions, ad-hoc expert judgment, or assumptions with no scientific or mathematical basis.

The lack of uncertainty information is a challenge for the forward UQ (uncertainty quantification) process. This challenge can be tackled by inverse uncertainty quantification (IUQ). IUQ seeks to quantify the uncertainties of input parameters based on available experimental data and code predictions.

The information on input parameters may not be measured directly. However, they can be quantified based on measured data that is connected indirectly to the input parameters by a computational model. Another subject of interest within the context of IUQ is model calibration. In model calibration, the input parameters of a simulation code are recalculated/updated in a deterministic or statistical framework to improve the agreement between calculation results and experimental data. When code results show very good agreement with experiment, model calibration may not be required. However, IUQ is still needed to quantify uncertainties in the uncertain input parameters. In the following paper we have an approach to the treatment with high uncertainty about the history of the explanatory variables.

No hay texto alternativo para esta imagen
No hay texto alternativo para esta imagen

Therefore, it is extremely complex to try to determine ex ante who will be the champion. Especially when we take a relatively long time horizon. In this case, more than 15 days (approx). Although that period of time may increase, it would depend on the real data that is extracted in the first minutes of the world cup through structured and unstructured data that arise from the competition (images, IoT, results, performance of the players in Qatar, etc).

This type of behavioral restrictions cannot only be applied to soccer, or similar sports, but could also be applied to other types of economic scenarios, such as crises, black swans, or situations that are better explained by models of chaos than of rationality. (in the long run).

Although there is prior information that can give us some ideas, soccer will continue to be a difficult sport to predict, which makes it incredibly passionate. I think we can say that. See you on december 18th.

Luis Porto

Asesor Principal Estrategia y Desarrollo Organizacional (CSO) Organizacion de Estados Americanos

2 年

Los predictores son de afuera y los de afuera son de palo

Rainer Toifl-Dupin

Global & Strategic Partnerships Manager at Coface

2 年

"so you're a rocket scientist, how did you end up here?" "it's all just numbers really" https://www.youtube.com/watch?v=CthnrsU53LI 1:30?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了