Runners are not “normal”!

Runners are not “normal”!

Last week I ran the Paris-Versailles race for the first time. It wasn’t planned. I decided to run on the day before following a chain of events of no interest to this short post. It was a great run, a nice atmosphere (starting line is almost below the Eiffel tower, had a bunch of friends also running, weather was perfect...) and with a pretty good result as the cherry on the cake. I was running under one of my neighbor’s name who could not finally run so the spectators were encouraging this guy called Alex…

A few days after, I went to the race website to check on the results and ended up with that PDF file containing entries for the 22+ kilo-folks who ran on that day. Then, you know, I felt this irresistible urge of turning that PDF into a TXT file and write some Python scripts to crunch all these data (no! My life is not boring, I do plently of other cool things :-). So I computed histograms, cumulative distributions and a whole bunch of statistics, categories by categories, and so on… The results and insights were reasonably interesting (to me) but not enough to justify writing a LinkedIn post.

Still, a little fact was fun enough: if you just look at the probability density function of the times of these 22+ kilo-runners you get the purple-colored curve below. 

It is fairly asymmetric as the left hand side of the curve is more packed than its right hand side which has a fatter tail. So the faster-than-average runners tend to be more packed than the slower ones. Interestingly, at the Paris-Versailles, runners are unleashed every five minutes by waves of a few hundreds by order of arrival, so each wave mixes faster and slower runners and, hence, the faster folks don’t end up emulating themselves by pursuing each other.

This curve is not what should be observed if the times were following a Gaussian distribution having the same empirical mean and variance, as plotted in green above (considering a mixture of Gaussians, on a per category basis, does not significantly improve the fit). I have not pushed as far as backing up this claim with a proper Kolmogorov-Smirnov test considering that, provided the scale of the sample, the above graphical evidence was enough to make the following point.

So, in proper statistical terminology, this empirically establishes the fact that runners are not normal which I guess won’t surprise the nonrunners.

Eventually, I completed the non-flat 16.2 km in 1 h 10 mins and ended up 870th on… 22174 participants (hey! That's in the top 4% :-). Not bad for a veteran who already ran 53 km (plus ~40 km of cycling) since the beginning of that same week. Here’s the relive of the run.

Probably a better way of finding what distribution it is, is a kernel density estimation transform, using a normal kernel. I enjoyed a lot, thanks :)

回复

félicitations Renaud. A bient?t. dominique

回复
Etienne Hamelin

Chef de projet chez CEA List

7 年

It looks like a Gumbel distribution, doesn't it?

Lilia Zaourar Koutchoukali

Experte Co-design chez CEA

7 年

Congratulations

回复

要查看或添加评论,请登录

Renaud Sirdey的更多文章

社区洞察

其他会员也浏览了