Are we getting faster?
A few people have challenged me this year, suggesting that the predictions are not as accurate as they might be and I wondered if this had any validity.
I`d been thinking for some time that I might need to make some adjustment for time trailing's general progression - you might expect riders are faster now than they were 2 or 3 years ago. Its certain they are faster than 10 years ago, just look back at all the records.
The rankings themselves are validated by everyone else so they wont be affected by this - if everyone gets better your ranking within the group will still be the same. However predictions are based on past performance on the course for a given ranking so should change with the times.
Here`s what I found.
I have 3 years in which I've been producing predictions in advance of the event. They are based however on the last 5 years data. So in year 1 (ie 2019) I had data form 2017,18 and 19 to work with. Now I have 2017-2021.
To get the predictions I take a graph of rank against time for each course, using all the data I've got, to get a line of best fit from which I can see what the average time would be for any given rank currently.
My data is skewed to the early years as the number of events from 2017-19 was over 800 a year and 2020 was just under 200 and 2021 so far 550.So if you average this out the mid point is probably about the end of 2018.
This suggests I`m using the performance of the competitors from late 2018 to set the predictions today.
Will that have an impact on the results?
I've ignored hill climbs and obviously events I didn't predict and screened out massive outliers (for crazy fast or slow days or events with changed courses) by excluding those rides with a variance of more than 10 seconds per mile from the prediction.
In 2019 I have 22963 rides. 175 i got spot on. Of the others 49.95% were faster than predicted and 50.05% slower.
In 2020 I have 5370 rides. 41 I got spot on. Of the others 52.9% were faster than predicted and 47.1% slower.
In 2021 I have 13898 rides. 90 I got spot on. Of the others 62.9% were faster than predicted and 37.1% slower.
This looks wild and had me worried so I dug a bit further.
You would expect (hopefully) that most riders will be close to their prediction if the system is any good at all. So how far off are they?
Well, if you look at the riders who missed their prediction by LESS than 10 seconds per mile (well over 95% of all riders) and see by how much on average they missed it.
Year Beat Lost
2019 2.99 s/m -3.18 s/m
2020 3.15 s/m -3.17 s/m
2021 3.34 s/m -3.00 s/m
So each column is getting bigger ie times are moving ahead of the predictions, but not by much. This would suggest over the 3 years riders have got on average somewhere between 0.18 and 0.35 seconds per mile faster, which would translate to 3.5 seconds in a 10 and nearly 10 seconds in a 25. Does this seem feasible?
And what about the huge swing in percentages - well, if I was right 100% of the time and you all got 1 second faster I'd be wrong 100% of the time in future so I guess the swing means the predictions are very close to the tipping point between faster and slower which is actually a good sign.
However to stop the predictions falling behind I may need to adjust for progress. From this data it would be fair to say the predictions are getting less accurate (leaving the 50:50 of 2019) and the gap between Beat and Lost (variance) is increasing from 6.17 to 6.34. I`ll need to consider how I manage this but I might just blame you - the mid point I would like to get back to 2019 but the variance is either a failure in the accuracy of the rankings or a lack of consistency from the riders - ultimately the way the system works this may amount to the same thing.
Here`s the graphs:-
Ideally the point should be as sharp as possible to show the predictions have low variance and the legs of equal length to show the predictions are accurate. The red line shows all the riders within 5 secs per mile of their prediction and the percentage is the number of those riders compared with the total field. So we were getting 75% within 5 spm but its now down to 72%. If I adjust the predictions to take the progression of the sport over time hopefully we'll get back to the 2019 graph.