THE BRADLEY-TERRY RANKING SYSTEM
Background
The Bradley-Terry ranking system is an alternative ranking method for cricket teams, based on the Bradley-Terry model. I initially developed a system in 2014 for a University project, which has since been converted by my old tutor into an academic paper and published in the IMA Journal of Management Mathematics. The rankings used on this site are derived from an evolution of that system.
What is a Bradley-Terry model?
"The Bradley-Terry model (Bradley and Terry 1952) is a popular method for finding the probabilities for each item in a set being preferred over each of the others, based on repeated pairwise comparisons between the items of the set. It is used to assign a strength parameter to each item in a group, on the basis of a number of pairwise comparisons between those items. An overall ranking may then be formed on the basis of these strengths."
It is essentially an adaptation of a logistic regression model, but instead of predicting whether or not some event will happen, you are predicting which out of two given elements will 'win', with the predictions being based on strength parameters derived from a set of previous results.
It is particularly useful for ranking sports, where each match can be considered as a pairwise comparison, and the strength parameters can be seen as a measure of each competitor. Whilst most sporting competitions, especially at domestic level, follow a league system that simply allocates a certain number of points per win, international sport is often far more sporadic, and therefore needs alternative ways by which to rank its teams, and it is in these scenarios where Bradley-Terry models are effective.
How does Heavy Bail's ranking system work?
Rankings are computed by applying a Bradley-Terry model to all test cricket matches played in the previous four years. Matches from between two and four years previous are subject to exponential decay, meaning that once two years has passed since a result, its weight in the model will gradually decline for a further two years, and then it is removed from consideration entirely.
Outputs of the model are strength parameters, which estimate the ability of each team given the results from the qualification period. These form the basis for our rankings.
Once we have the strength parameters, we then exclude any team that has played fewer than 10 games in the qualification period. The strength parameters are then standardised into ranking points, whereby an average team will have between 95 and 105 ranking points, and more than 20 points either side of this indicates a very poor team or an exceptional team respectively. The total ranking points available is dependent on how many teams qualify for the rankings, and how much variance there is between the quality of these teams.
How does this differ from the ICC rankings system?
A primary motivation behind the ranking system presented here was to provide a statistically generated alternative ranking system that eliminated some of the arbitrary elements used in the ICC system developed by David Kendix. Note this is not necessarily a critique of the ICC system - most of the differences stem from the different goals of the systems. Specifically, the ICC system needs to be able to crown a 'champion' once a year, and needs to be relatively easy to understand. Our system aims to give a 'truer' reflection of the relative abilities of each team at any given time although, it should be noted that both ranking systems produce remarkably similar results despite using very different methodologies. We have applied our model to historical data all the way back to 1950 and it is very rare that there is a disagreement between the two models about who the best team is at the time. Some of the key differences in methodology are:
-
Qualification period
In our system, results from the previous four years are always taken into account. This also means that the moment four years have passed since a match, it is removed from consideration. Whilst the removal of older results does impact the present day ratings, the effect is gradual as matches are effectively dropping off one by one.
The ICC rankings, however, take results from anywhere between the last 36 and 48 months, depending on the time of year. Once a year, in May, all matches from between three and four years ago are removed from their calculation. This annual purge can cause a significant shift in the rankings without any matches being played. There is a practical reason for this - the ICC Test Championship mace is given out every year to the team that tops the rankings at the end of April, and so this method simply removes any result that will no longer have any relevance for determining who wins next years prize.
Unfortunately, Heavy Bail has no such mace to dish out, but it does at least mean we can have a ranking system with a consistent qualification period.
Weighting of older results
There is some similarity in how each systems deal with older results, in that both systems treat all matches in the last two years with 100% weightage.
The ICC rankings deal with older results by simply applying a 50% weight to any result in the first two years of their qualification period. This can cause sudden, seemingly unprompted shifts in the present day when the impacts of significant results from previous years are abruptly reduced. Again, this has a practical reason, as it simply adjusts the weighting of matches each May to the weighting that will be used to compute the ICC Test champion the next April
Heavy Bail's system weights results older than two years based on how long ago they occurred. Once two years has passed since a match, it's impact declines exponentially until it reaches zero a further two years later. This ensures a gradual decline of the impact of older results.
Series win bonus
The ICC system offers bonus ranking points for a series win. This is not considered here.
Dealing with relative strengths
Both systems give greater rewards for getting good results against stronger opponents than weaker ones. The ICC uses a points allocation system, where the number of points that can be gained is in relation to the number of ranking points your opponent has. The Bradley-Terry model is simply generating estimated abilities from match results, and so will automatically increase a team's ability rating when they get a result against a good team - recognising that result as a mark of improved quality.
Another key difference is that the Bradley-Terry model will look at the results in the qualification period holistically. This allows for the significance of older results to alter over time depending how things play out as time goes on.
For example, consider a situation where Bangladesh become a world force overnight, and then in their next series inflict a whitewash on India. In the short term this would badly affect India's rating, and possibly even cause them to fall a spot or two in the rankings. However, as time went on, and Bangladesh won more and more games, the model will recognise that India's defeat was against a stronger Bangladesh side than originally thought and India's rating will gradually recover.
In the ICC rankings, however, the damage would be done for India, as the result will always be seen as a defeat to a low ranked side, whereas future victims of this Bangladeshi juggernaut will suffer less as Bangladesh's rating rises.