Archive for arXiv:2004.09006

How Reliable Are University Rankings?

Posted in Bad Statistics, Education with tags , on April 21, 2020 by telescoper

I think most of you probably know the answer to this question already, but now there’s a detailed study on this topic. Here is the abstract of a paper on the arXiv on the subject

University or college rankings have almost become an industry of their own, published by US News \& World Report (USNWR) and similar organizations. Most of the rankings use a similar scheme: Rank universities in decreasing score order, where each score is computed using a set of attributes and their weights; the attributes can be objective or subjective while the weights are always subjective. This scheme is general enough to be applied to ranking objects other than universities. As shown in the related work, these rankings have important implications and also many issues. In this paper, we take a fresh look at this ranking scheme using the public College dataset; we both formally and experimentally show in multiple ways that this ranking scheme is not reliable and cannot be trusted as authoritative because it is too sensitive to weight changes and can easily be gamed. For example, we show how to derive reasonable weights programmatically to move multiple universities in our dataset to the top rank; moreover, this task takes a few seconds for over 600 universities on a personal laptop. Our mathematical formulation, methods, and results are applicable to ranking objects other than universities too. We conclude by making the case that all the data and methods used for rankings should be made open for validation and repeatability.

The italics are mine.

I have written many times about the worthlessness of University league tables (e.g. here).

Among the serious objections I have raised is that the way they are presented is fundamentally unscientific because they do not separate changes in data (assuming these are measurements of something interesting) from changes in methodology (e.g. weightings). There is an obvious and easy way to test for the size of the weighting effect, which is to construct a parallel set of league tables each year, with the current year’s input data but the previous year’s methodology, which would make it easy to isolate changes in methodology from changes in the performance indicators. No scientifically literate person would accept the result of this kind of study unless the systematic effects can be shown to be under control.

Yet purveyors of league table twaddle all refuse to perform this simple exercise. I myself asked the Times Higher to do this a few years ago and they categorically refused, thus proving that they are not at all interested in the reliability of the product they’re peddling.

Snake oil, anyone?