John Whitfield

When Margaret Thatcher died in 2013, Times Higher Education asked the former Cambridge vice-chancellor Peter Swinnerton-Dyer, who ran the government’s University Grants Committee in the 1980s, about her approach. ‘The instinct of a woman is to spring-clean,’ he said, ‘and this country needed spring-cleaning, not least the university sector.’ Swinnerton-Dyer oversaw a rechannelling of one particular stream of research funding: the block grant given to universities. Previously, this money had been allocated on the basis of undergraduate numbers, but with budgets falling, the government wanted to concentrate resources on the top performers in a way the sector as a whole would accept.

This was done by tying funding to a measure of past research performance. The UGC asked universities to submit a selection of their research work to its expert subcommittees, whose judgments were used to allocate the money. The idea was to make research funding more transparent and efficient, and, more important, to introduce an element of competition in order to drive up quality. The exercise also had the effect of increasing government involvement in the working lives of often uncooperative academics.

The first research evaluation exercise was carried out in 1986, and there have been seven more since then. The results of the most recent exercise – it’s now called the Research Excellence Framework – were released last May. The assessments are the Thatcher government’s most significant contribution to research policy, and are obsessed over by university leaders, emulated in some countries and studied in many more. They have both reflected and facilitated the UK’s incorporation into a global university system that has grown steadily larger, more international and more competitive, one in which research tends to be done in teams, and which is governed by an expectation that universities must account for their share of public spending and contribute to society and the economy in direct and measurable ways.

The case for evaluation is that as well as bringing a measure of transparency and accountability to government research funding, it has made the UK’s research better and forced its universities to act more strategically. The case against is that star ratings tell us little about the true quality of academic work, that universities are more interested in primping their ratings than knowing where they actually stand, and that too many institutions use the evaluation to discipline staff in ways that make good scholarship harder. In the universities themselves, the biggest complaint is the amount of work the process creates – I’ll get back to that.

The assessments have been a goldmine for a field sometimes called metascience, a data-based approach to understanding how research and researchers work and helping them do it better. In The Quantified Scholar Juan Pablo Pardo-Guerra looks at the fields of anthropology, economics, politics and sociology, and investigates the way universities’ responses to evaluation have shaped where academics work and what they work on, and the effects of this on the academic landscape. He does this by using publication databases to track where researchers work (through their author affiliations) and identifying the content of their research via the computational analysis of abstracts. Research evaluation is, he says, ‘a sort of natural experiment in which we can study how quantification affects knowledge’.

Pardo-Guerra was working in the sociology department at the London School of Economics and Political Science during the REF of 2014. In the evaluation before that, in 2008, sociology had been the lowest-ranked discipline at the LSE, coming twentieth nationally with a grade below the national average. This was not to be repeated. The LSE convened a strategy committee and hired a management consultant to oversee its submissions. Before that, the sociology department asked Pardo-Guerra and his colleagues to nominate their best work for submission; it was then put through internal and external assessment to see whether it passed muster. (In his case, it did. He now works at the University of California, San Diego.)

The LSE’s approach wasn’t unusual. The zero-sum nature of government research funding and the ease with which evaluation can be used to draw up league tables have inevitably resulted in an arms race to produce the highest-rated submissions possible. As well as running internal mock exercises, universities employ staff to manage and optimise submissions. Faculty members serve on departmental and university-wide committees. This is all before the evaluation itself, which relies on academics and other staff seconded to work on and with review panels. If you work in a UK university, the REF never goes away.

The process of evaluation has become more complicated over the years. In 1986, universities submitted five ‘outputs’, mostly publications, along with a four-page written statement, in each of 37 disciplinary areas. In REF 2021 (like the Olympics, it was delayed a year by the pandemic but kept the original date), universities submitted a census of their research workforce and 2.5 outputs per full-time staff member in 34 areas (or ‘units of assessment’). Each output was rated by an expert panel. A one-star rating was given to research ‘recognised nationally in terms of originality, significance and rigour’; four stars were awarded to ‘world-leading’ research. These scores accounted for 60 per cent of the final result. Another 25 per cent was based on a rating of the influence of the research beyond academia – industrial collaborations, policy advice, medical breakthroughs and so on. (The timing of the deadlines meant that Covid-related research has to wait until next time.) The final 15 per cent was based on the quality of the research environment, assessed by things like support for postgraduates and commitment to diversity and inclusion. Both ‘impact’ and ‘environment’, as they are known, require separate submissions, including narrative case studies. Carrying out REF 2014 is estimated to have cost £246 million.

Roughly speaking, the review panels’ judgments are turned into funding allocations by multiplying a university’s scores – its grade-point average – by the number of staff whose work is submitted. In England, only four-star and three-star (‘internationally excellent’) work gets money, split 80:20. Each devolved nation decides its own funding formula, but the variations are minor. Unlike the grants awarded to specific projects, this money – £2 billion a year – comes without strings. This is seen as a major strength of the UK research system, making it more robust and flexible. Core funding helps top up grants, which by design do not cover a project’s full costs, and contributes to big-ticket items such as buildings and fellowship programmes. It also helps to fund sabbaticals and to tide staff over between fixed-term contracts.

A poll of UK academics in November 2021 found that nearly 70 per cent of them considered the REF a net negative. One of the main complaints was that the amount of work it creates outweighs the benefits. (It could be argued, on the contrary, that the REF is good value: the £246 million spent in 2014 amounts to 2.4 per cent of the total given out, while the equivalent figure for administering grant funding is more than 10 per cent. Even so, a quarter of a billion pounds does seem like a lot of money.) One way of reducing the amount of work involved would be to replace some or all of the peer review element with data already collected for other purposes. REF results correlate with bibliometric measures such as publication citation counts, as well as with the amount of grant income won, and even the number of researchers submitted. Ratings in the three different aspects of assessment also track one another. A more automated approach has been considered from time to time – there’s some interest in using AI or machine-learning methods – but while panels are allowed to look at metrics, for the time being they are still expected to read the submitted papers. Citation counts are the best measure of research strength at a large scale – an entire country’s output, say – but they are at best an indirect indicator of quality in individual papers, and they come with all sorts of biases favouring papers in prestigious journals by white men from prestigious institutions. The chance of creating perverse incentives is also high. Tying core funding to grant income, for example, would prompt universities to put even more pressure on staff to chase grants than they do already.

Researchers themselves think that peer review gives the process more legitimacy than any alternative, and it’s true that the results match most people’s idea of the country’s strongest research universities: in REF 2021, Oxford, UCL and Cambridge got the most money, followed by Edinburgh and Manchester. Taking part isn’t compulsory, and smaller institutions focused on teaching have little to gain financially, but in practice a REF submission is seen as a necessary part of being a public university. The data are granular enough to show that pretty much everywhere is doing good work in some area: results day comes with a wave of press releases from universities, all of them boasting they’ve come top in something. (Leicester was rebuked by the Advertising Standards Authority for tweeting that it was ‘number 1 in the UK for overall research quality’ in arts and humanities.)

Aside from the amount of work involved, the other main complaint about the REF is that universities game their submissions. Universities have some scope to decide which staff to submit. Including more people has the potential to bring in more money, but researchers whose work receives fewer than three stars won’t count towards funding, dragging down a submission’s grade-point average. Many universities decide that selectivity is the best tactic. In 2008, when LSE got its disappointing grade in sociology, its submission included 38 staff. In 2014, fewer than 25 staff were included and the department’s GPA rose from 2.4 to 3.1, enough to promote it from twentieth place in the national table to eighth. A bad score can also persuade a university not to bother the next time round: in physics, the number of universities submitting fell from 64 in 1992 to 42 in 2014; in sociology, ten fewer submitted in 2014 than in 2008. It’s possible that sociologists at these places submitted their work as part of other units of assessment, such as politics or anthropology, but it’s also possible that poor scores resulted in redundancies. At the very least, a researcher whose university decided they had not produced enough REF-worthy outputs would see this as a blow to their internal standing and prospects of advancement. Pardo-Guerra shows that for the researchers in his study, not being submitted raised the probability of their moving jobs before the next round of assessment by 39 per cent. It’s unlikely that many of these moves were promotions.

Assessment was never intended to be used this way, and the rules are continually being tweaked to address such unintended consequences. REF 2021 pushed back against selectivity by introducing a requirement that all staff ‘with significant responsibility for research’ be submitted, while also allowing the number of outputs per researcher to vary. This does seem to have made submissions less selective: the number of staff submitted rose by nearly half, from 52,000 to more than 76,000. LSE’s sociology submission went up again to 39 staff, and the number of institutions submitting in sociology also increased to 37, close to where it was in 2008. The overall outcome was to shift money beyond the Oxford-Cambridge-London triangle.

The requirement to submit all active researchers has, however, accelerated the trend towards teaching-only contracts, particularly in large and research-intensive institutions. A move onto such a contract is not a sideways step – teaching has a lower status than research in universities, partly because it isn’t centrally measured or rewarded in the same way. (In 2017, partly to redress this balance, the government launched a Teaching Excellence Framework, but it carries nothing like the same heft.) Getting into the REF submission is a threshold that anyone wanting to do research in a UK university needs to cross early in their career.

Pardo-Guerra’s results show that academic departments have reshaped themselves since the late 1980s. At that time, he writes, UK social science was ‘a world of many islets’, marked by local specialisms. Birmingham was strong in industrial sociology, and Richard Hoggart and Stuart Hall had pioneered cultural studies there. Thirty-odd miles away at Leicester University, Ilya Neustadt and Norbert Elias gave the sociology department a more theoretical and international focus. Evaluation has had the effect of dispersing such centres. A comparison of career mobility data with the content of articles shows that the more similar scholars’ work is to that of others in their department, the more likely they are to move. At a departmental level, the thinking seems to be that evaluation has made specialisation risky, because it increases vulnerability to shifts in academic fashion or the likes and dislikes of a potential reviewer. In the case of LSE’s sociology department, the assessors viewed gaps in its coverage as a weakness: better to do a bit of everything. Nationwide, the results suggest that the most typical departments, whose output provides a snapshot of the discipline as a whole, score highest. The effect has been to make individual departments broader and institutions more alike.

Similarly, the words that academics use and the combinations in which they use them show that fields of study have themselves become more homogeneous. In economics, as many have observed, this means neoclassical, maths-heavy work. Pardo-Guerra gives the example that when economists used to write about risk they were often referring to general life-hazards such as obesity. Today, ‘risk’ tends to appear in conjunction with words such as ‘portfolio’ and ‘equity’, indicating that they are writing about finance. He finds the same trend in other social sciences, noting that ‘sociology in Britain today has the same degree of structural homogeneity as economics did in the mid-1980s.’

These changes aren’t solely the consequence of national evaluations. The Quantified Scholar covers a period during which research careers and collaborations have become more international and increasingly subject to global currents, the most powerful being ‘publish or perish’. Pardo-Guerra shows that in the 1980s the average social scientist in the UK produced fewer than two peer-reviewed articles per assessment period; by the late 2010s, they were producing nearly five. And the top journals have a worldwide pull: in economics, they are based in the United States and are neoclassical in orientation. Such factors would have shaped the efforts of UK academics even in the absence of government assessment. Germany has nothing like the REF, but after reunification in 1990 researchers in the East – newly thrust into a culture of publish or perish – soon caught up with the output and citation impact of their western counterparts.

Of course, any effort to compare and rank research quality using a star rating will unavoidably favour particular criteria. After individuals and institutions identify these criteria – once they know what excellence looks like – they will focus their efforts on meeting them or, at least, on representing their work as if they are doing so. Trying something different will seem rash. This goes for form as well as content: in the 1996 exercise, journal papers accounted for 62 per cent of submitted outputs; by 2021, this figure had risen to 82 per cent.

In many ways, the UK’s research is in good shape: in REF 2021, more than 80 per cent of the work submitted was rated three or four-star. Papers with authors at UK universities lead the world in average citation-impact, and in brute measures of research quantity and citation counts the UK is holding its own against emerging nations such as China and doing as well as or better than the US and continental Europe. There’s an appeal to the idea of social scientists in different parts of the country turning out work rooted in local contexts and traditions, as if they were cheesemakers, but that may not be the best way to maximise the return on limited public research funds for the nation as a whole. While it’s difficult to know what has been lost in this shift – Pardo-Guerra doesn’t judge – it seems unlikely to be nothing. Is the best system one with a lot of institutions that are good at the same things? Or would it be better to aim for a portfolio of different strengths, knowing that some will pay off more than others? If the latter, how would that affect the evaluation process? Judgments handed down in the form of the average mark given by a review panel do not distinguish between work that gets a mixture of great and lousy scores because it provokes strong opinions – you’d think there might be a place for such work – and stuff that gets a middling response across the board. Adding impact and environment measures to the REF may have broadened the view of excellence and increased the incentives to produce alternative types of research, but the creeping homogeneity that Pardo-Guerra reveals seems like a very difficult thing to engineer out.

Should the issues addressed by metascience seem abstruse, just think of the criticisms levelled at economists in the wake of the financial crisis of 2008. Their failure to foresee the crash was understood partly as a consequence of groupthink (to judge from Pardo-Guerra’s data, UK economics departments have become even more homogeneous since then). Research programmes can pivot quickly when they need to – the UK had been focused on a possible flu pandemic and was able to adapt when the coronavirus came instead – but a discipline with a wide range of approaches and outlooks is more likely to possess the tools it needs when the next crisis comes along.

More search Options

Browse by Subject

A Bit of Everything

Replication Crisis: Shoddy Papers

Don’t flush the fish: The End of the Coral Reef?

Get the Mosquitoes! selfish genes

Letters

send letters to

Download the LRB app

Sign up to our newsletter

Please enable Javascript