How many jellybeans?

David Runciman

Profiles, Probabilities and Stereotypes
by Frederick Schauer.
Harvard, 359 pp., £19.95, February 2004, 0 674 01186 4Show More

The Wisdom of Crowds: Why the Many are Smarter than the Few
by James Surowiecki.
Little, Brown, 295 pp., £16.99, June 2004, 0 316 86173 1Show More

Most of us, most of the time, are deeply prejudiced in favour of individual over collective judgments. This is hardly surprising, since we are all biased. First, we are biased in favour of our own opinions, which we tend to prefer to those of anyone else. Second, we are biased in favour of individuals generally, because we are all individuals ourselves, and so are broadly sympathetic to the individual point of view. We like to think of people exercising their personal judgment, and not just blindly following the rules. For example, who wouldn’t prefer, when appearing before a judge, to learn that the judge was willing to hear each case on its merits, and exercise some discretion if necessary? General rules, we think, are likely to be discriminatory, because they cannot take account of special circumstances. Individuals, by contrast, can use their own judgment, and make exceptions.

However, as Frederick Schauer argues in his excellent book, though we are right to suspect that all general rules are discriminatory, we are wrong to suppose that it is therefore better to trust to individuals. This is because no individual is truly capable of judging each case on its merits; individuals simply bring their own personal generalisations to bear on the case in question. These are likely to be just as crude and inflexible as the mandatory guidelines of some far distant committee. It is easy to forget this. Because general rules are cold and impersonal, the injustices they inevitably create seem crueller and more pernicious. Schauer gives various examples, including the cases of perfectly friendly dogs put down because they fell foul of the terms of dangerous dog regulations, or airline pilots forced by United States law to retire at the age of 60, regardless of their proficiency, expertise and general good health. It is not only airline pilots who think this latter rule absurd. The first commercial flight into space was piloted this June by a 63-year-old who had been forced to quit his day job as a domestic airline pilot. When he appeared on the Jay Leno show to publicise this feat, he pointed out that he was available to fly in space only because the government had deemed him unfit to fly underneath it. How the audience laughed at this spectacle of a bureaucracy gone mad, that a man should have been reduced by ageism to becoming an astronaut.

Because of glaring examples such as this, it is tempting to think that there must be some way of deciding who is and who is not fit to fly that would not be so inflexible and discriminatory. Why not test pilots when they reach 60, for example, to see if they are fit to carry on? But, as Schauer points out, specialised tests rely no less on generalisations than general rules. A blood pressure test designed to ascertain the likelihood of sudden incapacitation will seem discriminatory to anyone who falls below its arbitrary threshold and then goes on to enjoy thirty more years of healthy existence. So why not dispense with formal tests, and simply get individual doctors, perhaps in conjunction with other experienced pilots, to exercise their own judgment in assessing a person’s fitness to carry on after the age of 60? The answer is that these individuals will simply be applying their own generalisations, and these are likely to be as arbitrary as any. Some doctors will be particularly alert to incipient heart trouble; others will be on the look out for mental instability; some pilots will simply prefer those of their colleagues who remind them of themselves (I like a drink now and then, and I can still fly, so why not this guy?). These are the very reasons we have general rules in the first place, to save us from the arbitrary judgments of individuals.

This does not mean that it is impossible to argue against general rules on the grounds that they could be fairer – a law that forced pilots to retire when they went bald, or converted to Islam, would clearly be grotesque. What it does mean, though, is that it does not make sense to argue against general rules on principle, simply because they are general. What matters is to distinguish between what Schauer calls spurious and non-spurious generalisations. Spurious generalisations include those where there is no statistical correlation (as in the case of baldness and flying ability), as well as those where the correlation is irrelevant (it is true that in recent years a high percentage of aircraft to crash in the United States have been piloted by Muslims, but the non-spurious fact is that they were also piloted by terrorists, who should certainly be prevented by law from flying aeroplanes).

What matters is to decide which non-spurious generalisations are useful in making the general rules. By no means all of them are useful. For example, it is almost certainly true that there are non-spurious generalisations to be made about the likelihood of certain types of passengers posing a threat to airline security, on the basis of age, gender and ethnic background – the likeliest terrorists are young males of Middle Eastern origin. But it does not follow that young males of Middle Eastern origin should be singled out as a general rule for special attention from security officers. This is because individual security officers are likely to have devised such a rule for themselves, which they would apply, and probably over-apply, regardless. More useful would be guidelines that identified other dangerous groups who might be missed (other kinds of terrorist, other kinds of criminal, other kinds of security threat). Once it is clear that all judgments, including all individual judgments, rest on various kinds of generalisation, rules can be devised that take account of the inadequate generalisations of individuals.

Schauer provides some convincing arguments against different forms of racial profiling. What he does not provide, quite explicitly, are any arguments against racial profiling on principle, because there aren’t any. To argue, as the Gore Commission did when it considered the question of racial profiling, that considerations of risk must never be based on ‘stereotypes or generalisations’, is worse than useless (worse because it panders to a sloppy idealisation of individual discretion). How did they think that you could make risk assessments in the absence of stereotypes and generalisations? Much more important is to do what Schauer does, and establish that questions of profiling should not be judged in risk assessment terms alone. One good reason for rejecting even non-spurious racial profiling is that certain forms of discrimination can be stigmatising and counterproductive. Young Arab men pulled out of airline queues are likely to suffer more than airline pilots thrown off the job at 60. But they are not suffering more because they are being stereotyped (both are being stereotyped); nor are they suffering more because it is a spurious stereotype (it is not). They are suffering more because the stereotype is already widespread, and doesn’t need institutional reinforcement. Rather, it needs to be compensated for. Grizzled old airline pilots, by contrast, would probably be encouraged by their grateful passengers to fly on into their eighties if the law didn’t take a firm hand.

Schauer suggests that we should all toughen up about stereotyping, accept it as an inevitable fact of life, and instead of trying to avoid it, concentrate on coming up with the best stereotypes we can. He does not claim, however, that because sweeping generalisations are unavoidable in law and public life, no individuals using their own judgment could outperform the rulebook. For instance, it is probably true that skilled and sensitive judges, well versed in all the vagaries of the human predicament, would do better if they exercised some discretion when sentencing than if they followed a fixed grid that insisted on particular sentences for particular crimes. Mandatory life sentences for murder under English law seem a particularly egregious example of excessive rigidity, since in some cases of murder there are, of course, extenuating circumstances. The problem, though, is knowing who the really skilled judges are. Most judges tend to think that it is them, which is why judges as a whole tend to oppose mandatory sentencing regulations. ‘It is possible,’ Schauer writes,

that the judges are right, and that the judgment of judges as a whole (itself a generalisation) will produce fewer mistakes than would be produced by the systematic application of the guidelines. But in determining whether this is true, it may not be the best strategy to listen only to the judges, for it should come as no surprise that judges, just like carpenters, police officers, customs officials and university professors, are hardly the best judges of the frequency and magnitude of their own errors.

Even if the guidelines make more mistakes than the best judges, the fact that not all judges are the best is a good reason for having sentencing guidelines.

An even stronger case can be made against an automatic presumption in favour of personal judgments, and this is the one put by James Surowiecki in The Wisdom of Crowds. Surowiecki points out that, in a surprisingly wide variety of circumstances, even the best individual can be outperformed by the impersonal group. Take the trivial instance of trying to guess the number of jellybeans in a jar. If you conduct an experiment with a reasonably large group (say thirty or more) and a reasonably large number of jellybeans (say a few hundred), you will find that the average guess is almost certain to be closer to the truth than your own personal guess. Moreover, the average guess is likely to be closer to the truth than the guess of the person in the group who is judged in advance the best jellybean guesser (because this is a trivial case, you might be quite prepared to accept that you are not yourself very good at this kind of thing). In fact, there is a reasonable chance that the average guess of the group will be closer to the truth than the best guess of any single individual. In other words, the judgment of the group may be better than the judgment of all the individuals within it, even though the judgment of the group is solely determined by the judgment of its individual members. This is, to say the least, a striking thought.

Surowiecki shows that this insight can be applied in all sorts of non-trivial ways. Professional gamblers who make money out of horseracing do so by having more detailed knowledge and better individual judgment than the average punter. However, no gambler can outperform over time the final market on a horserace (the ‘starting price’), which is shaped by the collective judgment of everyone who bets on the race, however ignorant, foolish or cavalier. The only way to make money on the horses is to bet early (to ‘take a price’) before the hoi polloi have had their say. This seems paradoxical. After all, when you bet early you are pitting your judgment against that of the bookmaker, who is also likely to be much better informed than the crowd. When you bet late, you are betting against the crowd, to whom the bookmakers have in the end to surrender their judgment. But in the long run, the crowd will win. In ignorance there is indeed a kind of strength.

The first half of this book, in which Surowiecki itemises the dangers of supposing that it is always best to listen to the smartest person in the room, is electrifying. In all sorts of ways, our prejudices in favour of individual genius and dynamic leadership, and against collective mediocrity, turn out to be unfounded. But the second half of the book, in which Surowiecki applies these lessons to a series of case studies, is a bit of a disappointment. Surowiecki is the business columnist for the New Yorker, and these chapters read a bit like warmed-up New Yorker stories, dealing with the trials and tribulations of different businesses and management styles during the booms and bubbles of the last ten years. Only at the very end, in a rather half-hearted chapter, does he turn to consider what seems to me the obvious application of his ideas, which is in the domain of politics. Doesn’t the superiority of crowds to individuals provide a very strong justification for democracy?

It might seem odd to think that democracy stands in need of justification, given its current status as the world’s favourite political idea. But, in fact, political theorists have always found it hard to explain exactly what it is about democracy that is so great. The difficulty can be summarised like this. There are, broadly speaking, two potentially strong defences of democracy, one of which focuses on people’s preferences, the other on their cognitive capacities. The preference-based approach insists that democracy is the best way of finding out what people want. It doesn’t matter whether democratic decisions are right or wrong; what matters is that there is no other plausible way to track the desires of the majority. The problem with this is that it has become increasingly clear over the last fifty years, since the pioneering work of Kenneth Arrow, that there is no simple way to discern the preferences of the majority. All majoritarian voting procedures turn out to be vulnerable to various inconsistencies and contradictions, whenever there are more than two options to choose from.

A cognitive defence of democracy, by contrast, argues that democracies really do provide the likeliest means of making the right political choices, because only democracies allow for the diversity of opinion and freedom of information on which correct decision-making depends. John Stuart Mill’s Considerations on Representative Government is probably the most lucid defence of democracy in these terms. The difficulty here, however, is the widely accepted ignorance and fickleness of the masses. Cognitive defences of democracy tend to put the emphasis on elite forms of representation and a ‘filtering’ of public opinion, in order to protect political decision-making from the unthinking preferences of the general public; as a result, they often sound distinctly undemocratic. Mill, for instance, favoured a ‘plural’ voting system under which individuals got more votes depending on their intelligence, as judged by their occupation – more for barristers than tradesmen, more for tradesmen than foremen, more for foremen than labourers.

What, though, if the ignorance of the masses turns out to be not a weakness but a strength? This sounds odd, but Surowiecki provides ample evidence of the perils of ‘filtering’ opinion in order to prioritise elite judgment. When decision-making devolves onto a small group of self-consciously well-informed individuals, it is all too likely that they will lead each other astray, trusting too much in their own judgment and reinforcing each other’s prejudices. Take one recent example: the Iraq war. (I know, I know, we have to move on sometime, but this is important.) Tony Blair went to war in the face of widespread (though by no means universal) public scepticism. He justified this course of action on two grounds. First, it was his job to take a lead, even if the public did not like it. The implication here was that the public were against the war because, perfectly understandably, most people prefer not to go to war; but democratic politicians cannot always be guided by popular preferences. The second justification was that Blair, and his security services, knew more about the nature of the threat posed by Saddam than the general public, because they had access to much more information. It now turns out that this information was mostly wrong. Nevertheless, Blair can argue, no one knew any better at the time.

But maybe the public did know better at the time. Perhaps the ignorant masses were actually better equipped to assess the nature of the risk than the experts. It is true that very few people could have said with any degree of certainty that Saddam had no WMD. But then very few people could say with any degree of certainty how many jellybeans there are in the jar. It is also true that different people opposed the war for all sorts of different reasons, many of them pretty unconvincing in their own terms (Saddam’s not that bad, we sold him most of his weapons, if Blair’s for it I’m against it, George Bush is an idiot etc). But it is this very diversity of opinion that may explain why the public had a better overall idea of what was going on than those in the closed, secretive, hothouse worlds of Downing Street (and Washington). Although no member of the crowd could claim to be as well informed as the experts, the crowd knew what the specialists did not: Saddam was not an immediate threat.

This does not mean that the crowd is always right. As Surowiecki explains, large groups are only good at making decisions under fairly specific conditions. The members of the group must be willing to think for themselves, they must be more or less independent of each other, and the group itself should be reasonably decentralised. There must also be some means of aggregating different opinions into a collective judgment. When people start second-guessing each other, when they follow each other blindly, when they start looking for central direction, the crowd turns into a herd, and herds are notoriously bad at making decisions. But in the case of the public response to the Iraq war, Surowiecki’s conditions seem pretty much to have held. These conditions are strikingly similar, whether Surowiecki knows it or not, to the ones set out by Rousseau in The Social Contract for determining if a people is capable of governing itself. Here is Surowiecki:

If you ask a large enough group of diverse, independent people to make a prediction or estimate a probability, and then average those estimates, the errors each of them makes in coming up with an answer will cancel themselves out. Each person’s guess, you might say, has two components: information and error. Subtract the error, and you’re left with the information.

Here is Rousseau:

There is often a great difference between the will of all [what all individuals want] and the general will; the general will studies only the common interest while the will of all studies private interest, and is indeed no more than the sum of individual desires. But if we take away from these same wills the plusses and minuses which cancel each other out, the balance which remains is the general will.
From the deliberations of a people properly informed, and provided its members do not have any communication among themselves, the great number of small differences will always produce a general will, and the decision will always be good.

The many political theorists who have given some thought to this passage have usually come to the conclusion that Rousseau was either being horribly naive or wilfully obscure. Certainly that was my view. But now, having read Surowiecki, I am not so sure.

Of course, Surowiecki’s terms are much more modest than Rousseau’s. There is no talk about a general will or a common good; rather, it is simply a matter of probabilities, estimates and errors. Surowiecki accepts that crowds are no better than individuals at making moral judgments, because moral judgments are nothing like guessing the number of jellybeans in a jar. Crowds do not do well when the question is not a straightforwardly cognitive one. Nevertheless, a large number of the most important questions now facing the world are problems of cognition – we need to know what we are up against, in order to know how to allocate our resources. Take the question of whether terrorism is more of a threat than global warming. The best way to answer this question would be to know what is likely to happen in the medium term. If there is a good chance that terrorists will get hold of nuclear weapons and use them, terrorism would seem to constitute the greater threat. But how can we know? One possibility would be to ask the experts and follow their advice. The problem here is that the experts rarely agree, which makes it hard to know which experts to trust. A second alternative would be to see the disagreement of experts as a reason to ask the politicians to exercise their judgment (this is the Blairite option).

A third possibility would be to ask a large group of people that included not just experts and politicians but also members of the public to give it their best guess. This could be done by creating a kind of terrorism futures market, in which a wide range of individuals are invited to estimate the probability of certain eventualities – say, a large-scale nuclear strike over the next decade, or a civil war in Iraq, or even something as specific as the assassination of Iyad Allawi or John Negroponte. In order to guarantee that everyone really gives it their best shot, players could even be encouraged to gamble on the outcome (as Surowiecki points out, betting exchanges have consistently outperformed both academic experts and polling organisations in predicting the outcome of American presidential elections). This method would have a number of advantages over relying on the usual closed networks of spies, politicians and bureaucrats. It would be open to all, it would be decentralised, it would be leaderless, and it wouldn’t suck up to anybody. It would pool all available information and intuition, regardless of the source, and turn it into specific predictions. These predictions might not be right, but they are more likely to be right than the best estimate of any named individual.

There is one obvious disadvantage to this scheme. Most of us would feel pretty appalled at the thought of anyone gambling on real-life death and destruction. When, in 2003, the Defense Advanced Research Projects Agency in the US tried to set up just such a terrorism futures market (called PAM, or the Policy Analysis Market), the predictable reaction was one of outrage, which killed it stone dead. Two senators, Ron Wyden and Byron Dorgan, led the assault, calling PAM ‘harebrained’, ‘offensive’, ‘useless’ and ‘morally wrong’. They were, presumably, seeking to articulate the gut instincts of ordinary Americans, for whom the idea that US foreign policy might be dictated by a bunch of speculators backing someone to take out Yasser Arafat would be hard to stomach. It is hard to stomach. But the fact remains that such a market can provide information that is simply not available anywhere else. If US/ British policy in Iraq had been guided by the gambling instincts of the American and British people, who did not think the war was worth the risk, rather than the gambling instincts of Bush and Blair, who did, then it would have been founded on the sounder intelligence. This is why it is Wyden and Dorgan who were ‘morally wrong’. After all, what is the point of being a member of the most elitist democratic institution in the world – the United States Senate – if you are not prepared to stand up to the gut instincts of the masses, and do the right thing?

David Runciman

David Runciman teaches politics at Cambridge. His books include Political Hypocrisy: The Mask of Power, from Hobbes to Orwell and Beyond, How Democracy Ends and Confronting Leviathan: A History of Ideas. He has written more than a hundred pieces for the LRB on subjects including Lance Armstrong, gambling, all three volumes of Charles Moore’s biography of Thatcher, Donald Trump’s election and his defeat. He is the host of the podcast Past Present Future.

Letters

Vol. 26 No. 17 · 2 September 2004

In his review of James Surowiecki's The Wisdom of Crowds (LRB, 5 August), David Runciman misses an important point about the use of experts. The obvious requirements for choice of sample, independence and so on are only necessary conditions, not sufficient ones. A properly random sample of people can be completely wrong about something if they are all working on the same incorrect assumption. Consider the jellybean example: if there is a large glass sphere hidden among the jellybeans, then you will get a normal distribution of guesses centred on the wrong mean. An expert – someone who knows about the glass sphere – will be able to make a much more accurate guess.

If you want to determine the relative usefulness of experts and crowds in a particular situation, you need to consider the relationship between them, and the ability of the crowd to make reasonable guesses. There are situations in which there are no real experts, such as an unfixed jellybean contest; situations in which everyone is an expert; and situations in which there are real experts. A good example of the second type of scenario is betting on horses. Ask a thousand random people to look at the form and then bet on a race and most professional gamblers will be happy to bet according to the crowd's decision.

A simple example of the third type of situation is the well-known doubling problem: ask a thousand randomly chosen people how much money you'll have in a month's time if someone gives you a penny today, two tomorrow and so on. The group mean will almost certainly be an answer that's much too low; but ask an expert (anyone who understands basic exponential functions) and you'll get the right answer. The run-up to the war in Iraq was probably one of these situations. The point as regards Iraq is not that the crowd somehow knew better than the experts, but that the experts lied.

Jeffrey McGowan
Glastonbury, Connecticut

It is a fundamental principle of experiment that, if there’s no bias, experimental error will be randomly distributed. That’s why an experimenter will repeat his experiment, and average the results. The average, with experimental error roughly cancelled out, will be more accurate than a single measurement. It should be no surprise that this applies to guessing the number of jellybeans in a jar. If, however, we asked people to guess the number of jellybeans in a jar labelled ‘1000 jellybeans’, the average guess would be in the neighbourhood of 1000. The label introduces a bias into the experiment.

Runciman proposes that matters of public policy be submitted to ‘a large group of people … to give it their best guess’. But while we can easily find unbiased jellybean counters, I would not know where to look for unbiased policy-makers. Had such a process been applied to the decision to invade Iraq last year, the American people would have approved it (Runciman is wrong to suggest the opposite). Public opinion last year was biased (in the experimental sense) by the overwhelmingly one-sided propaganda favouring invasion. We see through this now, but all the averaging in the world could not have saved us then.

Clifford Story
Murfreesboro, Tennessee

‘Taken individually,’ Rachmaninov once said, ‘the people in an audience may be poor critics of music, but as a complete body, the audience never errs.’

Michael Scott
Mauritius

send letters to

The Editor
London Review of Books
28 Little Russell Street
London, WC1A 2HN

letters@lrb.co.uk

Please include name, address and a telephone number

More search Options

Browse by Subject

How many jellybeans?

David Runciman

But how? Capitalist Democracy

A Mess of Their Own Making: Twelve Years of Tory Rule

Short Cuts: At Blair’s Gathering

Letters

send letters to

Download the LRB app

Sign up to our newsletter

Please enable Javascript