“The fault… is not in our stars but in ourselves.”
When William Shakespeare wrote those words, he knew nothing of probability, political polling or Big Data. But that quote expresses as well as any how difficult it is for people to understand predictions. Whether it was the Cubs’ World Series chances or this year’s presidential election, most people do not appreciate two invariable limitations about forecasting the future. First, no matter the precision of the method or the expertise of the pollster, any prediction is based on information that is necessarily incomplete. Second, some underlying assumption used to make the prediction may be flawed.
On the morning of October 30, the Cubs trailed the Indians three games to one in the World Series. Most sources quoted their odds of winning the Series at 10-15%. This created a palpable resignation in the air, reflected in a steep drop-off in ticket prices for Game 5.
As CNN Money reported, “Just hours before the Cleveland Indians’ second shutout at Wrigley Field (Game 4), Game 5 had been crowned the most expensive sporting event in history. Tickets averaged $6,548 — topping even the 2015 Super Bowl, according to ticket search engine TicketIQ. But with Sunday’s Game 5 no longer a series-clinching opportunity for the Cubs, prices have sunk 18 percent to $5,373. It’s now the second-most expensive game behind the Super Bowl. ‘You see emotion really driving prices,’ TicketIQ CEO Jesse Lawrence explained. ‘As people get excited, prices go up. As they get discouraged, prices go down.’”
But estimates of the Cubs chances were based on extremely limited information - specifically, the historical fact that of teams trailing three games to one, only five of forty-four eventually went on to win the World Series. In baseball, a sample size of forty-four is quite small. Experienced observers noted that for Games 5-7 the Cubs had their three best starting pitchers while the Indians had two lesser pitchers for Games 5 and 6, and in Game 7 their ace would have to pitch with three days rest for the second time in the Series. Starting pitchers are, if not the most important, certainly among the most important determinants of who wins any game, so in reality the Cubs chances were better than 15%. Not even, certainly, but not nearly that bad. The players surely realized instinctively there was reason for hope, if not optimism.
In the 2016 presidential election, virtually every pre-election poll had Hillary Clinton winning. The New York Times quoted her odds of winning at 85%, the Huffington Post at 98%. Even well-respected analyst Nate Silver gave her a roughly 70% chance of winning, based on the results of 20,000 simulations using the results of many polls. It is a rule of statistics that the larger the sample size, the more accurate the prediction. Flipping a coin 1000 times will produce a result more likely to approximate 50 heads/50 tails than flipping it ten times.
However, if there is a faulty assumption about the nature of the sample size, the error will be magnified with a large sample. A poorly selected large sample is worse than a well-selected small sample. The classic instance was the 1936 Presidential election when the respected magazine Literary Digest mailed a mock ballot to 10 million people and, based on the results, predicted challenger Alf Landon would easily defeat President Franklin Roosevelt.
The problem was that the magazine based their sample on names in telephone directories, and on magazine and club membership lists. During the Depression, many people did not have phones, and most did not belong to clubs or subscribe to magazines. In addition, 75% of the ballots were not returned so the actual sample size was about 2.4 million, and it was heavily skewed to middle and upper class voters, Landon’s natural constituency. In reality, Roosevelt won 46 of 48 states, and Literary Digest soon went under.
Even with today’s sophisticated polling methods, pollsters can never be sure who will actually turn out to vote; “likely voters” are not certain voters. In addition, although the post-mortem will take months, it is possible many people did not want to say they were voting for Trump because of the perceived stigma of the candidate generally felt to be less socially acceptable.
The psychologist Daniel Kahneman won the Nobel Prize for his work on how people make judgments under conditions of uncertainty. In a 1973 paper he and his partner Amos Tversky (who probably would have shared the Nobel Prize had he not died prematurely) wrote, “Theories of choice are at best approximate and incomplete…choice is a constructive and contingent process. When faced with a complex problem, people…use computational shortcuts and editing operations.”
The fault is indeed not in the stars but ourselves. Combine “computational shortcuts and editing operations” with information that is never complete and assumptions that are riddled with bias, deliberate or not, and it’s easy to see how reality can trump prediction and surprise just about everyone.
Have PoliticalMavens.com delivered to your inbox in a daily digest by clicking here