There are *known knowns*; there are things we *know* that we know. There are *known unknowns*; that is to say, there are things that we now *know* we don't know. But there are also *unknown unknowns* -- there are things we *do not know* we don't know. -- Donald Rumsfeld, U.S. Secretary of Defense, February 12, 2002

A properly constructed Monte Carlo model helps you deal with the *known unknowns* of your problem, by taking advantage of the fact that you are aware of the uncertainty.

Traditional what-if scenarios, such as a trio of “best case” or “worst case” or “best guess” scenarios, are often unhelpful. You know that your best guess is not likely to be exactly right. But you also know things are unlikely to turn out as your best or worse case describes, because to get to either one of those conditions, the stars would have to align and everything would have to go right, or wrong.

What you usually need is some idea of the range of *probable* outcomes, plus some idea of how screwed you will be if one of the improbable outcomes does occur. Using computer power to leverage what you do know against what you don’t, you can develop estimates that are better than a what-if scenario.

Let’s look at a simple example. You have been offered a job, and want to estimate your annual commuting costs. We’ll start with individual “worst case,” “most likely” and “best case” estimates.

Variable | Worst | Likely | Best |

Days per year | 220 | 200 | 180 |

Miles per trip | 22 | 20 | 18 |

MPG | 20 | 25 | 30 |

Average Fuel Cost per G | $4.50 | $4.00 | $3.50 |

Simplistic Estimate | $2178 | $1280 | $756 |

However, this does not give you a sense of the relative probabilities of the different scenarios. A Monte Carlo model of this problem might assume that these variables are normally distributed, with the likely estimate being the mean, and the worst and best cases representing two standard deviations above and below the mean. When the simulation is run, the variables are randomized according to their respective probabilities, and the simulation is repeated thousands of times. Modern desktop computers can accomplish this in seconds.

After a simulation with a sufficiently large number of trials, you would know that the estimated average fuel cost is about $1293, with a standard deviation of $168, which indicates that in 95% of scenarios, fuel costs will be between $957 and $1629. If you run the simulation again, you may get slightly different values, but still more useful values than the simplistic estimates above.

Monte Carlo methods allow us to use the “brute force” of modern computers to gain insights that would otherwise be elusive. Moore’s law ensures that the needed computing power is cheaper with every passing day, and we might as well put that power to use.

With that understanding of the Monte Carlo method, let’s take a closer look at Nate Silver’s model. I'm going to grossly oversimplify, but when you strip away the details, he is simply acknowledging two things about polls that are acknowledged to be unreliable, and accounting for that variability in his model.

Political polling firms produce their results by sampling a subset of the population in a given voting jurisdiction and extrapolating those results as if they described the entire population. If you sample, say, 500 people in a state, and out of that sample, 49% say they intend to vote for one candidate, and 46% say they intend to vote for the other one, is that 49% to 46% result exactly what you’re going to get on election day? Probably not, because you don’t really know if the few people that you asked constitute a representative sample of the whole. Actually it’s more accurate to say that you are sure they are *not* a representative sample; you just aren’t sure *to what degree* the sample was wrong.

There are mathematical methods for identifying the size of a sample needed to achieve a given level of reliability. Let’s say that in this case, a sample size of 500 people lets you say that your results are accurate to plus or minus 2.5%. Your model therefore assigned a probability distribution to that 49% plus or minus 2.5%. Now let us say it’s a normal distribution. Each time the model is run, the model plugs in a random number that is scaled to fit the pattern of a normal distribution. Values clustered around 49% are the most likely, with values as high as 51.5% or as low as 46.5% becoming increasingly unlikely.

Your computer might run that model ten thousand times, randomizing it each time. By itself it doesn’t tell you much, but now consider that your model has done this for every poll, for every state. It has corrected for known biases in the polling methods used by different vendors, insofar as they can be identified from looking at either historical polls versus historical electoral outcomes, or deviation from the average of similar polls conducted around the same time. You now have an idea of how all these various probabilities interact.

In the end, your model based on probability distrubutions does not produce a definitive answer like your “best guess” estimates. Instead, the model's answer is itself a probability distribution, which tells you the range of probable outcomes and their relative likelihood.