How has Bayes’ Rule changed the way I think?

People talk about how Bayes’ Rule is so central to rationality, and I agree. But given that I don’t go around plugging numbers into the equation in my daily life, how does Bayes actually affect my thinking?
A short answer, in my new video below:



(This is basically what the title of this blog was meant to convey — quantifying your uncertainty.)

RS episode #53: Parapsychology

In Episode 53 of the Rationally Speaking Podcast, Massimo and I take on parapsychology, the study of phenomena such as extrasensory perception, precognition, and remote viewing. We discuss the type of studies parapsychologists conduct, what evidence they’ve found, and how we should interpret that evidence. The field is mostly not  taken seriously by other scientists, which parapsychologists argue is unfair, given that their field shows some consistent and significant results. Do they have a point? Massimo and I discuss the evidence and talk about what the results from parapsychology tell us about the practice of science in general.

Coach Smith’s Gutsy Call

Coach Mike Smith was facing a tough decision. His Falcons were in overtime against the division-rival Saints. His team had been stopped on their own 29 yard-line and were facing fourth down and inches. Should he tell his players to punt, or go for it? A punt would be safe. Trying to get the first down would be the high-risk, high-reward play. Success would mean a good chance to win, failure would practically guarantee a loss. What play call would give his team the best chance to win?

He decided to be aggressive. He called for star running back Michael Turner to try pounding up the middle of the field.

It failed. The Saints were given the ball in easy range to score, and quickly did so. The media and fans criticized Smith for his stupid decision.

But is the criticism fair? If the play call had worked, I bet he would have been praised for his guts and brilliance. I think my favorite reaction came from ESPN writer Pat Yasinskas:

When Mike Smith first decided to go for it on fourth-and-inches in overtime, I liked the call. I thought it was gutsy and ambitious. After watching Michael Turner get stuffed, I changed my mind. Smith should have punted and taken his chances with his defense.

What a perfect, unabashed example of Outcome Bias! We have a tendency to judge a past decision solely based on the result, not on the quality of the choice given the information available at the time.

Did Coach Smith know that the play would fail? No, of course not. He took a risk, which could go well or poorly. The quality of his decision lies in the chances of success and the expected values for each call.

Fortunately, some other people at ESPN did the real analysis, using 10 years of historical data of teams’ chances to win based on factors like field position, score, time remaining, and so on:

Choice No. 1: Go for the first down

…Since 2001, the average conversion percentage for NFL teams that go for it on fourth-and-1 is 66 percent. Using this number, we can find the expected win probability for Atlanta if it chooses this option.

* Atlanta win probability if it converts (first-and-10 from own 30-yard line): 67.1 percent
* Atlanta win probability if it does not convert (Saints first-and-10 from Falcons’ 29-yard line): 18 percent.
* Expected win probability of going for the first down: 0.660*(.671) + (1-.660)*(.180) = 50.4%

Choice No. 2: Punt

* For this choice, we will assume the Falcons’ net punt average of 36 yards for this season. This means the expected field position of the Saints after the punt is their own 35-yard line. This situation (Saints with first-and-10 from their 35, in OT, etc.) would give the Falcons a win probability of 41.4%.

So by choosing to go for it on fourth down, the Falcons increased their win probability by 9 percentage points.

That’s a much better way to evaluate a coach’s decision! Based on a simple model and league averages (there are problems with both of those, but they’re better than simply trusting outcome!) the punt was not the best option. Smith made the right decision.

Well, sort of. There are different ways to go for the fourth-down conversion, and according to Brian Burke at AdvancedNFLStats, Smith chose the wrong one:

Conversion success rates on 1-yd to go runs (%)

Position 3rd Down 4th Down
FB 77 70
QB 87 82
RB 68 66
Total 72 72

In these situations, quarterback sneaks have proven much more effective than having your running back take the ball. In a perfect game-theory world, defenses would realize their weakness and focus more effort on stopping it. But for now, it remains something more offenses teams can exploit. According to the numbers, the Falcons probably could have made a better decision.

And, of, course, it was OBVIOUS to me at the time that they should have called a quarterback sneak. </hindsight bias>

A Sleeping Beauty paradox

Imagine that one Sunday afternoon, Sleeping Beauty is taking part in a mysterious science experiment. The experimenter tells her:

“I’m going to put you to sleep tonight, and wake you up on Monday. Then, out of your sight, I’m going to flip a fair coin. If it lands Heads, I will send you home. If it lands Tails, I’ll put you back to sleep and wake you up again on Tuesday, and then send you home. But I will also, if the coin lands Tails, administer a drug to you while you’re sleeping that will erase your memory of waking up on Monday.”

So when she wakes up, she doesn’t know what day it is, but she does know that the possibilities are:

  • It’s Monday, and the coin will land either Heads or Tails.
  • It’s Tuesday, and the coin landed Tails.

We can rewrite the possibilities as:

  • Heads, Monday
  • Tails, Monday
  • Tails, Tuesday

I’d argue that since it’s a fair coin, you should place 1/2 probability on the coin being Heads and 1/2 on the coin being  Tails. So the probability on (Heads, Monday) should be 1/2. I’d also argue that since Tails means she wakes up once on Monday and once on Tuesday, and since those two wakings are indistinguishable from each other, you should split the remaining 1/2 probability evenly between (Tails, Monday) and (Tails, Tuesday). So you end up with:

  • Heads, Monday  (P = 1/2)
  • Tails, Monday (P = 1/4)
  • Tails, Tuesday  (P = 1/4)

So, is that the answer? It seems indisputable, right? Not so fast. There’s something troubling about this result. To see what it is, imagine that Beauty is told, upon waking, that it’s Monday. Given that information, what probability should she assign to the coin landing Heads? Well, if you look at the probabilities we’ve assigned to the three scenarios, you’ll see that conditional on it being Monday, Heads is twice as likely as Tails. And why is that so troubling? Because the coin hasn’t been flipped yet. How can Beauty claim that a fair coin is twice as likely to come up Heads as Tails?

Can you figure out what’s wrong with the reasoning in this post?

RS #47: The Search for Extra-Terrestrial Intelligence

In the latest episode of Rationally Speaking, Massimo and I spar about SETI, the Search for Extra-Terrestrial Intelligence: Is it a “scientific” endeavor? Is it worth maintaining? How would we find intelligent alien life, if it’s out there?

My favorite parts of this episode are the ones in which we’re debating how likely it is that intelligent alien life exists. Massimo’s opinion is essentially that we have no way to answer the question; I’m less pessimistic. There are a number of scientific facts which I think should raise or lower our estimates of the prevalence of intelligent alien life. And what about the fact of our own existence? Does that provide any evidence we can use to reason about the likelihood of our ever encountering other intelligent life? It’s a very tricky question, fraught as it is with unresolved philosophical problems in probability theory, but a fascinating one.

RS #47: The Search for Extra-Terrestrial Intelligence

Game theory and basketball

Ben Morris is a friend-of-a-friend of mine who recently competed in a contest sponsored by ESPN called “Stat Geek Smackdown,” in which the goal was to correctly predict as many of the NBA playoff games as possible. For each correct guess, a contestant received 5 points.

Heading into the final game between Miami and Dallas, Ben was in second place, trailing just 4 points behind a veteran stat geek named Ilardi. By most estimates, Miami had about a 63% chance of beating Dallas. But Ben realized that if he and Ilardi both chose Miami, then even if Miami won the game, Ilardi would still win the competition, because he and Ben would each get 5 points and the gap between their scores would remain unchanged. In order for Ben to win the competition, he would have to pick the winning team and Ilardi would have to pick the losing team.

So that created an interesting game theory problem: If Ben predicted that Ilardi would pick Miami, since they were more likely to win, then Ben should pick Dallas. But if Ilardi predicted that Ben would be reasoning that way, then Ilardi might pick Dallas, knowing that all he needs to do to win the competition is to pick the same team as Ben. But of course if Ben predicts that Ilardi will be thinking that way, maybe Ben should pick Miami…

What would you do if you were Ben? You can read about Ben’s reasoning on his excellent blog, Skeptical Sports, but here’s my summary. Ben essentially had two options:

(1) His first option was to play his Nash equilibrium strategy, which is a concept you might recall if you ever took game theory (or if you saw the movie “A Beautiful Mind,” although the movie botched the explanation). That’s the set of strategies (Ben’s and Ilardi’s) which gives each of them no incentive to switch to a new strategy as long as the other guy doesn’t. The Nash equilibrium strategy is especially appealing if you’re risk averse because it’s “unexploitable,” meaning that it gives you predictable, fixed odds of winning the game, no matter what strategy your opponent uses.

In this case — and you can read Ben’s blog for the proof — the Nash equilibrium is for Ben to pick Miami with exactly the same probability as Miami has of losing (0.37) and for Ilardi to pick Miami with exactly the same probability as Miami has of winning (0.63). (You might wonder how you should pick a team “with X probability,” but it’s pretty easy: just roll a 100-sided die, and pick the team if the die comes up X or lower.)

If you do the calculation, you’ll find that playing this strategy — i.e., rolling a hundred-sided die and picking Miami only if the die came up 37 or lower — would give Ben a 23.3% chance of beating Ilardi, no matter how Ilardi decided to play. Not terrible odds, especially given that this approach doesn’t require Ben to make any predictions about Ilardi’s strategy. But perhaps Ben could do better if he were able to make a reasonable guess about what Ilardi would do.

(2) That leads us to option two: Ben could abandon his Nash equilibrium strategy, if he felt that he could predict Ilardi’s action with sufficient confidence. To be precise, if Ben thinks that Ilardi is more than 63% likely to pick Miami, then Ben should pick Dallas.

Here’s a rough proof. Call “p” the likelihood that Ilardi picks Miami, and “q” the likelihood that Ben picks Miami. Then we can assign probabilities to each of the outcomes in which Ben wins:

Since the two outcomes are mutually exclusive, we can add up their probabilities to get the total probability that Ben wins, as a function of p and q:

Probability Ben wins = .37p + .63q – pq

Just to illustrate how Ben’s chance of winning changes depending on p, I plugged in three different values of p to create three different lines: For the black line, p=0.63. For the red line, p < 0.63 (to be precise, I plugged in p=0.62, but any value of p<0.63 will create an upward sloping line). For the blue line, p > 0.63 (to be precise, I plugged in p=0.64, but any value of p>0.63 will create a downward sloping line).

If p = .63, that renders Ben’s chance of winning constant ( .233) for all values of q. In other words, if Ilardi seems to be about 63% likely to pick Miami, then it doesn’t matter how Ben picks, he’ll have the same chance of winning (23.3%) as he would if he played his Nash equilibrium strategy.

If p > .63, Ben’s chance of winning decreases as q (his probability of choosing Miami) increases. In other words, if Ben thinks there’s a greater than 63% chance that Ilardi will pick Miami, then Ben should pick Miami with as low a probability as possible (i.e., he should pick Dallas).

If p < .63, Ben’s chance of winning increases as q (his probability of choosing Miami) increases. In other words, if Ben thinks there’s a lower than 63% chance that Ilardi will pick Miami, then Ben should pick Miami with as high a probability as possible (i.e., he should pick Miami).

So what happened? Ben estimated that Ilardi would pick Miami with greater than 63% probability. That’s mainly because most people aren’t comfortable playing probabilistic strategies that require them to roll a die —  people will simply “round up” in their mind and pick the team that would give them a win more often than not. And Ben knew that if he was right about Ilardi picking Miami, then Ben would end up with a 37% chance of winning, rather than the 23.3% chance he would have had if he stuck to his equilibrium strategy.

So Ben picked Dallas. As he’d predicted, Ilardi picked Miami, and lucky for Ben, Dallas won. This one case study doesn’t prove that Ilardi reasoned as Ben expected, of course. Ben summed up the takeaway on his blog:

Of course, we shouldn’t read too much into this: it’s only a single result, and doesn’t prove that either one of us had an advantage.  On the other hand, I did make that pick in part because I felt that Ilardi was unlikely to “outlevel” me.  To be clear, this was not based on any specific assessment about Ilardi personally, but based my general beliefs about people’s tendencies in that kind of situation.

Was I right? The outcome and reasoning given in the final “picking game” has given me no reason to believe otherwise, though I think that the reciprocal lack of information this time around was a major part of that advantage.  If Ilardi and I find ourselves in a similar spot in the future (perhaps in next year’s Smackdown), I’d guess the considerations on both sides would be quite different.

Thinking in greyscale

Have you ever converted an image from greyscale into black and white? Basically, your graphics program rounds all of the lighter shades of grey down to “white,” and all of the darker shades of grey up to “black.” The result is a visual mess – same rough shape as the original, but unrecognizable.

Something similar happens to our mental picture of the world whenever we talk about how we “believe” or “don’t believe” an idea. Belief isn’t binary. Or at least, it shouldn’t be. In reality, while we can be more confident in the truth of some claims than others, we can’t be absolutely certain of anything. So it’s more accurate to talk about how much we believe a claim, rather than whether or not we believe it. For example, I’m at least 99% sure that the moon landing was real. My confidence that mice have the capacity to suffer is high, but not quite as high. Maybe 85%. Ask me about a less-developed animal, like a shrimp, and my confidence would fall to near-uncertainty, around 60%.

Obviously there’s no rigorous, precise way to assign a number to how confident you are about something. But it’s still valuable to get in the habit, at least, of qualifying your statements of belief with words like “probably,” or “somewhat,” or “very.” It just helps keep you thinking in greyscale, and reminds you that different amounts of evidence should yield different degrees of belief. Why lose all that resolution unnecessarily by switching to black and white?

More importantly, the reason you shouldn’t ever have 0% or 100% confidence in any empirical claim is because that implies that there is no conceivable evidence that could ever make you change your mind. You can prove this formally with Bayes’ theorem, which is a simple rule of probability that also serves as a way of describing how an ideal reasoner would update his belief in some hypothesis “H” after encountering some evidence “E.” Bayes’ theorem can be written like this:

… in other words, it’s a rule for how to take your prior probability of a hypothesis, P[H], and update it based on new evidence [E] to get the probability of H given that evidence: P[H | E].

So what happens if you think there’s zero chance of some hypothesis H being true? Well, just plug in zero for “P[H],” all the way on the right, and you’ll realize that the entire equation becomes zero (because zero times anything is zero). So you don’t have to know any of the other terms to conclude that P[H | E] = 0. That means that if you start out with zero belief in a hypothesis, you’ll always have zero belief in that hypothesis no matter what evidence comes your way.

And what if you start out convinced, beyond a shadow of a doubt, that some hypothesis is true? That’s akin to saying that P[H] = 1. That also implies you must put zero probability on all the other possible hypotheses. So plug in 1 for P[H] and 0 for P[not H] in the equation above. With just a bit of arithmetic you’ll find that P[H | E] = 1. Which means that no matter what evidence you come across, if your belief in a hypothesis is 100% before seeing some evidence (that is, P[H] = 1) then your belief in that hypothesis will still be 100% after seeing that evidence (that is, P[H | E] = 1).

As much as I’m in favor of thinking in greyscale, however, I will admit that it can be really difficult to figure out how to feel when you haven’t committed yourself wholeheartedly to one way of viewing the world. For example, if you hear that someone has been accused of rape, your estimation of the likelihood of his guilt should be somewhere between 0 and 100%, depending on the circumstances. But we want, instinctively, to know how we should feel about the suspect. And the two possible states of the world (he’s guilty/he’s innocent) have such radically different emotional attitudes associated with them (“That monster!”/”That poor man!”). So how do you translate your estimated probability of his guilt into an emotional reaction? How should you feel about him if you’re, say, 80% confident he’s guilty and 20% confident he’s innocent? Somehow, finding a weighted average of outrage and empathy doesn’t seem like the right response — and even if it were, I have no idea what that would feel like.

The D.I.Y. way of getting a probability estimate from your doctor

One frustrating thing about dealing with doctors is that they tend to be unwilling or unable to talk about probabilities. I run into this problem in particular when they’ve told me there is “a chance” of something, like a chance of a complication of a procedure, or a chance of transmitting an infection, or a chance of an illness lasting past some time threshold, and so on. Whenever I’ve pressed them to try to tell me approximately how much of a chance there is, they’ve told me something to the effect of, “It varies” or “I can’t say.” I sometimes tell them, look, I know you’re not going to have exact numbers for me, but I just want to know if we’re talking more like 50% or, you know, 1%? Still, they balk.

My interpretation is that this happens due to a combination of (1) people not having a good intuitive sense of how to estimate probabilities and (2) doctors not wanting to be held liable for making me a “promise” – perhaps they’re concerned that if they give me a low estimate and it happens anyway, then I’ll get angry or sue them or something.

So I wanted to share a useful tip from my friend, the mathematician who blogs at, who was about to have his wisdom teeth removed and was trying unsuccessfully to get his surgeon to tell him the approximate risks of various possible complications from surgery. He discovered that you can actually get a percentage out of your doctor if you’re willing to just construct it yourself:

Friend: “I’ve heard that it’s possible to end up with permanent numbness in your mouth or lip after this surgery… what’s the chance of that happening?”

Surgeon: “It’s pretty low.”

Friend: “About how low? Are we talking, like five percent? Or only a fraction of one percent?”

Surgeon: “I really can’t say.”

Friend: “Okay, well… how many of these surgeries have you done?”

Surgeon: “About four thousand.”

Friend: “How many of your patients have had permanent numbness?”

Surgeon: “Two.”

Friend: “Ah, okay. So, about one twentieth of one percent.”

Surgeon: “I really can’t give you a percentage.”

Visualizing data with lines, blocks, and roller coasters

Randall Munroe's infographic on radiation dose levels (Click to enlarge)

I’m a huge fan of clever ways of visualizing data, especially when there’s something challenging about the data in question. For example, if it contains more than three important dimensions and therefore can’t be easily graphed with the typical representations (e.g., position on x-axis, position on y-axis, color of dot). Or if it contains a few huge outliers which distort the scale of the data.

This recent infographic in Scientific American by my friend (and co-blogger, at Rationally Speaking) Lena Groeger is a great example of the latter. The challenge in displaying relative levels of radioactivity is that there are a few outliers (e.g., Chernobyl) which are so many times higher than the rest of the data that when you try to graph them on the same scale, you end up with the outlier at one end and then all the rest of the data clumped together in an indeterminate mass at the other end.

Randall Munroe over at the webcomic XKCD came up with a pretty good, inventive solution that relies on our intuitive sense of area, rather than length. Each successive grid represents only one small block of the next grid, which is how he manages to cram the entire skewed scale into one page. It’s cool, but I don’t think it works that intuitively. We have to consciously keep in mind the reminder of how big each grid is relative to the next, and it’s easy to lose your grip on the relative scales involved.

However, one of the benefits of online infographics as opposed to print is that you don’t have to fit the whole image in view at once. Lena and her colleagues created a long, leisurely scale that has the space at one end to show the differences between various low levels of radiation dose, below 100,000 micro-Sieverts… and then it hits you with a sense of relative magnitude as you have to scroll down, down, down, until you get to Chernobyl at 6 million micro-Sieverts.

It reminded me of one of my all-time favorite data visualizations: over one hundred years of housing prices, transformed into a first-person perspective roller coaster ride. There are a number of wonderful things about this design choice. For one thing, it works on a visceral level: reaching unprecedented heights actually makes you feel giddy, and sudden steep declines are a little scary.

I also love the way it captures the most recent housing bubble — as you keep climbing higher, and higher, and higher, and higher, and higher, the repetitive climb starts to feel relaxing, and you even forget that you’re on a roller coaster. You forget, in other words, that you’re not going to keep going up forever. And that moment at the end, when the coaster pauses and you turn around to look down at how far away the ground is (this video stops right before the 2008 crash) — shiver. Just perfect.

Food, Bias, and Justice: a Case for Statistical Prediction Rules

We’re remarkably bad at making good decisions. Even when we know what goal we’re pursuing, we make mistakes predicting which actions will achieve it. Are there strategies we can use to make better policy decisions? Yes – we can gain insight by looking at cognitive science.

On the surface all we need to do is experience the world and figure out what does and doesn’t work at achieving goals (the focus of instrumental rationality). That’s why we tend to respect expert opinion: they have a lot more experience on an issue and have considered/evaluated different approaches.

Let’s take the example of deciding whether or not to grant prisoners parole. If the goal is to reduce repeat offenses, we tend to trust a panel of expert judges who evaluate the case and use their subjective opinion. They’ll do a good job, or at least as good a job as anyone else, right? Well… that’s the problem: everyone does a pretty bad job. Quite frankly, even experts’ decision-making is influenced by factors that are unrelated to the matter at hand. Ed Yong calls attention to a fascinating study which finds that a prisoner’s chance of being granted parole is strongly influenced by when their case is heard in relation to the judges’ snack breaks:

The graph is dramatic. It shows that the odds that prisoners will be successfully paroled start off fairly high at around 65% and quickly plummet to nothing over a few hours (although, see footnote). After the judges have returned from their breaks, the odds abruptly climb back up to 65%, before resuming their downward slide. A prisoner’s fate could hinge upon the point in the day when their case is heard.

Curse our fleshy bodies and their need for “Food” and “breaks”! It’s obviously a problem that human judgment is influenced by irrelevant, quasi-random factors. How can we counteract those effects?

Statistical Prediction Rules do better

Fortunately, we have science and statistics to help. We can objectively record evidential cues, look at the resulting target property, and find correlations. Over time, we can build an objective model, meat-brain limitations out of the way.

This was the advice of Bishop and Trout in “Epistemology and the Psychology of Human Judgment“, an excellent book recommended by Luke Muehlhauser of Common Sense Atheism (and a frequent contributor to Less Wrong).

Bishop and Trout argued that we should use such Statistical Prediction Rules (SPRs) far more often than we do. Not only are they faster, it turns out they’re more trustworthy: Using the same amount of information (or often less) a simple mathematical model consistently out-performs expert opinion.

They point out that when Grove and Meehl did a survey of 136 different studies comparing an SPR to the expert opinion, they found that “64 clearly favored the SPR, 64 showed approximately equivalent accuracy, and 8 clearly favored the clinician.” The target properties the studies were predicting varied from medical diagnoses to academic performance to – yup – parole violation and violence.

So based on some cues, a Statistical Prediction Rule would probably give a better prediction than the judges on whether a prisoner will break parole or commit a crime. And they’d do it very quickly – just by putting the numbers into an equation! So all we need to do is show the judges the SPRs and they’ll save time and do a better job, right? Well, not so much.
Read more and comment:


Get every new post delivered to your Inbox.

Join 510 other followers

%d bloggers like this: