Which Cognitive Bias is Making NFL Coaches Predictable?

In football, it pays to be unpredictable (although the “wrong way touchdown” might be taking it a bit far.) If the other team picks up on an unintended pattern in your play calling, they can take advantage of it and adjust their strategy to counter yours. Coaches and their staff of coordinators are paid millions of dollars to call plays that maximize their team’s talent and exploit their opponent’s weaknesses.

That’s why it surprised Brian Burke, formerly of AdvancedNFLAnalytics.com (and now hired by ESPN) to see a peculiar trend: football teams seem to rush a remarkably high percent on 2nd and 10 compared to 2nd and 9 or 11.

What’s causing that?

His insight was that 2nd and 10 disproportionately followed an incomplete pass. This generated two hypotheses:

1. Coaches (like all humans) are bad at generating random sequences, and have a tendency to alternate too much when they’re trying to be genuinely random. Since 2nd and 10 is most likely the result of a 1st down pass, alternating would produce a high percent of 2nd down rushes.
2. Coaches are suffering from the ‘small sample fallacy’ and ‘recency bias’, overreacting to the result of the previous play. Since 2nd and 10 not only likely follows a pass, but a failed pass, coaches have an impulse to try the alternative without realizing they’re being predictable.

These explanations made sense to me, and I wrote about phenomenon a few years ago. But now that I’ve been learning data science, I can dive deeper into the analysis and add a hypothesis of my own.

The following work is based on the play-by-play data for every NFL game from 2002 through 2012, which Brian kindly posted. I spend some time processing it to create variables like Previous Season Rushing %, Yards per Pass, Yards Allowed per Pass by Defense, and QB Completion percent. The Python notebooks are available on my GitHub, although the data files were too large to host easily.

Irrationality? Or Confounding Variables?

Since this is an observational study rather than a randomized control trial, there are bound to be confounding variables. In our case, we’re comparing coaches’ play calling on 2nd down after getting no yards on their team’s 1st down rush or pass. But those scenarios don’t come from the same distribution of game situations.

A number of variables could be in play, some exaggerating the trend and others minimizing it. For example, teams that passed for no gain on 1st down (resulting in 2nd and 10) have a disproportionate number of inaccurate quarterbacks (the left graph). These teams with inaccurate quarterbacks are more likely to call rushing plays on 2nd down (the right graph). Combine those factors, and we don’t know whether any difference in play calling is caused by the 1st down play type or the quality of quarterback.

The classic technique is to train a regression model to predict the next play call, and judge a variable’s impact by the coefficient the model gives that variable.  Unfortunately, models that give interpretable coefficients tend to treat each variables as either positively or negatively correlated with the target – so time remaining can’t be positively correlated with a coach calling running plays when the team is losing and negatively correlated when the team is winning. Since the relationships in the data are more complicated, we needed a model that can handle it.

I saw my chance to try a technique I learned at the Boston Data Festival last year: Inverse Probability of Treatment Weighting.

In essence, the goal is to create artificial balance between your ‘treatment’ and ‘control’ groups — in our case, 2nd and 10 situations following 1st down passes vs. following 1st down rushes. We want to take plays with under-represented characteristics and ‘inflate’ them by pretending they happened more often, and – ahem – ‘deflate’ the plays with over-represented features.

To get a single metric of how over- or under-represented a play is, we train a model (one that can handle non-linear relationship better) to take each 2nd down play’s confounding variables as input – score, field position, QB quality, etc – and tries to predict whether the 1st down play was a rush or pass. If, based on the confounding variables, the model predicts the play was 90% likely to be after a 1st down pass – and it was – we decide the play probably has over-represented features and we give it less weight in our analysis. However, if the play actually followed a 1st down rush, it must have under-represented features for the model to get it so wrong. Accordingly, we decide to give it more weight.

After assigning each play a new weight to compensate for its confounding features (using Kfolds to avoid training the model on the very plays it’s trying to score), the two groups *should* be balanced. It’s as though we were running a scientific study, noticed that our control group had half as many men as the treatment group, and went out to recruit more men. However, since that isn’t an option, we just decided to count the men twice.

Testing our Balance

Before processing, teams that rushed on 1st down for no gain were disproportionately likely to be teams with the lead. After the re-weighting process, the distributions are far much more similar:

Much better! They’re not all this dramatic, but lead was the strongest confounding factor and the model paid extra attention to adjust for it.

It’s great that the distributions look more similar, but that’s qualitative. To do a quantitative diagnostic, we can take the standard difference in means, recommended as a best practice in a 2015 paper by Peter C. Austin and Elizabeth A. Stuart titled “Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies“.

For each potential confounding variable, we take the difference in means between plays following 1st down passes and 1st down rushes and adjust for their combined variance. A high standard difference of means indicates that our two groups are dissimilar, and in need of balancing. The standardized differences had a max of around 47% and median of 7.5% before applying IPT-weighting, which reduced the differences to 9% and 3.1%, respectively.

So, now that we’ve done what we can to balance the groups, do coaches still call rushing plays on 2nd and 10 more often after 1st down passes than after rushes? In a word, yes.

In fact, the pattern is even stronger after controlling for game situation. It turns out that the biggest factor was the score (especially when time was running out.) A losing team needs to be passing the ball more often to try to come back, so their 2nd and 10 situations are more likely to follow passes on 1st down. If those teams are *still* calling rushing plays often, it’s even more evidence that something strange is going on.

Ok, so controlling for game situation doesn’t explain away the spike in rushing percent at 2nd and 10. Is it due to coaches’ impulse to alternate their play calling?

Maybe, but that can’t be the whole story. If it were, I would expect to see the trend consistent across different 2nd down scenarios. But when we look at all 2nd-down distances, not just 2nd and 10, we see something else:

If their teams don’t get very far on 1st down, coaches are inclined to change their play call on 2nd down. But as a team gains more yards on 1st down, coaches are less and less inclined to switch. If the team got six yards, coaches rush about 57% of the time on 2nd down regardless of whether they ran or passed last play. And it actually reverses if you go beyond that – if the team gained more than six yards on 1st down, coaches have a tendency to repeat whatever just succeeded.

It sure looks like coaches are reacting to the previous play in a predictable Win-Stay Lose-Shift pattern.

Following a hunch, I did one more comparison: passes completed for no gain vs. incomplete passes. If incomplete passes feel more like a failure, the recency bias would influence coaches to call more rushing plays after an incompletion than after a pass that was caught for no gain.

Before the re-weighting process, there’s almost no difference in play calling between the two groups – 43.3% vs. 43.6% (p=.88). However, after adjusting for the game situation – especially quarterback accuracy – the trend reemerges: in similar game scenarios, teams rush 44.4% of the time after an incomplete and only 41.5% after passes completed for no gain. It might sound small, but with 20,000 data points it’s a pretty big difference (p < 0.00005)

All signs point to the recency bias being the primary culprit.

Reasons to Doubt:

1) There are a lot of variables I didn’t control for, including fatigue, player substitutions, temperature, and whether the game clock was stopped in between plays. Any or all of these could impact the play calling.

2) Brian Burke’s (and my) initial premise was that if teams are irrationally rushing more often after incomplete passes, defenses should be able to prepare for this and exploit the pattern. Conversely, going against the trend should be more likely to catch the defense off-guard.

I really expected to find plays gaining more yards if they bucked the trends, but it’s not as clear as I would like.  I got excited when I discovered that rushing plays on 2nd and 10 did worse if the previous play was a pass – when defenses should expect it more. However, when I looked at other distances, there just wasn’t a strong connection between predictability and yards gained.

One possibility is that I needed to control for more variables. But another possibility is that while defenses *should* be able to exploit a coach’s predictability, they can’t or don’t. To give Brian the last words:

But regardless of the reasons, coaches are predictable, at least to some degree. Fortunately for offensive coordinators, it seems that most defensive coordinators are not aware of this tendency. If they were, you’d think they would tip off their own offensive counterparts, and we’d see this effect disappear.

An Atheist’s Defense of Rituals: Ceremonies as Traffic Lights

The idea of a coming-of-age ceremony has always been a bit strange to me as an atheist. Sure, I attended more than my fair share of Bat and Bar Mitzvahs in middle school. But it always struck me as odd for us to pretend that someone “became an adult” on a particular day, rather than acknowledging it was a gradual process of maturation over time. Why can’t we just all treat people as their maturity level deserves?

The same goes with weddings – does a couple’s relationship really change in a significant way marked by a ceremony? Or do two people gradually fall in love and grow committed to each other over time? Moving in with each other marks a discrete change, but what does “married” change about the relationship?

But my thinking has been evolving since reading this fantastic post about rituals by Brett and Kate McKay at The Art of Manliness. Not only do the rituals acknowledge a change, they use psychological and social reinforcement to help the individuals make the transition more fully:

One of the primary functions of ritual is to redefine personal and social identity and move individuals from one status to another: boy to man, single to married, childless to parent, life to death, and so on.

Left to follow their natural course, transitions often become murky, awkward, and protracted. Many life transitions come with certain privileges and responsibilities, but without a ritual that clearly bestows a new status, you feel unsure of when to assume the new role. When you simply slide from one stage of your life into another, you can end up feeling between worlds – not quite one thing but not quite another. This fuzzy state creates a kind of limbo often marked by a lack of motivation and direction; since you don’t know where you are on the map, you don’t know which way to start heading.

Just thinking your way to a new status isn’t very effective: “Okay, now I’m a man.” The thought just pings around inside your head and feels inherently unreal. Rituals provide an outward manifestation of an inner change, and in so doing help make life’s transitions and transformations more tangible and psychologically resonant.

Brett and Kate McKay cover a range of aspects of rituals, but I was particular struck by the game theory implications of these ceremonies. By coordinating society’s expectations in a very public manner, transition rituals act like traffic lights to make people feel comfortable and confident in their course of action.

The Value of Traffic Lights

Traffic lights are a common example in game theory. Imagine that you’re driving toward an unmarked intersection and see another car approaching from the right. You’re faced with a decision: do you keep going, or brake to a stop?

If you assume they’re going to keep driving, you want to stop and let them pass. If you’re wrong, you both lose time and there’s an awkward pause while you signal to each other to go.

If you assume they’re going to stop, you get to keep going and maintain your speed. Of course, if you’re wrong and they keep barreling forward, you risk a deadly accident.

Things go much more smoothly when there are clear street signs or, better yet, a traffic light coordinating everyone’s expectations.

Ceremonies as Traffic Lights

Now, misjudging a teenager’s maturity is unlikely to result in a deadly accident. But, with reduced stakes, the model still applies.

As a teen gets older, members of society don’t always know how to treat him – as a kid or adult. Each type of misaligned expectations is a different failure mode: If you treat him as a kid when he expected to be treated as an adult, he might feel resentful of the “overbearing adult”. If you treat him as an adult when he was expecting to be treated as a kid, he might not take responsibility for himself.

A coming-of-age ritual acts like the traffic light to minimize those failure modes. At a Bar or Bat Mitzvah, members of society gather with the teenager and essentially publicly signal “Ok everyone, we’re switching our expectations… wait for it… Now!”

It’s important that the information is known by all to be known to all – what Steven Pinker calls common or mutual knowledge:

“In common knowledge, not only does A know x and B know x, but A knows that B knows x, and B knows that A knows x, and A knows that B knows that A knows x, ad infinitum.”

If you weren’t sure that the oncoming car could see their traffic light, it would be almost as bad as if there were no light at all. You couldn’t trust your green light because they might not stop. Not only do you need to know your role, but you need to know that everyone knows their role and trusts that you know yours… etc.

Public ceremonies gather everyone to one place, creating that common knowledge. The teenager knows that everyone expects him to act as an adult, society knows that he expects them to treat him as one, and everyone knows that those expectations are shared. Equipped with this knowledge, the teen can count on consistent social reinforcement to minimize awkwardness and help him adopt his new identity.

Obviously, these rituals are imperfect – Along with the socially-defined parts of identity, there are internal factors that make someone more or less ready to be an adult. Quite frankly, setting 13 as the age of adulthood is probably too young.

But that just means we should tweak the rituals to better fit our modern world. After all, we have precise engineering to set traffic light schedules, and it still doesn’t seem perfect (this XKCD comes to mind).

That’s what makes society and civilization powerful. We’re social creatures, and feel better when we feel comfortable in our identity – either as a child or adult, as single or married, as grieving or ready to move on. Transition rituals serve an important and powerful role in coordinating those identities.

We shouldn’t necessarily respect them blindly, but I definitely respect society’s rituals more after thinking this through.

To take an excerpt from a poem by Bruce Hawkins:

Three in the morning, Dad, good citizen
stopped, waited, looked left, right.
He had been driving nine hundred miles,
had nearly a hundred more to go,
but if there was any impatience
it was only the steady growl of the engine
which could just as easily be called a purr.

I chided him for stopping;
he told me our civilization is founded
on people stopping for lights at three in the morning.

Why Blocking Roads Can Speed Up Traffic

It’s so counter-intuitive that it’s called Braess’ Paradox: How can closing a road actually make everyone’s commute shorter? You would think that blocking a route would be an inconvenience, but under some circumstances it’s actually for the best.

Doesn’t sound right, does it?  Here’s the situation: Assume drivers are rational and intelligent.  I know, that’s a stretch – I grew up around DC.  But bear with me.  If there are multiple paths that people can take, they should in theory find an equilibrium between them.  If one path has less traffic and takes less time, more people will switch to it until it loses its advantage.  If one path starts longer than the others, nobody will use it until the other paths get congested enough to make it worth it.

So how can an extra path actually make the average commute time longer?  Shouldn’t an extra path just give people more options to choose from, and ultimately find the best equilibrium?

The Situation:

It turns out that when some roads are more prone to traffic than others, it can create Braess’ Paradox.  Imagine that some roads aren’t as affected by traffic – I picture these as the local roads with traffic lights. They add a fixed amount of time to your commute, say 45 minutes. The other roads are heavily dependent on traffic – these highways can either be wonderfully fast or a mess of stop-and-go congestion, depending on how many other people are on them. The average time it takes to drive on them is the number of cars over 100.

Let’s say there are 4000 cars driving from the start to finish. Without the connector (dotted in the diagram), an equilibrium forms where half the drivers (2000 cars) take the top route through A, and half take the bottom route through B.  The highway takes 2000/100 = 20 minutes, and the local road takes 45 minutes. So half the population spends 45 minutes on a local street, followed by 20 minutes on a highway, and the other half of the drivers spend 20 minutes on a highway, followed by 45 minutes on a local street. Everyone gets to their destination in 65 minutes. Nobody has any incentive to switch.

But what if a new connector is opened between A and B, allowing people to go straight from one highway to the other? Now everyone thinks to themselves, “Hey, why spend 45 minutes on a local street when I could spend 20 minutes on the highway? I’m going to take the route Start –> A –> B –> Finish, and shave 25 minutes off of my commute time!”

Of course, if everyone thinks that way, there are now double the cars on each highway than there were before, and it’s half as fast: now each highway takes 40 minutes, not 20 minutes. That’s still 5 minutes less than the 45 minutes it takes to drive on the local street, though, so everyone still has an incentive to take the highway.

So in the end, how has the connector affected people’s commutes? Everyone’s commute used to be 65 minutes; now, everyone’s commute is 80 minutes. And to make it stranger, there’s no better path to take – anyone considering switching to their original route would be looking at an 85 minute drive.

How does this happen?

How can opening a new, super-fast connector make commutes worse? It comes down to the price of anarchy and people’s selfish motivations.  With the connector open, each set of cars has the option to clog up the other half’s highways – saving themselves 5 minutes but adding 20 minutes to the other guys’ commute.

It’s like the prisoner’s dilemma: Each driver has the motivation to take the highways, even though it damages the overall system. Without the connector, nobody is allowed to “defect” for personal gain. In the traditional prisoner’s dilemma, it would be like a mafia boss keeping all his criminals anonymous. Without the option to rat each other out, criminals would avoid the selfish temptation and the entire system is better off.

Braess’ Paradox isn’t purely hypothetical – it has real-world implications in city planning. According to this New York Times article titled What if They Closed 42d Street and Nobody Noticed?, “When a network is not congested, adding a new street will indeed make things better. But in the case of congested networks, adding a new street probably makes things worse at least half the time, mathematicians say.”  That’s shocking. My intuitions about how traffic works were way off.

Lastly, via Presh Talkwalkar’s fantastic game theory blog, Mind Your Decisions, (which brought Braess’ paradox to my attention) there’s a great video of the paradox physically in action with springs. Check it out:

Coach Smith’s Gutsy Call

Coach Mike Smith was facing a tough decision. His Falcons were in overtime against the division-rival Saints. His team had been stopped on their own 29 yard-line and were facing fourth down and inches. Should he tell his players to punt, or go for it? A punt would be safe. Trying to get the first down would be the high-risk, high-reward play. Success would mean a good chance to win, failure would practically guarantee a loss. What play call would give his team the best chance to win?

He decided to be aggressive. He called for star running back Michael Turner to try pounding up the middle of the field.

It failed. The Saints were given the ball in easy range to score, and quickly did so. The media and fans criticized Smith for his stupid decision.

But is the criticism fair? If the play call had worked, I bet he would have been praised for his guts and brilliance. I think my favorite reaction came from ESPN writer Pat Yasinskas:

When Mike Smith first decided to go for it on fourth-and-inches in overtime, I liked the call. I thought it was gutsy and ambitious. After watching Michael Turner get stuffed, I changed my mind. Smith should have punted and taken his chances with his defense.

What a perfect, unabashed example of Outcome Bias! We have a tendency to judge a past decision solely based on the result, not on the quality of the choice given the information available at the time.

Did Coach Smith know that the play would fail? No, of course not. He took a risk, which could go well or poorly. The quality of his decision lies in the chances of success and the expected values for each call.

Fortunately, some other people at ESPN did the real analysis, using 10 years of historical data of teams’ chances to win based on factors like field position, score, time remaining, and so on:

Choice No. 1: Go for the first down

…Since 2001, the average conversion percentage for NFL teams that go for it on fourth-and-1 is 66 percent. Using this number, we can find the expected win probability for Atlanta if it chooses this option.

* Atlanta win probability if it converts (first-and-10 from own 30-yard line): 67.1 percent
* Atlanta win probability if it does not convert (Saints first-and-10 from Falcons’ 29-yard line): 18 percent.
* Expected win probability of going for the first down: 0.660*(.671) + (1-.660)*(.180) = 50.4%

Choice No. 2: Punt

* For this choice, we will assume the Falcons’ net punt average of 36 yards for this season. This means the expected field position of the Saints after the punt is their own 35-yard line. This situation (Saints with first-and-10 from their 35, in OT, etc.) would give the Falcons a win probability of 41.4%.

So by choosing to go for it on fourth down, the Falcons increased their win probability by 9 percentage points.

That’s a much better way to evaluate a coach’s decision! Based on a simple model and league averages (there are problems with both of those, but they’re better than simply trusting outcome!) the punt was not the best option. Smith made the right decision.

Well, sort of. There are different ways to go for the fourth-down conversion, and according to Brian Burke at AdvancedNFLStats, Smith chose the wrong one:

Conversion success rates on 1-yd to go runs (%)

 Position 3rd Down 4th Down FB 77 70 QB 87 82 RB 68 66 Total 72 72

In these situations, quarterback sneaks have proven much more effective than having your running back take the ball. In a perfect game-theory world, defenses would realize their weakness and focus more effort on stopping it. But for now, it remains something more offenses teams can exploit. According to the numbers, the Falcons probably could have made a better decision.

And, of, course, it was OBVIOUS to me at the time that they should have called a quarterback sneak. </hindsight bias>

Game Theory and Football: How Irrationality Affects Play Calling

Coaches and coordinators in professional football get paid a lot of money to call the right plays – not just the best plays for particular situations, but also unpredictable plays that will catch the other team off guard. It’s a perfect setup for game theory analysis!

As in other game theory situations, the best play depends in part on what your opponent does. Your running play is much more likely to succeed against a pass-prevent defense, but would be in trouble against a run-stuffing formation. If the defense can guess what you’re going to call, they can adjust accordingly and have an advantage. Even on 3rd down and long – a common passing situation – there’s value in calling a percent of running plays, because the defense is less likely to be geared toward stopping that. But as you do it more, the chance of catching the defense off guard gets smaller. There’s some optimal balance where the expected success of a surprising run is equal to the expected success of a more sensible (but anticipated) pass.

The goal is to stay unpredictable and exploit patterns where your opponent is using a sub-optimal combination. If a team notices that passing plays are working better, they’ll be more likely to call them. As the defense notices, they’ll shift away from their run-defense and focus more on defending passes. In theory, the two teams reach an equilibrium.

In practice, it doesn’t quite work that perfectly – human beings are making the decisions, and humans are both vulnerable to cognitive biases and notoriously bad at mimicking true unpredictability. Brian Burke, a fellow fan of combining sports with statistics, was poring over the play-calling data for second downs and noticed something odd:

There’s a strange spike in percent of running plays called at 2nd and 10! Tactically, 2nd and 10 isn’t all that different from 2nd and 9 or 11, so it’s strange to see such a difference. Why would they call so many more running plays in that particular situation?

The key is to realize that there are two ways a team tends to find itself facing a 2nd and 10 situation – runs that happen to go nowhere or any incomplete pass. Of those, incomplete passes are far more common. So in cases of 2nd and 10, it’s most often because the team just failed a passing play. That suggests two reasons coaches might be irrationally switching to running plays, even at the cost of sacrificing unpredictability:

(1) The hasty generalization bias (also called the small sample bias) and the recency effect are cognitive biases in which people overgeneralize from a small amount of data, especially recent data. Failed passes are very common (about 40% fail), so there’s no good reason for a coach to treat any single failed pass as evidence that they’d be better off switching to a running play. But the urge to overreact to the failed pass that just happened is strong, thanks to these two biases.

(2) People are terrible at generating unpredictability — when asked to make up a “seemingly-random” sequence of coin flips, we tend to use far more alternation between Heads and Tails than would actually occur in a real sequence of coin flips. So even if coaches weren’t overreacting to a failed pass, and they were simply trying to be unpredictable, they would still tend to switch to a running play after a passing play more often than random chance would dictate.

Indeed, when Brian separated the data by previous play, the alternation trend is clear — passes are more likely after runs, and runs are more likely after passes:

Brian concludes:

Coaches and coordinators are apparently not immune to the small sample fallacy. In addition to the inability to simulate true randomness, I think this helps explain the tendency to alternate. I also think this why the tendency is so easy to spot on the 2nd and 10 situation. It’s the situation that nearly always follows a failure. The impulse to try the alternative, even knowing that a single recent bad outcome is not necessarily representative of overall performance, is very strong.

So recency bias may be playing a role. More recent outcomes loom disproportionately large in our minds than past outcomes. When coaches are weighing how successful various play types have been, they might be subconsciously over-weighting the most recent information—the last play. But regardless of the reasons, coaches are predictable, at least to some degree.

Coaches are letting irrational biases influence their play calling, pulling them away from the optimal mix. The result, according to Pro Football Reference stats, is less success on those plays. I wonder how well a computer could call plays using a Statistical Prediction Rule

Ben Morris is a friend-of-a-friend of mine who recently competed in a contest sponsored by ESPN called “Stat Geek Smackdown,” in which the goal was to correctly predict as many of the NBA playoff games as possible. For each correct guess, a contestant received 5 points.

Heading into the final game between Miami and Dallas, Ben was in second place, trailing just 4 points behind a veteran stat geek named Ilardi. By most estimates, Miami had about a 63% chance of beating Dallas. But Ben realized that if he and Ilardi both chose Miami, then even if Miami won the game, Ilardi would still win the competition, because he and Ben would each get 5 points and the gap between their scores would remain unchanged. In order for Ben to win the competition, he would have to pick the winning team and Ilardi would have to pick the losing team.

So that created an interesting game theory problem: If Ben predicted that Ilardi would pick Miami, since they were more likely to win, then Ben should pick Dallas. But if Ilardi predicted that Ben would be reasoning that way, then Ilardi might pick Dallas, knowing that all he needs to do to win the competition is to pick the same team as Ben. But of course if Ben predicts that Ilardi will be thinking that way, maybe Ben should pick Miami…

What would you do if you were Ben? You can read about Ben’s reasoning on his excellent blog, Skeptical Sports, but here’s my summary. Ben essentially had two options:

(1) His first option was to play his Nash equilibrium strategy, which is a concept you might recall if you ever took game theory (or if you saw the movie “A Beautiful Mind,” although the movie botched the explanation). That’s the set of strategies (Ben’s and Ilardi’s) which gives each of them no incentive to switch to a new strategy as long as the other guy doesn’t. The Nash equilibrium strategy is especially appealing if you’re risk averse because it’s “unexploitable,” meaning that it gives you predictable, fixed odds of winning the game, no matter what strategy your opponent uses.

In this case — and you can read Ben’s blog for the proof — the Nash equilibrium is for Ben to pick Miami with exactly the same probability as Miami has of losing (0.37) and for Ilardi to pick Miami with exactly the same probability as Miami has of winning (0.63). (You might wonder how you should pick a team “with X probability,” but it’s pretty easy: just roll a 100-sided die, and pick the team if the die comes up X or lower.)

If you do the calculation, you’ll find that playing this strategy — i.e., rolling a hundred-sided die and picking Miami only if the die came up 37 or lower — would give Ben a 23.3% chance of beating Ilardi, no matter how Ilardi decided to play. Not terrible odds, especially given that this approach doesn’t require Ben to make any predictions about Ilardi’s strategy. But perhaps Ben could do better if he were able to make a reasonable guess about what Ilardi would do.

(2) That leads us to option two: Ben could abandon his Nash equilibrium strategy, if he felt that he could predict Ilardi’s action with sufficient confidence. To be precise, if Ben thinks that Ilardi is more than 63% likely to pick Miami, then Ben should pick Dallas.

Here’s a rough proof. Call “p” the likelihood that Ilardi picks Miami, and “q” the likelihood that Ben picks Miami. Then we can assign probabilities to each of the outcomes in which Ben wins:

Since the two outcomes are mutually exclusive, we can add up their probabilities to get the total probability that Ben wins, as a function of p and q:

Probability Ben wins = .37p + .63q – pq

Just to illustrate how Ben’s chance of winning changes depending on p, I plugged in three different values of p to create three different lines: For the black line, p=0.63. For the red line, p < 0.63 (to be precise, I plugged in p=0.62, but any value of p<0.63 will create an upward sloping line). For the blue line, p > 0.63 (to be precise, I plugged in p=0.64, but any value of p>0.63 will create a downward sloping line).

If p = .63, that renders Ben’s chance of winning constant ( .233) for all values of q. In other words, if Ilardi seems to be about 63% likely to pick Miami, then it doesn’t matter how Ben picks, he’ll have the same chance of winning (23.3%) as he would if he played his Nash equilibrium strategy.

If p > .63, Ben’s chance of winning decreases as q (his probability of choosing Miami) increases. In other words, if Ben thinks there’s a greater than 63% chance that Ilardi will pick Miami, then Ben should pick Miami with as low a probability as possible (i.e., he should pick Dallas).

If p < .63, Ben’s chance of winning increases as q (his probability of choosing Miami) increases. In other words, if Ben thinks there’s a lower than 63% chance that Ilardi will pick Miami, then Ben should pick Miami with as high a probability as possible (i.e., he should pick Miami).

So what happened? Ben estimated that Ilardi would pick Miami with greater than 63% probability. That’s mainly because most people aren’t comfortable playing probabilistic strategies that require them to roll a die —  people will simply “round up” in their mind and pick the team that would give them a win more often than not. And Ben knew that if he was right about Ilardi picking Miami, then Ben would end up with a 37% chance of winning, rather than the 23.3% chance he would have had if he stuck to his equilibrium strategy.

So Ben picked Dallas. As he’d predicted, Ilardi picked Miami, and lucky for Ben, Dallas won. This one case study doesn’t prove that Ilardi reasoned as Ben expected, of course. Ben summed up the takeaway on his blog:

Of course, we shouldn’t read too much into this: it’s only a single result, and doesn’t prove that either one of us had an advantage.  On the other hand, I did make that pick in part because I felt that Ilardi was unlikely to “outlevel” me.  To be clear, this was not based on any specific assessment about Ilardi personally, but based my general beliefs about people’s tendencies in that kind of situation.

Was I right? The outcome and reasoning given in the final “picking game” has given me no reason to believe otherwise, though I think that the reciprocal lack of information this time around was a major part of that advantage.  If Ilardi and I find ourselves in a similar spot in the future (perhaps in next year’s Smackdown), I’d guess the considerations on both sides would be quite different.

The Game Theory of Story Endings

Do happy endings really make you as happy if you see them coming a mile away? When we watch a trashy action flick or a fluffy romantic comedy, aren’t the conflicts less interesting because we know it’ll all end happily ever after? Someone has to bite the bullet and write a sad ending to give plausibility to the threat of unhappiness. It’s disincentivized because sad endings are more challenging and risk upsetting the audience, but someone has to do it.

I am intrigued by the market for movie endings. Movie-goers want two things in an ending: They want it to be happy and they want it to be unpredictable. There is some optimal frequency of sad endings that maintains the right level of suspense. Yet the market might fail to provide enough sad endings.

An individual director who films a sad ending risks short-term losses, as word gets around that the movie is “unsatisfying.” It is true that there are long-term gains, as viewers are kept off their guard for future movies. Unfortunately, most of those gains may be captured by other directors, because movie-goers remember only that the murderer does sometimes catch up with the heroine in the basement, and do not remember that it happens only in movies with particular directors. Under these circumstances, no individual director may be willing to incur costs for his rivals’ benefit.

A solution is for directors to display their names prominently, so that viewers know when a movie was made by someone unpredictable. Viewers, however, may find it in their interests to retaliate by covering their eyes when the director’s name is shown.

If you can be associated more strongly with unpredictability, you reap more benefits. You’re also more strongly associated with the unhappy ending, which might turn audiences away.

One way to ease the blow of an unexpected sad ending is to make deaths triumphant, defiant, or heroic. Think of how Spock died in The Wrath of Khan (No, I’m not going to give a spoiler alert for a 30 year old movie). Sure, people die in Star Trek all the time – when Kirk, Spock, and fresh-faced, red-shirted Ensign Jimmy beam down to explore a planet for life, we all know one of them isn’t going to make it back. But to kill a main character is more significant. And it was done in a touching way. They got the unpredictability without upsetting their audience.

I genuinely respect Joss Whedon for his willingness to throw curve balls like this in his story lines. He’s developed a reputation for having sympathetic characters die, leave, or change sides – often without warning. Rather than watching Buffy, Firefly and Serenity thinking “So, how is it all going to work out this time?” we’re forced to think “Is it going to work out this time?”

TV Tropes has a name for all this – Anyone Can Die:

This is where no one is exempt from being killed, including the main characters (maybe even the hero). The Sacrificial Lamb is often used to establish the writer’s Anyone Can Die cred early on. However, if the Lamb’s death is a one-off with no follow-up, it’s just Killed Off for Real. To really be Anyone Can Die, the work must include multiple deaths, happening at different points in the story. Bonus points if the death is unnecessary and devoid of Heroic Sacrifice.

In game theory situations, reputation plays a large role. TV Tropes mentions building a ‘Anyone Can Die’ cred, which can be achieved through repeated interactions. In a TV series or multiple films by the same director, you get a feel for whether the good guys always prevail. But even within a single story, early and repeated signaling can make the remainder of the plot more intense. When a major character is killed off without it being a Heroic Sacrifice, that’s a powerful signal that anything can happen. The musical Into the Woods will always have a special place in my heart for mastering this dynamic.

But there’s another route. Historical dramas can increase society’s perception of “sadness plausibility” without anyone taking a hit for being a downer. Nobody’s going to feel unsatisfied that Titanic, The Great Escape, or Butch Cassidy and the Sundance Kid have sad endings. (Or if they do, they can take it up with reality for writing a depressing script. It’s not easy to keep those separate in our brains; we just get the overall sense that sometimes stories have sad endings. And that perception helps us enjoy all the other movies we watch.