## The darker the night, the brighter the stars?

“The darker the night, the brighter the stars” always struck me as a bit of empty cliche, the sort of thing you say when you want to console someone, or yourself, and you’re not inclined to look too hard at what you really mean. Not that it’s inherently ridiculous that your periods of pleasure might be sweeter if you have previously tasted pain. That’s quite plausible, I think. What made me roll my eyes was the implication that periods of suffering could actually make you better off, overall. That was the part that seemed like an obvious ex post facto rationalization to me. Surely the utility you gain from appreciating the good times more couldn’t possibly be outweighed by the utility you lose from the suffering itself!

Or could it? I decided to settle the question by modeling the functional relationship between suffering and happiness, making a few basic simplifying assumptions. It should look something roughly like this:

Total Happiness = [(1-S) * f(S)] – S

where*
S = % of life spent in suffering
(1-S) = % of life spent in pleasure
f(S) = some function of S

As you can see, f(S) acts as a multiplier on pleasure, so the amount of time you’ve spent in suffering affects how much happiness you get out of your time spent in pleasure. I didn’t want to assume too much about that function, but I think it’s reasonable to say the following:

• f(S) is positive — more suffering means you get more happiness out of your pleasure
• f(0) = 1, because if you have zero suffering, there’s no multiplier effect (and multiplying your pleasure by 1 leaves it unchanged).

… I also made one more assumption which is probably not as realistic as those two:

•  f(S) is linear.**

Under those assumptions, f(S) can be written as:
f(S) = aS + 1

Now we can ask the question: what percent suffering (S) should we pick to maximize our total happiness? The standard way to answer “optimizing” questions like that is to take the derivative of the quantity we’re trying to maximize (in this case, Total Happiness) with respect to the variable we’re trying to choose the value of (in this case, S), and set that derivative to zero. Here, that works out to:

f'(S) – Sf'(S) – f(S) – 1 = 0

And since we’ve worked out that f(S) = aS + 1, we know that f'(S) = a, and we can plug both of those expressions into the equation above:

a – Sa – aS – 1 – 1 = 0
a – 2aS = 2
-2aS = 2 – a
2aS = a -2
S = (a – 2) / 2a

That means that the ideal value of S (i.e., the ideal % of your life spent suffering, in order to maximize your total happiness) is equal to (a – 2)/2a, where a tells you how strongly suffering magnifies your pleasure.

It might seem like this conclusion is unhelpful, since we don’t know what a is. But there is something interesting we can deduce from the result of all our hard work! Check out what happens when a gets really small or really large. As a approaches 0, the ideal S approaches negative infinity – obviously, it’s impossible to spend a negative percentage of your life suffering, but that just means you want as little suffering as possible. Not too surprising, so far; the lower a is, the less benefit you get from suffering, so the less suffering you want.

But here’s the cool part — as a approaches infinity, the ideal S approaches 1/2. That means that you never want to suffer more than half of your life, no matter how much of a multiplier effect you get from suffering – even if an hour of suffering would make your next hour of pleasure insanely wonderful, you still wouldn’t ever want to spend more time suffering than reaping the benefits of that suffering. Or, to put it in more familiar terms: Darker nights may make stars seem brighter, but you still always want your sky to be at least half-filled with stars.

* You’ll also notice I’m making two unrealistic assumptions here:

(1) I’m assuming there are only two possible states, suffering and pleasure, and that you can’t have different degrees of either one – there’s only one level of suffering and one level of pleasure.

(2) I’m ignoring the fact that it matters when the suffering occurs – e.g., if all your suffering occurs at the end of your life, there’s no way it could retroactively make you enjoy your earlier times of pleasure more. It would probably be more realistic to say that whatever the ideal amount of suffering is in your life, you would want to sprinkle it evenly throughout life because your pleasures will be boosted most strongly if you’ve suffered at least a little bit recently.

** Linearity is a decent starting point, and worth investigating, but I suspect it would be more realistic, if much more complicated, to assume that f(S) is concave, i.e., that greater amounts of suffering continue to increase the benefit you get from pleasure, but by smaller and smaller amounts.

## Calibrating our Confidence

It’s one thing to know how confident we are in our beliefs, it’s another to know how confident we should be. Sure, the de Finetti’s Game thought experiment gives us a way to put a number on our confidence – quantifying how likely we feel we are to be right. But we still need to learn to calibrate that sense of confidence with the results.  Are we appropriately confident?

Taken at face value, if we express 90% confidence 100 times, we expect to be proven wrong an average of 10 times. But very few people take the time to see whether that’s the case. We can’t trust our memories on this, as we’re probably more likely to remember our accurate predictions and forget all the offhand predictions that fell flat. If want to get an accurate sense of how well we’ve calibrated our confidence, we need a better way to track it.

Well, here’s a way: PredictionBook.com. While working on my last post, I stumbled on this nifty project. Its homepage features the words “How Sure Are You?” and “Find out just how sure you should be, and get better at being only as sure as the facts justify.” Sounds perfect, right?

It allows you to enter your prediction, how confident you are, and when the answer will be known.  When the time comes, you record whether or not you were right and it tracks your aggregate stats.  Your predictions can be private or public – if they’re public, other people can weigh in with their own confidence levels and see how accurate you’ve been.

(This site isn’t new to rationalists: Eliezer and the LessWrong community noticed it a couple years ago, and LessWrong’er Gwern has been using it to – among other things – track inTrade predictions.)

Since I don’t know who’s using the site and how, I don’t know how seriously to take the following numbers. So take this chart with a heaping dose of salt. But I’m not surprised that the confidences entered are higher than the likelihood of being right:

Predicted Certainty 50% 60% 70% 80% 90% 100% Total
Actual Certainty 37% 52% 58% 70% 79% 81%
Sample Size 350 544 561 558 709 219 2941

Sometimes the miscalibration matters more than others. In Mistakes Were Made (but not by me), Tavris and Aronson describe the overconfidence police interrogators feel about their ability to discern honest denials from false ones. In one study, researchers selected videos of police officers interviewing suspects who were denying a crime – some innocent and some guilty.

Kassin and Fong asked forty-four professional detectives in Florida and Ontario, Canada, to watch the tapes. These professionals averaged nearly fourteen years of experience each, and two-thirds had ha special training, many in the Reid Technique. Like the students [in a similar study], they did no better than chance, yet they were convinced that their accuracy rate was close to 100 percent. Their experience and training did not improve their performance. Their experience and training simply increased their belief that it did.

As a result, more people are falsely imprisoned as prosecutors steadfastly pursue convictions for people they’re sure are guilty. This is a case in which poor calibration does real harm.

Of course, it’s often a more benign issue. Since finding PredictionBook, I see everything as a prediction to be measured. A coworker and I were just discussing plans to have a group dinner, and had the following conversation (almost word for word):

Her: How to you feel about squash?”
Her: “What about sauteed in butter and garlic?”
Me: “That has potential. My estimation of liking it just went up slightly.”
*Runs off to enter prediction*

I’ve already started making predictions in hopes that tracking my calibration errors will help me correct them. I wish Prediction Book had tags – it would be fascinating (and helpful!) to know that I’m particularly prone to misjudge whether I’ll like foods or that I’m especially well-calibrated at predicting the winner of sports games.

And yes, I will be using PredictionBook on football this season. Every week I’ll try to predict the winners and losers, and see whether my confidence is well-placed. Honestly, I expect to see some homer-bias and have too much confidence in the Ravens.  Isn’t exposing irrationality fun?

## De Finetti’s Game: How to Quantify Belief

What do people really mean when they say they’re “sure” of something? Everyday language is terrible at describing actual levels of confidence – it lumps together different degrees of belief into vague groups which don’t always match from person to person. When one friend tells you she’s “pretty sure” we should turn left and another says he’s “fairly certain” we should turn right, it would be useful to know how confident they each are.

Sometimes it’s enough to hear your landlord say she’s pretty sure you’ll get towed from that parking space – you’d move your car. But when you’re basing an important decision on another person’s advice, it would be better describe confidence on an objective, numeric scale. It’s not necessarily easy to quantify a feeling, but there’s a method that can help.

Bruno de Finetti, a 20th-century Italian mathematician, came up with a creative idea called de Finetti’s Game to help connect the feeling of confidence to a percent (hat tip Keith Devlin in The Unfinished Game). It works like this:

Suppose you’re half a mile into a road trip when your friend tells you that he’s “pretty sure” he locked the door. Do you go back? When you ask him for a specific number, he replies breezily that he’s 95% sure. Use that number as a starting point and begin the thought experiment.

In the experiment, you show your friend a bag with 95 red and 5 blue marbles. You then offer him a choice: he can either pick a marble at random and, if it’s red, win \$1 million. Or he can go back and verify that the door is locked and, if it is, get \$1 million.

If your friend would choose to draw a marble from the bag, he preferred the 95% chance to win. His real confidence of locking the door must be somewhere below that. So you play another round – this time with 80 red and 20 blue marbles. If he would rather check the door this time, his confidence is higher than 80% and perhaps you try a 87/13 split next round.

And so on. You keep offering different deals in order to hone in on the level where he feels equally comfortable selecting a random marble and checking the door. That’s his real level of confidence.

The thought experiment should guide people through the tricky process of connecting their feeling of confidence to a corresponding percent. The answer will still be somewhat fuzzy – after all, we’re still relying on a feeling that one option is better than another.

It’s important to remember that the game doesn’t tell us how likely we are to BE right. It only tells us about our confidence – which can be misplaced. From cognitive dissonance to confirmation bias there are countless psychological influences messing up the calibration between our confidence level and our chance of being right. But the more we pay attention to the impact of those biases, the more we can do to compensate. It’s a good practice (though pretty rare) to stop and think, “Have I really been as accurate as I would expect, given how confident I feel?”

I love the idea of measuring people’s confidence (and not just because I can rephrase it as measuring their doubt). I just love being able to quantify things! We can quantify exactly how much a new piece of evidence is likely to affect jurors, how much a person’s suit affects their persuasive impact, or how much confidence affects our openness to new ideas.

We could even use de Finetti’s Game to watch the inner workings of our minds doing Bayesian updating. Maybe I’ll try it out on myself to see how confident I feel that the Ravens will win the Superbowl this year before and after the Week 1 game against the rival Pittsburgh Steelers. I expect that my feeling of confidence won’t shift quite in accordance with what the Bayesian analysis tells me a fully rational person would believe. It’ll be fun to see just how irrational I am!

## RS#37: The science and philosophy of happiness

On Episode #37 of the Rationally Speaking podcast, Massimo and I talk about the science and philosophy of happiness:

“Debates over what’s important to happiness — Money? Children? Love? Achievement? — are ancient and universal, but attempts to study the subject empirically are much newer. What have psychologists learned about which factors have a strong effect on people’s happiness and which don’t? Are parents really less happy than non-parents, and do people return to their happiness “set point” even after extreme events like winning the lottery or becoming paralyzed? We also tackle some of the philosophical questions regarding happiness, such as whether some kinds of happiness are “better” than others, and whether people can be mistaken about their own happiness. But, perhaps the hardest question is: can happiness really be measured?”

## Bayesian truth serum

Here’s a sneaky trick for extracting the truth from someone even when she’s trying to conceal it from you: Rather than asking her how she thinks or behaves, ask her how she thinks other people think or behave.

MIT professor of psychology and cognitive science Drazen Pelec calls this trick “Bayesian truth serum,” according to Tyler Cowen in Discover Your Inner Economist. The logic behind it is simple: our impressions of “typical” attitudes and behavior are colored by our own attitudes and behavior. And that’s reasonable. You should count yourself as one data point in your sample of “how people think and behave.”

Your own data is likely to influence your sample more strongly than other data points, however, for two reasons. First, because it’s much more salient to you, compared to your data about other people, so you’re more likely to overweight it in your estimation. And second, through a ripple effect — people tend to cluster with other people who think and act similarly to themselves, so however your sample differs from the general population, that’s an indicator of how you yourself differ from the general population.

So, to use Cowen’s example, if you ask a man how many sexual partners he’s had, he might have a strong incentive to lie, either downplaying or exaggerating his history depending on who you are and what he wants you to think of him. But his estimate of a “typical” number will still be influenced by his own, and by that of his friends and acquaintances (who, because of the selection effect, are probably more similar to him than the general population is). “When we talk about other people,” Cowen writes, “we are often talking about ourselves, whether we know it or not.”

## Game Theory and Football: How Irrationality Affects Play Calling

Coaches and coordinators in professional football get paid a lot of money to call the right plays – not just the best plays for particular situations, but also unpredictable plays that will catch the other team off guard. It’s a perfect setup for game theory analysis!

As in other game theory situations, the best play depends in part on what your opponent does. Your running play is much more likely to succeed against a pass-prevent defense, but would be in trouble against a run-stuffing formation. If the defense can guess what you’re going to call, they can adjust accordingly and have an advantage. Even on 3rd down and long – a common passing situation – there’s value in calling a percent of running plays, because the defense is less likely to be geared toward stopping that. But as you do it more, the chance of catching the defense off guard gets smaller. There’s some optimal balance where the expected success of a surprising run is equal to the expected success of a more sensible (but anticipated) pass.

The goal is to stay unpredictable and exploit patterns where your opponent is using a sub-optimal combination. If a team notices that passing plays are working better, they’ll be more likely to call them. As the defense notices, they’ll shift away from their run-defense and focus more on defending passes. In theory, the two teams reach an equilibrium.

In practice, it doesn’t quite work that perfectly – human beings are making the decisions, and humans are both vulnerable to cognitive biases and notoriously bad at mimicking true unpredictability. Brian Burke, a fellow fan of combining sports with statistics, was poring over the play-calling data for second downs and noticed something odd:

There’s a strange spike in percent of running plays called at 2nd and 10! Tactically, 2nd and 10 isn’t all that different from 2nd and 9 or 11, so it’s strange to see such a difference. Why would they call so many more running plays in that particular situation?

The key is to realize that there are two ways a team tends to find itself facing a 2nd and 10 situation – runs that happen to go nowhere or any incomplete pass. Of those, incomplete passes are far more common. So in cases of 2nd and 10, it’s most often because the team just failed a passing play. That suggests two reasons coaches might be irrationally switching to running plays, even at the cost of sacrificing unpredictability:

(1) The hasty generalization bias (also called the small sample bias) and the recency effect are cognitive biases in which people overgeneralize from a small amount of data, especially recent data. Failed passes are very common (about 40% fail), so there’s no good reason for a coach to treat any single failed pass as evidence that they’d be better off switching to a running play. But the urge to overreact to the failed pass that just happened is strong, thanks to these two biases.

(2) People are terrible at generating unpredictability — when asked to make up a “seemingly-random” sequence of coin flips, we tend to use far more alternation between Heads and Tails than would actually occur in a real sequence of coin flips. So even if coaches weren’t overreacting to a failed pass, and they were simply trying to be unpredictable, they would still tend to switch to a running play after a passing play more often than random chance would dictate.

Indeed, when Brian separated the data by previous play, the alternation trend is clear — passes are more likely after runs, and runs are more likely after passes:

Brian concludes:

Coaches and coordinators are apparently not immune to the small sample fallacy. In addition to the inability to simulate true randomness, I think this helps explain the tendency to alternate. I also think this why the tendency is so easy to spot on the 2nd and 10 situation. It’s the situation that nearly always follows a failure. The impulse to try the alternative, even knowing that a single recent bad outcome is not necessarily representative of overall performance, is very strong.

So recency bias may be playing a role. More recent outcomes loom disproportionately large in our minds than past outcomes. When coaches are weighing how successful various play types have been, they might be subconsciously over-weighting the most recent information—the last play. But regardless of the reasons, coaches are predictable, at least to some degree.

Coaches are letting irrational biases influence their play calling, pulling them away from the optimal mix. The result, according to Pro Football Reference stats, is less success on those plays. I wonder how well a computer could call plays using a Statistical Prediction Rule

## Asking for reassurance: a Bayesian interpretation

Bayesianism gives us a prescription for how we should update our beliefs about the world as we encounter new evidence. Roughly speaking, when you encounter new evidence (E), you should increase your confidence in a hypothesis H only if that evidence would’ve been more likely to occur in a world where H was true than in a world in which H was false — that is, if P(E|H) > P(E|not-H).

I think this is indisputably correct. What I’ve been less sure about is whether Bayesianism tends to lead to conclusions that we wouldn’t have arrived at anyway just through common sense. I mean, isn’t this how we react to evidence intuitively? Does knowing about Bayes’ rule actually improve our reasoning in everyday life?

As of yesterday, I can say: yes, it does.

I was complaining to a friend about people who ask questions like, “Do you think I’m pretty?” or “Do you really like me?” My argument was that I understood the impulse to seek reassurance if you’re feeling insecure, but I didn’t think it was useful to actually ask such a question, since the person’s just going to tell you “yes” no matter what, and you’re not going to get any new information from it. (And you’re going to make yourself look bad by asking.)

My friend made the valid point that even if everyone always responds “Yes,” some people are better at lying than others, so if the person’s reply sounds unconvincing, that’s a telltale sign that that they don’t genuinely like you/ think you’re pretty. “Okay, that’s true,” I replied. “But if they reply ‘yes’ and it sounds convincing, then you haven’t learned any new information, because you have no way of knowing whether he’s telling the truth or whether he’s just a good liar.”

But then I thought about Bayes’ rule and realized I was wrong — even a convincing-sounding “yes” gives you some new information. In this case, H = “He thinks I’m pretty” and E = “He gave a convincing-sounding ‘yes’ to my question.” And I think it’s safe to assume that it’s easier to sound convincing if you believe what you’re saying than if you don’t, which means that P(E | H) > P(E | not-H). So a proper Bayesian reasoner encountering E should increase her credence in H.

(Of course, there’s always the risk, as with Heisenberg’s Uncertainty Principle, that the process of measuring something will actually change it. So if you ask “Do you like me?” enough, the true answer might shift from “yes” to “no”…)