I want to disturb you with probability and its (apparent) paradoxes some more.
Let’s warm up with something that looks really straightforward. (Uh-oh!) You are shown a red door and a blue door, and told there is a prize behind one of them. What’s the likelihood that the prize is behind red as opposed to blue?
You probably said 1-in-2 for each door, and you may well be right. Or maybe you were playing Let’s Make a Deal, you originally picked the red door from among three (say red, green, and blue), the host showed you the prize wasn’t behind the green door, so now you think it’s 1-in-3 for red and 2-in-3 for blue.
But, depending on the back story, the probabilities could be literally anything. I mean it. Pick any fraction a/b, where a and b are integers and a < b. For concreteness, say a = 13, b = 63, so your fraction is 13/63. Now let’s play Let’s Make a Deal with 63 doors. It works like this: there’s a car behind one door, and you get to pick 13 doors out of 63. I’ll open 12 of the 13 doors that you picked, and 49 of the 50 doors you didn’t pick. I’ll make sure the car is not behind any of the doors I open.
Of the two remaining doors, say the one you picked is red and the one you didn’t pick is blue. Then the probability that the car is behind the red door is 13/63 (or a/b). We can play the same game for any fraction you give me.
So when you see just two doors to pick between, remember that you might be missing a whole back story that can impact your choice. In fancier language: to make inferences based on observations, you need to know the process by which those observations were arrived at.
In my last post, I talked about this in the context of the so-called Boy-Girl Puzzle. This puzzle has a number of interesting variants, and I want to reinforce the importance of back story by touching on a couple of them.
The set-up: you’re at a school reunion, chatting at a reception with a friend who you find out has two kids. Some of your classmates’ kids are in the room as well, although you haven’t paid any attention to them because you’ve been focusing on the grown-ups. Now suppose that while you’re talking, your friend points toward a cluster of kids (you can’t tell which kid he’s pointing to) and says, “There’s my daughter.” The Boy-Girl puzzle asks: what’s the likelihood that your friend has two kids of the same gender? Meaning, in this case, that their other kid is also a girl.
Imagine two scenarios:
Scenario 1. Since everyone brought kids to the reunion, the organizers have kindly put together some events to keep the kids entertained. Today there’s a basketball camp. But there isn’t enough gym space for all the kids, so this has been scheduled in two phases: a girls’ session from 4 to 5, and then a boys’ session from 5 to 6. Assume all the kids go to their respective session. It is now 5:15, the girls’ session let out a little while ago, and now all the girls are in the reception room while the boys are playing basketball.
Scenario 2. Same as Scenario 1, only the school has a really big gym, so the basketball camp is at the same time for everybody. It’s 5:15 again, and all the kids, girls and boys alike, are in the reception room.
Analyzing Scenario 1, let’s imagine that there are 400 two-kid families in your class, 100 each for each of the four possible gender combinations: (Girl, Girl), (Girl, Boy), (Boy, Girl), and (Boy, Boy). (For now I’m ordering the kids by age, though any unambiguous ordering will do.) When the girls’ camp lets out, 400 girls make their way to the reception room. Now 300 of the 400 parents of a two-kid family have a daughter in the room. You just found out that your friend is one of them. Of those 300 parents, 200 have a girl and a boy, and 100 have two girls. So the probability that your friend’s other kid is a girl is 1-in-3.
In Scenario 2, there are 400 girls and 400 boys from two-kid families in the reception room. To keep things simple, let’s assume that each of your classmates will eventually see both of their kids. Assume also that a parent with both a girl and a boy is just as likely to see the girl first as the boy. This means that if we list the kids in each family in the order that their parents see them, each of the four gender combinations ((Girl, Girl), etc.) is equally likely. For 200 of the 400 2-kid families, the parent will see a daughter first: 100 with (Girl, Girl), and 100 with (Girl, Boy). One of these parents is your friend, so the probability that their other kid is a girl is 1-in-2.
Even if you believe me that back story is important, the different conclusions here might still seem a little weird. Aren’t you finding out the same information in each case? Why does it matter if you’re in a room full of girls vs. a room with girls and boys?
I find it helpful to think about this in terms of the possibilities that are being ruled out. You start out with a 50-50 chance that your friend has two kids of the same gender. When you’re in a room full of girls only, two things could happen. Your friend could eventually tell you they see their daughter, or, if they have two sons, they wouldn’t see any of their kids at all. So when you hear, “I see my daughter,” you’ve ruled out some (half) of the possible two-kids-of-the-same-gender scenarios. So of course the same-gender probability goes down.
Whereas, if both girls and boys are in the room, you will eventually hear either “I see my daughter” or “I see my son.” (The story ends as soon as your friend sees one of their kids.) “I see my daughter” rules out half the same-gender scenarios, but it also rules out half the different-gender scenarios (the ones where your friend sees their son before their daughter). So the same-gender probability remains the same, 50-50.
If having the same-gender probability bounce between 1-in-2 and 1-in-3 isn’t weird enough for you, you can actually try to make it land somewhere in between! (In the spirit of the Let’s Make a Deal example, you can imagine a basketball camp with different numbers of girls and boys.) But I want to take things in a slightly different direction. Let’s work off of Scenario 1, so we have a room full of girls, your friend with two kids tells you they see their daughter (1-in-3 chance at this point that your friend has two girls), but now let’s say your friend keeps talking. Suppose you hear your friend say…
Scenario 1A. “That girl’s my oldest kid.”
Scenario 1B. “That girl was born on a Tuesday.”
Now suppose I tell you that in one of these scenarios, the 1-in-3 probabilities of a same-gender pair changes, and in the other it doesn’t. Care to guess which is which? Be careful, the answer might not be what you think!
To analyze Scenario 1A, let’s go back to writing gender combinations in birth order. When we heard “I see my daughter” in Scenario 1, we ruled out all the (Boy, Boy) pairs. In Scenario 1A, we can rule out all 100 (Boy, Girl) pairs (oldest kid is a boy), and none of the (Girl, Boy) pairs. What about the (Girl, Girl) pairs? Well, if we assume that your friend will see their oldest daughter first half the time, and their youngest daughter first the other half, we can rule out half, or 50, of the (Girl, Girl) pairs. So we have 150 possible pairs: 100 (Girl, Boy) and 50 (Girl, Girl). (Symmetrically, if your friend had said “That girl’s my youngest kid,” you would have gotten the other 150 pairs with a girl: 100 (Boy, Girl), and the other 50 (Girl, Girl).) Of the 150 pairs, 100 are opposite-gender and 50 are same-gender. So the same-gender probability is still 1-in-3.
We already saw a version of this analysis in my last post. It basically comes down to this: the same-gender-pair probability depends on what the alternative to “My older kid is a girl” is. If the alternative is “My older kid is a boy,” the probability is 1-in-2. If, as here, the alternative is “My younger kid is a girl,” the probability is 1-in-3.
Now let’s move on to Scenario 1B. To make the arithmetic we’re about to do easier, let’s throw out 8 of our 400 families (two for each gender combination), so now we have 98 pairs of kids under each of (Girl, Girl), (Girl, Boy), and (Boy, Girl). (There are also 98 under (Boy, Boy), but they don’t count since one of your friend’s kids is a girl.) Now you ask: what difference could it make that the girl you saw was born on a Tuesday? Well, let’s see which of the above pairs you can rule out once you know this.
We assume that kids are as likely to be born on any day as any other. So 1/7 of all kids are born on Monday, 1/7 on Tuesday, etc. Of the 98 (Girl, Boy) pairs, the girl was born on Tuesday 1/7 of the time. That makes 14 (Tuesday Girl, Boy) pairs, and 84 (non-Tuesday Girl, Boy) pairs.
Similarly, of the 98 (Boy, Girl) pairs, there are 14 (Boy, Tuesday Girl) pairs and 84 (Boy, non-Tuesday Girl) pairs.
The (Girl, Girl) pairs are different, though. In 14 of them, the older girl was born on Tuesday, and in 14 of them, the younger girl was born on Tuesday. Does that make 28 pairs with a Tuesday girl? Not quite, because there’s overlap. How much? Well, in 1/7 of the 14 (Tuesday Girl, Girl) pairs, the younger girl was born on Tuesday too. 1/7 of 14 is 2. So there are 2 (Tuesday Girl, Tuesday Girl) pairs, 12 (Tuesday Girl, non-Tuesday Girl) pairs, and 12 more (non-Tuesday Girl, Tuesday Girl) pairs — 26 in all. Taking 26 away from 98, we have 72 (non-Tuesday Girl, non-Tuesday Girl) pairs.
The important thing here is that unlike Scenario 1A, in which we got to rule out half the opposite-gender pairs and half the same-gender pairs, in Scenario 1B we ruled out 6/7 of the opposite-gender pairs but only about 5/7 of the same-gender pairs. If we pare down the opposite-gender pairs more than the same-gender pairs, the same-gender probability should go up. We have 14+14+26 = 54 pairs with a Tuesday Girl, and 26 of them are (Girl-Girl) pairs, so the same-gender probability is now 26/54 = 13/27 (almost a half!).
What drove things here was that in Scenario 1B, the additional condition (born on a Tuesday) is non-exclusive: either or both kids in the pair could be born on a Tuesday. Whereas, in Scenario 1A, only one kid in the pair could be the older kid. The non-exclusivity means that the additional information (kid you see was born on a Tuesday) is more restrictive in the opposite-gender pairs, when you know which kid it applies to (the girl in the pair) than in the same-gender pairs, when you know it applies to the girl you see, but that girl could be either of the girls in the pair.
Still, I don’t blame you if you find Scenario 1B kind of a head-scratcher. Or, if you don’t, and you have an alternate explanation that you like, please write it down in the comments!