# Cathy’s Book is Out!

Cathy O’Neil’s book Weapons of Math Destruction is out, and it’s already been shortlisted for a National Book Award! Here is a review of the book that I posted on Amazon.com:

So here you are on Amazon’s web page, reading about Cathy O’Neil’s new book, Weapons of Math Destruction. Amazon hopes you buy the book (and so do I, it’s great!). But Amazon also hopes it can sell you some other books while you’re here. That’s why, in a prominent place on the page, you see a section entitled:

Customers Who Bought This Item Also Bought

This section is Amazon’s way of using what it knows — which book you’re looking at, and sales data collected across all its customers — to recommend other books that you might be interested in. It’s a very simple, and successful, example of a predictive model: data goes in, some computation happens, a prediction comes out. What makes this a good model? Here are a few things:

1. It uses relevant input data.The goal is to get people to buy books, and the input to the model is what books people buy. You can’t expect to get much more relevant than that.
2. It’s transparent. You know exactly why the site is showing you these particular books, and if the system recommends a book you didn’t expect, you have a pretty good idea why. That means you can make an informed decision about whether or not to trust the recommendation.
3. There’s a clear measure of success and an embedded feedback mechanism. Amazon wants to sell books. The model succeeds if people click on the books they’re shown, and, ultimately, if they buy more books, both of which are easy to measure. If clicks on  or sales of related items go down, Amazon will know, and can investigate and adjust the model accordingly.

Weapons of Math Destruction reviews, in an accessible, non-technical way, what makes models effective — or not. The emphasis, as you might guess from the title, is on models with problems. The book highlights many important ideas; here are just a few:

1. Models are more than just math. Take a look at Amazon’s model above: while there are calculations (simple ones) embedded, it’s people who decide what data to use, how to use it, and how to measure success. Math is not a final arbiter, but a tool to express, in a scalable (i.e., computable) way, the values that people explicitly decide to emphasize. Cathy says that “models are opinions expressed in mathematics” (or computer code). She highlights that when we evaluate teachers based on students’ test scores, or assess someone’s insurability as a driver based on their credit record, we are expressing opinions: that a successful teacher should boost test scores, or that responsible bill-payers are more likely to be responsible drivers.
2. Replacing what you really care about with what you can easily get your hands on can get you in trouble. In Amazon’s recommendation model, we want to predict book sales, and we can use book sales as inputs; that’s a good thing. But what if you can’t directly measure what you’re interested in? In the early 1980’s, the magazine US News wanted to report on college quality. Unable to measure quality directly, the magazine built a model based on proxies, primarily outward markers of success, like selectivity and alumni giving. Predictably, college administrators, eager to boost their ratings, focused on these markers rather than on education quality itself. For example, to boost selectivity, they encouraged more students, even unqualified ones, to apply. This is an example of gaming the model.
3. Historical data is stuck in the past. Typically, predictive models use past history to predict future behavior. This can be problematic when part of the intention of the model is to break with the past. To take a very simple example, imagine that Cathy is about to publish a sequel to Weapons of Math Destruction. If Amazon uses only  purchase data, the Customers Who Bought This Also Bought list would completely miss the connection between the original and the sequel. This means that if we don’t want the future to look just like the past, our models need to use more than just history as inputs. A chapter about predictive models in hiring is largely devoted to this idea. A company may think that its past, subjective hiring system overlooks qualified candidates, but if it replaces the HR department with a model that sifts through resumes based only on the records of past hires, it may just be codifying (pun intended) past practice. A related idea is that, in this case, rather than adding objectivity, the model becomes a shield that hides discrimination. This takes us back to Models are more than just math and also leads to the next point:
4. Transparency matters! If a book you didn’t expect shows up on The Customers Who Bought This Also Bought list, it’s pretty easy for Amazon to check if it really belongs there. The model is pretty easy to understand and audit, which builds confidence and also decreases the likelihood that it gets used to obfuscate. An example of a very different story is the value added model for teachers, which evaluates teachers through their students’ standardized test scores. Among its other drawbacks, this model is especially opaque in practice, both because of its complexity and because many implementations are built by outsiders. Models need to be openly assessed for effectiveness, and when teachers receive bad scores without knowing why, or when a single teacher’s score fluctuates dramatically from year to year without explanation, it’s hard to have any faith in the process.
5. Models don’t just measure reality, but sometimes amplify it, or create their own. Put another way, models of human behavior create feedback loops, often becoming self-fulfilling prophecies. There are many examples of this in the book, especially focusing on how models can amplify economic inequality. To take one example, a company in the center of town might notice that workers with longer commutes tend to turn over more frequently, and adjust its hiring model to focus on job candidates who can afford to live in town. This makes it easier for wealthier candidates to find jobs than poorer ones, and perpetuates a cycle of inequality. There are many other examples: predictive policing, prison sentences based on recidivism, e-scores for credit. Cathy talks about a trade-off between efficiency and fairness, and, as you can again guess from the title, argues for fairness as an explicit value in modeling.

Weapons of Math Destruction is not a math book, and it is not investigative journalism. It is short — you can read it in an afternoon — and it doesn’t have time or space for either detailed data analysis (there are no formulas or graphs) or complete histories of the models she considers. Instead, Cathy sketches out the models quickly, perhaps with an individual anecdote or two thrown in, so she can get to the main point — getting people, especially non-technical people, used to questioning models. As more and more aspects of our lives fall under the purview of automated data analysis, that’s a hugely important undertaking.

# How I Learned to Stop Worrying and Love Pythagoras

Maybe the best thing about the Pythagorean theorem is how it puts math and non-math people on a pretty equal footing. We all know what it says (right triangle, squares of sides, hypotenuse), we all agree it’s Important Math with a capital M… and most of us don’t have much idea, if any, why it’s true. Seriously. Ask another math person if you don’t believe me. If you’re lucky, they might point you to a picture that looks more or less like this:

This is sometimes called a proof without words, but here are a few words to guide you, just in case. We’ve got the usual notation: a and b are sides of a right triangle, c is the hypotenuse. We build a super-square with side length a+b and break it up in two ways. On the right, we divide it into one (white) sub-square with side length a (area a2), another with side b (area b2), and four (colored) copies of the triangle. On the left, we rearrange the four triangles so that their complement is a new white sub-square with side c (area c2). White area on the left = white area on the right, so c2 = a2 + b2.

Lovely as this is, it feels like a nifty conjuring trick. The Pythagorean equation is the most direct thing in the world, a2 + b2 = c2, and the best we can do is to rearrange triangles inside a big square? Surely there must be a way to cut up c2 into a2 and b2 directly.

Well, there is! I first ran across what I’m about to show you a few weeks ago, loved it, and was surprised that (1) I hadn’t seen it before and (2) it doesn’t seem to be widely known, though the idea actually goes back to Euclid. Perhaps you’ll feel the same way once you see it. Here goes:

The set-up. What’s so special about right triangles? Well, one thing is that they have an amazing self similarity property.  Draw a line segment out from the vertex with the right angle toward the hypotenuse and perpendicular to it. It divides our original right triangle up into two smaller ones:

Let’s stick to the same notation we had before. Let c be the hypotenuse of our original right triangle, running along the bottom. Let a and b be the sides, a the one on the left, b the one on the right. Then a is the hypotenuse of the green right triangle on the left, and b is the hypotenuse of the blue right triangle on the right. Obviously (foreshadowing), the areas of the two smaller triangles add up to the area of the original.

The self similarity property is that the two smaller right triangles are both similar to the original one! In other words, all three triangles have the same three interior angles, which means that you can rotate and scale each one into any of the others. Put simply, all three triangles have the same shape. Can you see why? Let’s compare those interior angles: the big triangle has a right angle of 90 degrees, and two others, which we will call $\theta$ (say the one on the left, in the green triangle) and $\phi$ (on the right, in the blue triangle). The key point is that the angles of any triangle have to add up to 180 degrees, so two angles of a triangle always determine the third. The green triangle has an angle of $\theta$ on the left that it inherits from the original big triangle, and a 90 degree angle in the middle, so its third angle, at the top, must be $\phi$. (Essentially: the green triangle and the big triangle have two angles in common, so they must have all three in common.) Similarly, the blue triangle has an angle of $\phi$ on the right that it inherits from the original triangle, and a 90 degree angle in the middle (again, two angles in common), so its third angle must be $\theta$. Same angles, same shape.

The pay-off. Stare for a minute at those three similar triangles, with the areas of the two smaller ones adding up to the area of the bigger one. Wouldn’t it be great if the triangle with hypotenuse a had area a2, the one with hypotenuse b had area b2, and the one with hypotenuse c had area c2? It’s not true, of course. But it’s almost true! If you read my last post, you know that in fact the area of a right triangle with hypotenuse c and interior angle $\theta$ is

$\frac14 \cdot \sin(2 \theta) \cdot c^2$.

The green and blue triangle have exactly the same interior angles as the big one. So their areas are given by the same exact formula, with c replaced by a and b, respectively. The areas have to add up, so we have:

$\frac14 \cdot \sin(2 \theta) \cdot a^2 + \frac14 \cdot \sin(2 \theta) \cdot b^2 = \frac14 \cdot \sin(2 \theta) \cdot c^2$.

Now just divide out the common factor of $\frac14 \cdot \sin(2 \theta)$, and you’re left with

$a^2 + b^2 = c^2$.

What just happened? The way Pythagoras’s equation fell out of equating areas may seem like a bit of a magic trick too, but it’s actually based on the very fundamental idea of scale invariance. To recap: we (1) wrote a completely explicit formula for the area of a right triangle, (2) equated formulas corresponding to equal areas, and (3) found that the bulk of the formulas, everything except the part corresponding to the Pythagorean theorem, went away. The key to understanding all that is this picture:

It shows the area of a right triangle embedded in the area of the corresponding square, with the hypotenuse matching one side of the square. The precise formula for the ratio of the areas, $\frac14 \cdot \sin(2 \theta)$, doesn’t matter so much — what matters is that when we blow this picture up or down, the ratio of the areas doesn’t change. That’s scale invariance. If the two smaller triangles add up to the big triangle, then the squares corresponding to the smaller triangles have to add up to the square corresponding to the big triangle. And that’s exactly the Pythagorean theorem.

I like to think of a2b2, and c2 as units of area corresponding to each triangle. In a nutshell, the Pythagorean theorem decomposes a right triangle into two smaller, similar ones, and says that if the triangles add up, the units of area have to add up too. It’s deep, it’s direct, and I’ll never forget it. How about you?

# For Pi Day 3/14/16: Triangle to Circles, and Back Again

Around this time last year, we heard a lot about 3/14/15 being a once-in-a-century Pi Day, because the date, including the year, matches the first few digits of pi (3.1415…). But the next digit after 3.1415 in the decimal expansion is 9, giving us 3.14159… So pi is a lot closer to 3.1416 than it is to 3.1415, which is why I want to wish you a Happy once-in-a-century Pi Day today, 3/14/16!

Pi fascinates us because it appears in so many unexpected places. Since pi is ultimately about circles (it’s defined, more or less, as the circumference of a circle of diameter 1), what this really means is that circles appear in many unexpected places. I want to show you one.

You need something with a corner (I used a Rubik’s cube), a cable or string of some kind, and two fixed, firm endpoints (I used the ends of a towel rod). Stretch the cable between the two endpoints across the corner. Now move the corner from one side to the other between the two endpoints, keeping the cable stretched across all three, like this:

Question: what’s the shape that the corner traces out? A triangle, a circle, a parabola, an ellipse, something else?

I told you there was a circle coming, so, yes, it turns out to be a circle. (Well, a semicircle, tracing out 180 degrees, or pi!) I find this a little surprising, because this set-up is quite different from a compass, which is based on spinning a fixed length around a center. Here, the length of cable stretched across the corner extends and then contracts as you move the corner from one side to the other. One good reason to try this out physically is that you can really feel the extension and contraction!

Here is an animation that might start to convince you:

The mathematical fact afoot here is Thales’s Theorem, which says that if you put two points on two opposite ends of a circle (so the line between them is a diameter), then the line segments connecting those points with any third point on the circle will meet at a right angle. To give you an idea of why this is true, and play with pi some more, here’s a picture:

The idea here is that if O is the center of the circle, then the line segments OA, OB, and OC all have the same length, because they’re all radii of the circle. This makes the two triangles AOB and BOC isosceles: each triangle has two equal sides, hence two equal angles, as illustrated in the picture. With this notation, the sum of the angles of triangle ABC is $2 \alpha + 2 \beta$. But we know that the sum of angles of a triangle is 180 degrees (pi again!). Dividing by 2, we find that $\alpha + \beta$ must be 90 degrees, which is Thales’s theorem.

One last thing: this picture also leads to a formula for the area of the right triangle in terms of the length of the hypotenuse. Let d be this length, i.e., the diameter extending from A to C. Look at triangle BOC again. The sum of its angles must also be 180 degrees, and since we know from above that $2 \alpha + 2 \beta = 180$, we find that the missing angle, between segments OB and OC, must be $2 \alpha$. This means the height of our triangle is $\sin (2 \alpha)$ times the radius of the circle, or $\sin (2 \alpha) \cdot d/2$. Since the area of a triangle is half the base times the height, we find

${\rm{Area}} = \frac14 \cdot \sin(2 \alpha) \cdot d^2.$

This last formula is actually connected in a very interesting way to the Pythagorean theorem. But that’s another blog post.

# Probability For Us Dummies 2: More Boy-Girl Variants

I want to disturb you with probability and its (apparent) paradoxes some more.

Let’s warm up with something that looks really straightforward. (Uh-oh!) You are shown a red door and a blue door, and told there is a prize behind one of them. What’s the likelihood that the prize is behind red as opposed to blue?

You probably said 1-in-2 for each door, and you may well be right. Or maybe you were playing Let’s Make a Deal, you originally picked the red door from among three (say red, green, and blue), the host showed you the prize wasn’t behind the green door, so now you think it’s 1-in-3 for red and 2-in-3 for blue.

But, depending on the back story, the probabilities could be literally anything. I mean it. Pick any fraction a/b, where a and b are integers and a < b. For concreteness, say a = 13, b = 63, so your fraction is 13/63. Now let’s play Let’s Make a Deal with 63 doors. It works like this: there’s a car behind one door, and you get to pick 13 doors out of 63. I’ll open 12 of the 13 doors that you picked, and 49 of the 50 doors you didn’t pick. I’ll make sure the car is not behind any of the doors I open.

Of the two remaining doors, say the one you picked is red and the one you didn’t pick is blue. Then the probability that the car is behind the red door is 13/63 (or a/b). We can play the same game for any fraction you give me.

So when you see just two doors to pick between, remember that you might be missing a whole back story that can impact your choice. In fancier language: to make inferences based on observations, you need to know the process by which those observations were arrived at.

In my last post, I talked about this in the context of the so-called Boy-Girl Puzzle. This puzzle has a number of interesting variants, and I want to reinforce the importance of back story by touching on a couple of them.

The set-up: you’re at a school reunion, chatting at a reception with a friend who you find out has two kids. Some of your classmates’ kids are in the room as well, although you haven’t paid any attention to them because you’ve been focusing on the grown-ups. Now suppose that while you’re talking, your friend points toward a cluster of kids (you can’t tell which kid he’s pointing to) and says, “There’s my daughter.” The Boy-Girl puzzle asks: what’s the likelihood that your friend has two kids of the same gender? Meaning, in this case, that their other kid is also a girl.

Imagine two scenarios:

Scenario 1. Since everyone brought kids to the reunion, the organizers have kindly put together some events to keep the kids entertained. Today there’s a basketball camp. But there isn’t enough gym space for all the kids, so this has been scheduled in two phases: a girls’ session from 4 to 5, and then a boys’ session from 5 to 6. Assume all the kids go to their respective session. It is now 5:15, the girls’ session let out a little while ago, and now all the girls are in the reception room while the boys are playing basketball.

Scenario 2. Same as Scenario 1, only the school has a really big gym, so the basketball camp is at the same time for everybody. It’s 5:15 again, and all the kids, girls and boys alike, are in the reception room.

Analyzing Scenario 1, let’s imagine that there are 400 two-kid families in your class, 100 each for each of the four possible gender combinations: (Girl, Girl), (Girl, Boy), (Boy, Girl), and (Boy, Boy). (For now I’m ordering the kids by age, though any unambiguous ordering will do.) When the girls’ camp lets out, 400 girls make their way to the reception room. Now 300 of the 400 parents of a two-kid family have a daughter in the room. You just found out that your friend is one of them. Of those 300 parents, 200 have a girl and a boy, and 100 have two girls. So the probability that your friend’s other kid is a girl is 1-in-3.

In Scenario 2, there are 400 girls and 400 boys from two-kid families in the reception room. To keep things simple, let’s assume that each of your classmates will eventually see both of their kids. Assume also that a parent with both a girl and a boy is just as likely to see the girl first as the boy. This means that if we list the kids in each family in the order that their parents see them, each of the four gender combinations ((Girl, Girl), etc.) is equally likely. For 200 of the 400 2-kid families, the parent will see a daughter first: 100 with (Girl, Girl), and 100 with (Girl, Boy). One of these parents is your friend, so the probability that their other kid is a girl is 1-in-2.

Even if you believe me that back story is important, the different conclusions here might still seem a little weird. Aren’t you finding out the same information in each case? Why does it matter if you’re in a room full of girls vs. a room with girls and boys?

I find it helpful to think about this in terms of the possibilities that are being ruled out. You start out with a 50-50 chance that your friend has two kids of the same gender. When you’re in a room full of girls only, two things could happen. Your friend could eventually tell you they see their daughter, or, if they have two sons, they wouldn’t see any of their kids at all. So when you hear, “I see my daughter,” you’ve ruled out some (half) of the possible two-kids-of-the-same-gender scenarios. So of course the same-gender probability goes down.

Whereas, if both girls and boys are in the room, you will eventually hear either “I see my daughter” or “I see my son.” (The story ends as soon as your friend sees one of their kids.) “I see my daughter” rules out half the same-gender scenarios, but it also rules out half the different-gender scenarios (the ones where your friend sees their son before their daughter). So the same-gender probability remains the same, 50-50.

If having the same-gender probability bounce between 1-in-2 and 1-in-3 isn’t weird enough for you, you can actually try to make it land somewhere in between!  (In the spirit of the Let’s Make a Deal example, you can imagine a basketball camp with different numbers of girls and boys.) But I want to take things in a slightly different direction. Let’s work off of Scenario 1, so we have a room full of girls, your friend with two kids tells you they see their daughter (1-in-3 chance at this point that your friend has two girls), but now let’s say your friend keeps talking. Suppose you hear your friend say…

Scenario 1A. “That girl’s my oldest kid.”

Scenario 1B. “That girl was born on a Tuesday.”

Now suppose I tell you that in one of these scenarios, the 1-in-3 probabilities of a same-gender pair changes, and in the other it doesn’t. Care to guess which is which? Be careful, the answer might not be what you think!

To analyze Scenario 1A, let’s go back to writing gender combinations in birth order. When we heard “I see my daughter” in Scenario 1, we ruled out all the (Boy, Boy) pairs. In Scenario 1A, we can rule out all 100 (Boy, Girl) pairs (oldest kid is a boy), and none of the (Girl, Boy) pairs. What about the (Girl, Girl) pairs? Well, if we assume that your friend will see their oldest daughter first half the time, and their youngest daughter first the other half, we can rule out half, or 50, of the (Girl, Girl) pairs. So we have 150 possible pairs: 100 (Girl, Boy) and 50 (Girl, Girl). (Symmetrically, if your friend had said “That girl’s my youngest kid,” you would have gotten the other 150 pairs with a girl: 100 (Boy, Girl), and the other 50 (Girl, Girl).) Of the 150 pairs, 100 are opposite-gender and 50 are same-gender. So the same-gender probability is still 1-in-3.

We already saw a version of this analysis in my last post. It basically comes down to this: the same-gender-pair probability depends on what the alternative to “My older kid is a girl” is. If the alternative is “My older kid is a boy,” the probability is 1-in-2. If, as here, the alternative is “My younger kid is a girl,” the probability is 1-in-3.

Now let’s move on to Scenario 1B. To make the arithmetic we’re about to do easier, let’s throw out 8 of our 400 families (two for each gender combination), so now we have 98 pairs of kids under each of (Girl, Girl), (Girl, Boy), and (Boy, Girl). (There are also 98 under (Boy, Boy), but they don’t count since one of your friend’s kids is a girl.) Now you ask: what difference could it make that the girl you saw was born on a Tuesday? Well, let’s see which of the above pairs you can rule out once you know this.

We assume that kids are as likely to be born on any day as any other. So 1/7 of all kids are born on Monday, 1/7 on Tuesday, etc. Of the 98 (Girl, Boy) pairs, the girl was born on Tuesday 1/7 of the time. That makes 14 (Tuesday Girl, Boy) pairs, and 84 (non-Tuesday Girl, Boy) pairs.

Similarly, of the 98 (Boy, Girl) pairs, there are 14 (Boy, Tuesday Girl) pairs and 84 (Boy, non-Tuesday Girl) pairs.

The (Girl, Girl) pairs are different, though. In 14 of them, the older girl was born on Tuesday, and in 14 of them, the younger girl was born on Tuesday. Does that make 28 pairs with a Tuesday girl? Not quite, because there’s overlap. How much? Well, in 1/7 of the 14 (Tuesday Girl, Girl) pairs, the younger girl was born on Tuesday too. 1/7 of 14 is 2. So there are 2 (Tuesday Girl, Tuesday Girl) pairs, 12 (Tuesday Girl, non-Tuesday Girl) pairs, and 12 more (non-Tuesday Girl, Tuesday Girl) pairs — 26 in all. Taking 26 away from 98, we have 72 (non-Tuesday Girl, non-Tuesday Girl) pairs.

The important thing here is that unlike Scenario 1A, in which we got to rule out half the opposite-gender pairs and half the same-gender pairs, in Scenario 1B we ruled out 6/7 of the opposite-gender pairs but only about 5/7 of the same-gender pairs. If we pare down the opposite-gender pairs more than the same-gender pairs, the same-gender probability should go up. We have 14+14+26 = 54 pairs with a Tuesday Girl, and 26 of them are (Girl-Girl) pairs, so the same-gender probability is now 26/54 = 13/27 (almost a half!).

What drove things here was that in Scenario 1B, the additional condition (born on a Tuesday) is non-exclusive: either or both kids in the pair could be born on a Tuesday. Whereas, in Scenario 1A, only one kid in the pair could be the older kid. The non-exclusivity means that the additional information (kid you see was born on a Tuesday) is more restrictive in the opposite-gender pairs, when you know which kid it applies to (the girl in the pair) than in the same-gender pairs, when you know it applies to the girl you see, but that girl could be either of the girls in the pair.

Still, I don’t blame you if you find Scenario 1B kind of a head-scratcher. Or, if you don’t, and you have an alternate explanation that you like, please write it down in the comments!

# Probability For Dummies (And We’re All Dummies)

Sometimes it feels like probability was made up just to trip you up. My undergraduate advisor Persi Diaconis, who started out as a magician and often works on card shuffling and other problems related to randomness, used to say that our brains weren’t wired right for doing probability. Now that I (supposedly!) know a little more about probability than I did as a student, Persi’s statement rings even truer.

I spent a little time this weekend thinking lately about why probability confuses us so easily. I don’t have all the answers, but I did end up making up a story that I found pretty illuminating. At least, I learned a few things from thinking it through. It’s based on what looks like a very simple example, first popularized by Martin Gardner, but it can still blow your mind a little bit. I actually meant to have a fancier example, but my basic one ended up being more than enough for what I wanted to get across. (Some of these ideas, and the Gardner connection, are explored in a complementary way in this paper by Tanya Khovanova.) Here goes.

Prologue. Say you go to a school reunion, and you find yourself at a dimly-lit late evening reception, talking to your old friend Robin. You haven’t seen each other for years, you’re catching up on family, and you hear that Robin has two children. Maybe the reunion has you thinking back to the math classes you took, or maybe you’ve just been drinking too much, but for some reason, you start wondering whether Robin’s children have the same gender (two boys or two girls) or different genders (one of each). Side note: if you’ve managed to stay sober, this may be the point at which you realize that you’ve not only wandered into a reunion you’re barely interested in, you’ve wandered into a math problem you’re barely… um, well, anyway, let’s keep going.

The gender question is pretty easy to answer, at least in terms of what’s more and less likely. Assuming that any one child is as likely to be a girl as a boy (not quite, but let’s ignore that), and assuming that having one kid be a girl or boy doesn’t change the likelihood of having your other kid be a girl or boy (again, probably not exactly true, but whatever), we find there are four equally likely scenarios (I’m listing the oldest kid first):

(Girl, girl)      (Girl, boy)     (Boy, girl)     (Boy, Boy)

Each of these scenarios has probability 25%. There are two scenarios with two kids of the same sex (total probability 50%), and two scenarios with two kids of opposite sexes (total probability also 50%). Easy peasy.

But things won’t stay simple for long, because you’ve not only wandered into a school reunion and a math problem, you’ve also wandered into a…

Really. So you’re at the reunion, still talking to Robin, only you might be sober, or you might be drunk. Which is it?

Sober Version: You and Robin continue your nice lucid conversation, and Robin says: “My older kid is a girl.” Does the additional information change the gender probabilities (two of the same vs. opposites) at all?

This one looks easy too, especially given that you’re sober. Now that we know the older kid is a girl, things come down to the gender of the younger kid. We know that having a girl and having a boy are equally likely, so two of the same vs. opposite genders should still be 50-50. In terms of the scenarios above, we’ve ruled out the last two scenarios and have a 50-50 choice between the first two.

But now let’s turn the page to the…

Drunk Version: You and Robin have both had more than a little wine, haven’t you? Maybe Robin’s starting to mumble a bit, or maybe you’re not catching every word Robin says any more, but in any case, in this version what you heard Robin say was, “My umuhmuuh kid is a girl.” So Robin might have said older or younger, but in the drunk version, you don’t know which. What are the probabilities now? Are they different from the sober version?

Argument for No: Robin might have said, “My older kid is a girl,”in which case you rule out the last two scenarios as above and conclude the probabilities are still 50-50. Or Robin might have said, “My younger kid is a girl,” in which case you would rule out the second and fourth scenarios but the probabilities would again be 50-50. So it’s 50-50 no matter what Robin said. It doesn’t make a difference that you didn’t actually hear what it was.

Argument for Yes: Look at the four possible scenarios above. All we know now is that one of the kids is a girl, i.e., we’ve only ruled out (Boy, Boy). The other three are still possible, and still equally likely. But now we have two scenarios where the kids have opposite genders, and only one where they have the same gender. So now it’s not 50-50 anymore; it’s 2/3-1/3 in favor of opposite genders.

Both arguments seem pretty compelling, don’t they? Maybe you’re a little confused? Head spinning a little bit? Well, I did tell you this was the drunk version!

To try to sort things out, let’s step back a little bit. Drink a little ice water and take a look around the room. Let’s say you see 400 people at the reunion that have exactly two kids. I won’t count spouses, and I’ll assume that none of your classmates got together to have kids. That keeps things simple: 400 classmates with a pair of kids means 400 pairs of kids. On average, there’ll be 100 classmates for each of the four kid gender combinations. One of these classmates is your friend Robin.

Now imagine that each of your classmates is drunkenly telling a friend about which of their kids are girls. What will they say?

• The 100 in the (Boy, Boy) square would certainly never say, “My umuhmuuh kid is a girl.” We can forget about them.
• The 100 in the (Girl, Boy) square would always say, “My older kid is a girl.”
• The 100 in the (Boy, Girl) square would always say, “My younger kid is a girl.”
• The 100 in the (Girl, Girl) square could say either. There’s no reason to prefer one or the other, especially since everyone is drunk. So on average, 50 of them will say “My older kid is a girl,” and the other 50 will say, “My younger kid is a girl.”

All together, there should be 150 classmates who say their older kid is a girl, 150 who say their younger kid is a girl, and 100 who don’t say anything because they have no girl kids.

In the drunk version, where we don’t know what Robin said, Robin could be any of the 150 classmates who would say “My older kid is a girl.” In that case, 100 times out of 150, Robin’s two kids have opposite genders. Or Robin could be any of the 150 classmates who would say, “My younger kid is a girl,” and in that case again, 100 times out of 150, Robin’s two kids have opposite genders.

This analysis is consistent with the Argument for Yes, and leads to the same conclusion: there is a 2-in-3 chance (200 times out of 300) that Robin’s kids have opposite genders. But, it seems to agree with the spirit of the Argument for No as well! It looks like knowing Robin was talking about the older kid actually didn’t add any new information: that 2-in-3 chance would already hold if Robin had soberly said “My older kid is a girl” OR if Robin had just as soberly said “My younger kid is a girl.”

But now something seems really off. Because now it’s starting to look like our analysis of the sober version, apparently the simplest thing in the world, was actually incorrect. In other words, now it seems like we’re saying that finding out Robin’s older kid was a girl actually didn’t leave the gender probabilities at 50-50 like we thought. Which is just… totally… nuts. (And not at all sober.) Isn’t it?

Not necessarily.

Here’s the rub. In the sober version, the conversation could actually have gone a couple different ways:

Sober Version 1.

ROBIN: I’ve got two. My older kid is a junior in high school, plays guitar, does math team, runs track, and swims.

YOU: That’s great. Girls’ or boys’ track? The girls’ track team at my kids’ school is really competitive.

ROBIN: Girls’ track. My older kid is a girl.

Sober Version 2.

YOU: I teach math and science, and I’m really interested in helping girls succeed.

ROBIN: That’s great! Actually, if you’re interested in girls doing math, you might be interested in something that happened to one of my kids. My older kid is a girl, and…

Comparing Versions. In both versions, it looks like you ended up with the same information (Robin’s older kid is a girl). But the conclusions you get to draw are totally different!

Let’s view things in terms of your 400 classmates in the room. In Sober Version 1, the focus is on your classmate’s older kid. The key point is that, in this version of the conversation, in the 100 scenarios in which both of your classmate’s kids are girls, you would hear “my older kid is a girl” in all of them. Of course in the 100 (Girl, Boy) scenarios, you would hear “my older kid is a girl” as well. That makes for 200 “my older kid is a girl” scenarios, 100 of which are same-gender scenarios. The likelihood that both kids are girls is 50-50.

Whereas in Sober Version 2, the focus is on girls. In the 100 scenarios in which both of your classmate’s kids are girls, you should expect to hear a story about the older daughter about half the time, and the younger daughter the other half. (Perhaps not exactly, because the older kid has had more time to have experiences that become the subject of stories, but I’m ignoring this.) Combining this with the 100 (Girl, Boy) scenarios, we get 150 total “my older kid is a girl” scenarios. Only 50 of them are same-gender scenarios, and the likelihood that both kids are girls is only 1-in-3.

Why Probability Makes Us All Dummies. Probability is about comparing what happened with what might have happened. Math people have a fancy name for what might have happened: they call it the state space. What we see in this example is that when you talk about everyday situations in everyday language, it can be very tricky to pin down the state space. It’s hard to keep ambiguities out.

Even the Sober Version, which sounds very simple at first, turns out to have an ambiguity that we didn’t consider. And when we passed from the Sober Version to the Drunk Version, we got confused because we implicitly took the Sober Version to be Version 1, with a 200-person state space, while we took the Drunk Version to be like Version 2, with a 150-person state space. In other words, in interpreting “My older kid is a girl” vs. “One of my kids is a girl,” we fell into different assumptions about the background. I think this is what it means that our brains aren’t wired right to do probability: it’s incredibly easy for them to miss what the background assumptions are. And when we change the state space without realizing it by changing those background assumptions, we get paradoxes.

Note: while I framed what I’ve been calling the Drunk Version (one of my kids is a girl) in a way that makes Version 2 the natural interpretation, it can also be reframed to sound more like Version 1. In that case, the Argument for No in the Drunk Version is fully correct, and the probabilities are 50-50. From a quick online survey, I’ve found this in a few places, including Wikipedia and the paper I linked at the start. I haven’t seen anyone else note that what I’ve been calling the Sober Version (my oldest kid is a girl) can be also framed in multiple ways. Just more proof that it’s really easy to miss background assumptions!

Another point of view on this is in terms of information. The Sober vs. Drunk versions confused us because it looked like we had equivalent information – one of the kids is a girl – but ended up with different outcomes. But in fact we didn’t have equivalent information; in fact in the Sober version, there was an essential ambiguity in what information we had! The point here is that just knowing the answer to a question (my oldest kid is a girl) usually isn’t the full story when it comes to probability problems. We need to know the question (Is your oldest kid a girl vs. Is one of your kids a girl) as well. The relevant information is a combination of a question and a statement that answers it, not a statement (or set of statements) floating on its own.

# How to Count

The other day I saw a math question disguised as a baseball trivia question. Here it is:

How many states don’t have major league baseball teams?

Let’s see: there’s Alaska, Arkansas,… Sure, it might be hard to list them all, but why am I calling this a math question?

Well, it doesn’t ask us to list all the states without baseball teams, it asks us to count them. Of course you can count things directly, by listing every one, but that’s not always as easy as it might seem. Maybe you can list all 50 states off the top of your head, and keep track as you go along of which ones don’t have teams, but I’m pretty sure I’ll overlook a few states. (I thought I could organize the states alphabetically, but I ended up giving up once I thought I got through the A’s, and I forgot Alabama!)

So how do you count things indirectly, without listing them? For a start, let’s reframe the question:

How many states DO have major league baseball teams?

If we can answer one question, we can answer the other: if, say, 20 states out of 50 have teams, then 30 don’t. But doesn’t the question feel a little easier when you ask it this second way? Pause here with me for just a moment: why is that?

One reason is that relatively few states have teams, and the ones that do are likely to be the better known ones, so if you were going to try to count by listing, listing the states that have teams is probably easier than listing the ones that don’t. But the real reason the alternate formulation helps is that you don’t have to count by listing — at least not by listing states. You could count by listing teams.

The Red Sox play in Massachusetts — that’s one state. The Yankees play in New York — that’s a second. The Giants play in California — a third. The A’s play in California too, but we already counted that. And so on.

We can make this process a little more organized if we use the structure of the baseball leagues. There are 30 major league baseball teams and they are currently divided evenly into two leagues: 15 in the American League, 15 in the National. Each league has 3 divisions — East, Central, and West — and each division has 5 teams. In other words: 30 teams broken up into 6 divisions of 5.

Doesn’t it feel a lot easier to go through 6 divisions of 5 than to go through 50 states? Let’s do it. I write this off the top of my head, in real time:

AL East: Boston Red Sox (MA, 1), New York Yankees (NY, 2), Baltimore Orioles (MD, 3), Toronto Blue Jays (Canada, not a state), Tampa Bay Rays (FL,4)

AL Central: Kansas City Royals (MO, 5), Detroit Tigers (MI, 6), Cleveland Indians (OH, 7), Minnesota Twins (MN, 8), Chicago White Sox (IL, 9)

AL West: Oakland A’s (CA, 10), Houston Astros (TX, 11), Texas Rangers (TX, repeat state), California Angels (CA, repeat state), Seattle Mariners (WA, 12)

NL East: Washington Nationals (DC, not a state), New York Mets (NY, repeat state), Philadelphia Phillies (PA, 13), Miami Marlins (FL, repeat state), Atlanta Braves (GA, 14)

NL Central: St. Louis Cardinals (MO, repeat state), Pittsburgh Pirates (PA, repeat state), Milwaukee Brewers (WI, 15), Cincinnati Reds (OH, repeat state), Chicago Cubs (IL, repeat state)

NL West: Arizona Diamondbacks (AZ, 16), Colorado Rockies (CO, 17), San Diego Padres (CA, repeat state), Los Angeles Dodgers (CA, repeat state), San Francisco Giants (CA, repeat state)

And there you have it: 17 distinct states with teams, so 33 states without. And while this problem isn’t winning anybody the Fields Medal, it does illustrate two very important principles of counting, and math in general:

1. Find and use correspondences. When we asked which states have teams, we set up an implicit correspondence between states and teams. A way to make that correspondence more explicit is to reframe the question yet again, this time in terms of team-state pairs:

How many pairs (S, T) are there, where S is a state, T is a team that plays in that state, and no state is repeated more than once?

This might sound needlessly complicated, but math people actually like to talk this way! (Remember the definition of relations and functions the first time you saw it? Your eyes probably glazed over; mine sure did.) We use this language because it brings to the surface the duality inherent in the set-up: states and teams are paired. When you have pairs, you get to choose how to enumerate them: over the first entry, or over the second. And in this case, the second is the way to go, because…

2. More structure is better. The set of states seems sort of amorphous. You can try to break it up into regions (New England, Mid-Atlantic, Midwest,…), but it’s not totally clear how to do it. Whereas the set of baseball teams has a very clear structure: six by five. I lied in one place when I told you I was listing baseball teams in real time. When I got to the NL Central, I put down three of the five teams, and then spaced on what the other two were. But I knew there had to be five, and I knew about where they should be geographically. I remembered the other two within a minute.

Counting has a rich and noble history. Also a fancier name: combinatorics. And while the subject, perhaps like much of math, might seem like a bag of tricks when you first encounter it, it has some clear guiding principles. Look for structures, and try to transform your problem so you can make use of those structures. These principles are at work all over, so keep an eye out for them!

# Ee-ther/Ai-ther: Calling the Whole Thing Off at the Science Museum

The papers (the Boston Globe, Time, and others) were abuzz yesterday about a supposed error in a math exhibit at the Boston Museum of Science. Most of the interest in the story came from the fact that the issue — described as a minus sign instead of a plus sign in a formula for the golden ratio — was pointed out by a 15 year-old. Frustratingly, none of the articles I saw included any actual math, though if you’re familiar enough with the golden ratio, you might guess even from the very brief description above that that the fuss was probably about a difference of convention rather than any kind of serious mistake.

And, right on schedule, today the Globe reports that the exhibit is correct after all. So what’s going on?

Let’s start with what we mean by “golden ratio.” I’ve posted about it before, in the context of ratios of successive Fibonacci numbers, which have the golden ratio as their limit. Let’s start with a picture:In this picture, the small rectangle (with one side length having length a and the other length b) and the big rectangle (with one side having length a+b and the other having length a) are supposed to be similar, meaning that the ratios of their sides are the same. In other words, if you write the length of the longer side on top, a/b = (a+b)/a. You could also put the length of the shorter side on top and get an equivalent equation: b/a = a/(a+b). Either way, dividing through by top and bottom, we get:

a² = b(a+b),

or

a² − ab = 0.

This equation has lots of pairs of solutions (a,b). You could find them using the quadratic formula, in one of two ways. If you treat a as the variable, you can solve for it in terms of b:

a = (b ± √b² + 4b²  ) / 2 = b·(1 ± √5 ) / 2.

But the equation is pretty symmetrical, and you can also solve for b in terms of a:

b = (−a ± √a² + 4a²  ) / 2 = a·(1 ± √5 ) / 2.

We need to pare down our solutions just a bit. Knowing that a and b are both lengths of rectangle sides, we should make sure they are both positive. 1 − √5 and 1 − √5 are not positive, so we throw them out, leaving us with

a =  b·(1 + √5 ) / 2   and   b = a·(1 + √5 ) / 2.

Once we know this, it’s easy to talk about ratios of sides. The ratio of the longer side to the shorter side is a/b. Taking the equation a =  b·(1 + √5 ) / 2 and dividing both sides by b, we see that a/b = (1 + √5 ) / 2 = 1.61803… And the ratio of the shorter side to the longer side is b/a, which by similar logic is just (1 + √5 ) / 2 = a/b 1 = 0.61803… (We can also deduce b/a = a/b 1 directly from the initial equation a/b = (a+b)/a, because  (a+b)/a  is just 1 + b/a.)

Pictorially, if the square in our initial picture is 1 × 1 (a = 1), then b = (1 + √5 ) / 2 = 0.61803… (the short side of the small rectangle), and a + b = (1 + √5 ) / 2 = 1.61803… (the long side of the big rectangle).

So what is the golden ratio? Well, which ratio do you want — long side to short side or short to long? Do you say tom-ay-to or tom-ah-to? Which of the two we call golden is unimportant; what matters is that the picture, and all the math around the ratio, are the same either way. Which should we take as the golden ratio? Ee-ther! Or maybe ai-ther!

We think of math as being about deduction and absolute right answers, but it is also full of decisions and conventions. Sometimes the decisions make a difference: we decide to make .9999… equal to 1 (by deciding on certain rules for doing math with infinite sums), and this has consequences across the subject (decimal representations are no longer unique). But sometimes the decisions are only conventions, just a way of fixing language or notation and no more, and don’t matter very much.

We do, however, need to keep track of what conventions we’re using. The 15-year old in the news stories probably learned that the golden ratio is (1 + √5 ) / 2, which is the more common formulation. Then, at the Science Museum, he saw this (photo from the latest Globe article):

It looked wrong; he was sure it should say (5 + 1) / 2, not (5 − 1) / 2. But read the fine print: the short side divided by the long side. That ratio is indeed (5 − 1) / 2, as the display claims. The Science Museum just happened to frame their display in terms of the opposite ratio from the one the student learned. There’s nothing wrong with that, but we need to be aware that which version of the ratio we use is mathematical convention for us to choose, not mathematical fact set in stone.