So I was reading my Facebook feed on the bus ride home, with the typical smattering of kid pictures, and I started to think about how there are many people in my life that I don’t get to see that much, but Facebook lets me watch their kids growing into people, with sparks in their eyes and faces that remind me of their parents, and that really feels quite moving. And then I came home and went upstairs to drop off my work things, and my own growing son ran to me enthusiastically, and then he dropped his pants and mooned me.


The Art of Teaching

I had mixed feelings about this Atlantic article, which focuses on the importance of promoting students’ joy in learning. On one hand — duh. On the other, in this age of standards, skills, grit, self-control, and college and career readiness, the point is worth making. But in making it, it’s important not to set the joy of learning in opposition to the hard work. The two are really partners.

When I taught, I always thought my first job was to create a spark, to give my students a sense of the importance, the fun, and, yes, the joy of math. Without that, learning is a slog for everyone, students and teachers alike, and you rarely get anyone’s best work. But once you have joy and excitement, you still have to harness them. You have to get students to challenge themselves by thinking in new ways and trying hard problems — which can involve frustration and at least the temporary suspension of joy. The art of teaching is balancing these. You balance them by making a promise: here’s something hard, but if you stick with it, you’ll figure it out. I’ll help you through it, and when you do get it, you’ll feel the joy again, sometimes even more deeply than you did before.

Done right, joy leads to hard work, which leads to success and joy, and the cycle starts all over again. It takes courage to make those promises, and skill to keep them, and the best teachers make and keep them over and over again.

Real Life Rock Top 1: Finding Greil Marcus Online

When I was a student at Berkeley in the 90’s, sometimes I would walk home instead of taking the BART. I lived a few miles north of campus, about an hour’s walk if you didn’t stop anywhere, but it always took longer because of the bookstores (and the bagel shop, and the scenery) along the way. On one of these walks, I fished a $5 copy of Greil Marcus’s “In the Fascist Bathroom” out of the remainder bin in Half Price Books on Solano Ave. I can’t guarantee you that I read it cover to cover that night, but I’m pretty sure I couldn’t go to bed before getting through all the essays on Elvis Costello, and The Mekons, and The Clash, and Gang of Four, and Springsteen, and The Go-Go’s… so pretty much cover to cover. There were insightful points over the place, but more importantly Greil cared so much, you just couldn’t put the book down.

I’ve been a big fan of his writing ever since. So it was a treat to find out that a lot of it is now available online, at I don’t know who runs the site, but it has a range of his pieces, from the 60’s and 70’s to today, and keeps adding more. It also links to his monthly Real Life Rock Top 10 column, now hosted at Barnes and Noble. A great intro is this extensive conversation with Elvis Costello from the early 80’s, which was one of the pieces that grabbed me that first night. Two thoughtful, sincere, creative people talking, and taking life and music seriously.


You Can’t Separate Models From Data

This Atlantic article is a bit highfalutin’, but if you make it past all the metaphors at the beginning, you’ll get to see some good examples of a very important idea: you can’t separate a model from the data that goes into the model. In particular, constraints on the input data become constraints on the model.

A story: my first non-academic job was Modeling Guy at a start-up that was building technology to generate movie recommendations, similar to Netflix or Amazon. (Just so you know how long ago this was, we were going to have recommendation kiosks at video stores! Then the Internet crash happened.) Recommendation models all work in pretty much the same way: the model finds people whose taste (in movies, books, music, whatever) is similar to yours, and recommends movies to you that those people have liked but you might not have seen yet. The basic input data to a model like this is preference information. In less fancy language, you need to know what movies different people like.

One thing I wanted to account for in my model was that there are multiple movie genres, and people might have similar tastes in some but not others. (You and I could both like pretty much the same comedies, but maybe you like musicals and I hate them. No, really, I hate them.) To make this work, I needed enough data to be able to model preferences in each genre, not just overall. It wasn’t enough to know, for each person, 10 or 20 movies that they liked; I needed to know a few comedies each person liked, a few mysteries, a few musicals (if any), etc. Which meant I needed a larger dataset overall, because there are a lot of genres.

Now, it wasn’t so hard to collect this data. We made a long list of movies, made sure we included a decent number from every genre we wanted to cover, and had people rate the movies on our list. (You can give people a long list, because they usually still remember a movie well enough to rate it long after they saw it.) Long story short, I had enough data to do what I wanted to do — model each genre separately — and my model seemed to work pretty well. (We tested against models that lumped all the movies together, and mine did better.) What I want to highlight is that if I hadn’t been able to collect as much data, my fine-grained approach probably wouldn’t have worked at all. If I only had a small dataset, I wouldn’t have been able to say anything about what was going on inside each genre, and grouping people based on all the movies lumped together would have been a better bet. The model wouldn’t have been very precise, but it would have used the little data I did have more efficiently.

The upshot is that models depend on data, and data availability (quantity and quality) is always a real world issue, not just a math issue. A model may make perfect sense in theory, but work badly in practice if reality gets in the way of gathering the data you need to run the model.

Education data is a great example here. There’s a class of models for measuring teacher and school performance, known broadly as value-added models (VAM). The idea is to try to isolate how much “value” a teacher or school adds to students’ learning, where learning is usually measured through test scores. Regardless of what you think about standardized testing, you should know that the modeling here is extremely challenging! The problem is that it’s very hard to break out the impact of a teacher or school from all the other, “external” factors that might affect a kid’s test scores (genetics, at-home support and preparation, attendance, schools attended in the past, just to name a few). To do this, you need a model to estimate a kid’s “expected” test score based on all the external factors. (The “value added” by the school or teacher is then supposed to be captured as the difference between this model-based expected score and the actual score.)

To build such a model, you need to model the external factors, which means you need a huge amount of input data. You certainly want a history of past test scores (hard to collect if a kid has moved around between schools where different tests are given; hard to interpret even if you happen to have the data). You likely want to know something about income (typically eligibility for reduced-price school lunch programs is used as a proxy for this. At my kids’ school, this data was apparently wrong for a couple years). And you probably want to know something about support at home, out of school activities, and lots of other variables — well, good luck! The worst part is that the data gaps tend to be the biggest in the poorest schools (less resources to collect data, more kids going in and out, making the data problem harder to begin with). These are precisely the schools where it’s most important to model the challenges the kids face — and yet the data isn’t there to do it.

There’s starting to be a backlash against standardized testing, and against measuring teachers and schools by the results of those standardized tests. And there’s also a backlash to the backlash, with supporters of the VAM framework arguing that it’s the most objective measure of teacher performance and kids’ progress. But models and measures based on data that’s not there, and can’t be filled in, aren’t objective at all.

Fractions for Cooks (with Special Bonus Recipe)

For a little weekend fun, here’s another real life math problem you or your kids may enjoy. This one is about fractions. I like it because it’s very concrete, covers a lot of ground, and might have something to interest you regardless of how old you are. Here goes:

  1. I have a recipe that uses 3 eggs and half a cup of flour. (True story; for actual recipe, see below.) I want to use 4 eggs instead of 3, and I need to measure out the right amount of flour for the recipe. My Pyrex measuring cups with fractional increments on the side are in the dishwasher and I don’t feel like cleaning them (I told you, true story). But I do have these:
    IMG_5100That is, four large measuring spoons, with sizes 1/4 cup, 1/3 cup, 1/2 cup, 1 cup. How can I measure out the flour I need?
  2. How can I measure out the flour I need if I’m going to use just one egg?
  3. What’s the smallest unit of flour I can measure out accurately with these measuring spoons?
  4. If I have measuring spoons of sizes 1/n1, 1/n2,…, 1/nk (OK, this is getting kind of fictional), what’s the smallest unit of flour I can measure out accurately?

We talked about the first two of these at the math workshop for 3rd-6th graders that I ran in Montclair this fall (see here). The first is about addition or multiplication of fractions (1/3 + 1/3 = 2/3 or 2 × 1/3 = 2/3). It’s easy enough as long as you take the time to figure out how much flour you actually need (which wasn’t easy for my kids at first). The second one is about subtraction, and I like it because it lets you explain the fractional equation at the heart of the problem (1/2 – 1/3 = 1/6) with something very concrete: 3 eggs – 2 eggs = 1 egg. When we did it at the workshop, that made as much sense to the kids (maybe even more) as cutting up a pie into 6 slices.

The third and fourth problems are about common denominators and least common multiples. You actually need what’s usually thought of as higher math (Euclidean algorithm) to solve the last one fully, but both of them are meant to encourage experimenting. No matter how old you are, you can get started and see how far you get. You might even discover the Euclidean algorithm on your own!

Here’s the recipe, by the way. It’s for our current favorite weekend breakfast item, Russian cheese pancakes, known as syrniki (“syr” is Russian for cheese):


1½ lbs of farmer cheese (e.g., 3 half-pound packs)
3 eggs
½ cup of flour

Mix ingredients together in a large bowl.

Cover a cutting board with flour. Take out a handful of cheese mixture and roll it in the flour on the cutting board to make a ball-shaped dumpling. Repeat until all the mixture is used up (usually makes 20 or so dumplings). If your cutting board is well-covered with flour, the dumplings shouldn’t stick to it.

Heat a skillet over medium to high heat and cover the bottom with a layer of oil. Flatten each dumpling so it’s about half an inch thick, and place in the skillet (depending on the size of the skillet, you’ll likely be able to fit between 6 and 10 dumplings at once). Cook on each side until light brown (2-4 minutes a side). Serve covered with sour cream and/or sugar. Yum!

How Poker Explains Who Saw The Interview Where

Apparently I still can’t stop talking about The Interview and how you can figure out who saw it where (theater vs. video). I thought for sure I was done after this post, but then I came up with a few more takes on the problem. I really wasn’t kidding when I said that a huge part of thinking like a math person is being able to look at problems from multiple points of view! The first two approaches below are inspired by a discussion of a very similar problem here (h/t Mark Lakata).

The set up again: The New York Times told us that The Interview made $15M over Christmas weekend, from a total of 2M paid customers, each of whom either rented it for $6 or bought a ticket for $15. We want to know how many rented and how many bought tickets.

1.  We know that if all 2M people rented, the total revenue would have been $12M. To reach the actual revenue of $15M, we need to make an additional $3M by converting some renters into ticket buyers. Changing a single renter into a ticket buyer nets an additional $9 (from $6 to $15). The number of conversions we need to make is then the total incremental revenue, $3M, divided by the incremental revenue per conversion, $9. So we make $3M / $9 = 1/3 M conversions and end up with 1/3 M ticket buyers. (This is essentially another form of shifting the sliding scale from my last post.)

2.  For a mathematically equivalent, but more vivid version of this argument, imagine a poker game with 2M players (!) and a $6 ante. When everyone antes up, we have $12M in the pot. Cards are taken. The first bet is (an additional) $9. Some players see the bet, put in $9, and stay in, the rest fold. If we find that over this round, $3M more was added to the pot, for a total of $15M, we can deduce that the number of players who put in $9 more and stayed in (they correspond to the ticket buyers, who paid $15 in total) has to be $3M / $9 = 1/3 M. This formulation suggests a longer poker game, with multiple rounds of betting. Which in turn would correspond to a larger algebraic system, with more equations and more unknowns. Rentals, full price tickets, matinees, anyone?

3.  If you like to think in terms of averages, you can observe that if in total 2M people paid $15M, then the average person paid $7.50. If some people paid $6 and others paid $15, and if the average of $7.50 is 1/6 of the way between $6 and $15, then $6 must contribute 5/6 of the mass of the average, while $15 contributes 1/6. So 5/6 of the people paid $6 (rented) and 1/6 paid $15 (bought tickets). This is very close to my original weighted average argument, but with a change of variables, so we’re averaging over people (between $6 and $15) rather than over possible outcomes (between $12M and $30M), which seems a little more intuitive.

All three of these arguments are clean (I think), illuminating (they give you insight, one way or another, into why the answer is what it is), and elementary (you don’t need to know any math beforehand to come up with them). On the other hand, they aren’t exactly obvious. Moreover, as we’ve seen, they’re connected to some powerful mathematical techniques (solving systems of equations, changing variables, and so on). The connections work both ways. Having the general math techniques at your disposal makes it more likely that you’ll come up with your own nice take on the problem. On the other hand, if you’re lucky enough to come up with such an approach from first principles, it’ll help you understand why the general techniques work in the first place. The insight can go either way; math does not have to flow in any particular direction. To paraphrase Andrew Wiles, who proved (with help) Fermat’s last theorem: when you do math, you make your way, blindfolded, through a door into a dark room, you feel your way around some, you might bump into some dead ends, you might stumble once or twice, but if you’re thoughtful and persistent, you end up with a pretty good map of the territory.

And then you open up the next door, and go on to the next room.

Think Like A Math Person II: More on The Interview

A few days ago, Joey deVilla wrote a very nice blog post about using algebra (a system of two linear equations in two unknowns) to figure out how many people rented The Interview as opposed to seeing it in a movie theater. I posted my own note about how you can still analyze this example algebraically even if you’re allergic to equations. Joey then added some extremely clear pictures illustrating what I was talking about. After seeing his drawings, I thought it might be illuminating to reconcile the two approaches a little more explicitly.

To review the set up, The New York Times told us that the movie made $15M over Christmas weekend, from both rentals and ticket sales to a total of 2M people, where each person either rented for $6 or bought a ticket for $15. We want to know how many rented and how many bought tickets. If we write r = number of rentals, s = number of tickets sold, we know that

r + s = 2,000,000          (Eq. 1: total number of rentals and sales was 2M),

6r + 15s = 15,000,000  (Eq. 2: total revenue from rentals and sales was $15M).

If all 2M people rent — put another way, if the people counted by r AND the people counted by s each pay $6 — the total revenue would be $12M. We can write an equation for this by multiplying Eq. 1 by 6, obtaining

6r + 6s = 12,000,000.   (Eq. 3)

Similarly, if the people counted by both r and s all pay $15 each (everyone buys a ticket at the theater), the total revenue would have been $30M:

15r + 15s = 30,000,000.

In reality, of course, the people counted by r paid $6 and the people counted by s paid $15. Eq. 2 represents that the resulting revenue was $15M, which is 1/6 of the way from $12M to $30M (picture by Joey):


From here, we can unlock the mystery of how many rented and how many bought tickets with one question: What if we reanchor this picture at 0? In other words, what if we slide the Seth Rogen sliding scale to the left so it starts at 0 rather at $12M?

This amounts to pretending that everything costs $6 less. In that case, a rental would be free, and a movie ticket would cost $9 rather than $15. The scale would run from 0 to $18M: if everyone rented, the total revenue would be 0, and if everyone bought a ticket, the total revenue would be $18M ( = 2M × $9). We didn’t change the spread between the cost of a rental and a ticket (still $9), so the difference between the two extremes of the scale is still the same: $18M. We also didn’t change the number of renters and ticket buyers, so the actual revenue based on r rentals and s sales is still 1/6 of the way across the (shifted) scale, or $3M. You could also say that letting each of 2M people pay $6 less reduces your total revenue by $12M — from $15M to $3M (again). If we work with equations, this is exactly what we’re doing when we subtract Eq. 3 from Eq. 2 to obtain 9s = 3,000,000. It may look like just an operation on equations, but it has real physical meaning!

Now we’re ready to read off the answer. Looking at things in terms of equations, you would divide 9s = 3,000,000 by 9 to obtain s = 1/3 M. Looking at things in terms of sliding scales, you would say that if rentals are free and tickets cost $9, the total revenue is $3M, which is 1/6 of what it would have been if everyone bought a ticket. So how many of the 2M people bought tickets? Clearly 1/6 of them, or 1/3 M again!

So we can think about and solve this problem in multiple ways, and relate the different approaches to each other. Does that matter, and if so, why?

It matters because this is how math is supposed to work. At its heart, math is not about learning procedures (long division, completing the square, elimination of variables), and it’s not about getting to an answer, though procedures can be useful tools and can also help increase our overall understanding, and answers are salutary, and sometimes necessary, outcomes. At its heart, math is about learning to order and make sense of the world, which means formulating good organizing principles and drawing logical conclusions from those principles. When you’re trying to order the world, having multiple ways to look at things isn’t a bug, it’s a feature. It means that if you’re looking at things one way and you’re stuck, you might make progress when you switch points of view. It means that if you like manipulating equations and I like averaging along a sliding scale, we can both figure out how many people bought tickets to see The Interview. And it means that when we compare our points of view, we can find connections between them and gain a deeper understanding.

When you walk into a research math department, you don’t (usually) find professors hunched over desks or computers, trying to find more digits of pi, or trying to solve rows of really complicated integrals. What you’ll usually find is people talking to each other, looking for connections. You might hear:

— Hey, haven’t seen you for a while, what have you been up to?

— Well, I found this funny formula, and I’m trying to figure out what it means. It reminds me of another formula I know that counts certain arrangements of hyperplanes in 4-dimensional space. You used to work on hyperplane arrangements, right? Think you could help me understand what it means?

— Sure, let me take a look. OK, this term here looks like it’s counting….

To paraphrase Joe Strummer: This is Math / This is how we feel.

But connections between math ideas don’t need to be limited to university math departments, or even to math blogs. Maybe you have a kid who’s learning how to multiply, divide, or solve equations differently from how you learned it. Maybe you’re a little frustrated: what’s wrong with the old way? The answer, in all likelihood, is nothing. But there’s probably nothing inherently wrong with the new way, either. And maybe if you and your kid sit down and try to reconcile the different approaches, you might find out that they’re related, and understand why they both get you to the same place, and gain new insights about each one.

If you’ve gotten this far, if you’ve managed to bear with Joey and me through multiple takes on who saw The Interview where, I bet you have a better feel for the underlying math than you did before you started — maybe even better than you imagined you could. Congratulations — that’s thinking like a math person. And now you too have every right in the world to make fun of the New York Times when it asks if algebra is necessary.