How Value Added Models are Like Turds

“Why am I surrounded by statistical illiterates?” — Roger Mexico in Gravity’s Rainbow

Oops, they did it again. This weekend, the New York Times put out this profile of William Sanders, the originator of evaluating teachers using value-added models based on student standardized test results. It is statistically illiterate, uses math to mislead and intimidate, and is utterly infuriating.

Here’s the worst part:

When he began calculating value-added scores en masse, he immediately saw that the ratings fell into a “normal” distribution, or bell curve. A small number of teachers had unusually bad results, a small number had unusually good results, and most were somewhere in the middle.

And later:

Up until his death, Mr. Sanders never tired of pointing out that none of the critiques refuted the central insight of the value-added bell curve: Some teachers are much better than others, for reasons that conventional measures can’t explain.

The implication here is that value added models have scientific credibility because they look like math — they give you a bell curve, you know. That sounds sort of impressive until you remember that the bell curve is also the world’s most common model of random noise. Which is what value added models happen to be.

Just to replace the Times’s name dropping with some actual math, bell curves are ubiquitous because of the Central Limit Theorem, which says that any variable that depends on many similar-looking but independent factors looks like a bell curve, no matter what the unrelated factors are. For example, the number of heads you get in 100 coin flips. Each single flip is binary, but when you flip a coin over and over, one flip doesn’t affect the next, and out comes a bell curve. Or how about height? It depends on lots of factors: heredity, diet, environment, and so on, and you get a bell curve again. The central limit theorem is wonderful because it helps explain the world: it tells you why you see bell curves everywhere. It also tells you that random fluctuations that don’t mean anything tend to look like bell curves too.

So, just to take another example, if I decided to rate teachers by the size of the turds that come out of their ass, I could wave around a lovely bell-shaped distribution of teacher ratings, sit back, and wait for the Times article about how statistically insightful this is. Because back in the bad old days, we didn’t know how to distinguish between good and bad teachers, but the Turd Size Model™ produces a shiny, mathy-looking distribution — so it must be correct! — and shows us that teacher quality varies for reasons that conventional measures can’t explain.

Or maybe we should just rate news articles based on turd size, so this one could get a Pulitzer.

 

8 thoughts on “How Value Added Models are Like Turds

  1. The link explaining how we know VAM are noise is broken…do you have another good link? And can you explain in brief? I suppose I would one way to start is by looking at whether teacher ratings persist from year to year…and how much they persist when a teacher switches schools…

    Like

  2. “Each single flip is binary, but when you flip a coin over and over, one flip doesn’t affect the next, and out comes a bell curve.”

    How can a binary operation come out in a bell curve?? It’s an either/or situation, a or b, heads or tails. The example doesn’t make sense to me as written. Please explain further. Gracias.

    Like

    • Right, each flip is heads or tails, meaning that if you flip a coin just once, you get either one head (half the time, so a 50% chance) or no heads (also 50% chance). If you flip twice (or flip two coins), you have a 25% chance of two heads, a 50% chance of one head, and a 25% chance of no heads. Flip three times, or three coins, and it’s a 12.5% chance of three heads, 37.5% chance of two heads, 37.5% chance of one head, and 12.5% chance of none. If you plot those probabilities, the curve is looking a little more bell-shaped. The central limit theorem says that the more flips, or the more coins, the more bell-like the probability curve gets.

      Like

      • Can’t agree. Each time one flips a coin the chance is 50/50. The flips before or after have no effect. Those probabilities you cite are false probabilities. Now that doesn’t mean that they aren’t fun mathematically speaking, but there is only ever a 50/50 chance.

        It seems to me that what you are doing is taking the bell curve and modifying the definition of probability to fit that curve and satisfy some other purpose, that cannot be either inferred, induced or deduced.

        Like

      • Two different questions. Toss a coin three times. Each flip has a 50/50 chance of coming up heads, as you say. But once you’ve made the three flips, you can also count up how many heads you just got. What I’m saying above is that three heads out of three flips is less likely than two heads and a tail (for example). And if you flip a hundred times, then 50 heads is more likely than 40 heads, and a lot more likely than just 30 heads, and the curve that tells you just how much more likely turns out to look like a bell. If that’s not convincing, think of rolling two dice. For each single roll, each face (1-6) is as likely to come up as any other. But you’re a lot more likely to roll a 7 on two dice than you are to roll a 2, just because there are more ways to get to the 7.

        Liked by 1 person

Leave a reply to Duane E Swacker Cancel reply