How Value Added Models are Like Turds

“Why am I surrounded by statistical illiterates?” — Roger Mexico in Gravity’s Rainbow

Oops, they did it again. This weekend, the New York Times put out this profile of William Sanders, the originator of evaluating teachers using value-added models based on student standardized test results. It is statistically illiterate, uses math to mislead and intimidate, and is utterly infuriating.

Here’s the worst part:

When he began calculating value-added scores en masse, he immediately saw that the ratings fell into a “normal” distribution, or bell curve. A small number of teachers had unusually bad results, a small number had unusually good results, and most were somewhere in the middle.

And later:

Up until his death, Mr. Sanders never tired of pointing out that none of the critiques refuted the central insight of the value-added bell curve: Some teachers are much better than others, for reasons that conventional measures can’t explain.

The implication here is that value added models have scientific credibility because they look like math — they give you a bell curve, you know. That sounds sort of impressive until you remember that the bell curve is also the world’s most common model of random noise. Which is what value added models happen to be.

Just to replace the Times’s name dropping with some actual math, bell curves are ubiquitous because of the Central Limit Theorem, which says that any variable that depends on many similar-looking but independent factors looks like a bell curve, no matter what the unrelated factors are. For example, the number of heads you get in 100 coin flips. Each single flip is binary, but when you flip a coin over and over, one flip doesn’t affect the next, and out comes a bell curve. Or how about height? It depends on lots of factors: heredity, diet, environment, and so on, and you get a bell curve again. The central limit theorem is wonderful because it helps explain the world: it tells you why you see bell curves everywhere. It also tells you that random fluctuations that don’t mean anything tend to look like bell curves too.

So, just to take another example, if I decided to rate teachers by the size of the turds that come out of their ass, I could wave around a lovely bell-shaped distribution of teacher ratings, sit back, and wait for the Times article about how statistically insightful this is. Because back in the bad old days, we didn’t know how to distinguish between good and bad teachers, but the Turd Size Model™ produces a shiny, mathy-looking distribution — so it must be correct! — and shows us that teacher quality varies for reasons that conventional measures can’t explain.

Or maybe we should just rate news articles based on turd size, so this one could get a Pulitzer.

 

Advertisements

Why the Prado is the Best

I am in my mid-40’s and have been to a lot of art museums, so I didn’t really expect to walk into another one and think, “Oh my god, this is head and shoulders above any place I’ve been to” (the Met, the Louvre, the Uffizi, the Barnes, etc.). But then I had never been to Madrid and the Prado until this week. Here are three reasons why it really is the best:

1. The great stuff is so, so great. Within minutes of walking in, I had wandered into someplace called room 49, which might be the best single collection of Italian Renaissance art you’ll ever see in a single room. With a bunch of glorious large and mid-sized Raphaels, like this one:

img_3047

I mean, can you imagine seeing something like this just a couple minutes after you walk in? When I came across it, I didn’t know yet that they don’t let you take photos, so I snapped this one before the guards gently told me the rules.

And oh yeah, the Prado has a really interesting copy of the Mona Lisa too.

2. It’s not gigantic, but the quality is very high throughout. There’s something astonishing, often many things, in practically every room. Some highlights: likely the deepest Goya collection anywhere, from all across his career, including a bunch of large, airy, light-colored early paintings called tapestry cartoons up on the top floor. A magnificent set of El Grecos. And don’t get me started on Velazquez, Zurbaran, Ribera, Bosch (again, the deepest set of his works I’ve seen in one place), Fra Angelico’s Annunication, and Tintoretto. The main hall is anchored by a fantastic collection of Titian and Rubens, which… is absolutely great, but many of the other rooms are even better.

3. The way it just teems with life. So check out this graceful bit of Asian calligraphy, from a scroll I found in an out-of-the way, empty room, with no guards to tell me not to take pictures:

img_3058

I mean, those hands!

OK, the reason why you haven’t heard about the Asian calligraphy scrolls in the Prado is that there aren’t any. Our angel with the graceful hands is actually part of a Spanish altarpiece, circa 1200:

img_3057

Or maybe it didn’t look like an Asian scroll to you. But the magic of the Prado is that when you see so many amazing pictures together in one place, they amplify and animate each other, they transform each other, they dance with each other and with you, and before you know it, all the distinctions you’ve ever learned between different genres and periods and schools melt away. Then only the art is left, direct, pure, and alive, and it fills you with life, and keeps dancing with you long after you leave the building. The Prado is the most joyful art museum in the world.

Springtime for Donald*

What a beautiful spring day in the Northeast yesterday! The kind of day that makes you, right now, wherever you are, city folks or out in the country, snuggled in quilts or riding the bus, just turn to the nearest Real American around you, even your own reflection in the mirror and . . . just . . . sing:

It’s springtime for Donald and Vladimir!
Winter for NATO and Ukraine
Putin’s Manchurian Candidate
Is helping Make Russia Great Again!

Oh, it’s springtime for Donald and Vladimir,
Winter for Syrian refugees
Sorry if your doctor’s Iranian,
And judges belong on their knees

Yes, it’s springtime for Donald on Twitter
Please get your facts from Kellyanne
Congress had better just smile and nod
When Spicer says it’s not a ban

It’s springtime for Donald and the alt-right!
The Times and the Post are fake news
Read Breitbart so you’ll be the first to know
When Bannon puts a ban on the Jews!

Now ev’rybody –

*Apologies to Thomas Pynchon (from whom I stole the intro) and Mel Brooks (who gave us this):

Music For Marchers

1. The Opportunity

In mid-September 2001, about a week after terrorists destroyed the World Trade Center towers, Laurie Anderson played a concert at Town Hall in New York City. Here is what she said when she began:

We want to dedicate our music tonight

to the great opportunity that we all have

to begin to truly understand

the events of the past few days,

and to act upon them,

with courage and with compassion,

as we make our plans

to live in a completely new world.

Opportunities we didn’t ask for, opportunities that we fear, are still opportunities. We missed the one we faced back then, but let’s recognize the one we face as a country today — and act upon it.

2. Let’s Renew Ourselves Now

Leonard Cohen, improvising in front of perhaps 600,000 people at the Isle of Wight festival in 1970:

I know it’s been cold, and I know it’s been damp.

I know you’ve been sitting all night long.

But let’s renew ourselves now, let’s renew ourselves now, let’s renew ourselves now.

3. Marchons, Marchons

I don’t care how many times you’ve watched this movie or this scene, it’s impossible to see it without feeling something. So watch it again. And march on, march on.

Then They Came for my Job — and No One was Left to Speak for Me

Everybody loves Martin Niemoller. Wikipedia will tell you he is famous for his “provocative poem about the cowardice of German intellectuals following the Nazis’ rise to power and subsequent purging of their chosen targets, group after group.” Here’s the original form of that poem, in case it hasn’t come across your Facebook feed lately:

First they came for the Socialists, and I did not speak out —
Because I was not a Socialist.

Then they came for the Trade Unionists and I did not speak out —
Because I was not a Trade Unionist.

Then they came for the Jews, and I did not speak out —
Because I was not a Jew.

Then they came for me — and there was no one left to speak for me.

It’s a lovely piece to quote: it reckons with the past, signals enlightenment and moral clarity, and challenges us all to live up to its ideals. You pretty much can’t help feeling a little more ethical and upright when you read it. Still, before we get too complacent, I’d like to ask you to accompany me, and Niemoller, on a short tour of recent American economic and political history. I want to see if we’ve really learned to speak up for others as well as we think we have. Humor me, you don’t really mind, do you?

First they came for the Socialists — this feels like a quaint anachronism now, but the Socialist party was highly relevant politically in the United States in the early 1900’s. The Socialists had two representatives in Congress and high water marks of 6% of the popular vote in the 1912 presidential election for socialist Eugene Debs and almost 17% in 1924 for socialist-supported progressive Robert LaFollette. (By contrast, Gary Johnson won 3% and Jill Stein won 1% of the popular vote in 2016.) Within 25 years, discredited by American politics and world events, the Socialists were more or less irrelevant in the U.S., winning less than 0.1% of the presidential vote from the 1950’s on. Some people did speak out, but the tide was against them. You and I weren’t around then, in any case.

Then they came for the Trade Unionists —  much of this happened in our lifetimes. Labor unions were highly influential in the United States for most of the 20th century, especially during what’s known as the Great Compression, the period following the New Deal reforms when income inequality declined dramatically. From Wikipedia:

This “middle class society” of relatively low level of inequality remained fairly steady for about three decades ending in early 1970s, the product of relatively high wages for the US working class and political support for income leveling government policies.

This is clear in this chart that tracks income inequality over time. This income leveling correlates well to labor union membership, which grew steadily during the New Deal, and eventually reached almost 35% — more than 1 in 3! — of salaried workers in 1954. Union membership gradually declined from that point. The best statistics start in 1983, when around 20% of the work force still belonged to unions, and they show another 50% decline to just 11% of the work force in 2015. For my generation (I am in my mid-40’s), and for the so-called new economy, I think we can say that we’ve played out Niemoller’s script pretty much as written: few of us were unionists, and few of us have paid attention as union influence declined throughout our working lives. (Uber rides are cheap and convenient, so who cares if the drivers have no rights and the company can raise its take from fares at will? I bet a hundred years ago there would have been a strike.)

Then they came for the Industrial Cities. Here I am talking about both big (Detroit, MI, population 1.85M in 1950) and small cities (Youngstown, OH, population almost 170K in 1960). Since then, both cities have shrunk by about 60%! The history of these cities (and many more like them) in the 20th century is a complex mix of economics, sociology, and racism, but here’s the world’s shortest introduction:

U.S. cities and industry in the North and Midwest grew rapidly in the late 1800’s and early 1900’s, powered by a huge wave of  European immigrants. Then World War I created a labor shortage, and blacks from the South migrated north to fill the gap. This began the Great Migration of blacks to cities in the North and (eventually) the West. Apart from a short break during the Great Depression, the migration continued for the next 40-50 years, transforming the black population from 80% rural-20% urban to the exact opposite — 80% urban-20% rural — in little more than half a century.

As they came to the white-dominated cities, incoming blacks were steered to neighborhoods with lower economic opportunity and investment, away from whites, a morally bankrupt process known as redlining. (For a history of redlining that goes beyond Wikipedia, read Ta-Nehisi Coates’s mighty essay, The Case for Reparations.) Meanwhile, whites tended to concentrate in their own segregated parts of the same cities, and eventually to move away from the cities altogether as road networks improved (white flight). Some of the industrial jobs that brought blacks to the cities went to the suburbs with the white population, some began to go overseas, and some were lost to automation, with the latter trend showing no signs of abating. In just 20 years (1967-1987), Philadelphia, New York City, Detroit, and Chicago all lost over 50% of their manufacturing jobs. Some cities managed to reorient around the technology or financial sectors (Boston, New York), while others lost significant population and income that they’ve never regained. (For more on the relationship between lost industrial jobs and inner city poverty, you can read William Julius Wilson; here is a short summary paper.)

Who spoke out for these cities, for their minority populations and for their financially starved schools? Not those whites who moved to the suburbs to get away from blacks, and not the mainstream Democratic party of the 1980’s and 90’s, which decided it needed to distance itself from inner city concerns in order to win elections. Maybe you did?

Then they came for the Heartland. Or perhaps for the U.S. manufacturing economy, depending on whether you prefer to view things geographically or economically. This has been the subject of much debate since the election, because Trump’s margin of victory came from three “Rust Belt” states (PA, MI, WI) that he had been expected to lose, and because it’s believed that his margin of victory in those states came from dissatisfied white working class voters either seeking change or lashing out either at bullying elites or at bullied minorities or immigrants. (This is probably a lousy explanation of voter dynamics in Pennsylvania.) Frustratingly, these discussions have devolved into arguing over whether these voters are motivated by economic or social-cultural factors (as though it were easy to separate the two) and whether they deserve sympathy (as though we’ve never seen anyone bully others while being bullied at the same time).

No matter how you view the politics here, we should be able to come to some kind of agreement on the economic facts. Here’s a recent article suggesting that the Rust Belt is not a struggling region, because other parts of the country are worse off: the states we are talking about are all in the middle third of U.S. states by median income. Sure, they’re not Connecticut or California, but they’re not Arkansas or Mississippi either. This is true enough as far as it goes, but it fails to consider decline. As a quick back-of-the-envelope exercise, I pulled up the income data and sorted the states top to bottom by median income as of 2015 and as of 1995. Here’s what I found:

  • The state with the biggest decline in ranking over the last 20 years (from #5 to #28)? Wisconsin.
  • The state with the second biggest decline (from #14 to #31)? Michigan.
  • Five of the ten states with the biggest drop in ranking form a contiguous region at the heart of the Rust Belt: Wisconsin, Michigan, Illinois, Indiana, and Ohio. (There are no other clusters; the other five states are completely non-contiguous.)

At this point, these Rust Belt jobs are more likely lost to automation than to other states or countries (though there is some of the latter) or incoming immigrants from lower-income countries (virtually none coming into the Midwest). So it’s hardly clear what to do. Still, there’s not much sympathy for these people’s plight on my Facebook feed, where most people (1) have college or graduate degrees and professional occupations, (2) live in desirable areas, and (3) are frustrated and terrified, as I am, to see Trump in power.

Then they came for — OK, who’s next? Are you? What job will you have in twenty years, or what job will your children have, when:

  • Economic opportunity appears to be declining, not increasing, across generations.
  • The technology sector, which is not very labor intensive, and also finds it easy to move jobs around or offshore, is coming to dominate the (non-service) economy. (Just to give you an idea, Facebook is the 7th most valuable company in the world and employs only 13,000 people, a fraction of the workforce of the industrial behemoths of 50 years ago, whose employees numbered in the hundreds of thousands.)
  • Automation is increasing, and potentially applies to more and more jobs, both in and out of the tech sector, as technology itself becomes ever more capable.
  • The gig economy is rising, and there doesn’t appear to be an obvious limit on what kind of jobs can be gig-ized. (Gig workers have even less leverage than long-term employees.)
  • The financial industry, which has grown rapidly over the last several decades, provides relatively little economic value relative to the human capital invested in it, and might be ripe for contraction. You could speculate that the same might be true of marketing as well (another industry where too many of the well-educated settle), or at least that it could be done more cheaply. If our news stories can be replaced by crap written on the cheap in Eastern Europe, why not the ads?

Shifting into my ominous Rod Serling voice: with the manufacturing economy gone, both in the cities and out, with more and more people shifted into low-wage service jobs, with labor as powerless as it seems it’s ever been, with our economy sliced up into strata, and the humans in each one of those strata cut away in turn — who will be left to speak for you when they come for you? And how will you feel with Niemoller’s warning, turned into prophecy now, running on a loop over and over, not just on your Facebook feed but inside your head?

Elvis Costello Resists

71kt0r5issl-_sl1032_

 

At the start of October, I went to see Elvis Costello at Town Hall, a small theater near Times Square with a history of hosting political meetings. You can see framed programs from some of those meetings, hazy, fading memories of the past, as you walk into the building:

img_2192

The show had been loosely billed as a review of Elvis’s life and career, aligned with the memoir he released last summer, and I was looking forward to a few hours spent listening to him resculpting his songs in new and deeply satisfying ways. (I still felt warm from the time I saw him carry this off several years ago, at Carnegie Hall of all places, when he opened the show by playing the entire first side of My Aim is True, sang songs from all over the map, and told the kind of stories you might hear from the cool uncle you wish you had.)

For the first half hour, the show felt much like that night at Carnegie Hall. The stage was dominated by a giant TV, which showed 80’s Elvis videos as we waited for him to come on, then, once he did, ran a slide show of old family photos (hazy snapshots from the past again) and pictures from his early tours behind him as he played (solo, with acoustic guitar). He gave us both old and recent standards, culminating in a lovely rearrangement of Everyday I Write the Book. But as I burrowed deeper into my plush chair, getting comfortable for a couple more hours with Elvis’s fabulous songbook, he walked stage right, sat down at the piano, and smashed the time machine to bits.

In a halting, almost shattered voice, accompanied by the most austere, jagged piano figures you could imagine, he sang Shipbuilding, perhaps his greatest song. Written at the time of the Falklands war, Shipbuilding is a biography of out-of-work men in the English shipyards, waiting for the war to restart the shipyards and bring them jobs,

A new winter coat and shoes for the wife
And a bicycle on the boy’s birthday

and also for the warships that emerge from the shipyards to take those boys away, and bring them to their graves. “Is it worth it,” Elvis asks, and then asks again in a later verse that I hadn’t known before,

A small bunch of flowers is all that you get
And a box to bury the baby

I had heard Elvis sing the song before, but not like this. I wish I knew how to describe the cocktail of regret and loss and compromise and inevitability I heard in his voice, but all I can do is point you to another song: if you listen to Springsteen’s Highway Patrolman, you’ll catch the same emotions when you hear Bruce’s voice trail off as he sings When it’s your brother, sometimes you look the other way at the end of the second verse. Hearing Elvis now, you could see shipyards, but also car factories on the outskirts of town, and mom and pop stores on Main Street, all of them filled with people’s dread of decisions they might know they’ll come to regret but still can’t avoid making. By the time the song’s resolution — diving for dear life, when we should be diving for pearls — exploded from the stage, you felt the full weight of the present day, of this election. Somehow, in just four minutes, Elvis had managed to make you feel the dilemmas of Trump voters deep down in your bones, more convincingly than any analysis I’ve read before or since the election. But he was only getting started.

He sang Deep Dark Truthful Mirror, an answer record in some ways, about facing up to the consequences of whatever choices you’ve made. Then, still at the piano, he started to tell us about a musical he was working on with the playwright Sarah Ruhl. It is an adaptation of the 1957 film A Face in the Crowd, which tells the story of the media-driven rise and fall of an American demagogue. If hearing Elvis sing Shipbuilding wasn’t enough to convince you he was addressing today’s political climate, the new songs he had written for the musical made it even plainer. He sang as the anti-hero, promising the angry and ignored to be their champion. He sang silly jingles that scoot back and forth across your screen and sell you silly things. And he sang for and with the faces in the crowd, all of us looking ourselves in the mirror and trying to find some common ground. The songs were theatrical, raw, maybe still works in progress — but they felt necessary and Elvis got them across. I am no fan of musicals, but I can’t wait to see what these songs will be like, and what they will mean, by the time they hit the stage, in the full blossoming of Trump’s America.

And then the finale. Given the range of Elvis’s songbook, it’s ironic that the song  he is most associated with in the public imagination, What’s So Funny ‘Bout Peace, Love, and Understanding, is one he didn’t even write. But it is, and you won’t feel like you got your money’s worth unless you hear him play it, will you? Still, I’m not sure the song ever felt like it had more at stake than when Elvis sang it at Town Hall to close this show. And, as he sang it, the giant TV behind him projected a late 70’s, Armed Forces-era poster with the headline of the evening, Elvis’s answer to the America of Donald J. Trump, the next president of the United States: DON’T JOIN!

img_2190

Soon it was over, and it was hard to believe we actually had to leave the building. But we did, and as I walked downstairs, the faded programs on the walls, from Town Meetings of long ago, were suddenly as sharp and timely as anything you might see on TV or on your Facebook feed. The past is not dead, not even past, as Faulkner said, and now all of us have many more town meetings to go to.

The Models Were Telling Us Trump Could Win

Nate Silver got the election right.

Modeling this election was never about win probabilities (i.e., saying that Clinton is 98% likely to win, or 71% likely to win, or whatever). It was about finding a way to convey meaningful information about uncertainty and about what could happen. And, despite the not-so-great headline, this article by Nate Silver does a pretty impressive job.

First, let’s have a look at what not to do. This article by Sam Wang (Princeton Election Consortium) explains how you end up with a win probability of 98-99% for Clinton. First, he aggregates the state polls, and figures that if they’re right on average, then Clinton wins easily (with over 300 electoral votes I believe). Then he looks for a way to model the uncertainty. He asks, reasonably: what happens if the polls are all off by a given amount? And he answers the question, again reasonably: if Trump overperforms his polls by 2.6%, the election becomes a toss-up. If he overperforms by more, he’s likely to win.

But then you have to ask: how much could the polls be off by? And this is where Wang goes horribly wrong.

The uncertainty here is virtually impossible to model statistically. US presidential elections don’t happen that often, so there’s not much direct history, plus the challenges of polling are changing dramatically as fewer and fewer people are reachable via listed phone numbers. Wang does say that in the last three elections, the polls have been off by 1.3% (Bush 2004), 1.2% (Obama 2008), and 2.3% (Obama 2012). So polls being off by 2.6% doesn’t seem crazy at all.

For some inexplicable reason, however, Wang ignores what is right in front of his nose, picks a tiny standard error parameter out of the air, plugs it into his model, and basically says: well, the polls are very unlikely to be off by very much, so Clinton is 98-99% likely to win.

Always be wary of models, especially models of human behavior, that give probabilities of 98-99%. Always ask yourself: am I anywhere near 98-99% sure that my model is complete and accurate? If not, STOP, cross out your probabilities because they are meaningless, and start again.

How do you come up with a meaningful forecast, though? Once you accept that there’s genuine uncertainty in the most important parameter in your model, and that trying to assign a probability is likely to range from meaningless to flat-out wrong, how do you proceed?

Well, let’s look at what Silver does in this article. Instead of trying to estimate the volatility as Wang does (and as Silver also does on the front page of his web site, people just can’t help themselves), he gives a careful analysis of some possible specific scenarios. What are some good scenarios to pick? Well, maybe we should look at recent cases of when nationwide polls have been off. OK, can you think of any good examples? Hmm, I don’t know, maybe…

brexit-headlines

Aiiieeee!!!!

Look at the numbers in that Sun cover. Brexit (Leave) won by 4%, while the polls before the election were essentially tied, with Remain perhaps enjoying a slight lead. That’s a polling error of at least 4%. And the US poll numbers are very clear: if Trump overperforms his polls by 4%, he wins easily.

In financial modeling, where you often don’t have enough relevant history to build a good probabilistic model, this technique — pick some scenarios that seem important, play them through your model, and look at the outcomes — is called stress testing. Silver’s article does a really, really good job of it. He doesn’t pretend to know what’s going to happen (we can’t all be Michael Moore, you know), but he plays out the possibilities, makes the risks transparent, and puts you in a position to evaluate them. That is how you’re supposed to analyze situations with inherent uncertainty. And with the inherent uncertainty in our world increasing, to say the least, it’s a way of thinking that we all better start becoming really familiar with.

The models were plain as day. What the numbers were telling us was that if the polls were right, Clinton would win easily, but if they were underestimating Trump’s support by anywhere near a Brexit-like margin, Trump would win easily. Shouldn’t that have been the headline? Wouldn’t you have liked to have known that? Isn’t it way more informative than saying that Clinton is 98% or 71% likely to win based on some parameter someone plucked out of thin air?

We should have been going into this election terrified.