A Firsthand Account of Microsoft’s Employee-Ranking System

Commentary about business and finance.
Aug. 26 2013 11:56 AM

Tales of an Ex–Microsoft Manager

Outgoing CEO Steve Ballmer’s beloved employee-ranking system made me secretive, cynical, and paranoid.

low angle view of a team of business executives sitting at a table
Time to stack rank. Don't tell nonmanagers.

Photo by George Doyle/Stockbyte/Thinkstock

There were seven or eight of us managers in the conference room, all peers, as well as our own manager. The conference rooms all had large tables, whether heavy varnished oak or cheap plywood. The chairs were the sort that let you lean back with increasing tension, and they had a few levers underneath the seat that raise and lower the seat and adjust the back. I’m a fidgeter, so I played with them a lot during meetings.

David Auerbach David Auerbach

David Auerbach is a writer and software engineer based in New York. His website is http://davidauerba.ch.

We were mostly white, and all men. Each of us had between three and six “direct reports”: nonmanager programmers who we oversaw. We were the direct reports of our manager. There were lots and lots of managers at Microsoft—it was the only path to advancement, so the company structure became more and more steeply vertical. Once or twice a year, we would all get together and decide how good each of our reports was, by ordering them from best to worst.

The system was called the stack rank.

Following Friday’s news of Microsoft CEO Steve Ballmer’s imminent retirement, postmortems of his lackluster 13-year reign have pointed to stack ranking—which, to be entirely fair, predated him—as both a cause and a symptom of the corporation’s decline. As a software developer and later development lead at Microsoft between 1998–2003, I had to evaluate others and be evaluated myself under this system. And I can say that yes, stack ranking is as toxic for innovation and integrity and morale as media reports made it out to be, and then some.

Each report’s name was written on an index card and put on the table. It was a two-step process. First, reports were broadly sorted into four buckets: excellent, good, mediocre, and awful. Then, within each bucket, people were paired for comparison and bubbled up or down. Managers would argue whether a particular report was better or worse than some other manager’s report in the same bucket. Our manager would adjudicate the debates. Some managers were better fighters than others.

A manager’s goals here are twofold. First, you want to place your best people as high as possible on the ladder. This will help them get big bonuses, promotions, and raises, and thus keep them happy and less likely to leave your group or the company. Second, if you’re unfortunate enough to have weak reports, you want either to help them by placing them sufficiently high that they don’t get dinged too badly come the annual review, or to throw them to the wolves and let them get ranked low. If you give up on them, they’ll be put on a dead-end track that marks them as more or less useless. The object is to get them out from under you and make them someone else’s problem.

I was lucky not to have any weak reports. I had been encouraged to take one at an earlier point, but frenetically talked my way out of it, using all the managerese I could summon to argue that he didn’t belong on my team.

Advertisement

My reports seemingly did fine. I got them all into the top two buckets. After the stack rank was over, our boss, who’d overseen the whole meeting, took our stack rank and then went to another, similar meeting one level up, where now we would be stacked and ranked behind our backs, then mixed in with the stack we’d just created, cruelly shuffled like a deck of human cards.

Eventually, the vice presidents would have to bargain among themselves for how many bonuses and raises they would get for their entire organizations, then ration them out according to the stack rank. People would then be assigned one of three grades: 4.0 (Above Average), 3.5 (Average), and 3.0 (Below Average). The very rare 4.5 got you a set of steak knives; a 2.5 meant you were fired (more or less). I’m not sure if Alan Turing himself could have gotten a 5.0.

The stack rank was harmful. It served as an incentive not to join high-quality groups, because you’d be that much more likely to fall low in the stack rank. Better to join a weak group where you’d be the star, and then coast. Maybe the executives thought this would help strong people lift up weak teams. It never worked that way. More often, it just encouraged people to backstab their co-workers, since their loss entailed your profit.

The stack rank was a zero-sum game—one person could only excel by the amount that others were penalized. And it was applied at every level of the organization. Even if you were in a group of three high performers, it was very likely that one of you would be graded Above Average, one Average, and one Below Average. Unless your manager was a prick or an idiot or both, the ordering would reflect your relative skills, but that never came as too much comfort to the hard-working schlub who just wasn’t as good as the other two.

This was my problem. I had three reports, A, B, and C, and they neatly fit into three categories: C was good, B was great, and A was fantastic. They were all nice and retiring sorts—they weren’t self-promoters, which put them at a disadvantage at Microsoft—and I did want to do well by them. Based on their position in the stack rank, I thought that this would have been a fair assessment of them relative to the company in general:

My Ideal Distribution

  • A: Above Average
  • B: Above Average
  • C: Average

Above Average would get A and B nice bonuses and raises, while C might get a small raise and a decent bonus with an Average. That didn’t happen. My manager told me baldly that this was how it would go:

The Actual Distribution

  • A: Above average
  • B: Average
  • C: Below average

My desired rankings were out of the question, since my manager would then have had to steal that extra Above Average from some other manager. I thought that B could live with Average (we were all well-compensated, after all), but rating C as Below Average hurt.

So I argued for C, and my manager said there was exactly one alternative:

The Alternative Distribution

  • A: Average
  • B: Average
  • C: Average

But A had been at the very top of the stack! How could A do worse than people we’d all agreed were weaker programmers? I gave up and let C take the Below Average. This is the zero-sum game at work.

I still feel bad about this.

Then I had to explain things to my reports. This illustrated another problem with the system: It destroyed trust between individual contributors and management, because the stack rank required that all lower-level managers systematically lie to their reports. Why? Because for years Microsoft did not admit the existence of the stack rank to nonmanagers. Knowledge of the process gradually leaked out, becoming a recurrent complaint on the much-loathed (by Microsoft) Mini-Microsoft blog, where a high-up Microsoft manager bitterly complained about organizational dysfunction and was joined in by a chorus of hundreds of employees. The stack rank finally made it into a Vanity Fair article in 2012, but for many years it was not common knowledge, inside or outside Microsoft. It was presented to the individual contributors as a system of objective assessment of “core competencies,” with each person being judged in isolation.

When review time came, and programmers would fill out a short self-assessment talking about their achievements, strengths, and weaknesses, only some of them knew that their ratings had been more or less already foreordained at the stack rank. The ones who knew could sometimes be recognized by their flip comments on their performance reviews, like the hot-tempered guy who wrote every year in “Areas to Improve,” “I will try to be less of an asshole.”

They were exceptions, though. If you did know about the stack rank, you weren’t supposed to admit it. So you went through the pageantry of the performance review anyway, arguing with your manager in the rhetoric of “core competencies.” The managers would respond in kind. Since the managers had little control over the actual score and attendant bonus and raise (if any), their job was to write a review to justify the stack rank in the language of absolute merit. (“Higher visibility” was always a good catch-all: Sure, you may be a great coder and work 80 hours a week, but not enough people have heard of you!)

Strangely, this charade would sometimes happen even between managers and their managers, both pretending that they didn’t know about the stack rank that they had recently participated in. This kind of bad faith is more common than you might think. I saw it most vividly in a certain number of party-liners who seemed wholly oblivious to the dissonance between the performance review and the stack rank, as though the two would always magically line up, even though they never did.

This sort of organizational dissembling skews your psyche. After I left Microsoft, I was left with lingering paranoia for months, always wondering about the agendas of those around me, skeptical that what I was being told was the real story. I didn’t realize until the nonstacked performance review time at my new job that I’d become so wary. At the time—inside Microsoft—it just seemed the only logical way to be.

  Slate Plus
Slate Picks
Dec. 19 2014 4:15 PM What Happened at Slate This Week? Staff writer Lily Hay Newman shares what stories intrigued her at the magazine this week.