As long as the status remains EVERYTHING_IS_GREAT, the computer will continue going through the “if” statements and, because there’s no DANGER, it skips the “goto fail” statements.
Except it doesn’t. What the computer makes of that above code is actually this:
OSStatus status = EVERYTHING_IS_GREAT;
if ((status = DoSomeSecurityStuff) is DANGER)
if ((status = KeepDoingSecurityStuff) is DANGER)
if ((status = DoTheMostImportantSecurityStuff) is DANGER)
An “if” statement only controls the first statement after it. So if there’s one after it—like a second “goto fail”—that statement will execute all the time. So in other words, that last “if” statement, with DoTheMostImportantSecurityStuff in it, never executes (which is why I crossed it out). Even worse, if KeepDoingSecurityStuff returned EVERYTHING_IS_GREAT, then status will be EVERYTHING_IS_GREAT, even though DoTheMostImportantSecurityStuff never happened. And so the security check “succeeds” when it shouldn’t have.
How on earth did that second “goto fail” get added? Some conspiracy buffs have suggested it’s a clever National Security Agency hack to enable them to spy on everyone that much more easily. I don’t buy it. I also don’t think that an engineer made the change in a manual edit—I hope not, at any rate. Incremental changes to crucial security code are generally made with great care, and in any professional environment, the changes are always reviewed by another engineer. Slip ups like this one are more common when a programmer isn’t actually working seriously on the code but doing some sort of maintenance or housekeeping on it.
Most likely (based on this diff, or set of changes, between the nonbugged and bugged versions of the code), the extra line crept in when a programmer did a partially automated merge of two different versions of this source code, and the resulting file was approved without anyone looking too closely at it. Code bases contain different “versions” and “branches” of code so that people can work on long-term projects and short-term fixes without stepping on one another’s toes. From time to time, you have to merge the changes made between branches. So if I were at Apple, I might have to merge two different versions of this file, and I’d look it over to make sure it was OK. I might not pay too close attention to every change, because I’d assume that a change simply reflected a difference between the two versions, and not the introduction of new, buggy code. I could be wrong, but this is mission-critical code, and I cannot believe Apple was going at it with a hacksaw.
Preventing bugs like these is one of the biggest challenges of software engineering, and this incident should make it pretty damn clear why. A single extra line of code compromised the security of millions and millions, and no one caught it for more than a year.
Even in a rigorous environment of code reviews, automated testing, and high-quality development, bugs like this do slip through. But the sheer complexity of today’s code makes it very difficult to catch everything. Some people have bashed on the code, saying that it’s too sloppy and careless. While I’m not thrilled with all the stylistic and structural choices, I think the code is reasonably good by today’s standards. Apple wouldn’t have released the code as open source if it weren’t good, and even if they had, there would have been quite an outcry from the open-source community if they’d looked it over and found it to be garbage. The people who wrote it knew what they were doing. The bug is indeed the result of negligence, but the sad and scary thing is that, for software developers, this is not a “What were they thinking?” moment. It’s an “Oh my God, that could have been me” moment.