The Optimizer

How Google’s Jeff Dean became the Chuck Norris of the Internet.

Jan 23, 201310:20 AM

Jeff Dean of Google. — Google’s Jeff Dean

Courtesy Google.

“The speed of light in a vacuum used to be about 35 mph. Then Jeff Dean spent a weekend optimizing physics.”—Jeff Dean Facts

Jeff Dean facts aren’t, well, true. But the fact that someone went to the trouble to make up Chuck Norris-esque exploits about Dean is remarkable. That’s because Jeff Dean is a software engineer, and software engineers are not like Chuck Norris. For one thing, they’re not lone rangers—software development is an inherently collaborative enterprise. For another, they rarely shoot cowboys with an Uzi.

Nevertheless, on April Fool’s Day 2007, some admiring young Google engineers saw fit to bestow upon Jeff Dean the honor of a website extolling his programming achievements. For instance:

Compilers don’t warn Jeff Dean. Jeff Dean warns compilers.
Jeff Dean writes directly in binary. He then writes the source code as documentation for other developers.
When Jeff Dean has an ergonomic evaluation, it is for the protection of his keyboard.
Jeff Dean was forced to invent asynchronous APIs one day when he optimized a function so that it returned before it was invoked.

Here’s a true Jeff Dean fact: You have to be a computer whiz to understand most of the jokes that people tell about Jeff Dean. (For those interested, Business Insider offers helpful explanations of some of the more popular ones.) But if his fake accomplishments are hard to understand without a real computer-science background, his real accomplishments are even more abstruse. The programs that Dean was instrumental in building—MapReduce, BigTable, Spanner—are not the ones most Google users associate with the company. But they’re the kind that made Google—and, consequently, much of the modern Web as we know it—possible. And the projects he’s working on now have the potential to revolutionize information technology once again.

When you think of the people who built today’s Web, you probably conjure founders and CEOs: Tim Berners-Lee, Marc Andreessen, Larry Page and Sergey Brin, maybe Mark Zuckerberg. That makes sense: Each of those people invented a product or framework that shaped how we use the Internet.

Meanwhile, in the shadows of these giants—all of whom have graduated from day-to-day gruntwork—are legions of faceless developers who tap away at keyboards every day to build the products and systems we all use. In the tech world, more so than in most other industries, those employees are far from interchangeable. A great accountant might save you 5 percent on your taxes. A great baseball player will reach base just a bit more often than a mediocre baseball player. But a great software developer can do in a week what might take months for a team of 10 lesser developers—the difference is exponential rather than marginal. That’s not a Jeff Dean fact; it’s conventional wisdom in Silicon Valley, which is why the best companies go to such great lengths to attract top talent.

Dean arrived at Google in mid-1999 having already earned a reputation as one of the country’s top young computer scientists. Growing up when home computing power was just blossoming, Dean says he was always looking for ways to push the limits of what you could do on a given machine. As a high schooler, he wrote software for analyzing vast sets of epidemiological data that he says was “26 times faster” than what professionals were using at the time. The system, called Epi Info, has been adopted by the Centers for Disease Control and translated into 13 languages. And as a Ph.D. student in computer science, he worked on compilers, programs that translate source code into a language that a computer can readily execute. “I’ve always liked code that runs fast,” he explains matter-of-factly.

But Dean has always been restless in his interests, and he didn’t want to work on compilers his whole life. So he left academia and landed less than three years later at Google, which had only about 20 employees at the time. (According to Steven Levy’s book In the Plex, the search startup saw Dean as something like a prized draft pick.) After contributing some important early work to Google News and AdSense, the advertising product that would rewrite the rules of the Internet economy, he turned his attention to one of the company’s core problems: scale.

Google’s founding ideas came from Page and Brin, world-class developers in their own right. In the late 1990s they built PageRank, an algorithm for returning the most relevant results to a given search query. The focus on relevance put Google on a course to surpass Yahoo, AltaVista, and the day’s other leading search engines. But as the upstart grew in popularity, it faced a tremendous computing challenge. “We couldn’t deploy machines fast enough” to keep up with demand, Dean recalls.

So Dean, working with fellow standout programmer Sanjay Ghemawat and others, did what he had done in high school with Epi Info: found software solutions to what seemed like hardware problems. Ghemawat helped lead a team that built the Google File System, which allowed for huge files to be efficiently distributed across thousands of cheap servers. Then Dean and Ghemawat developed a programming tool called MapReduce that allowed developers to efficiently process gargantuan data sets with those machines working in parallel. Much as a compiler allows a programmer to write code without worrying about the nitty-gritty of how the CPU will process it, MapReduce allowed Google’s developers to tweak the search algorithm or add new computations without having to worry about how to parallelize the operation or handle equipment failures.

Dean and Ghemawat’s approach was so powerful that, when they explained it to the public in a seminal 2004 research paper, it quickly became an industry standard. Today it underlies, among other things, Hadoop, the open-source framework that has helped make “big data” a buzz phrase in industries ranging from online travel to energy exploration. And while Google is beginning to move beyond MapReduce for some of its core operations, Dean says he still sees a big spike in usage when a new crop of summer interns arrives and begins working on new projects.

MapReduce is a good example of what Google co-founder Page is talking about when he talks about “10x”—doing things 10 times better, not 10 percent better than they’ve been done before. MapReduce didn’t make one type of operation a little faster. It allowed every programmer at Google to do things they might never have attempted otherwise.

Several of Dean’s other projects have had similarly exponential effects. Building on Google File System, he and Ghemawat helped create a distributed data storage system called BigTable that could handle petabytes of data. (A petabyte is 1 million gigabytes.) Then they went further and developed Spanner, which has been called the “world’s largest single database.” Thanks to an innovative approach to timekeeping, Spanner “stretches across the globe while behaving as if it’s all in one place,” in the words of Wired’s Cade Metz. In other words, it can keep information consistent across a worldwide network of data centers even though a given update may take longer to travel to some locations than others. Metz adds, “Before Spanner was revealed, many didn’t even think it was possible.”

At this point, the real facts about Jeff Dean may be starting to sound a bit like fake Jeff Dean facts. Dean himself laughs at the phenomenon, calling it “a little embarrassing, but flattering too.” The thing to keep in mind, he says, is that his real accomplishments are almost always the product of collaboration.

Almost every morning, he comes into work at the GooglePlex in Mountain View, Calif., and sits down for coffee with the same core group of people. “We’ve made 20,000 cappuccinos together” over the years, he estimates. These people don’t all work together. In fact, as Google has grown, some have moved to different buildings on opposite sides of the campus. But when they get together to dish about what they’re doing, their problems spark ideas in one another, Dean says. These coffee talks are what has enabled Dean to put his expertise in optimization, parallelization, and software infrastructure to work on such a wide array of projects. That and healthy doses of ambition and confidence. “He’s always very enthusiastic and optimistic about how much we can get done,” says Ghemawat, his longtime collaborator. “It’s hard to discourage him.”

His latest endeavors may be a good indicator of where Google is headed next. Last year, he collaborated with Stanford machine-learning expert and Coursera co-founder Andrew Ng to help one of Ng’s grad students, Quoc Le, perform a groundbreaking experiment in unsupervised machine learning. The study, performed at the secretive Google X “skunkworks,” put 16,000 processors to work studying YouTube videos without human oversight—and they came out with a pretty good idea of what a cat looks like. That might sound like a lot of computers producing a fairly basic result. But it could help lay the groundwork for the next generation of artificial intelligence, with potential applications ranging from “personal assistant” technologies like Google Now to image-search functions that could come in handy for Project Glass.

The Jeff Dean of “Jeff Dean facts” could probably invent things like that just by typing zeroes and ones on his special keyboard. The real Jeff Dean admits he isn’t a machine-learning expert but says he’s eager to help out with his skills in building scalable, high-performance systems.

Contrary to what the Jeff Dean facts imply, Dean says simply sitting down to write the perfect program is rarely the best way to tackle a problem. Instead, his process often begins with back-of-the-envelope calculations to find the optimal trade-off between quality and speed for a given process. “In a lot of these areas, from machine translation to search quality, you’re always trying to balance what you can do computationally with each query,” he says. “Maybe you can’t afford the ideal [solution], but if we can approximate it in a certain way, you can get 98 percent of the benefit with 1 percent of the computation.”

Dean does this so often that he has developed a list of “numbers that every computer engineer should know”—things like how many milliseconds it takes to send a packet from California to Amsterdam and back at the speed of light (about 150). Keep those in mind, he says, and many times “in 20 minutes on a whiteboard you can figure out which of three designs is going to be better.” Can’t do calculations that fast? That’s OK, Dean says. “Convert everything to rough powers of two and then it’s easier to multiply.”

If Dean has a superhuman power, then, it’s not the ability to do things perfectly in an instant. It’s the power to prioritize and optimize and deal in orders of magnitude. Put another way, it’s the power to recognize an opportunity to do something pretty well in far less time than it would take to do it perfectly. In Silicon Valley, that’s much cooler than shooting cowboys with an Uzi.

Google