How Google’s Jeff Dean became the Chuck Norris of the Internet.
Dean and Ghemawat’s approach was so powerful that, when they explained it to the public in a seminal 2004 research paper, it quickly became an industry standard. Today it underlies, among other things, Hadoop, the open-source framework that has helped make “big data” a buzz phrase in industries ranging from online travel to energy exploration. And while Google is beginning to move beyond MapReduce for some of its core operations, Dean says he still sees a big spike in usage when a new crop of summer interns arrives and begins working on new projects.
MapReduce is a good example of what Google co-founder Page is talking about when he talks about “10x”—doing things 10 times better, not 10 percent better than they’ve been done before. MapReduce didn’t make one type of operation a little faster. It allowed every programmer at Google to do things they might never have attempted otherwise.
Several of Dean’s other projects have had similarly exponential effects. Building on Google File System, he and Ghemawat helped create a distributed data storage system called BigTable that could handle petabytes of data. (A petabyte is 1 million gigabytes.) Then they went further and developed Spanner, which has been called the “world’s largest single database.” Thanks to an innovative approach to timekeeping, Spanner “stretches across the globe while behaving as if it’s all in one place,” in the words of Wired’s Cade Metz. In other words, it can keep information consistent across a worldwide network of data centers even though a given update may take longer to travel to some locations than others. Metz adds, “Before Spanner was revealed, many didn’t even think it was possible.”
At this point, the real facts about Jeff Dean may be starting to sound a bit like fake Jeff Dean facts. Dean himself laughs at the phenomenon, calling it “a little embarrassing, but flattering too.” The thing to keep in mind, he says, is that his real accomplishments are almost always the product of collaboration.
Almost every morning, he comes into work at the GooglePlex in Mountain View, Calif., and sits down for coffee with the same core group of people. “We’ve made 20,000 cappuccinos together” over the years, he estimates. These people don’t all work together. In fact, as Google has grown, some have moved to different buildings on opposite sides of the campus. But when they get together to dish about what they’re doing, their problems spark ideas in one another, Dean says. These coffee talks are what has enabled Dean to put his expertise in optimization, parallelization, and software infrastructure to work on such a wide array of projects. That and healthy doses of ambition and confidence. “He’s always very enthusiastic and optimistic about how much we can get done,” says Ghemawat, his longtime collaborator. “It’s hard to discourage him.”
His latest endeavors may be a good indicator of where Google is headed next. Last year, he collaborated with Stanford machine-learning expert and Coursera co-founder Andrew Ng to help one of Ng’s grad students, Quoc Le, perform a groundbreaking experiment in unsupervised machine learning. The study, performed at the secretive Google X “skunkworks,” put 16,000 processors to work studying YouTube videos without human oversight—and they came out with a pretty good idea of what a cat looks like. That might sound like a lot of computers producing a fairly basic result. But it could help lay the groundwork for the next generation of artificial intelligence, with potential applications ranging from “personal assistant” technologies like Google Now to image-search functions that could come in handy for Project Glass.
The Jeff Dean of “Jeff Dean facts” could probably invent things like that just by typing zeroes and ones on his special keyboard. The real Jeff Dean admits he isn’t a machine-learning expert but says he’s eager to help out with his skills in building scalable, high-performance systems.
Contrary to what the Jeff Dean facts imply, Dean says simply sitting down to write the perfect program is rarely the best way to tackle a problem. Instead, his process often begins with back-of-the-envelope calculations to find the optimal trade-off between quality and speed for a given process. “In a lot of these areas, from machine translation to search quality, you’re always trying to balance what you can do computationally with each query,” he says. “Maybe you can’t afford the ideal [solution], but if we can approximate it in a certain way, you can get 98 percent of the benefit with 1 percent of the computation.”
Dean does this so often that he has developed a list of “numbers that every computer engineer should know”—things like how many milliseconds it takes to send a packet from California to Amsterdam and back at the speed of light (about 150). Keep those in mind, he says, and many times “in 20 minutes on a whiteboard you can figure out which of three designs is going to be better.” Can’t do calculations that fast? That’s OK, Dean says. “Convert everything to rough powers of two and then it’s easier to multiply.”
If Dean has a superhuman power, then, it’s not the ability to do things perfectly in an instant. It’s the power to prioritize and optimize and deal in orders of magnitude. Put another way, it’s the power to recognize an opportunity to do something pretty well in far less time than it would take to do it perfectly. In Silicon Valley, that’s much cooler than shooting cowboys with an Uzi.