Jason Kottke: How does a computer program compare to the Web's best link blogger?

Jason Kottke: How does a computer program compare to the Web's best link blogger?

Jason Kottke: How does a computer program compare to the Web's best link blogger?

Will these machines steal your job?
Sept. 27 2011 7:20 PM

Will Robots Steal Your Job?

We wrote a computer program that replicates Jason Kottke. How does Robottke compare to the Web's best link blogger?

Jason Kottke.
Jason Kottke

I asked my colleague Chris Wilson to build a robotic Jason Kottke because the human version looked like he could use a break. Kottke launched his eponymous site in 1998, back when blogging was just getting started. The Web has changed enormously in that time, but Kottke.org remains as vital as ever. Part of why I like the site is that Kottke and I share many interests: science, technology, Michael Lewis, typography, The New Yorker, David Foster Wallace, The Wire, and North Korea. But lots of people blog about those subjects. Kottke's genius is his ability to find links that elude the rest of us. Every day, I come across hundreds of URLs on Twitter, Facebook, email, and dozens of blogs. But of the five or six links Kottke posts on a typical day, two or three are brand new to me, and almost every one points me toward an awe-inspiring corner of the Web.

There used to be many Kottkes. During the heyday of blogging—between the late 1990s and about 2006—Kottke-like "link blogs" were a primary format. Sites like John Barger's Robot Wisdom served a useful purpose for readers—there was new stuff online everyday; you went to a link blog to find the best of it—while allowing authors to demonstrate their curatorial mastery over the Web.

There are still several popular link blogs today; Boing Boing, Daring Fireball, and The Drudge Report are canonical examples. But the format is on the decline. Barger, for instance, no longer blogs—instead, he tweets and shares links using his RSS reader. Sites like Reddit, Digg, and Hacker News also allow readers to submit links and rank them by votes. Other services scan blogs, Twitter, and Facebook in search of the most popular stuff (see Gabe Rivera's Techmeme, Memeorandum, and Mediagazer). Several times a day, I get an email from a wonderful service called Summify that sends me the most-shared links among the people I follow on Twitter. Kottke himself now spends less time blogging than building a link-collecting service called Stellar, which lets you follow the things people liked on Twitter, Flickr, YouTube, and Vimeo.

Advertisement

It's obvious why these new systems have superseded Kottke-style blogs. The handmade link blog depends on one person's—or a handful of people's—ability to sort a never-ending stream of new content. "Some days, you just don't want to do it," Kottke says. "You look at so much stuff everyday and it all becomes kind of the same—all equally interesting or uninteresting. It's hard to maintain that sense of discovery, that little hit that you get when you find something that you haven't seen before. I've posted 15,000, maybe 20,000 links since I started. I've been whittling down the discovery space of things that are going to be new and interesting."

In computer science parlance, Kottke doesn't scale. That's a shame. While services that collect popular stuff online are useful, they lack any editorial sensibility. The links on Techmeme and Summify represent a horde's view of the Web. The material on Kottke represents one guy's indispensible take. The Web ought to have both kinds of aggregators, but I'd love to see more people starting link blogs that offer a clear editorial vision. But how do you get more of something so hard to do?

Enter Robottke. Over the last few weeks, Chris Wilson has been building a machine that aims to automatically generate links you might find on Kottke.org. Robottke isn't meant to replace flesh-and-blood Kottke; we just want to come up with a list of items that Jason Kottke might link to each day. 

You can check out Robbotke here. How does it work? We began by crawling all the sources that Jason Kottke is likely to look at every day—we look at all the sites he links to, and all the stuff that people he follows on Twitter are sharing. The hard part is choosing the best, most Kottke-like links from Robottke's collection. It's helpful that the human Kottke meticulously tags all of his posts with keywords. When Robottke finds a link, it searches for topics that it knows Kottke likes—the more it finds, the higher the article ranks.

Advertisement

The results can be charming: As I write this, Robottke's top link is to a video that explores the scientific mystery of how a bike can stay balanced when it rolls along without a rider. Robottke can also miss the mark wildly: Other links in today's edition of Robottke include a 2010 Washington Post op-ed about the cost of the Iraq war (too old, too political for Kottke), an interview with Michael Moore (also too political), and a link to the traffic stats page of the Cynical-C Blog (oops—that tricked our algorithm).

My verdict: Robottke is far from perfect, but it's not bad, either. With some more effort and artificial-intelligence expertise, I believe Robottke stands a good chance of succeeding.

When I asked Jason Kottke what he thought, he was diplomatic. "Um," he began, and then paused for a few seconds. "I think some of them are good and some are way off. The biggest thing is that it's not necessarily about the links. People tell me that what they like about my site is what I say about the links—it's how I present the links, how I sell them in a certain sense. And this doesn't do a very good job of that at all."

Kottke is right. Robottke can't write, it can't do headlines, and it can't find the best excerpt to quote. Well, it can't, yet. We'd love to get people to help us improve Robottke; if you're a coder who's got an interest in artificial intelligence and the media, contact Chris Wilson for Robottke's source code. (FYI, the source is in Python.)

Advertisement

We need your help, because the flesh-and-blood Kottke is slowing down. "I started building Stellar because I was basically burned out on blogging," he says. In 2006 and 2007, he would spend nearly all of his waking hours in search of links, and he always found himself frustrated. "Then, over a period of 6 to 8 months, I suddenly started getting a lot more efficient," Kottke says. He attributes the change to experience—"maybe I hit 10,000 hours"—and to new tools, like Twitter. "I found that I could do the site in much less time—now I'm spending only a couple of hours a day on Kottke.org," he says. "Now there are fewer items a day, and I'm writing a little bit less about each thing, and I don't often take the time to really dig into something that's a little more interesting the way I used to."

When I asked if he plans to wind down Kottke.org, he answered cryptically. "I would probably stop working on it as a job, but I don't think I would ever stop blogging," he said. "I feel a certain responsibility to people who are reading, and it is something that I like doing, but I don't like doing it full time, all day anymore."

That's the advantage of Robottke: It will never get burned out.

***

Advertisement

In this series, I've been looking at machines that can replace high-skilled human professionals. I've already covered my dad's profession, pharmacy, and my wife's medical specialty, pathology. Now, it's time to look in the mirror. As a writer, I like to think of myself as having uniquely human skills. I write columns about stuff that human readers care about, and in doing so I try to elicit human emotions—joy, fascination, fury. Machines can't yet mimic this sort of creativity. But as I surveyed efforts to automate journalism—and as we learned from our work on Robottke—I found that my job may not be beyond the capacities of a robot.

I go through the same routine every weekday morning. First, I launch a half-dozen technology news sites and open up dozens of stories. As I scan each headline and skim a bit of the story, an algorithm executes in my head: Is this interesting enough to write about? If not, skip. When I do encounter a story that appeals to me, my brain launches another script: Can I come up with a unique take? Will people care about this? My process isn't conscious, and it's not always so systematic. Often I'll get distracted by email or the phone, and if I've been up all night tending to the baby, I'm likely to spend more time goofing off on Facebook than looking for something to write about. But these are human failings; if I were a robot, I'd follow the algorithm perfectly—and likely come up with better story ideas as a result.

This is the theory behind current efforts to automate journalism: Whether we admit it or not, many writers loosely follow a script when they work. How do you write a wire-service-style news story? You start by noting the most important thing that happened—what the jury found, who got killed, which team won. Then you get into the particulars. Structural rules even dominate more high-minded genres. The former New Yorker writer Dan Baum once passed along this bit of advice from his editor at the vaunted magazine: "You can use any narrative structure you like. … Just know that when I get it, I'm going to take it apart and make it all chronological." So, here's the algorithm to write a New Yorker story: 1) Say what happened first. 2) Say what happened next. 3) If there's more stuff to say, repeat 2); otherwise, end.

At the moment, no computer on earth can write anything like a New Yorker story. But machines can do a serviceable job with more rote pieces. Look at this excerpt from a site called FriscoFan.com, which posts a game recap every time the San Francisco Giants play:

Advertisement

            Giants Batter Rockies 3-1

Ryan Vogelsong worked an impressive seven innings and Mike Fontenot hit a triple to put the Giants ahead in the third as San Francisco slipped past Colorado, 3-1.

With the game scoreless in the bottom of the third, the Giants grabbed the lead when Fontenot drove in Andres Torres and Vogelsong after hitting a triple off starter Jhoulys Chacin.

Vogelsong's record improved to 13-7 on the year after a top-notch outing in which he surrendered no runs on four hits with four strikeouts and no walks. Santiago Casilla worked 2/3 of an inning to pick up the save, his sixth of the season, allowing no runs on no hits while punching out one and walking no one. Chacin got the loss (11-14) and gave up two runs on eight hits with three strikeouts and one walk over seven innings.

That story was written by a computer. Everything on FriscoFan is written by machines. The site was created by a company called Automated Insights (formerly known as StatSheet), which runs nearly 400 other sites like FriscoFan, each one devoted to covering Major League Baseball or NCAA basketball teams. Automated Insights generates stories just as a human hack would. It starts by looking at the stats. Then, it comes up with the perfect tone—if a hometown favorite was defeated, the "writer" will sound crestfallen. Next, the machine consults a huge database of phrases in search of words that match the story told by the numbers. If one team beats another soundly, you get something like this headline: "Giants Batter Rockies."

Automated Insight CEO Robbie Allen
Automated Insight CEO Robbie Allen

Automated Insight's stories are far from original, but they are also not obviously robotic. (I've read a lot of text written by humans that's not nearly so coherent.) And what the machines lack in originality, they make up for in price and speed. Robbie Allen, the founder and CEO of Automated Insights, hires a handful writers who spend their days coming up with phrases like "top-notch outing" to add to the library. Through automation, he can turn their efforts into an amazing number of articles. "Last year during college basketball season, we generated 64,000 stories," Allen says. "I did a calculation and it would have taken 100 writers writing four stories a day to get similar results."

It's not just that Automated Insights has managed to replace human writers—it's managed to surpass them. Because human sportswriters are expensive, you'd never pay one to spend his time following a single small team that doesn't have a large following. Automated Insights can follow every team. And when it branches out to other kinds of content later this year—it just received a pile of funding to do so—it will be able to create narratives for every company's stock market fluctuations, or blurbs about every weather event in every small town in America. Automated Insight's main competitor, a firm called Narrative Science, has similarly broad ambitions.

And yet there's something missing in Allen's approach: creativity. While I recognize that there are algorithms that power my work, my columns are also peppered with stuff computers can't master—amazing wordplay, irresistible arguments, cutting wit. Two of my most popular Slate articles this year—a rant about people who erroneously use two spaces after a period, and a rant about the snobs who write letters to NPR—weren't inspired by my rigorous daily Web searching. They just jumped into my mind out of the blue. (This happens to me at least once a week in the shower.)

Could a computer have come upon these topics? Not today. If we define human creativity as a kind of invention meant to please other human beings, machines would seem ill-suited for the task—computers are good at copying, not at coming up with wholly new things. A piece of software can insert the phrase "top-notch outing" in an article. It can't decide that "top-notch outing" is a lame sportswriting chestnut and come up with another phrase instead.

None of the experts on computers and language that I consulted could tell me how computers would make the leap from mimicry to creativity. But they did point me to recent developments that show a few small steps in that direction. A team at Google has made some progress on teaching a machine to translate poetry, a task that's difficult even for human translators. Chloe Kiddon and Yuriy Brun, researchers at the University of Washington, meanwhile, taught a machine to recognize when it would be correct to say, "That's what she said!"

Looking at what computers can do now and how fast they're progressing, Automated Insight's Allen is convinced that the machines can't be stopped. "Because it's algorithmic, it can only get better over time," he says. "Within five years, our writing will be as good as the best sportswriters out there. And it's going to be difficult for humans to compete." It's really hard for me to imagine that Allen will be right. And yet, five years is a really long time for computers. I'm not sure it would be smart to bet against him.