What Happens When You Let Machines Edit All of Your Videos?

Sept 12, 20143:16 PM

How much should you let a program edit your family videos?
LiveLight screencap.

Most videos on the Internet should be shorter, but most people don’t know how to edit video. This problem haunts otherwise worthy Web videos of all kinds (even duckling rescues aren’t immune), and the onslaught of amateur video from Facebook and sophisticated new apps are poised to make it worse.

Plenty of developers have engineered solutions in “automatic” video editors that ingest footage and package it more or less on their own. Google offers auto-editing features like music and automated scene selection for videos uploaded to Google Plus, and services like Magisto rely on an algorithm that uses footage and user choices to create slick, ready-made edits. Videolicious has made it into newsrooms at the Washington Post, enabling reporters to assemble quick videos from a variety of photos and rough footage.

This summer, Instagram users have embraced Hyperlapse, and that may signal a marquee moment for this form: The app builds ready-to-share time lapses from video shot on an iPhone, stabilizes the footage for unsteady hands, and produces short movies with a “cinematic feel,” as Instagram likes to put it. Early uses are not always revelatory—dancing clouds from airplane windows are the new dewy sunsets—but the technology is showing how users are responding to the automated video editors of the future.

But these services still rely on people to make at least some of the decisions in the editing process, especially in adding sounds and effects that give the automated videos their proto-professional feel. What if they cut out the human editor altogether? What if videos were edited in real time, even as someone was shooting them? That notion animates Bin Zhao, a graduate student in machine learning at Carnegie Mellon University, who began demonstrating a system called LiveLight this summer with CMU professor Eric Xing. The technology relies on an algorithm that “watches” a video only once, building what Zhao called a “visual dictionary” to analyze the footage. The process relies on visuals, not sound, using movement and novel cues to build an instant summary of the footage. Here’s a rough demonstration:

LiveLight technology is in the early stages compared with some of its competitors, making it feel like another tool to “cut the boring parts” out of long videos for now. Zhao told me, for instance, that he’s already testing its obvious ability to detect any suspicious movements in an otherwise static security feed.

But the long game is more interesting. LiveLight could take hours of footage from Google Glass, GoPros, smartwatches, and other wearable devices, then process it in almost real time, giving a raw summary of a just-completed event. “If we deploy this app onto Google Glass, by the time the video is finished recording—maybe within 30 seconds or one minute—the video will be waiting to view,” Zhao said. He imagines a fascinating if eerie future where LiveLight can aid in a kind of selective self-surveillance: “The system will pull out the most interesting things that happened that day—a video summary, a video diary, a video journal of what has happened.” It’s easy to see the potential for digesting a day of travel or meetings, not to mention applications for police departments.

Zhao gave me the keys to try LiveLight out, and the tool is now decidedly scrappier than these larger ambitions will require. An hour of video can take well more than an hour to process at this point because of limited backend power. Nevertheless, I tried some rudimentary raw footage, and it worked as advertised, editing it based on new scenes and sudden movements. I’d singled out similar moments in the video I uploaded.

At a colleague’s suggestion, and with Zhao’s reluctant permission, I also trolled the system a little bit, feeding it more complicated (and nonamateur) video to see how it would grapple with complex footage. I was pleasantly surprised. I tried it on ClickHole’s parody viral video “This Stick Of Butter Is Left Out At Room Temperature; You Won’t Believe What Happens Next,” featuring three full hours of butter slowly melting in a dish. LiveLight was unfazed, spitting out a short video summary and this charming, undeniably accurate GIF:

I also gave LiveLight several other videos, from short footage to feature-length movies, and the canniest result I got was for Stanley Kubrick’s 2001: A Space Odyssey. The movie seemed an obvious choice to pit against an all-knowing algorithm, and LiveLight came up with a creepy, not-half-bad summary of the movie’s visual arc:

Perhaps grudgingly, LiveLight does allow humans to tinker with its edits, offering a ranked list of important moments that can be used to add cut sequences back into the final product. The team also plans to develop the product for near-term consumer and security applications (which explains why Carnegie Mellon’s research is supported by groups as varied as Google and the Air Force Office of Scientific Research). But Zhao said the prospect of automatic, indexed visual libraries of our life events is the technology’s most alluring possibility. At its most basic, he said, the tool is “a way to be able to organize people’s videos. Right now they are like the dark matter of the Internet.”

That may be true, but LiveLight suggests automated video editors may not just create and organize “social movies” for us—they may mean trusting an algorithm with how we process the memories themselves. Even tools as simple as Hyperlapse are creating a narrative. A personal, all-day video feed may give us pause for obvious reasons, but perhaps the spookier implication is that a machine would be helping us decide what to remember—and what to forget.