God Help Us, These Researchers Are Using Reddit to Teach a Supercomputer to Talk

Aug 17, 20164:44 PM

Can machines use Reddit to get smarter? We’re pretty sure humans can’t.
Photo by Noam Galai/Getty Images for TechCrunch

OpenAI, the nonprofit backed by Elon Musk and Peter Thiel, wants to teach technology to talk. It has enlisted the help of a supercomputer named DGX-1 to help train its machine learning systems. (What are machine learning systems? MIT Technology Review describes them alluringly as a “network of crudely simulated neurons” that use data to glean “a probabilistic understanding of conversation.”) DGX-1 can feed prodigious amounts of natural language to OpenAI’s robotic reticulation, which then takes the input as a model for its own “speech.” All the student teacher pair needs in its quest for cocktail chatter mastery is source material.

Source material—that sounds easy enough! Did the researchers prescribe a steady diet of luminous prose from English’s marquee authors? Did they plunder the canon for Martin Luther King Jr.’s oratory, Virginia Woolf’s collected letters, and Tennessee Williams’ plays?

Nope. “We’re training,” said OpenAI research scientist Andrej Karpathy in a press release, “on entire years of conversations of people talking to each other on Reddit.”

Oh boy.

To recap: Of all the possible linguistic corpora on earth, these scientists have decided to expose their learning systems to a discourse that usually ends with someone calling someone else a fat gay loser cuck and comparing him to Hitler. And then the second guy cracks a xenophobic, sexually explicit joke about the first guy’s mom. And then the first guy pretends to solve the Boston Bombing.

Have we learned nothing from Tay, the Microsoft chatbot that spewed foul racist garbage after only a few hours of interacting with trolls on Twitter? Sure, Reddit models a colloquial tone, as Sophie Kleeman at Gizmodo points out, and its many communities discuss a wide range of subjects, but it is also frequently the boneyard where all grace and decency go to die. Will OpenAI’s learning systems absorb strategies for choosing careers and college majors, or only gain expertise in nihilistic lulz and platform-specific acronyms? At least, a success from the researchers on this front would break new ground: They’d have created the only brain ever to get smarter by reading Reddit.