Providence Talks program and the rise of social-engineering surveillance.

We Need a Nuremberg Code for Big Data

We Need a Nuremberg Code for Big Data

The citizen’s guide to the future.
June 20 2013 7:17 AM

We Need a Nuremberg Code for Big Data

The world of social-engineering surveillance is growing rapidly.

German toddlers of the "Frogs" group play in the garden at the Spreekita Kindergarten in Berlin May 3, 2007.
German toddlers of the "Frogs" group play in the garden at the Spreekita Kindergarten in Berlin May 3, 2007.

Photo by John MacDougall/Getty Images

Recent revelations about the federal government’s PRISM program have sparked widespread debate about the benefits and harms of state surveillance of Americans in the name of national security. But what about the surveillance we submit to in the service of more mundane activities, like improving children’s vocabularies or increasing student engagement in the classroom? This growing world of social-engineering surveillance has garnered far less attention and controversy but poses significant challenges to the future of privacy.

This spring, the city of Providence, R.I., won the grand prize in the Bloomberg Philanthropies’ Mayor’s Challenge, an annual competition that invites the leaders of cities to propose innovative solutions to urban problems. Providence will use the $5 million prize money to launch Providence Talks, a project targeting the so-called “word gap.” The program draws on the work of psychologists Betty Hart and Todd Risley, whose research in the 1990s on parent-child communication concluded that by the age of 3, lower-income children had heard 30 million fewer words than their better-off peers, leaving them at a disadvantage as they entered school.

Providence Talks hopes to bridge that gap, but with a technological twist. Instead of clipboard-wielding researchers fanning out into a small number of homes, as Hart and Risley did in the 1990s, Providence Talk participants—that is, infants and toddlers—will be constantly surveilled by recording devices provided by LENA, a company that specializes in language environment analysis. For 16 hours, one day a month, the kids will wear little recording devices. LENA has even devised special clothing for its research subjects, including a dapper pair of green overalls and a sweet pink pullover (festooned with the LENA company logo) to safely house the recording devices—prompting images of crafty toddlers engaging in spontaneous acts of civil disobedience by dumping applesauce in the clothing’s “high-tech pockets.”


The program targets low-income families eligible for home visits under the state’s Universal Newborn Screening program and ideally will begin recording children at birth. After analyzing the data, researchers will create evaluations, which will be passed along to the social workers and nurses who meet monthly with families. Then, the social workers will offer strategies for improving the way lower-income parents talk to their kids, such as pointing out everyday objects and responding to infants’ vocalizations. But the program has larger ambitions: As the mayor of Providence wrote in his proposal, “We believe these data will be useful for city managers as well. Aggregate data on block and neighborhood level household auditory environments would allow us to direct existing early childhood resources with a level of precision and thoughtfulness never before possible.”

Setting aside the project’s logistical challenges (will families, hyperaware of the recording devices, end up overcompensating, thus skewing the results?) and the many issues it raises with regard to social class, it is surprising how little discussion of privacy and consent the project has prompted. The program is described as “free, confidential, and completely voluntary,” and according to LENA’s descriptions of its technology, it is possible for the recorder to encrypt what it records, although it’s not clear that it will be so for Providence participants. And in their proposal to Bloomberg Philanthropies, city officials claimed that the recordings would be deleted after being analyzed by LENA’s software. Of course, those data are proprietary, and LENA has so far stated no intention of making it publicly available to other researchers.

More worrisome, however, is the lack of concern about how state surveillance of private citizens—even in the interest of “improving” those citizens—is increasing with little public debate about the challenges such interventions pose to freedom and autonomy. Research has demonstrated that teaching low-income families to talk more to their children yields positive results, but why is intrusive technological surveillance necessarily better than simply having social workers emphasize that during home visits? Even if digital surveillance can provide a bit more detail about a family’s conversational patterns, is this extra information worth the cost in terms of the money spent on the technology and the loss of privacy? If Providence’s pilot project yields rich data and good results for families, will the state of Rhode Island make it mandatory for anyone applying for government assistance?

You don’t have to live in Providence to be the subject of social engineering surveillance. If you are on Facebook, have enrolled in a MOOC, or use electronic textbooks, you are also, perhaps unwittingly, part of a growing number of human subjects used in Big Data research. MOOCs such as edX, Coursera, and Udacity are all engaged in large-scale data collection of the students registered in their courses, ostensibly for the purpose of improving course offerings. But such information can also be sold to third parties.

At universities such as Texas A&M, professors who use digital textbooks developed by Silicon Valley startup CourseSmart can track whether or not their students are reading and annotating their digital textbooks diligently enough (not unlike what your Kindle or Nook are doing if you read on them). As Evgeny Morozov observed about CourseSmart last year, “This data may seem trivial but once merged with other data—say, their Facebook friends or their Google searches—it suddenly becomes very valuable to advertisers and potential employers.” CourseSmart generates an “engagement index” to assess student performance based on the tracking data; more than 3 million students use its textbooks, which generate masses of data for the company. As a recent story in the New York Times revealed, the efforts of students who take their notes by hand or on a file on their personal computer instead of in the digital text itself are not included in the engagement index, leaving them vulnerable to giving their instructors the impression that they aren’t spending serious time with the course textbook—even if they are.