How to Be Good

Why you can’t teach human values to artificial intelligence.

April 20, 20168:30 AM

Elon Musk. — Elon Musk, the co-founder of luxury electric U.S. car maker Tesla, speaks at the StartmeupHK Venture Forum in Hong Kong on Jan. 26.
Philippe Lopez/Getty Images

If you encountered a robot on the street, you would want it to give you the right of way instead of just rolling over your foot, right? Making room for a passerby is simple, but it’s just one of the many human “values” that we seek to make our increasingly prolific machine creations obey.

Computer scientists like Stuart Russell and technologists in companies building advanced artificial intelligence platforms say that they want to see A.I. “provably aligned with human values.” A scientist at the A.I. startup Anki recently assured Elon Musk and others that A.I. will be “friend”—not “foe.”

At first glance, little of this is objectionable. We have been conditioned ever since Isaac Asimov’s famous Three Laws of Robotics to believe that, without our moral guidance, artificially intelligent beings will make erroneous and even catastrophically harmful decisions. Computers are powerful but frustratingly dumb in their inability to grasp ambiguity and context. Russell and others want to ensure that computers make the “right” decisions when placed in contact with humans; these range from the simple right-of-way norm observed to more complex and fraught issues, such as deciding whose life to prioritize in a car accident.

However, Russell and others ignore the lessons of the last time that we seriously worried about how to interpreting how machines embedded in society reflected human beliefs and values. Twenty years ago, social scientists came to the conclusion that intelligent machines will always reflect the knowledge and experiences of the communities they are embedded within. The question is not whether machines can be made to obey human values but which humans ought to decide those values.

Russell, in a 2015 interview with Quanta magazine, acknowledged the challenge of ensuring that A.I. respect human values but also expressed some cautious optimism:

It’s a deliberately provocative statement, because it’s putting together two things—“provably” and “human values”—that seem incompatible. It might be that human values will forever remain somewhat mysterious. But to the extent that our values are revealed in our behavior, you would hope to be able to prove that the machine will be able to “get” most of it. There might be some bits and pieces left in the corners that the machine doesn’t understand or that we disagree on among ourselves. But as long as the machine has got the basics right, you should be able to show that it cannot be very harmful.

Russell goes on to argue that machines could learn approximations of human values from observing us and the cultural and media products we produce. Of course, the question then becomes: Which human values? Psychologists Joseph Henrich, Steven Heine, and Ara Norenzayan recently published a study showing that broad claims about basic human psychology and behavior generated from experiments held in Western, Industrialized, Rich, and Democratic (or WEIRD) societies do not generalize outside of them.

Henrich and his colleagues are not alone; social psychologists have long pointed to the empirical existence of cultural differences in thoughts and opinions about the nature of life between the West and the Rest. Not to fear, Russell counters. “[M]achines should err on the side of doing nothing in areas where there’s a conflict of values. … If you want to have a domestic robot in your house, it has to share a pretty good cross-section of human values.” Russell believes that a machine can observe how humans make complex tradeoffs and learn from our example. But he may want to brush up on his A.I. history.

Decades ago, state-of-the-art A.I. was knowledge-based expert systems. To build an expert system, a design team would painstakingly translate an expert’s knowledge of a domain into a reasoning method, so it would make decisions based on the facts (“Future Tense is a technology and society blog on Slate”) and rules (“You should always read Future Tense whenever a new post is published”) in the relevant domain. During the age of expert systems, many pondered the same questions that Russell and others do now—what does a machine know, and can it know what we know?

In 1985, Steve Woolgar penned a call to his fellow sociologists to make “sociology of machines.” The idea here wasn’t to teach ethics to machines, but to use A.I. to settle some of the most contentious social science debates about how to theorize human behavior. If A.I. was possible, Woolgar reasoned, “it would vindicate those philosophies that hold that human behavior can be codified and reduced to formal, programmable, and describable sequences.” Additionally, in 1994 a group of social scientists under the aegis of the National Science Foundation called for research in “artificial social intelligence” to study how A.I. could be harnessed to better understand the nature of human society and behavior from a social scientific perspective.

However, the project to merge social science and artificial intelligence ran into more than a few bumps in the road. In a 1990 book, sociologist Harry Collins suggests why. Collins argued that every community “knows” certain tacit things that are difficult if not impossible to fully represent computationally. Or, in other words, “computers can act intelligently to the degree that humans act mechanically.” Lots of human activities (like voting, greeting, praying, shopping, or writing a love letter) are “polymorphic”—socially shaped based on an understanding of how society expects the action to be performed. Often, we execute these contextual activities mechanically because no one bothers to question their status as socially expected behavior. One paper on modeling the evolution of norms was appropriately titled “learning to be thoughtless.”

As Collins pointed out, computers acquire human knowledge and abilities from the fact that they are embedded in human social contexts. A Japanese elder care personal robot, for example, is only able to act in a way acceptable to Japanese senior citizens because its programmers understand Japanese society. So talk of machines and human knowledge, values, and goals is frustratingly circular.

Which brings us back to Russell’s optimistic assumptions that computer scientists can sidestep these social questions through superior algorithms and engineering efforts. Russell is an engineer, not a humanities scholar. When he talks about “tradeoffs” and “value functions,” he assumes that a machine ought to be an artificial utilitarian. Russell also suggests that machines ought to learn a cross-section of human values from human cultural and media products. So does that mean a machine could learn about American race relations by watching the canonical pro-Ku Klux Klan and pro-Confederacy film The Birth of a Nation?

But Russell’s biggest problem lies in the very much “values”-based question of whose values ought to determine the values of the machine. One does not imagine too much overlap between hard-right Donald Trump supporters and hard-left Bernie Sanders supporters on some key social and political questions, for example. And the other (artificial) elephant in the room is the question of what gives Western, well-off, white male cisgender scientists such as Russell the right to determine how the machine encodes and develops human values, and whether or not everyone ought to have a say in determining the way that Russell’s hypothetical A.I. makes tradeoffs.

It is unlikely that social scientists like Collins and others can offer any definitive insights about these questions. However, it is even less likely that Russell and his peers can avoid them altogether through technical engineering efforts. If only the problem were indeed just how to engineer a system to respect human values; that would make it very easy. The harder problem is the thorny question of which humans ought to have the social, political, and economic power to make A.I. obey their values, and no amount of data-driven algorithms is going to solve it.

This article is part of the artificial intelligence installment of Futurography, a series in which Future Tense introduces readers to the technologies that will define tomorrow. Each month from January through June 2016, we’ll choose a new technology and break it down. Read more from Futurography on artificial intelligence:

“What’s the Deal With Artificial Intelligence Killing Humans?”
“Your Artificial Intelligence Cheat Sheet”
“Killer Robots on the Battlefield”
“The Wrong Cognitive Measuring Stick”

Future Tense is a collaboration among Arizona State University, New America, and Slate. To get the latest from Futurography in your inbox, sign up for the weekly Future Tense newsletter.