AI Research Is in Desperate Need of an Ethical Watchdog

More social scientists are using AI intending to solve society’s ills, but they don’t have clear ethical guidelines to prevent them from accidentally harming people.
Image may contain Chess Game Human and Person
Getty Images

About a week ago, Stanford University researchers posted online a study on the latest dystopian AI: They'd made a machine learning algorithm that essentially works as gaydar. After training it with tens of thousands of photographs from dating sites, the algorithm could perform better than a human judge in specific instances. For example, when given photographs of a gay white man and a straight white man taken from dating sites, the algorithm could guess which one was gay more accurately than actual people participating in the study.* The researchers’ motives? They wanted to protect gay people. “[Our] findings expose a threat to the privacy and safety of gay men and women,” wrote Michal Kosinski and Yilun Wang in the paper. They built the bomb so they could alert the public about its dangers.

Alas, their good intentions fell on deaf ears. In a joint statement, LGBT advocacy groups Human Rights Campaign and GLAAD condemned the work, writing that the researchers had built a tool based on “junk science” that governments could use to identify and persecute gay people. AI expert Kate Crawford of Microsoft Research called it “AI phrenology” on Twitter. The American Psychological Association, whose journal was readying their work for publication, now says the study is under “ethical review.” Kosinski has received e-mail death threats.

But the controversy illuminates a problem in AI bigger than any single algorithm. More social scientists are using AI intending to solve society’s ills, but they don’t have clear ethical guidelines to prevent them from accidentally harming people, says ethicist Jake Metcalf of Data & Society. “There aren’t consistent standards or transparent review practices,” he says. The guidelines governing social experiments are outdated and often irrelevant—meaning researchers have to make ad hoc rules as they go.

Right now, if government-funded scientists want to research humans for a study, the law requires them to get the approval of an ethics committee known as an institutional review board, or IRB. Stanford’s review board approved Kosinski and Wang’s study. But these boards use rules developed 40 years ago for protecting people during real-life interactions, such as drawing blood or conducting interviews. “The regulations were designed for a very specific type of research harm and a specific set of research methods that simply don’t hold for data science,” says Metcalf.

For example, if you merely use a database without interacting with real humans for a study, it’s not clear that you have to consult a review board at all. Review boards aren’t allowed to evaluate a study based on its potential social consequences. “The vast, vast, vast majority of what we call ‘big data’ research does not fall under the purview of federal regulations,” says Metcalf.

So researchers have to take ethics into their own hands. Take a recent example: Last month, researchers affiliated with Stony Brook University and several major internet companies released a free app, a machine learning algorithm that guesses ethnicity and nationality from a name to about 80 percent accuracy. They trained the algorithm using millions of names from Twitter and from e-mail contact lists provided by an undisclosed company—and they didn't have to go through a university review board to make the app.

The app, called NamePrism, allows you to analyze millions of names at a time to look for society-level trends. Stony Brook computer scientist Steven Skiena, who used to work for the undisclosed company, says you could use it to track the hiring tendencies in swaths of industry. “The purpose of this tool is to identify and prevent discrimination,” says Skiena.

Skiena's team wants academics and non-commercial researchers to use NamePrism. (They don’t get commercial funding to support the app’s server, although their team includes researchers affiliated with Amazon, Yahoo, Verizon, and NEC.) Psychologist Sean Young, who heads University of California’s Institute for Prediction Technology and is unaffiliated with NamePrism, says he could see himself using the app in HIV prevention research to efficiently target and help high-risk groups, such as minority men who have sex with men.

But ultimately, NamePrism is just a tool, and it’s up to users how they wield it. “You can use a hammer to build a house or break a house,” says sociologist Matthew Salganik of Princeton University and the author of Bit by Bit: Social Research In The Digital Age. “You could use this tool to help potentially identify discrimination. But you could also use this tool to discriminate.”

Skiena’s group considered possible abuse before they released the app. But without having to go through a university IRB, they came up with their own safeguards. On the website, anonymous users can test no more than a thousand names per hour, and Skiena says they would restrict users further if necessary. Researchers who want to use the app for large-scale studies have to ask for permission from Skiena. He describes the approval process as "fairly ad hoc." He has refused access to businesses and accepted applications from academics affiliated with established institutions who have proposed "what seem to be reasonable topics of study." He also points out that names are public data.

The group also went through an ethics review at the company that provided training list of names, although Metcalf says that an evaluation at a private company is the “weakest level of review that they could do." That's because the law does not require companies to follow the same regulations as publicly-funded research. “It’s not transparent at all to you or me how [the evaluation] was made, and whether it’s trustworthy,” Metcalf says.

But the problem isn’t about NamePrism. “This tool by itself is not likely to cause a lot of harm,” says Metcalf. In fact, NamePrism could do a lot of good. Instead, the problem is the broken ethical system around it. AI researchers—sometimes with the noblest of intentions—don’t have clear standards for preventing potential harms. “It’s not very sexy,” says Metcalf. “There’s no Skynet or Terminator in that narrative.”

Metcalf, along with researchers from six other institutions, has recently formed a group called Pervade to try to mend the system. This summer, they received a three million dollar grant from the National Science Foundation, and over the next four years, Pervade wants to put together a clearer ethical process for big data research that both universities and companies could use. “Our goal is to figure out, what regulations are actually helpful?” he says. But before then, we’ll be relying on the kindness—and foresight—of strangers.

*Correction at 1:26 p.m. on 9/19/2017: An earlier version of this story misstated the accuracy of the Stanford algorithm.