Scientists Are Just as Confused About the Ethics of Big-Data Research as You

Institutional review boards have long governed research ethics, but do they need to evolve in the digital age?
Six young people walkining Aerial Views
Bernhard Lang/Getty Images

When a rogue researcher last week released 70,000 OkCupid profiles, complete with usernames and sexual preferences, people were pissed. When Facebook researchers manipulated how stories appear in News Feeds for a mood contagion study in 20141, people were really pissed. OkCupid filed a copyright claim to take down the dataset; the journal that published Facebook’s study issued an “expression of concern.” Outrage has a way of shaping ethical boundaries. We learn from mistakes.

Shockingly, though, the researchers behind both of those big data blowups never anticipated public outrage. (The OkCupid research does not seem to have gone through any kind of ethical review process, and a Cornell ethics review board that did look at the Facebook study declined to review it due to the only limited involvement of two Cornell researchers2.) And that shows just how untested the ethics of this new field of research is. Unlike medical research, which has been shaped by decades of clinical trials, the risks—and rewards—of analyzing big, semi-public databases are just beginning to become clear.

And the patchwork of review boards responsible for overseeing those risks are only slowly inching into the 21st century. Under the Common Rule in the US, federally funded research has to go through ethical review. Rather than one unified system though, every single university has its own institutional review board, or IRB. Most IRB members are researchers at the university, most often in the biomedical sciences. Few are professional ethicists.

Even fewer have computer science or security expertise, which may be necessary to protect participants in this new kind of research. “The IRB may make very different decisions based on who is on the board, what university it is, and what they’re feeling that day,” says Kelsey Finch, policy counsel at the Future of Privacy Forum. There are hundreds of these IRBs in the US—and they’re grappling with research ethics in the digital age largely on their own.

Medical Origins

The Common Rule and the IRB system were born out of outrage, too—though over a far graver mistake. In the 1970s, the public finally learned about the US government’s decades-long Tuskegee experiment, in which African-American sharecroppers were left untreated with syphilis to study the disease’s progression. The controversy led to new regulations on human subjects research conducted for the United States Department of Health and Human Services, which then spread to all federal agencies. Now, any institution that gets federal funding has to set up an IRB to oversee research involving humans, whether it’s a new flu vaccine or an ethnography of rug sellers in Turkey.

“The structure was very much developed out of health agencies for experimental research,” says Zachary Schrag, a historian at George Mason University and the author of a book on IRBs in the social sciences. But not all human research is medical in nature, and many social scientists feel the process is ill-suited for their research, where risks are usually more subtle than life or death.

Certain IRB requirements can seem ridiculous when applied to the social sciences. Informed consent statements, for example, often have the phrase “the alternative to participating is…” to allay a patient’s possible fears that refusing to participate would mean they are refused a medical treatment. But if you’re looking for volunteers to fill out a survey on test-taking habits, then the only way to complete the phrase is the glaringly obvious “the alternative to participating is not participating.”

Social scientists have been crying foul about IRBs for a while now. The American Association of University Professors has recommended boosting the number of social scientists on IRBs or establishing separate boards that only evaluate social science research. In 2013, it went so far as to issue a report recommending researchers themselves get to decided whether their minimal risk work needs IRB approval or not, which would also free up more time for IRBs to devote to biomedical research with life-or-death stakes.

New Risks

That’s not to say social science research in general—and big data social science research in particular—don’t carry risks. With new technology, a system that never quite worked is working even less.

Elizabeth Buchanan, an ethicist at the University of Wisconsin-Stout, sees Internet-based research entering its third phase, which raises new ethical questions. The first phase started in the ‘90s with Internet-based surveys, and the second with data from social media sites. Now, in the third phase, researchers can buy, say, Twitter data going back years and merge it with other publicly available data. “It’s in the intermingling that we can see the tension in the ethics and privacy,” she says.

Buchanan recently sat on an IRB where she reviewed a proposal to merge social media mentions of the street name for a drug with publicly available crime information. Technically, all of the information was public at some point—even if some of those tweets have since been deleted or locked behind a private account. But the act of combining that information could mean identifying the people behind crimes through routine research. The IRB ultimately approved the project. In such cases, says Buchanan, you have to weigh the social value of the research against the risk and minimize risk in the first place by, for example, stripping public releases of data of potential identifiers.

The risks to participants can also be hard to predict as technology changes. In 2013, researchers at MIT found they could match names to publicly available DNA sequences based on information about research participants that the original researchers themselves posted online. The geneticist who figured this out? He used to be a whitehat hacker. “I think it’s really important for boards to have either a data scientist, computer scientist or IT security individual,” says Buchanan. “That’s the reality now. It’s not just thinking about someone getting upset about questions on a survey.”

Or maybe other institutions, like the open science repositories asking researchers to share data, should be picking up the slack on ethical issues. “Someone needs to provide oversight, but the optimal body is unlikely to be an IRB, which usually lacks subject matter expertise in de-identification and re-identification techniques,” Michelle Meyer, a bioethicist at Mount Sinai, writes in an email.

Even among Internet researchers familiar with the power of big data, attitudes vary. When Katie Shilton, an information technology research at the University of Maryland, interviewed 20 online data researchers, she found “significant disagreement” over issues like the ethics of ignoring Terms of Service and obtaining informed consent. Surprisingly, the researchers also said that ethical review boards had never challenged the ethics of their work---but peer reviewers and colleagues had. Various groups like the Association of Internet Researchers and the Center for Applied Internet Data Analysis have issued guidelines, but the people who actually have power---those on institutional review boards--are only just catching up.

Outside of academia, companies like Microsoft have started to institute their own ethical review processes. In December, Finch at the Future of Privacy Forum organized a workshop called Beyond IRBs to consider processes for ethical review outside of federally funded research. After all, modern tech companies like Facebook, OkCupid, Snapchat, Netflix sit atop a trove of data 20th century social scientists could have only dreamed up.

Of course, companies experiment on us all the time, whether it’s websites A/B testing headlines or grocery stores changing the configuration of their checkout line. But as these companies hire more data scientists out of PhD programs, academics are seeing an opportunity to bridge the divide and use that data to contribute to public knowledge. Maybe updated ethical guidelines can be forged out of those collaborations. Or it just might be a mess for a while.

1 UPDATE 5/23/16 1:30 PM ET: The story has been corrected to note that Facebook's mood contagion study manipulated how stories were prioritized in News Feeds.
2 UPDATE 5/23/16 1:30 PM ET: This story has been updated to clarify that Cornell's institutional review board concluded the Facebook study did not need to undergo a full review because of the limited role of Cornell researchers in the study.