Humans Run Experiments, a Robot Writes the Paper

The future of automated scientific writing is upon us—and that’s a good thing.

Dec 12, 20179:00 AM

In 2014, a researcher in France revealed a disturbing fact about the published scientific literature: At least 120 computer-generated manuscripts had made their way into academic conference proceedings, according to his analysis. Those robot-written papers, containing little more than strung-together buzzwords, had been created with a piece of software known as SCIgen, originally written on a lark by a trio of MIT graduate students in 2005. But in the years since, it seemed scientists had repurposed SCIgen to puff up their resumes and boost their professional status. This was understood to be a major scandal.

For Klemen Zupancic, though, the scandal was a source of inspiration. “It got us thinking,” the 32-year-old molecular biologist and tech entrepreneur told me this week from his office in Slovenia. Zupancic is head of sciNote, a tech startup that builds tools for helping scientists to switch from using pen-and-paper laboratory notebooks to more efficient online apps. (The company claims to have about 20,000 users, of which almost half are in the U.S.) When he read about the infiltration of academic journals by robo-generated text, he realized that the same approach might be used for honest ends. If software can publish scientific gobbledygook, then maybe it can write a valid scientific paper, too. So his company set out to create a program that would do just that.

The result of this effort, called Manuscript Writer, came out in early November. It works by searching through a sciNote user’s references, data, and protocols, and then stringing bits and pieces end-to-end in a rough draft of a formal academic paper. I mean a very rough draft: The software doesn’t even try to write a discussion section or interpret an experiment’s results; and based on what I’ve seen, the rest isn’t that much better than what you’d get from using SCIgen. Manuscript Writer constructs an introduction, for example, by pulling sentences and sentence fragments from a set of open-access references and laying them out in what appears to be no particular order.

The sciNote system is likely to improve, though. In theory, its A.I. will learn from its mistakes by comparing users’ finished papers to the software’s first attempts. Given what we’ve already seen in automated journalism, it’s not so crazy to predict that the quality of science paper robo-prose will soon become much better than it is today. Perhaps we’ll even reach the point where it’s about as good (or about as bad) as the work of average human scientists.

Indeed, we should all be looking forward to that day. Humans may be essential when it comes to formulating theories to explain results, but the rest of scientific writing—from a paper’s introduction through its description of experiments, methods, and results—would likely benefit from automation.

It’s not as though the quality of academic prose could end up that much worse than it is today. In fact, leading scientists have long bemoaned the lousy writing of their peers: In 1908, for example, Francis Galton presented his “Suggestions for Improving the Literary Style of Scientific Memoirs” to the Royal Society of Literature. “I have had occasion to read many memoirs in manuscript, on subjects where I was fairly at home, in which there was nothing especially recondite,” he wrote then, “but the expressions used in them were so obscure, the grammar so bad, and the arrangement so faulty, that they were scarcely intelligible on a first reading. … The writers of them may have been, and probably were successful investigators, but their powers of literary exposition were of a sadly low order; so low that they could hardly be made to realize their deficiencies.”

Things have only gotten worse since Galton’s gripe. When scientists settled on a lingua franca for their work, it meant that researchers around the world would have to write in English regardless of their skill at using it. At the same time, the growth of the scientific enterprise after World War II, and its balkanization into increasingly specialized sub-disciplines made research articles more technical and formalized. Around the mid-20^th century, a dreary template for their writings began to spread throughout the sciences: First an Introduction, followed by the Methods used for the experiment, then a resume of the Results and a section for Discussion. By the 1970s, this “IMRaD” format was nearly universal in the literature.

As these changes solidified, scientific writing became less a vehicle for rhetoric than a conduit for data. Papers started to look more like packets on a network. If all those packets were the same—algorithmic in their composition, unembroidered, boring—that might only make them more efficient. In 1900, papers in Science and Nature were about as accessible to a general audience as pieces in the New York Times, according to a lexical analysis cited in a 2003 feature in Nature by Jonathan Knight. But their readability steadily eroded—and their jargon thickened—as the years went by.

“We are now in a system that incentivizes sameness,” says Melinda Baldwin, author of Making Nature: The History of a Scientific Journal. “We’re in a system that doesn’t give people incentive to write beautiful scientific papers.” Individual researchers now feel pressure to produce a large quantity of publications, with less regard for style. The most important thing for them is to distribute their results as quickly, and as clearly, as they can.

Even now, scientists who aren’t comfortable in English, or who just want to save some energy, may outsource the writing of their manuscripts to paid professionals. Automated writing would be an even better fix for those with suspect language skills or busy schedules. With robo-writers at the keyboard every article might end up looking more or less the same; but that’s a good thing. Distracting differences in scholars’ backgrounds, or defects in their style, would be averaged out across the literature. Each of Galton’s literary bugaboos—obscure expressions, poor grammar, and the faulty arrangements of ideas—could be instantly deleted from the literature, or patched over in a set of updates to the paper-writing software.

Naturally, this flattening effect would also average out any charm or wit. Take, for example, one of the most famous scientific articles ever published: Francis Crick and James Watson’s 1953 announcement in Nature of the structure of DNA. Even as this paper laid out one of the most consequential discoveries in the history of biology, its authors allowed themselves just a single understated boast—one that has been celebrated for its wryness ever since: “It has not escaped our notice,” wrote Crick and Watson, “that the specific pairing we have postulated immediately suggests a possible copying mechanisms for the genetic material.”

What if their paper had been written automatically by Klemen Zupancic’s software or something similar? We’d have been deprived of the duo’s all-time classic kicker. The article, “Molecular Structure of Nucleic Acids,” would be as undistinguished in its writing, as insipid in its style, and as bereft of elegance as almost every other paper in the literature.

That could be a good thing, too.

If bad writing interferes with reading papers—if it slows down the sharing and transmission of experiments and data—then good writing does the opposite: It can make a paper slick and its logic slippery; it sands away the friction that helps readers get a handle on the findings it describes. According to FiveThirtyEight’s Nate Silver, poor writing in a research paper makes it hard to spot mistakes—but good writing does the same. It’s easiest to find errors, he says, in “average” papers, where you’re not distracted by the prose.

Psychological experiments have supported to this idea. The more easily one can process any given statement, the more believable it seems, according to what’s been called the “fluency heuristic.” If that’s true, then poorly written papers would be disadvantaged—i.e. their findings deemed less true—even when the underlying science was completely sound. And papers written in an easygoing entertaining style could be taken as reliable, even when they’re based on insubstantial evidence. But if robots wrote our papers for us, we’d never have to worry that we’re victims of this bias. Every manuscript would be written in an average way: not too bad and not too good.

Awareness of this problem, and a corresponding fear of eloquence in scientific writing, has been present since the earliest days of academic periodicals. One of the first dedicated scientific journals appeared in 1665, published by the Royal Society of London. In a history of the society published two years later, Thomas Sprat boasted of the group’s “constant Resolution, to reject all the amplifications, digressions, and swellings of style. … They have extracted from all their members a close, naked, natural way of speaking; positive expressions; clear senses; a native easiness: bringing all thing as near the Mathematical plainness, as they can.”

Not every scientist subscribed to the fellows’ “constant Resolution,” though. While the Royal Society set off in one direction, valorizing plainness, other communities of scholars, at other points in history, were more inclined to fancy talk. Science communication scholar Alex Cziszar cites the naturalist Alexander von Humboldt: Like other proponents of romantic science, Humboldt argued that the “lucid exposition of the great phenomena of the universe” should be drawn from “the effusions of creative fancy.”

Cziszar says this tension over what it means to write a paper “well”—whether scientists should strive for clarity, beauty, efficiency or something else—has been present all along. From the late 19^th century, many scientists have favored more succinct and uncreative prose. “A florid, roseate style, full of polysyllabic, metaphorical phraseology, distracts the reader’s attention,” wrote two medical journal editors in a style guide from 1925. “Seldom is it necessary in scientific writing to use other than simple, Anglo-Saxon terms to express an idea or to state a fact.”

Indeed as research grows more data-focused, there’s an ever stronger case for dreary, formulaic prose. At this point we have at least 24 million references in the biomedical literature alone, and 15 million scientists are actively writing papers. There’s simply too much knowledge for any single person to absorb, even in a single subfield of research, and even if the work were always written very clearly. To get a better handle of this corpus, then, we’ll increasingly rely on another piece of software—not a robo-writer but a robo-reader. These exist already: Scientists are automating their investigations of the literature, with bots that sort through millions of abstracts at a time. It’s here that we might find the greatest benefit from algorithmic text. If machines were writing up the papers from the start, it’s likely that machines would do a better job of understanding them, too.

How much further could the robo-revolution go? Last year a group of researchers at the University of Trieste offered something new: the automated peer review. Inspired by the SCIgen prank, these scientists set out to build a tool for generating referee reports. The program will spit out a positive, negative or neutral assessment of any given paper, depending on which mode you request. “It would be good if you can also talk about the importance of establishing some good shared benchmarks,” the computer told one author. “It would be useful to identify key assumptions in the modeling,” it advised another. When the scientists showed their fake reviews, intermixed with real ones, to a group of human readers, the computer-generated text was deemed more useful one-quarter of the time.

Once robot referees have improved enough, they can peer review our robot-written papers. From there, the next step should be obvious: robot science journalists to robo-write surprising takes on the latest science news.