Science

Hips Do Lie

Why did the New York Times fall for a dubious study on what makes women great dancers?

Photo illustration by Natalie Matthews-Ramo. Images by iStock and Thinkstock.

Photo illustration by Natalie Matthews-Ramo. Images by iStock and Thinkstock.

The secret of good dancing has just been revealed—get this—by science! In a paper published last week in the journal Scientific Reports, a group of researchers in England has determined exactly which moves make a woman most attractive. It’s in the jiggle of her hips, these experts say, and in the asymmetric swinging of her arms and thighs.

You may have come across a write-up of these findings in the tabloids or the straight-laced science press. It even got 14 paragraphs last Thursday in the New York Times—perhaps in accordance with the paper’s pledge, made in January, to run more “softer, buzzier, and funnier” material. Such viral spread should come as no surprise: This is just the sort of science story that shimmies to the top of newsfeeds. That is to say, it’s of little consequence, and it’s very likely wrong.

Those two traits aren’t unrelated. Indeed, they work like interlocking gears in the vast machine that churns out clicky science content on the web. If anything, a study’s silliness only serves as journalistic cover, by making it seem peevish to delve into the details. On other beats, reporters worry over facts that seem “too good to check,” but for the science press, a paper on the secret of good dancing—like one on how chocolate makes you thin—lands in an even more ignominious bracket. It’s not that it’s too good to check. It’s that it’s too dumb to deserve serious scrutiny.

For a great example of how this phenomenon works, let’s boogie back to six years ago, when the same lab, headed by psychologist Nick Neave of Northumbria University, put out its first investigation of sweet dance moves. Back then, the headline finding—one that ended up on ABC, NPR, the BBC, and many other places—referred to dancing men instead of women. A dude could maximize his sexiness, Neave’s paper argued, by varying the amplitude of his neck and trunk movements, and by speeding up the bending and twisting of his right knee.

Most coverage focused on the knee, since that was the surprise. Could one knee (and not the other) really work such magic? Well, no. Neave’s team had looked at only 19 dancing subjects (with 37 women rating them), while measuring dozens of variables that ranged from the variability in flexion of the left shoulder to the external rotation speed of the right ankle. With such a smorgasbord of data, drawn from such a paltry set of subjects, even random blips might appear to be significant. Indeed, when the same group produced a follow-up in 2013, this one involving 30 dancing men, it failed to replicate the first experiment’s results. Now it seemed as though the speed, variability, and amplitude of a man’s arm movements—not those of his neck or knee—mattered most.

Notwithstanding these obvious defects, the male dancing study went viral in the media multiple times. The second round of coverage, which began in 2014, seems to have come about when the U.K.’s Daily Mail—among the world’s most prolific fonts of bogus science news, by the way, so unreliable that it’s been banned from Wikipedia—decided to revive the story. From there the reanimated finding made its way to Business Insider, the Washington Post, and back onto network television.

I called up Neave two years ago to find out how and why this happened. “It seems to have a life of its own,” he told me then. “We’re not reporting a cure for cancer. It’s not that important, but it’s fun!”

What about the fact that its findings had by that time been cast in doubt? “It was a preliminary study,” said Neave. “We got some very interesting findings. We’ve built on that study, and 90 percent of what we’ve found since seems to corroborate what we found then.”

The problem was, the 90 percent that held up did not include the banner news about a dancer’s torso and his neck, let alone his knee. Neave acknowledged that when he’d done the same experiment with the larger set of subjects, the knee’s importance went away. “When you’ve run so many analyses with so many variables, you will get these ones that kind of crop up,” he said. I took that to mean that his finding on the knee had been a blip in his statistics—a false-positive result that had cropped up in an underpowered study involving dozens of comparisons.

To answer certain research questions, Neave continued, he’d learned that he would need to study 200 subjects at the very least. (Remember, that first study included just 19.) Still, he wasn’t willing to abandon what by 2015 looked to be a dubious finding. The effect of the knee, he told me, had been “nearly significant” in his failed replication. “I think it shouldn’t be discounted entirely, but it also should be treated with a little bit of caution.”

Well, then. There’s no such thing as “near significance” in science, and there are also no signs that Neave has been exercising any caution whatsoever in espousing his result. His paper on dancing women, for example, cites his previous research in saying “that male dance quality can be predicted by … the speed of movement of the right knee.” Another of his recent publications, from 2015, contends that “‘good’ dancers displayed … faster bending and twisting movements of their right knee.” And in the past few days, media accounts of the new study are re-reporting the same, debunked result.

I called Neave back this week to ask him why he still lingers on the empty claim about the knee. “My guess is that this was some kind of artifact in the first study,” he acknowledged once again, but he said that doesn’t make it unworthy of consideration. “That was a finding,” he said. “I mean, it was there, wasn’t it?”

“Well, sure, it was a finding,” I said, “but it was a finding that, through careful follow-up work, now appears to be invalid.”

“No, not necessarily,” he said. “We didn’t find it in one other study. But again, that might mean that it might not be there, or that it might be there. For the minute, it’s there, until we can conclusively prove that it’s not there.”

I didn’t know quite what to say to this. “But you have good reason to believe that the finding on the knee, at least, might be wrong,” I tried. “And you don’t know about the neck and torso … ”

“I don’t think there’s a huge contradiction there,” he said. “But you’re right: Every time we do these studies we find ever-so-slightly different things because we’ve got different samples and different ways of looking at things. My guess is that if we ran the study again, something else might pop out.”

In that case, why not wait until he’s landed on a more reliable result? “That is a fair point,” he said. “In that case, we should never publish anything—but we’re under massive pressure to publish.” If he does end up getting to a solid answer, Neave continued, he’d be happy to put out corrected findings. “I’m not ashamed to say that we might have done something wrong or reported something that was an artifact or statistical blip. I have no problem with that as long as there’s still a story to tell—and I think there would be.”

Papers from Neave’s lab are indeed suffused with science storytelling. When we first talked in 2015, he told me that he’d noted in his original publication that the knee thing “seemed a bit odd.” In fact, the paper contains no such caveat. Rather, it tries to explain the odd result by noting (in what seems to me a non sequitur) that 80 percent of individuals have evolved to be right-footed. This sort of explanation shows up again in the new paper on women’s dance moves. Perhaps the swinging of the hips and thighs serve to signal reproductive fitness, Neave and his fellow authors write. The swinging of the hips and asynchronous thigh movements “might be an emphatically feminine trait,” while another sign of good dancing—mildly asynchronous arm movements—signifies a high degree of motor control. The paper says the latter requires a careful balance, though: If a woman swings her arms with too little synchronicity, then that could make her seem as though she had Huntington’s Disease or Tourette’s.

Let’s put aside the stories, though. Is there any reason to believe the main results of this new study on women and dancing? Like the experiment on men, this was not question-based research but rather something more like data-mining from a shallow quarry. The sample for the new study comprised 39 female dancers. Neave’s team had several hundred people rate their dancing skills, and then asked two experts to compare the moves of the five highest- and lowest-rated women. By eyeballing motion-captured videos, the experts picked out 3 of 22 recorded joint angles that struck them as being most indicative of good dancing: those that measured movements of the women’s hips, arms, and thighs. The researchers then selected a few variables related to those joint angles and ran statistical tests to see how they corresponded with ratings given to all the dancers in the study.

In a conventional experiment—one designed to test a clear hypothesis—scientists would state a research question at the start, plan out their analysis ahead of time, and then begin collecting data. Here, Neave and his colleagues recorded their data first, then reverse-engineered their questions and analysis. There’s nothing wrong with this approach, per se, so long as it’s understood to be exploratory. In the paper, Neave and his co-authors acknowledge that their work was “data-driven,” as opposed to being based on “a priori assumptions.” But given that, we should understand that the evidence the paper provides is very weak. At best it offers a suggestion as to what the researchers might want to study next, in more detail. As with the paper on male dancing, its findings may not replicate.

Yet Neave seems ready to press on with more elaborate, convoluted studies—for example, one that looks at how women dance at different stages of their menstrual cycles.

Again, I asked if it wouldn’t be a wise idea to wait, at least until he’d confirmed his findings with a fresh, hypothesis-driven experiment.

“As I said, we’re psychologists,” he said. “We haven’t wrongly reported a cure for cancer or anything like that. You know, it’s not that important, is it really?”

I was forced to agree.

“We’ve reported some interesting stuff,” he continued. “We don’t fully understand everything that we’ve found. … We’ve never, ever stood up and said, ‘We have proven this.’ You can’t say that in psychology. It’s not physics. So we have these situations where you report something, and you hope that it’s true, but it might not be true. Then you change your mind; you tweak things around. That’s the nature of the game.”

If that’s the nature of the game, if researchers are indeed incentivized to sashay around the facts and call it science, then journalists—especially those at publications like the New York Times—need to do a better job of staying on their toes.