Sex and Significance
How the Heritage Foundation cooked the books on virginity.
It's a confusing time to be a confirmed virgin in America. In March, the Journal of Adolescent Health published a paper by sociologists Hannah Brückner and Peter Bearman, which found that adolescents who pledged to remain virgins until marriage had STD infection rates as young adults that were statistically indistinguishable from those of nonpledgers. Last month, Robert Rector and Kirk Johnson of the Heritage Foundation delivered two conference papers and a press release that accused Yale's Brückner and Columbia's Bearman of reaching an "inaccurate" conclusion that "misled the press and public."Strong words, especially considering that the authors agree on many key points. The National Longitudinal Study of Adolescent Health, from which both papers draw their data, shows that, while most pledgers don't keep their promises, they do wait longer than nonpledgers before having sex and have fewer partners. (Of course, it's very difficult to know what proportion, if any, of this difference is a result of the pledge. Teens with a pre-existing belief in virginity are presumably more likely to make virginity pledges and more likely to stay virgins. A stronger study would compare adolescents who pledged with adolescents who didn't pledge but would have, had the option been available.)
Still, it's reasonable to expect the pledgers to be at reduced risk for STDs. The numbers in the adolescent health study seem, at first, to bear that out. Urine samples taken in early adulthood showed 5.8 percent of pledgers and 6.9 percent of nonpledgers tested positive for gonorrhea, chlamydia, or trichomoniasis. The quarrel between the two papers centers on how to interpret these percentages. It's an argument about math, not sex. And mathematically, the Heritage paper comes up short.
The debate here is one over statistical significance, a mathematical measure of the persuasiveness of an experiment. Suppose you flip a coin 10 times. You'd probably guess that the coin is equally likely to land heads or tails; that's called the null hypothesis. But if your first 10 flips came up all heads or all tails, you might doubt that null hypothesis; if the coin were fair, the chance of getting the same result 10 straight times would be one in 512, or 0.002. If the first ten flips yielded eight heads or eight tails, you'd have less reason to think something was amiss; a fair coin yields such a lopsided outcome in 56 out of 512, or 0.11, of cases.
Statisticians would say that the first result has a significance level (or p-value) of 0.002, and the second result has a p-value of 0.11. (Note: The p-value is the probability of obtaining the observed result, supposing that the null hypothesis were true. It is not the probability that the null hypothesis is true!) It's statistical custom to call a result statistically significant if its p-value is at most 0.05. That means 10 heads in a row is significant evidence that the coin is biased; eight out of ten isn't.
If you called a coin unfair every time you got eight heads in 10 throws, you'd make a lot of false accusations. That's something that statisticians, a conservative crowd, want to avoid. The choice of 0.05 as a threshold is arbitrary, but statisticians have used 0.05 for decades, and holding to a common benchmark imposes a healthy impartiality on our analyses. (You can find a clear and thorough discussion of this point here. Disclosure: The author is my wife's first cousin's husband.)
No matter which paper you look at, the difference in tested STD rates between pledgers and nonpledgers fails the test of statistical significance. Brückner and Bearman say the p-value is 0.15; Rector and Johnson say it's somewhere between 0.06 and 0.13. Even so, the latter authors blast the former for writing that the STD rate among pledgers "does not differ" from that of nonpledgers.
But Brückner and Bearman's language is the conventional way of describing differences that aren't statistically significant. As another paper puts it, "If a variable is not statistically significant, it means that the variable has no statistically discernable difference between the coefficient value and zero, so there is no effect." That language comes from a 2001 paper by Rector, Johnson, and Patrick Fagan that claims race alone has no effect on poverty rates; finding an effect with a p-value of 0.15, the authors threw out the result. Four years later, they've adopted a more forgiving stance toward insignificant results—at least the ones that support the Heritage Foundation's policy stances.
Rector and Johnson claim their analysis is the more thorough one because "Bearman and Brückner used only one STD measure (the presence of three STD's in urine samples)," while "the present paper analyzes five STD measures based on urine samples, STD diagnoses, and STD symptoms." But Rector and Johnson's four new measures, all of which yield favorable results for pledgers with p-values less than 0.05, rely on subjects' self-reports. That adolescents who took virginity pledges might be less likely to report STD symptoms is a possibility Rector and Johnson don't seem to consider; but they should. When it comes to sex, people often lie. Urine samples don't.
Worse still is the just-plain-wrong claim that Brückner and Bearman use only one STD measure. They use two: the gonorrhea-chlamydia-trichomoniasis test cited above and a urine test for human papilloma virus. In Rector and Johnson's paper, the HPV test has disappeared. It's hard for me to see any reason but the obvious one: The HPV test also showed no statistically significant difference—but pledgers scored a little higher than nonpledgers.
What's particularly frustrating about Rector and Johnson's paper is how easy it would be to write a better one. They could have pointed out that, while no effect was found in this study, that's no proof the effect isn't there and that a larger study, or one designed to demonstrate the effectiveness of pledges and not just detect correlations, might yield more conclusive results. Instead, they damage their own credibility by attacking the perfectly reasonable methodology of Brückner and Bearman's article.
It's telling that Rector and Johnson accuse Brückner and Bearman of waging a "campaign against abstinence education." They seem to believe that the two sociologists have a predetermined conclusion in mind and will hack and knead the data as necessary to support it. In other words, in a failure of imagination, they see their fellow authors as versions of themselves.
Jordan Ellenberg is a professor of mathematics at the University of Wisconsin. His book How Not To Be Wrong is forthcoming. He blogs at Quomodocumque.
Photograph © Royalty-Free/Corbis.