DNA math: If police find a genetic “match,” that doesn’t mean they have the right suspect.

# If Police Find a DNA “Match,” That Doesn’t Mean They Have the Right Suspect

Innovation, the Internet, gadgets, and more.
June 5 2013 4:57 PM

# Doubt and the Double Helix

## If police find a DNA “match,” that doesn’t mean they have the right suspect.

On Monday, the Supreme Court narrowly upheld a Maryland law allowing the state to collect DNA samples from people arrested for violent crimes and burglary. The decision vastly expands law enforcement’s power to collect Americans’ genetic data, to the openly expressed horror of Antonin Scalia and the other dissenters.

It’s nice to imagine a world in which cracking a case means grabbing a fabric swatch from the crime scene, scanning it with the help of CheekSwab.gov, and then getting a report with the criminal’s name, address, photo, and last 10 tweets. But it’s not going to be that easy. Simple example: You get DNA from a hair found at the scene of the crime and find six usable places in the genome to test. The chance that any given person is a genetic match at those six places is pretty small, say 1 in 5 million. Now you run the sample through your database and you’re a happy detective because you find just one match. We got him! And when you try the case, the number “1 in 5 million” is going to be front and center. When the DA rips open his dress shirt at the culminating moment of his closing statement, “1 in 5 million” is what’s printed on the tank top underneath.

That’s how I imagine it, anyway.

But that number, impressive as it is, isn’t the right one. What the DA is telling the jury is that there’s a 1 in 5 million chance that an innocent person would have DNA that matched the sample. In other words:

1. If a person has nothing to do with the crime, what’s the chance that person’s genes match the ones in the sample?

But that’s not what we want to know, is it? We want to know the chance that the defendant before us, the guy who matched the sample, is innocent. And that’s a different question:

2. If a person’s genes match the ones in the sample, what’s the chance she has nothing to do with the crime?

Flipping a probability question like this is apt to change the answer. For instance, if a person is from China, the chance they’re from Yunnan Province is pretty small. But if a person is from Yunnan Province, the chance they’re from China is 100 percent.

The formal way to traverse the gap between these two questions is Bayes’ theorem. But I want to do this a bit more informally.

Remember that our hypothetical DNA database is pretty big; say it includes genetic material from 10 million people. That any individual will match the DNA sample is fantastically improbable, but given 10 million chances, the odds that somebody in the database matches the sample are pretty good. In fact, on average, there should be two matches, at least one of whom is definitely innocent of the crime! The bigger the database, the more poor innocent saps are likely to get fingered by the matching algorithm. That means the answer to question 2 can be big (like 1 in 2) even when the answer to question 1 is really, really small (like 1 in 5 million).

That doesn’t mean DNA testing is necessarily unfair and immoral. To avoid false positives, states can and do make rules requiring a certain large number of genetic loci to be tested before DNA evidence can be admitted. And there’s nothing wrong with using the results of DNA-trawling as exploratory evidence; if 10 people in your database match the sample, you’d better check whether any of those 10 people knew the victim, or mysteriously disappeared the day after the crime. It’s like getting an anonymous tip—you recognize there’s a chance it’s wrong, but that doesn’t mean you don’t investigate.

What is dangerous is the use of a DNA hit from a large database to determine guilt all by itself. And that’s not the spooky math future—that’s already happening.

In their fascinating and wide-ranging book Math on Trial, Leila Schneps and Coralie Colmez tell the story of Diana Sylvester, a nurse who was raped and murdered in San Francisco in 1972. Sylvester’s murder went unsolved, the evidence was boxed up and put away, and the case was more or less forgotten. In 2003, the SFPD opened up the box as part of a program to revisit cold cases with new genetic technology. When they ran what was left of the physical evidence against their database of 338,000 convicted sex offenders, they found a single match: a 72-year-old named John Puckett. Puckett was tried and convicted of first-degree murder, by a jury who’d been warned that only one in 1.1 million people carried the DNA markers shared by Puckett and the decades-old sample.

This, remember, answers the first of the two questions: If a person has nothing to do with the crime, what’s the chance that person’s genes match the ones in the sample? But by itself, it tells us nothing at all about the probability that Puckett is innocent. The judge, though, excluded the argument that it was reasonably likely for an innocent person in the database to have matched the sample by chance, feeling that any more mathematics would confuse the jury. Even the fact that Puckett had been identified as a suspect purely by database search wasn’t admitted at trial. Puckett remains imprisoned today.

Is he really guilty? The defense generated its own estimate that there was a 1-in-3 chance Puckett was innocent. That guess, which was barred from testimony, was, to be frank, hardly more mathematically correct than the prosecution’s favored number. For one thing, the DNA match might easily have fingered someone who wasn’t born, or was a child, in 1972; it didn’t. It might have identified someone whose recorded sex offenses were nonviolent; Puckett had two rape convictions and had served eight years in prison. Moreover, the fact that, in this case, more than 300,000 known sex offenders other than Puckett were ruled out as the murderer has some evidential value in itself. So the prosecution had some reason to think the DNA was pointing them in the right direction.

But that doesn’t change the fact that the only number the jury saw was “1 in 1.1 million”—the tiny odds that are not the odds of Puckett’s innocence. After taking into account all the information they had access to, Schneps and Colmez came up with a figure of their own: a 1-in-70 chance that Puckett is innocent. Whether a 1-in-70 chance of innocence constitutes “reasonable doubt” is debatable, but it can’t be denied that 1 in 70 is meaningfully different from 1 in 1.1 million.

Maybe this doesn’t matter. If we get to the future that some fear and some relish, where every one of us is sequenced in a DNA database, it will be a pretty big strike against you when your genes come up triple cherries against the bodily substance found at the scene of the crime and no other living human in the world can say the same.

It will be, at least, overwhelmingly convincing evidence that you are the person whose DNA is in the sample. But whether that sample is actually from the crime scene is another question, as Farah Jama found out when he was jailed in Australia for a crime he didn’t commit—a crime that may not have even taken place.

A woman was found unconscious at a bar, and a swab taken at a nearby rape crisis center found sperm that DNA testing identified as Jama’s. That sounds like an open-and-shut case, and that’s how it sounded to the jury, too.

But here’s what actually happened. Another woman, who had Jama's semen in her hair, had been at the center the previous day; both women were seen by the same doctor. The cleaning protocols used there were perfectly good for preventing infection, but far from sufficient to eradicate traces of DNA-bearing material from the examination room. Jama’s DNA was on the swab, but it had never been on the woman from the nightclub. The state government of Victoria prepared a thorough account of the case and its aftermath. There was no other evidence connecting Jama to the crime, and no one had reported seeing a college-aged Sudanese kid wandering around a bar where, demographically speaking, he would have stood out.* He was convicted anyway, and spent 15 months in jail before the conviction was quashed.

That’s the danger of impressive-sounding numbers. As the precision of our genetic endeavors mounts, the astronomical figures we can intone in court improve: “There’s no more than a 6-in-a-vigintillion chance that this DNA could have come from an innocent man, your honor!” And those sorts of numbers have a way of crowding out everything else going on in our minds. But we shouldn’t let them. There’s only so much reasonable doubt we can genetically engineer away.

*Correction, June 6, 2013: This story originally misstated the mechanism by which Farah Jama’s DNA was used to convict him erroneously. The contamination of a testing swab happened at a rape crisis center, not the Victorian Institute for Forensic Medicine.