The young woman’s body was found in a ditch in Ohio in 1981. Detectives nicknamed her “Buckskin Girl” after the fringed leather jacket she wore, and for 37 years, no one knew her real name.

 

Then, in March 2018, the detectives got a break. Two amateur sleuths had used a new method to compare the DNA of Buckskin Girl to records in a public genetic database called GEDmatch. In just four hours, they found a lead, and detectives soon confirmed that Buckskin Girl was a 21-year-old Arkansas woman named Marcia King.

King’s mother had not changed her phone number since Marcia had gone missing decades before. “She was always hoping her daughter would come home,” says Colleen Fitzpatrick of Fountain Valley, California, one of the two genealogists who helped to solve King’s case.

Identifying Buckskin Girl was the first public success for a new technique, forensic genetic genealogy, that is changing the practice of criminal investigation. In April of last year, detectives revealed that they had used genetic genealogy to pinpoint a prime suspect in the notorious Golden State Killer case, arresting then-72-year-old Joseph James DeAngelo Jr. for rapes and murders committed in California from 1974 to 1986.

Investigators have since used the method to identify at least 24 more suspects. In December 2018, an Indiana man, 59-year-old John D. Miller, became the first criminal known publicly to have been convicted with the aid of genetic genealogy. On December 21, he was sentenced to 80 years in prison after pleading guilty to molesting and murdering an eight-year-old girl in 1988.

Adding to a public pool of data

Forensic genetic genealogy compares the genetic profile of a crime suspect or victim to genetic profiles of people in public databases — primarily, the volunteer-run GEDmatch, where customers of any genetic-testing company can upload their information. Genealogists search for a target’s relatives in GEDmatch, then pursue these leads to pinpoint their John or Jane Doe’s identity.

An artist’s rendition of Buckskin Girl, whose body was found in Ohio in 1981. She was identified last year as Marcia King of Arkansas. The case was solved with the assistance of the DNA Doe Project, using genetic genealogy.

An artist’s rendition of Buckskin Girl, whose body was found in Ohio in 1981. She was identified last year as Marcia King of Arkansas. The case was solved with the assistance of the DNA Doe Project, using genetic genealogy.

CREDIT: NATIONAL CENTER FOR MISSING & EXPLOITED CHILDREN

The method is exploding in popularity. One firm that offers a genetic genealogy service to law enforcement officials, Parabon of Reston, Virginia, says it has looked into more than 200 cases. Other genealogists, like Fitzpatrick, volunteer their skills to help resource-strapped law enforcement agencies solve crimes. Fitzpatrick has recruited a network of amateurs to work on cold cases as part of the DNA Doe Project, which strives to put names to unidentified victims of homicides, accidents and suicides.

Geneticists and genealogists say the method will only become more powerful in coming years as more people publicly share their genetic data and forensic geneticists refine their investigative tools. One group of researchers has calculated that most people of European ancestry in the United States could already be identified through genetic genealogy databases. Work is also underway to find new ways to use genetic traits to predict physical characteristics, which might harness DNA data to further pinpoint suspects. Given the potential reach of the technique, some lawyers and geneticists are pushing for guidelines to prevent excessive privacy intrusions; they point to similar regulations that govern the use of existing law enforcement databases.

“The interesting thing is the scale of these databases — and how different types of genetic data could be used,” says population geneticist Graham Coop of the University of California, Davis. “The size of these databases is increasing and these issues are only going to get stronger in the next few years.”

Sorting out the DNA evidence

Forensic scientists have long used genetics to help catch criminals. In the United States, the FBI’s National DNA Identification System contains DNA profiles of more than 17.7 million people who have been arrested for, or convicted of, committing crimes, compiled from individual states’ criminal DNA databases. These databases store information on short tandem repeats, or STRs — sections of DNA that repeat themselves over and over at spots in the genome — near 20 genes. The number of repeats at each gene varies among individuals, so STR profiles can be used to verify people’s identities.

To use such a criminal DNA database, investigators first determine a suspect’s STR profile from blood, semen or tissue from a crime scene, then compare it against database records. But there’s a limit to the power of STR searches: They can only yield matches to individuals or close relatives, such as a suspect’s parent, child or sibling.

GEDmatch, by comparison, can be far more powerful. That’s because it contains information on 100,000 to 600,000 genetic markers — tiny variations in DNA sequence called single nucleotide polymorphisms (SNPs), the same genetic markers profiled by consumer genetic companies such as 23andMe and Ancestry.com.

Screenshot of DNA profile matches from the volunteer-run GEDmatch database. Users of any genetic genealogy service can upload their data here, making it the most comprehensive resource for genetic genealogists.

Screenshot of DNA profile matches from the volunteer-run GEDmatch database. Users of any genetic genealogy service can upload their data here, making it the most comprehensive resource for genetic genealogists.

CREDIT: COURTESY OF CURTIS ROGERS

To access GEDmatch, a user must first buy her or his SNP profile from a consumer genetics firm, then upload that profile to GEDmatch. GEDmatch then runs a software program to compare the DNA profile to that of every other user in the database, and reports back which other users share significant portions of DNA. The user can then contact other GEDmatch users and hunt down additional information to discover how the match fits into his or her family tree.

Law enforcement officials and genealogists follow a similar procedure, with one difference: Rather than submitting their own genetic profiles to GEDmatch, they submit data from an unidentified crime suspect or victim. GEDmatch reports back a list of “hits” — users who share DNA with the unidentified target. Genealogists then use data from genealogical websites and public-record databases to build family trees for the GEDmatch hits, looking for leads to their target’s identity. Criminal investigators must then confirm that DNA from one of their potential suspects matches DNA found at the scene of the crime.

A GEDmatch user’s records can thus reveal information about many other relatives who have not uploaded their DNA to the site — even third and fourth cousins, far more distant than what’s obtained from STR-based criminal DNA databases. In the case of Buckskin Girl, for example, GEDmatch linked the victim to a woman who had a first cousin once removed, long missing and presumed dead. The missing cousin was Marcia King.

A process neither quick nor cheap

The speediness of the King case isn’t typical; it can take genealogists hundreds of hours to scour family trees in GEDmatch and other websites to find a lead to a person’s identity. As a result, it’s expensive and law enforcement officials rarely turn to it. Parabon, which has four genetic genealogists on staff, charges $1,500 for an initial assessment of each case and about $3,500 for a full analysis.

But several trends could make the technique more powerful and routine within the next few years, including the growth of the GEDmatch database and continued improvements in genetic technology. GEDmatch already contains 1.2 million profiles, with disproportionately large numbers of records from people with European American heritage, reflecting the demographics of consumer genetic testing customers. Indeed, geneticists have calculated that the majority of the Caucasian US population is already findable via GEDmatch records.

In June, for instance, geneticist Yaniv Erlich of the Israel-based genetic genealogy company MyHeritage calculated that sixty percent of US Caucasians have a third cousin or closer relative in GEDmatch, and are thus identifiable by genealogical sleuthing. Erlich also calculated that a genetic database containing records from five million individuals — two percent of Caucasians living in the United States — could identify any US-based person with European heritage. The consumer genetics companies are not far from that landmark: To date, more than twelve million people have used the companies’ services, although  only a fraction have gone on to make the data publicly available by uploading them to GEDmatch. But Curtis Rogers, GEDmatch’s cocreator, notes that 1,500 to 2,000 new records are added to the site each day.

“The question is whether you can eventually get GEDmatch big enough that everyone can find a cousin in it,” says geneticist Doc Edge at the University of California, Davis. “I’d say you can get there for sure, and I would say we’re pretty close.”

Illustration of a family tree. Go back just a few generations and there are huge numbers of people on these trees, so genetic genealogy requires a great deal of sleuthing.

Go back just a few generations and there are huge numbers of people on family trees — so genetic genealogy requires a great deal of sleuthing.

CREDIT: COLUMBIA UNIVERSITY

Not everyone agrees we are that far along. Ellen Greytak, who leads the bioinformatics team at Parabon, says that Erlich made some assumptions — for instance, that investigators will know a suspect’s birth year. And even if people can, in theory, be identified from a distant relative’s GEDmatch record, it takes a lot of genealogy legwork. “Ninety-nine percent of the work happens after GEDmatch,” Greytak says.

Still, it’s undeniable that the pool of potential matches will grow as users continue adding records. So, too, will the pool of criminal cases that might be amenable to this technique, spurred by improvements in computational tools.

Extracting answers from degraded DNA

Law enforcement has primarily used forensic genetic genealogy to solve cold cases: old crimes that have gone unsolved for many years. Evidence collected years ago has often degraded to the point where it’s difficult to extract enough useful DNA for analysis. But forensic geneticists are now deploying methods developed by basic-science researchers for analyzing old DNA, such as computational techniques for reconstructing genomes of ancient humans. That allows them to extract information from severely damaged crime-scene samples.

In one bizarre case, in 2014 US marshals were trying to identify a man who committed suicide in an Ohio apartment in 2002 and, investigators soon learned, had been living under an assumed name. One of the marshals asked Fitzpatrick to help identify the man using DNA from his Y chromosome. Y-chromosome DNA is usually found in males only, so “Y-DNA” testing can be paired with database searches to guess men’s last names — because both are passed down from fathers to sons. Fitzpatrick had used the method to help investigators in dozens of prior cold cases.

Close
Previous
The strange story of Robert Ivan Nichols
July 30, 2002: A suicide victim is found in an Ohio apartment. The man’s body is too decomposed to lift fingerprints and the remains are cremated. The unknown man, Mr. X, had $82,000 in his bank account.
Efforts to contact next of kin reveal Mr. X was living under a stolen identity. His social security number belongs to a boy, Joseph Newton Chandler, who died in a car crash in 1945. An address for Mr. X’s supposed sister, listed on rental paperwork, is just an empty lot. The few leads go nowhere, and the case goes cold.
2014: Twelve years later, U.S. Marshals pick up the case while looking for fugitives who have escaped from Alcatraz. Efforts to identify Mr. X ramp up again. Investigators unearth a key piece of evidence: Mr. X had been diagnosed with cancer before his death and the hospital has a tissue sample.
Much of the DNA in the hospital sample is degraded, but the team isolates enough to create a profile of Mr. X based on DNA from his Y chromosome. Y chromosomes are passed from fathers to sons, like last names.
The Y-DNA profile of Mr. X shows 17 regions on the Y chromosome where short strings of DNA letters repeat. Called short tandem repeats, or STRs, these snippets are usually 1 to 6 DNA letters long and repeat up to 50 times at a given spot. Close male relatives typically have the same number of repeats at a given location.
2018: A forensic genealogist joins the case and compares the Y chromosome profile of Mr. X to hundreds of thousands in databases. One profile matches at all 17 spots: a man with the last name of Nicholas. Last names typically follow the male line, like Y chromosomes, so Mr. X could be a Mr. Nicholas or something similar. But the match’s lineage has far too many descendants to identify Mr. X.
Investigators sequence Mr. X’s whole genome — not just the Y chromosome. Despite the old, degraded DNA, they manage to construct a new genetic profile for Mr. X made up of the hundreds of thousands of places where the human DNA sequence varies by a single DNA letter. These differences are known as SNPs (single nucleotide polymorphisms).
The team uploads Mr. X’s SNP profile to the public genetic database GEDmatch, which finds matches based on shared chunks of SNP-containing-DNA. Mr. X’s profile matches hundreds of distant relatives. After months of research into public records and websites like Ancestry.com, the team creates a family tree of 15,000 people. More clues are needed to locate Mr. X in its branches.
The team tries a second round of DNA sequencing, using the last of Mr. X’s degraded tissue sample. When the two rounds of SNP data are combined, GEDmatch finds some new relatives, which focus the investigation on one section of the tree: a couple, Silas Nichols and Alpha Schreiber, who had four sons. Three sons have died. There is no death date listed for the fourth son, Robert Ivan Nichols.
A flurry of research into Robert Ivan Nichols unearths his birth certificate and a surprising detail: the birth certificate has the same street address, 1823 Center Street, of the fake sister Mr. X listed on his rental application.
Investigators contact Phil Nichols, who they now believe is the son of the Mr. X. The father-son resemblance is striking; their DNA matches. Phil Nichols reports that his father had abandoned the family and that contact ended in 1965. The case is solved, in part: What Robert Ivan Nichols was running from, and why he changed his identity, remains a mystery.
Next
View Fullscreen

CREDIT: WIKIPEDIA

CREDIT: KNOWABLE MAGAZINE

CREDIT: U.S. MARSHALS SERVICE

CREDIT: KNOWABLE MAGAZINE

CREDIT: NATIONAL HUMAN GENOME RESEARCH INSTITUTE

CREDIT: KNOWABLE MAGAZINE

CREDIT: KNOWABLE MAGAZINE

CREDIT: KNOWABLE MAGAZINE

SOURCE: DNA DOE PROJECT

SOURCE: DNA DOE PROJECT

CREDIT: U.S. MARSHALS SERVICE

CREDIT: U.S. MARSHALS SERVICE

The man’s body had been cremated, so officials gave Fitzpatrick a DNA sample from a tumor biopsy the man had undergone before he died. By comparing the dead man’s Y-DNA to entries in Y-DNA genetic genealogy databases, and tracking the last names of men in family trees, Fitzpatrick was able to give the marshals a real name to work with — Nicholas, or something similar — but it wasn’t enough. There were thousands of descendants to check out.

Fitzpatrick then met genealogist Margaret Press of Sebastopol, California, and the pair founded the DNA Doe Project. They decided to try the more complex task of using genetic material not from the Y chromosome but from all of the others, to find people through GEDmatch records. This seemed like a good test case, and the Ohio marshals agreed, so in 2017 the man’s tissue sample went for broader genetic sequencing, and then for computational analysis by Greg Magoon, a bioinformatician with Aerodyne Research Inc. in Massachusetts. Just 12 percent of the man’s genome yielded usable data, but Magoon used computational methods developed for analyzing ancient DNA to convert this tiny amount of information into SNP data. Fitzpatrick and Press uploaded the information to GEDmatch.

It took two tries and a lot of follow-up genealogical research, but the work finally led investigators to an Ohio man named Phillip Nichols. He was shown a photo of the mystery man and identified him as his father, Robert Ivan Nichols, who had abandoned his family in the 1960s.

Genetic genealogists say that as the science of sequencing and bioinformatics continues to advance, so will their ability to generate leads from imperfect DNA samples. “Researchers are getting better and better at reconstructing the genome, and that helps when we’re working with degraded DNA,” says Press.

“A year ago, we never would have predicted we would be where we are now.”

Photograph of forensic genealogists Colleen Fitzpatrick and Margaret Press of California. They founded the DNA Doe Project, which uses genetic genealogy to help law enforcement solve crimes.

Forensic genealogists Colleen Fitzpatrick (left) and Margaret Press (right) of California founded the DNA Doe Project, which uses genetic genealogy to help law enforcement solve crimes.

CREDIT: SARA PRESS PHOTOGRAPHY

Sketching suspects with genetics

The type of information that geneticists can extract from forensic samples is also changing, and moving beyond just establishing DNA matches. There’s been an explosion in the study of predicting people’s traits, such as physical appearance, from genetic data, by matching gene variants and SNP patterns to those traits. Researchers are farthest along at predicting pigmentation such as hair, eye and skin color, and freckling — GEDmatch already has an eye-color prediction tool. Parabon offers a service that predicts face shape, and is developing algorithms to predict hair characteristics such as texture and curliness.

So far, some of these tools are crude; Fitzpatrick says that Parabon’s facial reconstructions are “rather generic.” But as scientists gain access to larger datasets, the accuracy of these trait profiles could increase, says forensic geneticist Susan Walsh of Indiana University-Purdue University Indianapolis. “You look in the mirror, and you can see something that we could try to predict,” she says.

In the case of the Golden State Killer, GEDmatch predicted that he would have blue eyes, and another website called Promethease predicted that he would be bald prematurely. Investigators working the case used these traits along with other lines of evidence, such as his Italian heritage, to narrow down the pool of potential suspects to DeAngelo.

Encouraged by successes in solving old crimes, investigators are already using genetic genealogy and trait prediction in active investigations, before leads dry up. In July 2018, Utah police charged 31-year-old Spencer Glen Monnett for the rape of a woman in April based on DNA he left at the crime scene — traced back to him through genetic genealogical work by Parabon.

Quality control — and privacy

Some lawyers and geneticists say forensic genetic genealogy is moving too fast. They are worried about the lack of professional, technical and legal guidelines for the field, which they say could endanger public privacy to an unprecedented degree.

Anyone claiming expertise in genetic genealogy can hang out a shingle and offer services to law enforcement officials, says UK-based genetic genealogist Debbie Kennett: “We’ve never had our work subjected to the same rigor as science,” she says of genetic genealogists. If they generate faulty leads, that can waste time and money and needlessly harass innocent people.

Kennett cites the case of the Golden State Killer, where investigators first used Y-DNA matching to find potential relatives of the suspect. That led police to persuade an Oregon judge to order a 73-year-old man in poor health, living in a nursing home, to provide a DNA sample to investigators. That man turned out to have nothing to do with the case. Whole-genome DNA matching is supposed to be more precise than Y-DNA matching, but Kennett says the example illustrates that genetics is not a foolproof investigative tool.

Photograph of Joseph James DeAngelo Jr., the Golden State Killer suspect, appearing in court in April 2018.

Golden State Killer suspect Joseph James DeAngelo Jr. appears in court in April 2018.

CREDIT: TRIBUNE CONTENT AGENCY LLC / ALAMY STOCK PHOTO

The issues could become more complex as the science advances. Currently, researchers use a few specific genetic markers to predict a suspect’s physical appearance. But some geneticists are using technologies such as machine learning, in which computer algorithms identify individuals based on hundreds, thousands or millions of genetic data points. Researchers might not even know which genetic variants the software uses to predict, say, facial morphology, and thus would have no way to evaluate the plausibility of a putative connection.

“We’ll have some sense of how the answer is obtained, but we won’t necessarily have full transparency,” says computer scientist Vikas Pejaver of the University of Washington. “The question is, how confident are we about that answer? And at what point does that answer become actionable, as opposed to being too intrusive and causing a lot of concern for people who have nothing to do with the case?”

In fact, some lawyers say that the depth and reach of information revealed by genetic genealogy databases might already violate constitutionally protected civil liberties. Courts have ruled that databases of crime suspects’ DNA do not violate the Fourth Amendment right against unreasonable search and seizure, because arrestees lose some of their privacy rights. But GEDmatch users and their relatives have no such “diminished expectation of privacy,” says law professor Natalie Ram of the University of Baltimore, Maryland — in other words, they have reason to expect that their personal information, including genetic, demographic and medical information, will not be available to law enforcement officials.

GEDmatch does inform users who post their data that law enforcement officials might be using the database too, but distant family won’t know that. “There’s a good argument that the user’s consent does not extend to hundreds or thousands of genetic relatives who are now findable through a relative’s DNA,” Ram says.

Ram also points out that states regulate how law enforcement officials can use criminal DNA databases for “familial searches” — ones attempting to identify suspects through their relatives’ genetic information. In California, Texas and Virginia, investigators can run familial searches in these databases for serious crimes but only after they have exhausted other investigative avenues; Maryland and Washington, DC, forbid familial searches.

Legal debates

But there are no similar guidelines governing the use of genealogical databases for such searches, and Ram thinks that state and federal regulators should create some. “I do think these are serious crimes that law enforcement should try to solve, but I have concerns about the lawfulness and constitutionality of these kinds of tools,” she says.

Photograph of Phil Nichols of Cincinnati, Ohio, talking to reporters in 2018. His father, Robert Ivan Nichols, abandoned the family in the 1960s and soon ceased all contact. Genetic genealogy revealed his final fate: Robert Ivan Nichols was an unidentified man who had committed suicide in 2002 in an Eastlake, Ohio apartment and had been living under a stolen identity for decades.

Phil Nichols of Cincinnati, Ohio, talks to reporters in 2018. His father, Robert Ivan Nichols, abandoned the family in the 1960s and soon ceased all contact. Genetic genealogy revealed his final fate: Robert Ivan Nichols was an unidentified man who had committed suicide in 2002 in an Eastlake, Ohio apartment and had been living under a stolen identity for decades.

CREDIT: JOHN CANIGLIA, PLAIN DEALER / BARCROFT MEDIA

Other lawyers don’t think law enforcement’s access to genetic genealogy databases should be limited. They argue that the information is voluntarily disclosed, the type of information revealed about health risks is limited, and the information is used only to generate leads that investigators must then follow up.

In the Golden State Killer case, for example, once investigators got DeAngelo’s name, they still had to take a fresh sample of his DNA to confirm that it matched DNA from the crime scene. Such requirements reduce the risk of wrongful convictions, says lawyer David Kaye of the Pennsylvania State University School of Law in University Park.

Kaye agrees that law enforcement’s use of genetic genealogy should receive the same types of oversight already applied to other forensic technologies — DNA-testing labs should be licensed and accredited, and statistical methods for inferring relationships via DNA should be validated. But, he says, “whether states need laws or regulations to limit the investigations to certain kinds of cases is not so clear.”

The science of forensic genetic genealogy is still extremely new. Geneticists hope that publicity around newly solved cases will lead to a more informed public dialogue about genetics and privacy: People need to understand that if they put their DNA into a public database, it may lead the police to their door, or to the doors of their children, grandchildren or distant relatives.

For now, most GEDmatch users seem to have accepted that their data can be used in criminal investigations. Some 100 to 200 users per day remove their data from the site, says GEDmatch cofounder Rogers, but that rate hasn’t changed since users realized that law enforcement could be using their data this way.  On the contrary, “we get overwhelming support,” he says. One letter he received was from a woman whose father was a serial killer; she wrote that she was glad that her data might help to convict other criminals.

Even so, Erlich says that researchers should see this as a teachable moment about the power of DNA data. He has long argued that researchers should be more proactive about explaining to people how much information is contained within their genomes, so they can make educated decisions, such as whether to share it with databases like GEDmatch.

“It’s about informing the individual that DNA can be identifiable information,” he says. “This is not going away.”