Analysis of the world's largest set of genome data from pregnant women, totaling 141,431 expectant mothers from across China, has uncovered unsuspected associations between genes and birth outcomes, including the birth of twins and a woman's age at first pregnancy

The analysis also allowed researchers to reconstruct the recent movement and intermarriage of different ethnic groups in China and promises to help identify genes that make people susceptible to infectious diseases.

"It's amazing that this is even possible—that you can take these massive samples and do association mapping to see what the genetic variants are that explain human traits," said co-author Rasmus Nielsen, a professor of integrative biology at the University of California, Berkeley, who oversaw the computational analysis performed by researchers at BGI in Shenzhen, China.

It's even more amazing because the researchers sequenced, on average, only 10% of each mother's genome, relying on large numbers of poor-quality genomes to leverage cheaper tests to discover new genetic links.

The mothers-to-be had provided blood samples to be tested for fetal chromosomal abnormalities, primarily Down syndrome. This technique, called cell-free fetal DNA testing, a form of non-invasive prenatal testing, is possible because mothers have DNA from their unborn child floating in their bloodstream. With rapid shotgun sequencing, labs can break up all the free-floating DNA in the blood and sequence just enough of the bits to diagnose Down syndrome.

Though not yet widespread in the United States, non-invasive prenatal testing is common in China: 70% of such tests worldwide have been performed in China. Sampling the mother's blood can be done early and risk-free, whereas standard prenatal testing in the U.S. involves amniocentesis or chorionic villus sampling, both of which require obtaining fetal cells from inside the uterus and risk harming the unborn child.

BGI was paid by maternity hospitals to conduct these tests but obtained informed consent from each mother to also analyze the partially sequenced genomes for research purposes, maintaining anonymity. All the analyses were performed in China, and the data is hosted in the China National GeneBank.

The data analysis revealed, for example, that variation in a gene called NRG1 is linked to a greater or lesser incidence of twins. One variant of the gene is more common in mothers with twins and is associated with hyperthyroidism, tightening a link between thyroid function and twinning that had previously been seen in mice.

A variant of another gene, EMB, was associated with older first-time mothers. The analysis also pulled out several genes that had not previously been associated with height and body mass index.

Perhaps most interesting, Nielsen said, is what sequencing of all the DNA in maternal blood tells us about viruses circulating through the body, and thus the link between viruses and genes that determine susceptibility to disease.

A variation in one gene, for example, was associated with a higher concentration of herpesvirus 6 in a mother's blood. Herpesvirus 6 is the most common cause of the relatively benign baby rash called roseola, but a high "viral load" correlates with more severe symptoms. People with Alzheimer's disease also have higher levels of herpesvirus 6 in their brains.

"Most people are infected by herpesvirus six at some point in their life, but some people seem to be less affected than others. We have now found a human genetic variant that helps control the severity of the infection," Nielsen said. "This is quite interesting because we don't know much about the genetic variants that control why some people seem more susceptible to viral infection and not others."

More correlations remain to be discovered. The BGI team to date has sequenced genomes from more than 3 million pregnant women, much of it accompanied by information on the mothers' and babies' health that can be used to find genetic associations.

Sequencing by imputation

To find genes associated with human traits—height and weight, for instance—researchers typically sequence thoroughly a small number of genomes—hundreds to thousands—and scan the genomes for variations in the sequence that are more common in people with the trait. The gold standard now is to sequence each genome 60 times to ensure accuracy given inherent errors in the sequencing process. Even if each genome is sequenced a mere 20 times, which is good but not great, it still gets expensive.

The new study relies on only partial genomes—which are cheaper to get—but massive numbers of them. On average, about one-tenth of each mothers' genome was sequenced, because that is all that is necessary for a doctor to diagnose a chromosomal anomaly in the fetus. For example, Down syndrome, or trisomy 21, is caused by three rather than two copies of chromosome 21. 

The researchers also found that many Chinese had genetic variants common among Indians, Southeast Asians and, along with the route of the old Silk Road, Europeans. Nielsen is currently working with his BGI colleagues to analyze the genomes of 1 million Chinese women who underwent non-invasive prenatal testing.