Image source: Shutterstock/Sergey Nivens

53,831 genomes analysed

Rare diseases: huge dataset brings new insights

Researchers at the University of Maryland School of Medicine (UMSOM) and their colleagues published a new analysis from genetic sequencing data of more than 53,000 individuals, primarily from minority populations.

The early analysis, part of a large-scale program funded by the National Heart, Lung, and Blood Institute, examines one of the largest and most diverse data sets of high-quality whole genome sequencing, which makes up a person’s DNA. It provides new genetic insights into heart, lung, blood, and sleep disorders and how these conditions impact people with diverse racial and ethnic backgrounds, who are often underrepresented in genetic studies.

The analysis was now published in the journal Nature.

portrait of Timothy O’Connor
Timothy O’Connor, PhD
Source: UMSOM

The program, called Trans-Omics for Precision Medicine (TOPMed), seeks to understand the genetic variations that occur among individuals both in nuclear families and in populations from diverse ethnicities residing on different continents. The project’s ultimate goal is to improve the diagnosis, treatment, and prevention of the most common conditions that lead to disability or death. “We have already identified some surprising new insights,” said study corresponding author Timothy O’Connor, PhD, Associate Professor of Medicine & Endocrinology at the Institute for Genome Sciences (IGS) at UMSOM. For example, the team identified more than 400 million genetic variations, but 97 percent of them are extremely rare, occurring in less than one percent of the population. Gene variations or variants can occur by random chance when genes get recombined or mutate. “Most of the time, these variants mean nothing,” said Dr. O’Connor, “but they can provide a new understanding of mutational processes and recent human evolutionary history.”

The TOPMed team includes more than 180 researchers from leading institutions in genomics worldwide, who have been compiling huge datasets in systematic and defined ways to increase knowledge about diversity in genetic studies. Since its launch in 2014, the TOPMed investigators have begun adding whole genome sequencing and “omics” analysis (which includes a study of genetic and molecular profiles like proteins)  to research studies in order to better understand how variations affect different organ systems giving rise to disease in, for example, the heart and lungs.

Ancestry, genetic diversity and rare-variant genetic relatedness across the TOPMed studies

Image source: Taliun et al., Nature 2021 (CC-BY 4.0)

In the new paper, the researchers pointed out that the program “aims to identify causal genetic variants and how they interact with the environment, to characterize disease and its molecular subtypes, to understand differences in disease across diverse ancestries, and to establish a foundation for personalized disease prediction, prevention, diagnosis, and treatment.” Braxton Mitchell, PhD, Professor of Medicine at UMSOM, and Jeffrey O’Connell, PhD, Associate Professor of Medicine at UMSOM, were co-authors on this paper. 

TOPMed is the largest sequencing project to date and has identified over 400 million gene variants with an overarching mission of understanding global genetic diversity. Since joining the TOPMed program in 2016, UMSOM researchers have published valuable new insights on genetic diversity, including sequencing data from the initial flagship paper on the first 53,831 TOPMed samples.

The increasing diversity of the population samples will help investigators learn more about how specific diseases impact different ethnic populations around the world. In addition, the group has established uniform standards for sequencing performed on a massive scale. The standards maximize the integrity of the data as the large group of international researchers use uniform methods as they continue to add other “omics” methods for analysis such as the study of metabolic differences.

This is a major effort to rectify the underrepresentation of minority participants in genomic studies and tracks with a broader mission within the School of Medicine to increase diversity in clinical trials

Albert Reece

In addition to enabling detailed analysis of the combined genomic and health data for sequenced samples, TOPMed has enhanced the analyses of genotyped samples through a new reference panel that now includes over 97,000 individuals. The TOPMed imputation reference panel is publicly available for review and input of new genetic data by researchers.

The first stage of the data release in the study demonstrated a greater inclusion of a diversity of sampling, which will be invaluable to the international group in learning more about the diseases impacting these populations. Because of the vast sample sizes and the longitudinal scope of many of the population samples, the investigators were able to demonstrate that the rare variants represent recent and potentially deleterious changes that can impact protein function, gene expression, or other biologically important elements. “This is a major effort to rectify the underrepresentation of minority participants in genomic studies and tracks with a broader mission within the School of Medicine to increase diversity in clinical trials,” said E. Albert Reece, MD, PhD, MBA, Executive Vice President for Medical Affairs, UM Baltimore, the John Z. and Akiko K. Bowers Distinguished Professor and Dean, University of Maryland School of Medicine. “This hopefully will move the genomics field closer to extending personalized medicine for all patients.”

Cashell Jaquish, Ph.D., an NHLBI program officer for TOPMed and a corresponding author on the paper, agrees. “The NHLBI’s TOPMed program is a huge resource for the scientific community. We didn’t really know what genomic variation looked like in diverse groups until now. This new study represents truly historic findings, and we look forward to continued research studies in this area as we move toward personalized medicine.”

Source: University of Maryland School of Medicine


Read all latest stories

Related articles



An 'on-off switch' for gene editing

Over the past decade, the CRISPR-Cas9 gene editing system has revolutionized genetic engineering, allowing scientists to make targeted changes to organisms’ DNA. While the system could potentially…


Coronavirus genome folding

Researchers prepare for “SARS-CoV-3”

For the first time, an international research alliance has observed the RNA folding structures of the SARS-CoV2 genome with which the virus controls the infection process. This could not only lay the…


International research collaboration

The Pan-Cancer project: Cancer development begins within the genes

Launched in 2011, the Pan-Cancer Project, involved more than 1,300 scientists and clinicians in 37 countries, and analysed more than 2,600 genomes of 38 tumour types. Discovery: The first indications…

Related products

Lifotronic – Nucleic Acid Extraction Kit


Lifotronic – Nucleic Acid Extraction Kit

Lifotronic Technology Co., Ltd
MolGen – PurePrep 96


MolGen – PurePrep 96

Sarstedt – Low DNA Binding Micro Tubes

Research Use Only

Sarstedt – Low DNA Binding Micro Tubes

Subscribe to Newsletter