Today, the latest issue of nature, a leading academic journal, is on the cover of the Genome Aggregation Database (gnomAD). The public catalog, developed by multinational researchers, is the largest database of human genetic variants to date, bringing together genome sequencing or exobiome sequencing data from more than 140,000 people.
In this issue of nature, four related papers have been published. In addition, there are similar papers in the sub-issues of Nature Medicine and Nature Communications. In today’s article, the Academic Ethnos team will work with readers to learn about the gnomAD database, which is a milestone in human genetics research.
Click on “Read the original/Read More” at the end of the article to access the list of all articles under the topic nature (photo authors: Sigrid Knemeyer and Hang Yu Lin, SciStory SLLC). )
The advent of gene sequencing technology allows us to read all the genetic information of a person , the human genome. However, the greater challenge than measuring gene sequences is to understand the physiological function of these genes. Little is known about the function of most genes in the human genome.
One way to reveal gene function is to observe the results of a genetic mutation. These genetic variants, often inactivated by the proteins they encode, are called loss-of-function variants. But such variants are rare in populations, which means that a very large genome sample size is required to discover the variant and assess the results of each variant. That’s what large-scale databases are all about.
The Genome Aggregation Database (gnomAD) project aggregates data through a variety of large-scale population sequencing projects to identify various loss-of-functional variants.
Prior to the gnomAD project, scientists in 2016 unveiled exacsic clusters of ex-sons, which included data on more than 60,000 exongroups, mainly DNA fragments (exons) directly related to protein synthesis in the genome. According to an overview article in Nature, the new gnomAD not only brings together 125,748 whole exophotosequence sequences, but also contains 15,708 genome-wide sequencing data, increasing in size and scope, allowing more samples and more complex genetic variants to be systematically recorded and to understand variations beyond protein coding sequences.
GnomAD is larger in size and scope than ExAC, and can be interpreted with a richer genetic variant (Photo: Resources 5)
The team screened a total of 443,769 predicted functional loss variants (predicted LoF, pLoF) variants, predicting that these variants affected the normal functioning of their encoded proteins. The researchers went on to classify the variants, from the ability to have little effect on physiological functions leading to serious health problems, in order to better identify the genes that cause common genetic and rare genetic diseases.
In the second article, the researchers focused on clinical interpretation of a particularly rare genetic variant. Why should some genes not tolerate certain pLoF variants, but they can carry them with little effect? The researchers point out that when some genes are transcribed, the same gene forms different transcript isoforms due to differences in RNA shear, and that some exons have very limited levels of expression. If a person has a pLoF mutation in a key gene in their body, the mutation is more likely to occur in an episotria with limited expression and therefore minimizes the effect.
But other transcription subtypes can lead to the emergence of specific diseases. For example, a mutation in a gene that encodes the calcium channel can cause a rare disease called Timothy syndrome. Different transcriptsubs of mutant genes are expressed in different tissues, resulting in multi-system disorders in patients.
To do this, the researchers developed a new indicator to quantify the transcription of genetic variants, creating a data set that could help in the genetic diagnosis of rare diseases and analyze the burden of rare variants in multisystem diseases.
The third paper in the same series explores how to identify candidate drug targets using the human loss-of-function variant database. The researchers reported several key findings: First, genes that do not tolerate functionally loss-of-function variants (i.e. essential genes) can still be viable targets for success. to design the development of inhibitors. Specifically, when it is found that some individuals carry two pLoF variants on a particular gene, this gene may be a good drug target. Second, most of the functionally loss-of-the-type variants in most genes are very rare, and researchers have shown that there are many miscalculations when inferring such variants, so to collect definitive evidence, a queue 1000 times larger than the gnomAD sample size needs to be verified;
In a fourth paper published simultaneously in Nature, the researchers analyzed nearly 15,000 genome-wide sequencing data from the gnomAD database, creating a repository of structural variation.
Structural variation (SV) refers to the rearrangement of large fragments of DNA on chromosomes, such as deletion, repetition, insertion, translocation, and even reversal of direction. Such mutations are an important cause of many genetic diseases and cancers. The researchers note that the rich library of 433,000 SV species “has a wide range of uses in population genetics, disease-related research, and diagnostic screening.”
This massive genome sequencing and analysis effort has produced the most comprehensive data yet and tools to understand human genetic variation, according to a contemporaneous review in Nature. GnomeAD has made these data and tools public. This valuable genetic resource will change the way we interpret individual genomes, providing important information for us to understand the biological characteristics and diseases of humans, to assess rare and common genetic diseases.