On March 2, local time, researchers from the Beijing Concord Hospital of the Chinese Academy of Medical Sciences, the China Center for Disease Control and Prevention, the University of California, Los Angeles, the University of Pittsburgh, and Hunan University published a joint report on the evolution of the new coronavirus in Rxbioiv, a bioscience preprint website. Recombination and insertion of heavy research (“Mutations, Recombination and Insertion in the Evolution of 2019-nCoV”).
By analyzing the mutation of the virus, the study explains why the new coronavirus is significantly more infectious.
The team further estimated that the difference between most human new coronaviruses (2019-nCoV) and bat RaTG13 virus proteins occurred between 2005 and 2012, while the differencebetween between human SARS virus and bat SARS-like coronavirus occurred between 1990 and 2002.
In addition to the dot mutations, they believe there is potential evidence that recombination is also the mechanism of evolution in 2019-nCoV. Their results suggest that the 2019-nCoV S protein may have been derived from pangolin coronavirus, rather than the bat coronavirus RaTG13. In the evolution of 2019-nCoV, there may have been a recombination between the RaTG13-like coronavirus strain and the pangolin-like coronavirus strain.
The paper is written by Wu Guizhen, secretary of the Cpc Central Committee’s Virus Disease Prevention and Control Institute, Tan Wenjie, a researcher at the Cdc’s Virus Disease Prevention and Control Institute, Jiang Taitong, director of the Biomedical Big Data Center of the Chinese Academy of Medical Sciences, and assistant director of the Suzhou Institute of Systems Medicine. Cheng Genhong, a professor in the Department of Microbiology and Immunogenetics at the University of California, Los Angeles.
The team collected and analyzed 120 2019-nCoV genome sequences, including 11 new genomes from Chinese patients. They found that although 2019-nCoV, human and bat SARS-CoV (Severe Acute Respiratory Syndrome Coronary Virus) are highly homogenous in the overall genomic structure, they evolved into two groups of viruses with different receptor entry characteristics through the potential recombination of the receptor binding domain (RBD).
The team found that 2019-nCoV had a unique four amino acid insertions (PRRa) between the S1 and S2 domains of the protoprotein (S protein), which could be a Fruin or TMPRSS2 (transmembrane serine protease 2) enzyme cutting site. Previous studies have shown that coronaviruses can cause protease lysis, which triggers fusion between virus-cell membranes. This flexibility of initiation and trigger fusion mechanism greatly regulates the pathogenicity and tendency of different coronaviruses.
The team suggested that the potential recombination of RBD, as well as the presence of unique Flynn protease cut sites, could explain a significant increase in the infection of the new coronavirus.
Currently, 2019-nCoV has infected more than 77,000 people and killed 2,400 people worldwide (as of present in the paper), and its genome is most similar in systematic development to the bat SARS virus RaTG13 strain, which was first isolated in Yunnan, China, in 2013.
The researchers say there has been a lot of research on new coronaviruses so far, but the mechanisms that contribute to this virus infection and molecular evolution remain unclear, and the study reveals the evolution, specificity, and possible mechanisms of infectiousenity of the B-coronavirus, providing comprehensive insights into the evolution and spread of 2019-nCoV.
They believe that further tracking genomic mutations using the 2019-nCoV strain isolated from patients at different locations and at different points in time could provide insight into the molecular evolution of the fast-transmitting virus.
Mutations in 2019-nCoV propagation
The team compared the infection rate at 2019-nCoV with the recent outbreak of the beta-coronavirus, the SARS virus in 2002, and the MERS (MERS) virus in 2012.
The 2019-nCoV travels much faster than SARS and MERS. To date, more cases of 2019-nCoV have been confirmed than during the entire SARS outbreak in 2002.
Over the past 18 years, scientists have published a number of genetic sequences of coronaviruses, including SARS strains isolated from different countries during the 2002 SARS outbreak, and many MERS strains from Middle Eastern countries such as Saudi Arabia and the United Arab Emirates.
The team collected and sequenced 11 full-length 2019-nCoV genome sequences from new patients in several Chinese cities, including Wuhan.
Systematic developmental analysis showed that these 11 new 2019-nCoV strains are similar to other 2019-nCoV strains, which are more homologous than human SARS, MERS and other coronaviruses.
At the amino acid level, they have only a small number of random mutations in the same consistent sequence of amino acids as the corresponding amino acid sequences in humans and bats.
To identify new genetic mutations, the researchers used the new coronavirus strain EPI_ISL_402125 as their root, building a system tree for the entire genome of all 120 new coronaviruses available in GIAID (Global Shared Avian Influenza Data Initiative) (updated until February 18, 2020).
The team found that the nucleotide strains 8517 and 27641, 2019-nCoV virus esphalates could be divided into two main categories.
Sequence series of 120 2019-nCoV full-length genomes, including 11 new genomes (highlighted by asterisks), marked in red and blue respectively by G1 and G2
All strains of the virus in group 1 (G1) have thymosin at 8517 and cytosine at 27641, which is the same as the corresponding nucleotide sonation in SARS, while group 2 (G2) has cytosine in 8517 and thymus thymus in 27641.
Epidemiological data from the above two groups of viruses show that the earliest G1 strain (EPI_ISL_406801) was collected in Wuhan on January 5, 2020, while the earliest G2 strains were isolated in Wuhan on December 24, 2019.
The presence of these two groups of genes in the same city suggests that they are co-circulating, but their evolution is convergent early in the outbreak. In each group, the researchers also observed other common mutations in multiple strains of the virus.
Based on these potential lymgens and the timing and location of identification, the researchers created a “mutation tree” diagram to track individual shared mutations and show relationships between different isolated strains.
For example, the five strains identified in Guangdong Province from 10 January to 15 January this year belong to the above-mentioned group 1 virus, which have the same mutation seromers at nucleotide location 28578, indicating that they may have been transmitted by the same person. Similar viruses may have spread to three cases found in Japan on January 29-January 31, with additional mutations at the location of 2397 nucleotides, and a similar virus may have been transmitted to a case found in the United States on January 22nd, where the virus has an additional mutation at the 10818 nucleotide site.
The team also noted that the Location of G10818T is interesting because it was shared by several separate virus strains in group 1 and 2, which led to the mutation of L3606F amino acids in orf1ab polyproteins.
G10818T is shared by several separate virus strains in Group 1 and 2
It is not clear whether the common mutations of the 1st and 2nd groups of viruses at the 10818 site have any growth advantage, but pangolins and bat coronaviruses also have L3606V mutations in the same location.
From the tree map, both groups of viruses have spread to most countries and regions where 2019-nCoV cases are reported, with few exceptions, indicating that both groups of viruses can spread quickly.
Although the evolution of these two different 2019-nCoV groups was determined before or after they were transmitted from animals to humans, both groups were first discovered in Wuhan and then spread to different regions and countries in China, the team said during the discussion.
2019-nCoV vs. Pangolin Coronavirus
The virus strain most closely associated with the new coronavirus is the beta-type coronavirus RaTG13, which has previously been isolated from Chinese chrysanthemum bats. Other systematic developmental analyses of specific viral proteins (e.g. orf1a, S proteins, substrates, and nuclear crusts) using nucleotide sequences found that the RaTG13 strain was just as closely related to other bat-like SARS-like coronavirus strains.
The researchers further estimate that the difference between most 2019-nCoV and RaTG13 virus proteins occurred between 2005 and 2012, while the difference between human SARS virus and bat SARS-like coronavirus occurred between 1990 and 2002.
“Our evolutionary clock analysis estimates that 2019-nCoV was differentiated from RaTG13 and human SARS-CoV about 12 and 30 years ago, respectively,” the researchers concluded during the discussion. “In addition to the point mutations, there is potential evidence that recombination is also a mechanism for evolution in 2019-nCoV.”
When comparing the full-length S-protein sequence, the researchers found that the sequence of human and bat SARS viruses in 2019-nCoV was 39 percent homogenous, compared with 29 percent of MERS or other coronaviruses.
Notably, the team found that 2019-nCoV and pangolin coronavirus esmothane viruses shared almost identical amino acid sequences in the RBD (amino acid 315-550 region) of the S protein, but not in RaTG13.
Pangolin CoV (blue line) and RaTG13 / 2013 (green line) vs. 2019-nCoV amino acid homologous contrast, conservative receptor binding domain (RBD) highlighted in yellow
To confirm this finding, the team compared pangolin CoV, published by other previous lying groups, with previously separated but unpublished pangolin CoV sequences.
Based on comparison and system development analysis, the researchers found that the common sequence of 2019-nCoV had the highest homogeneity with BetaCoV/Pangolin/Guangdong/P2S/2019 (EPI_ISL_410544), while other mutations and insertion deficiencies were found in the separatepan CoV strain in Guangxi.
Next, the researchers used the ML method (Maximum likely, maximum likely method) to detect a common S-protein sequence of 2019-nCoV and 25 representative CoV strains (including Hu-CoV, The systemic development relationship between SARS and MERS) and 5 new pangolin CoV strains.
The results suggest that the 2019-nCoV S protein may have been derived from pangolin coronavirus, rather than the bat coronavirus RaTG13, although both may be in the same lineage as bat-SARS-CoV or bat-SL-CoVZC542.
They suggest that the entire genomic structure of 2019-nCoV is the most homogenous to RaTG13, but that the RBD of the S protein is the most homogenous to pangolin CoV, a difference that suggests that in the evolution of 2019-nCoV, A recombination may have occurred between the RaTG13-like coronavirus and the pangolin-like coronavirus strain.
The researchers further examined all amino acid mutations in the genome. They found that when comparing pangolin-like coronaviruses with 2019-nCoV, areas in nsp (non-structural protein) 14 and 15 shared continuous sequences (Figure 3D) in addition to RBD.
Two evolutionary branches, as well as unique Flynn protease cutting sites
To further assess the relationship between 2019-nCoV and other SARS coronaviruses, the researchers analyzed RBM (receptor motif, receptor binding base sequence) of the 2019-nCoV and different human/bat SARS viruses, and observed that they could be clearly divided into two distinct evolutionary branches.
Evolutionary Branch I viruses include 2019-nCoV, pangolin CoV, and 12 types of bat SARS (bat SARS CoV I, such as RaTG13) and human SARS viruses.
Evolutionary Branch II viruses contain 49 bat SARS viruses(bat SARS CoV II), such as ZXC21 and ZC45, which are approximately 90% of nucleotides and amino acid homologous with 2019-nCoV.
The main difference between the two evolutionary branches is that the RBM of the Evolutionary Branch II virus has five, 13-14 amino acid regions shorter than the evolutionary branch I virus.
Previous studies have shown that 13-14 amino acid regions of SARS virus RBM form a unique ring structure that stabilizes by disulfur bonds between two cysteine residues. Although the amino acid sequence of 2019-nCoV in the ring area is very different from that of the human SARS virus, both cysteine residues are conservative.
Interestingly, all viruses known to use ACE2 as an entry receptor are type I, while all bat SARS viruses that do not use ACE2 to enter the receptor are type II. As a result, the team predicted that type I branchviruses, including 2019-nCoV, could infect host cells through human ACE2 (angiotensin conversion enzyme 2), while type II branch viruses could not infect host cells through ACE2.
Previous studies have shown that THE SARS virus uses human ACE2 as a receptor to infect host cells, while MERS virus uses DPP4 as its receptor. Recent indications are that ACE2 is also an entry receptor for 2019-nCoV, although other host cytokines such as TMPRSS2 may also be involved.
In evolutionary branch II viruses, although their overall genome sequence is homogenous to 2019-nCoV, there have been no reports of viruses belonging to that branch using ACE2 to enter the receptor.
Therefore, this study not only highlights the key role of RBM in determining the specificity of entering the receptor, but also raises an interesting question about how the homologous strain of beta-coronavirus can alter the tendency through mutations such as insertion, deletion, or recombination in RBM.
Notably, the team also found that 2019-nCoV had a unique four amino acid insertions (681-PRRA-684) in the S protein or nucleotide position 23619-23632.
Interestingly, this insertion (PRRA) in the S protein of the 2019-nCoV above creates a potential cutting site RRAR for the mammalian Flynn protease protein.
To understand the uniqueness of this insertion, the team used SARS-CoV strains from humans, cats and bats to make sequence comparisons. They found that the insertion was unique to 2019-nCoV. When compared with other members of the coronavirus family, the researchers found that similar insertions between s1 and S2 domains of the S protein were also identified.
Previous studies have shown that the virus may have protease lysis, which triggers the fusion of virus-cell membranes. This flexibility of initiation and trigger fusion mechanism greatly regulates the pathogenicity and tendency of different coronaviruses.
However, this protease cracking has not been detected previously in SARS-CoV. The introduction of an enzyme cutting point in SARS-CoV leads to the cracking of the S protein and enhances membrane fusion activity. In addition, the introduction of a lysis S protein into the SARS-CoV pseudovirus also allows it to enter the host cell directly.
Based on previous sequencing and structural analysis, the 2019-nCoV S protein is predicted to interact with ACE2 receptors, triggering fusion with the host cell membrane and triggering infection. Therefore, mutations or insertions that cause S1-S2 sub-bases to change can significantly affect viral infections.
The team speculated that the insertion of PRRA could cause the S protein to crack, triggering a viral fusion event.
While the exact mechanisms that lead to such high infection rates are still to be studied, the team believes their data show that RBD recombination, S1 and S2 domain insertion of unique Flynn enzyme cut points or TMPRSS2 enzyme cut points may explain why this emerging virus and other SARS, MERS-related beta-coronaviruses are significantly more contagious than compared to.