The National Science Review recently conducted and published a study that provides an analysis of the largest genome of the novel coronavirus. The study gives an insight into its two subtypes, their transmission abilities, and the ability to cause diseases of different severities.
For our comprehensive coverage and latest updates on COVID-19 click here.
In the article titled ‘On the origin and continuing evolution of SARS-CoV-2’ by Xiaolu Tang, Changcheng Wu, Xiang Li, et. al., published in The National Science Review analysing the largest genome of the neocoronavirus, researchers revealed that the virus has evolved into two subtypes, L and S and the two subtypes are significantly different in terms of their geographical distribution and the proportion of the population they have affected.  It is also speculated that the pathogenic differences between the two could also mean their transmission ability and the severity of diseases they can cause, would vastly vary.
For this study researchers sourced 103 coronavirus genome data from the available public database, however, they highly recommend using a larger sample size to confirm the speculations made in the study. If this hypothesis is confirmed, it can help understand different aspects of the new virus and treat resulting pneumonia with a higher success rate. It should also be noted that in the absence of patient data and more genome data, researchers could not combine the genome data with their case analysis for a more in-depth study.
Key points from the study
The analysis below is based on the study of 103 coronavirus genome data sourced from the public database that was available to the researchers. Their analysis showed:
- Two subtypes of the virus, L and S
- The virus strain showed 149 mutation sites, most of which had occurred recently
- 101 strains belonged to one of the two subtypes and the difference between the two sub-types is noticed at the 28144th site of the viral RNA genome
- The L type is the T base (corresponding to leucine, Leu), and the S type is the C base (corresponding to serine, Ser)
- The authors compared these with other coronaviruses and discovered that the new S-type coronavirus resembled the bat-derived coronavirus on the phylogenetic tree
The researchers then logically concluded that the S type is relatively older and has had more time to spread, so it should produce more strains.
However, contrary to the logic above, it was observed that L type accounted for 70%, whereas S type accounted only for 30%. Also, each L-type strain carried more new-born mutations than the S type.
Pertinent questions that arise from the study
The SARS-CoV-2 surfaced in late December 2019 in Wuhan, China but when it did, the L type was found to be more common during the early days of the outbreak even though, the S type is older in term of its evolution and also, less aggressive.
Why does the relatively young L-type new coronavirus produce more strains?
While L-type viruses showed higher mutation, researchers speculated its virulence also to be greater.
On comparing the changes in the proportion of S type and L type before and after January 7, the researchers found that the proportion of the L-type virus in the strain had decreased and that of S type had increased. Hence, patients who were affected by the L-type virus (under greater negative selection pressure) were more likely to show symptoms and be subjected to manual intervention.
The researchers, presented a hypothesis that it was possibly due to human intervention, which may have been responsible for severe selective pressure on the L type.
The S type, which is evolutionarily older and less aggressive, might have increased in relative frequency due to relatively weaker selective pressure.
Are there new mutations?
Researchers extracted 8872 and 28144 positions of the virus strains; however, a majority of the patients showed only C or T bases when they were infected with either of the L or S subtypes.
Virus strains of an American patient who had travelled to Wuhan were isolated and before the diagnosis, was confirmed to have a mix of C and T bases at both sites, which meant he was infected with both the subtypes and the possibility of new mutations couldn’t be ruled out.
The study revealed only a 4% variability in the genomic nucleotides of SARS-CoV-2 and a bat SARS-related coronavirus (SARSr-CoV; RaTG13).
Is the new virus bat-derived?
The study shows that the difference at neutral sites was 17%, which means the variance between the two viruses is higher than estimated earlier. However, a genome-level molecular evolution analysis confirms that the new coronavirus has the closest resemblance to the RaTG13 coronavirus strain, isolated from bats but related to varied regional source. At the same time, the study also confirmed that the pangolin coronavirus strain was quite different from the SARS virus.
Now considering that the previous studies suggest that bat RaTG13 and the neo-coronavirus have only 4% difference in the genome-wide nucleic acid sequence, the researchers believe that a far more precise comparison of the neutral position on the genome is needed.
They point at the third codon - the base site with the reason that these sites do not show any impact by a natural selection pressure as changes in most of the protein coding regions also leads to a change in the amino acid chains. Also, when compared to neutral sites, mutations that cause a change in the amino acids can have a stronger impact on the negative selection. So any analysis that mixes up different sequences at the genome level will also artificially make the differences between genomes smaller.
After comparing neutral sites, researchers concluded that the difference between the novel coronavirus and RaTG13 is larger than what was discovered earlier. In fact, the genetic distance between the two is 14 times the difference between humans and chimpanzees.
Analysing the spike gene of the new coronavirus
When compared to other coronaviruses, especially the spike gene (encoding the protein that the coronavirus binds to the mammalian cell Ace2 receptor), it was observed that the sequence changed to a great extent. Also, it is speculated that the spike gene mutates at a higher rate. However, it is also possible that the spike gene of the novel coronavirus is recombined with the coronavirus carried by the pangolin. As per the researchers, the reasons of this happening are convergence and evolution (with its relatively high mutation rate), which is the very nature of this coronavirus.
The researches, therefore, suggest that whichever results work, together should be chosen for any further analysis. In fact, there is a strong and urgent need for comprehensive studies of the genomic data, epidemiological data and chart records of clinical symptoms of COVID-19 combined.
 Tang X, Wu C, Li X, et al. On the origin and continuing evolution of SARS-CoV-2. National Science Review. 2020Mar
This story is part of our Global Content Initiative, where we feature selected stories from our Global network which we believe would be most useful and informative to our doctor members. This version is a literal translation of the Chinese version. To access the original article click here.