The COVID-19 pandemic offered an unprecedented glimpse into the evolution of its causative virus, SARS-CoV-2. It has been estimated that since its outbreak in late 2019, the virus has explored all possible alternatives in terms of missense mutations for all sites of its polypeptide chain. Spike protein of the virus exhibits the largest sequence variation in particular, with many individual mutations impacting target recognition, cellular entry, and endosomal escape of the virus. Among these, changes to charged amino acid residues have been shown to influence spike protein binding to different cell surface receptors, its non-specific electrostatic interactions with the environment, and its structural stability and conformation. Moreover, recent studies unveiled a significant increase in the total amount of positive charge on the spike protein during SARS-CoV-2 evolution, especially in the initial period of the pandemic.
We have performed a detailed analysis of changes in dissociable amino acids of more than 2600 different SARS-CoV-2 lineages, offering a detailed look into the evolutionary patterns of change of the spike protein charge. Our results show that the previously observed trend toward an increase in the positive charge has essentially stopped with the emergence of the early omicron variants. We also demonstrate that the patterns of change in the number of ionizable amino acids on the spike protein are characteristic of related lineages within the broader clade division of the SARS-CoV-2 phylogenetic tree. What is more, we show that the spike protein exhibits a prominent preference for lysine residues over arginine residues, the ratio of which increased at several points during spike protein evolution, most recently with BA.2.86 and its sublineages. The increased ratio is a consequence of mutations in different structural regions of the spike protein and is now among the highest among viral species in the Coronaviridae family. Overall, our work sheds light on the evolutionary changes in the number of dissociable amino acids on the spike protein of SARS-CoV-2, complementing existing studies and providing a stepping stone towards a better understanding of the relationship between the spike protein charge and viral infectivity and transmissibility.