Five Challenges in the Field of Viral Diversity and Evolution

1 Institute for Integrative Systems Biology (I2SysBio), Consejo Superior de Investigaciones Científicas (CSIC), Universitat de València, València, Spain, Medical Research Council Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom, Department of Microbiology and Immunology, University of Otago, Dunedin, New Zealand, 4 Institute of Environmental Science and Research, Wellington, New Zealand, Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM), Madrid, Spain, 6 Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Madrid, Spain, Department of Microbial Ecology, Netherlands Institute of Ecology, The Royal Netherlands Academy of Arts and Sciences, Wageningen, Netherlands, Wadsworth Center, New York State Department of Health, Albany, NY, United States, 9 Laboratorio de Evolución Experimental de Virus, Institut Pasteur de Montevideo, Montevideo, Uruguay, 10 Laboratorio de Virología Molecular, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay, 11 Section Microbial Biotechnology, Institute of Biochemistry and Biotechnology, Martin Luther University Halle-Wittenberg, Halle, Germany, Department of Ecology, Evolution, and Natural Resources, Rutgers University, New Brunswick, NJ, United States, Division of Public Health Laboratory Sciences, Hong Kong University, Hong Kong, China


INTRODUCTION
Viruses are extremely abundant, genetically diverse, and fast-evolving entities. They possess widely different replication strategies, genome organizations, and virion structures. Many viruses, particularly RNA viruses, evolve within short, human-observable timescales (1). This helps us not only understand previous evolutionary events, but potentially also predict their occurrence, anticipating important processes such as virulence evolution, vaccine escape, and drug resistance. For instance, the recent SARS-CoV-2 pandemic highlights the need to understand patterns of viral cross-species transmission, emergence, diversification, and adaptation to a novel host. However, evolution is an open-ended, complex, and inherently stochastic process, challenging our ability to predict its course. Multiple challenges therefore remain. For example, it is often found that the conclusions obtained about viral evolution in one context or timescale are no longer correct in alternative settings, for instance when short-term evolutionary rates are extrapolated to longer timescales. Achieving greater generality and consistency requires the collection of more comprehensive data and the construction of more flexible and integrative models for its interpretation. Below, we discuss specific challenges underpinning the goal of achieving higher predictability, consistency, and comprehensiveness in the field.

ANTICIPATING VIRAL EVOLUTION
Predicting viral evolution is a long-sought goal, a particular focus of such efforts being human strains of influenza virus, which require constant vaccine updates (2). The advent of high-throughput sequencing has facilitated massive genomewide viral sequencing in real time during the course of viral epidemics, as already evidenced during the 2013-2014 Western African Ebola epidemic (3) and the emergence of Zika in South America in 2015-2016 (4). With the SARS-CoV-2 pandemics in 2020, this approach reached massive proportions, with hundreds of thousands of full-genome sequences being released within months (gisaid.org) (5). By linking epidemiological information with sequence data, it has been possible to track the early spread of new variants exhibiting enhanced transmissibility, such as lineage B.1.1.7 emerged in the UK, anticipating their ability to outcompete other lineages and become dominant in different populations (6). Data collection on this scale constitutes a turning point in the field, making it feasible to predict certain aspects of viral evolution in the real world at a global scale.
However, major challenges still remain in this area. Although we are currently capable of forecasting the dynamics of some viral variants, our ability to predict the emergence of new variants of concern is still somewhat limited. Progress in this direction will require additional tools, including the experimental evaluation of fitness landscapes and mutational pathways to assess the evolutionary potential of a viral strain (7). RNA viruses are particularly convenient systems for investigating this topic, since they have simple genomes, can be manipulated easily, and mutate rapidly. For instance, the recurrent emergence of certain variants both in laboratory evolution experiments and in nature can help us predict the evolution of some viral features (8,9).
Nevertheless, we are still far from reaching predictability in other settings, particularly concerning viral transmission from wild animals into humans, which is the main source of new, potentially pandemic viruses. With hundreds of thousands of wildlife viruses with zoonotic potential but just a handful of zoonotic events, it is at present not possible to say when and where a new virus will first infect and spread efficiently among humans (10). However, risk factors such as local biodiversity, ecological disturbance, viral genetic features, receptor usage, and host tropism have been identified (11,12). In the not-so-distant future, we might achieve better predictability by combining viromics, machine learning, and experimental virology tools (13)(14)(15).

DEVELOPING MORE RELEVANT EXPERIMENTAL EVOLUTION SYSTEMS
Controlled laboratory experiments can provide useful information about the repeatability and predictability of viral evolution (16). Surprisingly, non-viral systems have often predominated in this discussion (17), despite the suitability of viruses, considering their fast evolution and simple genomes. It has been well-established that, under strong directional selection, viral evolution in the laboratory often produces parallel evolution events at the sequence level, as for instance shown in a recent long-term HIV evolution experiment (18). This information can be used for identifying candidate adaptive mutations, which can be then analyzed by genetic engineering. Advances such as deep mutational scanning and microfluidics help us generate viral diversity more efficiently in the laboratory, and allow experimental evolution to be carried out in a more systematic and automated manner, accelerating adaptation and increasing experimental repeatability (7).
By combining laboratory virus evolution with highthroughput mutagenesis or sequencing, virologists have characterized the distribution of mutational fitness effects with unprecedented detail, such that the effect of nearly every possible individual mutation can now be measured (19). However, the question remains to what extent these experiments, which are typically carried out in simple cell culture systems, capture important aspects of viral evolution in nature. Certain selective pressures, such as receptor usage, are probably well-modeled using cell cultures. However, evolutionary experiments are typically conducted using monolayers of tumoral cells, which tend to exhibit aberrant innate immune responses and thus offer an unnaturally permissive environment for viral spread. Cultures of non-tumoral cells, as well as 3D cultures, organoids, and explants should better reflect some selective pressures found in vivo (20,21). On the other hand, these systems are harder to implement and less efficient at selecting specific viral mutations because they achieve lower viral yields and, consequently, lower effective population sizes. One way or another, we need to better link experimental evolution systems with virus diversity and evolution in nature by contrasting them with field data. Finally, we should acknowledge that, at present, most viruses identified in nature have not even been isolated, and thus cannot be cultured. Despite these major limitations, experimental strategies such as creating pseudotypes or replicons by de novo DNA or RNA synthesis help us investigate important aspects of some wildlife viruses in the laboratory, including receptor-dependent cell tropism and innate immunity evasion.

INTEGRATING VIRAL DYNAMICS AND EVOLUTION AT DIFFERENT SCALES
An important issue in the field is that viral evolution rates inferred from phylogenetic analyses are strongly dependent on measurement timescales, such that rates estimated from recent isolates (e.g., seasonal outbreaks) are systematically higher than those estimated for longer time periods [e.g., permafrost or archival samples; (22,23)]. This inconsistency needs to be addressed for viral phylogenetic trees to provide more reliable information about the time at which different groups evolved. Similarly, to date relatively few studies have successfully linked evolutionary processes occurring at different levels, spanning intra-host, inter-host, and community levels. Some important steps forward have been recently made in this direction, though, such as identifying common evolutionary events at these different spatiotemporal scales (24).
A particularly challenging issue is defining and measuring viral fitness at these different organizational levels. Whereas at certain stages of infection or in a simplified laboratory setting fitness is largely determined by the rate at which a virus can infect cells (which in turn includes factors such as receptor binding, replication and gene expression rates, etc.), the ability to evade innate and adaptive immunity becomes increasingly important as the infection progresses. In the process of transmission, other factors such as the ability to be shed from hosts and environmental stability become critical. There is a need to better link mechanistically well-defined aspects of the viral infection cycle (e.g., polymerase catalytic efficiency, receptor binding affinity, capsid stability, etc.) to observable fitness at the population level (e.g., the basic reproductive rate, R 0 ).

CHARACTERIZING AND UNDERSTANDING THE VIROSPHERE MORE THOROUGHLY
Our knowledge of global viral diversity in nature is still very incomplete and biased. Classical virology has focused heavily on viral pathogens infecting humans, livestock, and cultivars. The metagenomics revolution has begun to reveal a much broader picture, and suggests that we currently know <1% of the existing viruses in nature (25). Strategies to more efficiently discover viruses from new or underrepresented viral groups are needed. Currently, a large fraction of sequence reads (up to 90% in some cases) from environmental samples categorize as "dark matter, " that is, sequences mapping to no other known biological entities (26). Remote homology detection tools and approaches that do not rely solely on comparative analysis of nucleic acid sequences are needed. For instance, prediction of viral capsid structures using artificial neural networks has recently contributed to identifying thousands of highly divergent new DNA viruses (27).
Characterizing the virosphere clearly goes beyond sequencing. We also need to address how different groups evolved from one another, how viral phylogenetic trees map to different ecosystems and hosts, why certain viral features have evolved recurrently in certain cases (e.g., multicomponent viruses mainly found in plants), how the ancestry of major viral groups is connected (e.g., whether insects served as evolutionary bridges between plant and animal viruses), and so on. The benefits of better characterizing and understanding the virosphere are manifold. For instance, we will achieve a broader picture of viral cross-species transmission, which should in turn inform viral emergence studies. We will also better identify factors involved in the transition from commensalism to parasitism.

IMPROVING OUR UNDERSTANDING OF VIRUS-VIRUS INTERACTIONS
The success of a viral infection sometimes depends not only on viral and host genotypes, but also on the presence of other microorganisms, including viruses, as revealed for instance when examining the viromes of flu-susceptible and flu-resistant wild birds (28). Some virus-virus interactions can be indirect, for instance by modifying the immune status of the host, whereas in other cases viruses interact more directly, such as in the case of satellite viruses. Virus-virus interactions can also involve members of the same viral species, which can cooperate or cheat according to the principles of social evolution (29). Wellknown instances of such interactions include the emergence of defective viral genomes at high multiplicities of infection (30), but also more recently discovered processes such as cooperative immunity evasion in animal viruses and bacteriophages (31,32). Moreover, viruses often undergo inter-host transmission or intra-host dissemination as pools of jointly dispersing viral particles, such as virion clusters enclosed in extracellular vesicles or virion aggregates (33), promoting coinfection. Whereas, virusvirus interactions typically take place when viruses share gene products in coinfected cells, it has been found that these interactions can also be established at the intercellular level. For instance, some temperate bacteriophages regulate lysis at the population level using small diffusible molecules encoded in their genomes, a system that resembles bacterial quorum sensing (34). Also, multicomponent viruses do not need to coinfect individual cells with particles containing each of the genome segments. Instead, viral products encoded by different segments are shared at the intercellular level in a process called distributed infection (35). The evolutionary stability of most of these processes depends on the spatial structure of viral infections, since this determines whether virus-virus interactions remain local or become systemic. As such, better understanding how viral infections progress anatomically and how this shapes spatial population structure remains another important research topic (36,37).
At a broader scale, coinfections are at the origin of horizontal gene transfer among viruses, although this process can also be driven via gene transfer to the host and subsequent capture by other viruses. Either way, gene exchange plays a fundamental role in host range evolution, the virus-host arms race, and virus-host coevolution at large. Improving our knowledge of horizontal gene transfer in viruses is challenging for several reasons. For most viral taxa, the number of complete and closely related genomes is small, which affects the inference of gene exchange events. In more thoroughly studied groups, such as mycobacteriophages, the emerging picture is that extensive horizontal gene transfer has produced an evolutionary network of mosaic-like viral genome structures that is poorly described by classical dichotomous phylogenetic trees (38). However, due to the high substitution rates that characterize viral genomes, homologous genes exchanged in the past often diverge beyond the point where accurate inferences can be made. Finally, the high proportion of open reading frames with unknown function restricts our ability to understand the functional consequences of genome plasticity.

CONCLUSION
The above list of challenges is far from exhaustive, yet helps us discuss how certain active research areas might progress in the near future. Importantly, these challenges are clearly interconnected. For instance, improving systems for experimental evolution should help us build more accurate predictions about viral evolution in nature, and better characterizing the virosphere should help us improve our understanding of virus-virus interactions. Some recent advances brought by these synergies have enforced deep changes in the virology and evolutionary biology communities, such as the way viral species are defined or the realization that most viruses are probably not pathogenic. Research at the interface between these areas should be encouraged too, such as for instance combining experimental virology and metagenomics. In the end, this should help us make viral evolution a more predictive, comprehensive and consistent scientific discipline.

AUTHOR CONTRIBUTIONS
All authors reviewed and approved the final version of the manuscript.