Lassa virus isolates from Mali and the Ivory Coast represent an emerging fifth lineage

Previous imported cases of Lassa fever (LF) into the United Kingdom from the Ivory Coast and Mali, as well as the detection of Lassa virus (LASV) among the Mastomys natalensis population within Mali has led to the suggestion that the endemic area for LF is expanding. Initial phylogenetic analyses arrange isolates from Mali and the Ivory Coast separately from the classical lineage IV isolates taken from Sierra Leone, Guinea, and Liberia. The availability of full genome sequences continues to increase, allowing for a more complete phylogenetic comparison of the isolates from Mali and the Ivory Coast to the other existing isolates. In this study, we utilized a Bayesian approach to infer the demographic histories of each LASV isolate for which the full sequence was available. Our results indicate that the isolates from Mali and the Ivory Coast group separately from the isolates of lineage IV, comprising a distinct fifth lineage. The split between lineages IV and V is estimated to have occurred around 200–300 years ago, which coincides with the colonial period of West Africa.


Introduction
Lassa virus (LASV) is the causative agent of Lassa fever (LF), a potentially fatal disease that infects as many as 100,000 people annually in endemic areas. Since the discovery of the virus in 1969, the endemic area for LASV has been mapped to the West African countries of Nigeria, Sierra Leone, Guinea, and Liberia (Ogbu et al., 2007). The primary natural host, Mastomys natalensis, is distributed throughout West Africa despite the constricted endemic area of LF, and infected rodents are distributed focally within the endemic area (Demby et al., 2001;Lecompte et al., 2006). However, recent cases of LF within the West African countries of Mali and the Ivory Coast suggest that the endemic area is expanding (Atkin et al., 2009;Sogoba et al., 2012).
The virus genome consists of an L and S segment, which encode the RNA-dependent RNA polymerase (LP), matrix (Z) protein, nucleoprotein (NP), and the glycoprotein precursor (GPC). Phylogenetic analyses of either partial or full-length LASV protein sequences have revealed that four lineages exist among LASV isolates. The four lineages correlate strongly with the geographic point of origin for the respective isolates (Bowen et al., 2000). While each lineage can be distinctly delineated from one another in phylogenetic analyses of each full-length gene (with the exception of the small Z protein), there is some variability in the relationships between individual lineages for each gene (Ehichioya et al., 2011). With full-length LASV sequences becoming more readily available, more extensive phylogenetic analyses utilizing full-length genes over a longer time frame can be performed, allowing for more complete characterization of each individual lineage. LASV has been suggested to have arrived in the Sierra Leone region from Nigeria between 150 and 250 years ago due to movement within the colonial period, and the recent emergence of LASV within Mali and the Ivory Coast has been suggested to be caused by movement during the Sierra Leone civil war between 1991 and 2002 (Lalis et al., 2012). The prototypic strain from the Ivory Coast, AV, was reported in 2000 as a case imported into Germany (Gunther et al., 2000). Although cases of LF have been reported in the Ivory Coast and Mali, sequencing data for isolates from these countries has only become available within the last few years (Safronetz et al., 2010(Safronetz et al., , 2013. While the sequence data for these isolates has become readily available, the genetic relationship of these isolates to the classical lineages has not been completely characterized. The purpose of this study was to determine whether a fifth lineage is emerging within Mali and the Ivory Coast using analyses of complete LP, NP, and GPC genes. Using Bayesian analysis, we investigated the relationship of isolates from Mali and the Ivory Coast to all available isolates within the four traditional lineages.

Sequence Alignments
Full-length L and S segments for each available isolate were imported into SeaView 4 (Gouy et al., 2010) from GenBank (Table 1), and nucleotide sequences were aligned as amino acids using MUSCLE (Edgar, 2004), and subsequently converted back to nucleotide sequences in order to maintain third-nucleotide alignment. The resulting alignments were trimmed to include only ORFs for LP, NP, and GPC sequences and exported as nexus files for phylogenetic analysis.

Phylogenetic Analysis
Trees for LP, NP, and GPC were generated in MrBayes (Huelsenbeck and Ronquist, 2001;Ronquist and Huelsenbeck, 2003). The software utilizes a Bayesian Markov chain Monte Carlo (MCMC) algorithm to infer phylogenetic relationships. Parameters were set to utilize the invariant gamma rate of substitution model. Four chains (one hot chain and three cold chains) were utilized, and data was sampled every 100 steps. Each analysis was run for 10,000,000 steps with burn-ins set to 250,000 steps. Data was analyzed using Tracer version 1.6 (http://tree.bio.ed.ac.uk/ software/tracer/) to confirm sufficient data sampling for each data set.
BEAST trees were generated for LP, NP, and GPC using BEAST (Drummond and Rambaut, 2007;Drummond et al., 2012). BEAST employs a Bayesian MCMC approach to infer demographic histories, evolutionary rates, and dates of divergence from serially (dated) sampled sequence data. Statistical uncertainty in the data is reflected in the 95% highest posterior density (HPD) values. Analyses were performed using the Bayesian Skyline Plot (BSP) model of population growth, which does not use a pre-specified demographic model (Drummond et al., 2005). The uncorrelated lognormal (UCLN) relaxed clock model, which allows rate variation among lineages in the phylogeny to be estimated (Drummond et al., 2006) was used. The MCMC chain was 100 million samples long, thinned to include every 5000th state in the final sample. The program Tracer version 1.6 was used to confirm stationarity. The software TreeAnnotator version 1.7.1 (http://beast.bio.ed.ac.uk/ software/TreeAnnotator) was used to summarize the data output from BEAST. The maximum clade credibility (MCC) tree was estimated using mean node heights after discarding the initial 10% of generations.

Bayesian Analysis Indicates the Presence of a Fifth LASV Lineage
To begin, we first sought to determine whether there was evidence of a fifth lineage among LASV isolates using full-length NP sequences. The analysis that originally described the four classical lineages was based on partial-length NP sequences, but was later confirmed using full-length sequences from LP, NP, and GPC. While all four lineages were easily delineated from one another using full-length sequences, the relationship of each lineage to one another varied between genes (Bowen et al., 2000;Ehichioya et al., 2011). Therefore, in order to completely characterize the relationships of isolates within Mali and the Ivory Coast to the other classical isolates, we aligned the fulllength open reading frames for GPC, NP, and LP and conducted a phylogenetic analysis utilizing the Bayesian MCMC approach (MrBayes v3.2.5) for each gene.
Analysis of both the NP and GPC genes produced phylogenetic trees (Figure 1) that resemble the traditional grouping of the four lineages, with lineage I (Pinneo) creating the most basal lineage. The hierarchy continues with lineage II, lineage III (Nig08-A18, Nigo8-A19), and lineage IV. In both trees, the isolates from Mali and the Ivory Coast delineate from lineage IV distinctly with strong bootstrap support. However, analysis of full-length LP nucleotide sequences (Figure 1) places lineage II as the most basal lineage, followed by lineage I, lineage III, and lineage IV.
Within lineage IV, the Liberian isolates (Z148, Macenta, 1200LIB10) cluster together in only the LP analysis. The isolate 1200LIB10 however does share a recent common ancestor with Z148 and Macenta in the NP analysis despite being most closely related to the Sierra Leone isolates. Interestingly, the 1200LIB10 isolate clusters within the Sierra Leone isolate clade in the GPC analysis. The Guinea rodent isolate BA366 is basal to the Liberia and Sierra Leone isolates in the NP analysis, but shares a more recent common ancestor with the Sierra Leone strains with respect to Z148 and Macenta in the GPC analysis.
Five strains of LASV fell into a different grouping, designated Lineage V, these strains included AV (Ivory Coast/Ghana), BambaR114 (Mali), KominaR16 (Mali) SorombaR (Mali), and SonombaR30 (Mali). These form a single well defined lineage with high posterior probability support. All the isolates from this lineage were isolated from Mali and the Ivory Coast suggesting that this lineage is geographically restricted, maybe due to either geographical barriers or distribution of a distinct haplotype of the rodent host M. natalensis.    The Fifth LASV Lineage Emerged during the Colonial Period In order to determine when the lineage emerged, we performed a BEAST analysis using the trees obtained from MrBayes. By providing the year of isolation for each isolate, we were able to approximate the emergence of lineages IV and V from their nearest common ancestor. Analysis of all three complete genes estimates the emergence of lineage V to have occurred roughly 250 years ago (Figure 2), which coincides with movement throughout the region during the colonial period of West Africa. While both S segment genes estimate the most recent common ancestor between lineages IV and V to have existed between 200 and 300 years ago, the estimated range is much larger in the LP analysis (141-416 years ago). The most recent common ancestor of the lineage V group was approximately 114 years ago, with a range of 225-30. Movement of the virus from Nigeria to the Mano river region is predicted to have occurred between roughly between 300 -500 years ago with respect to the S segment genes (Figure 2). However, the range for the LP gene is much larger (190-693 years ago). Western movement of LASV from Nigeria likely occurred during the pre-colonial period of West Africa between the years 1500 and 1700 AD, although the virus appears to have been circulating in Nigeria prior to 1300 AD. Additionally, the most recent common ancestor of LASV and Mopeia virus (MOPV) is estimated to have existed between 0 and 700 AD based on S segment gene analysis (Figure 2). The origin of LASV in the Nigerian region is most likely, but the sampling of LASV is still heavily biased in favor of certain regions and therefore increased sampling is required to fully determine the origins and movements of this virus.

Discussion
This study represents the first phylogenetic analysis of LASV that includes every available isolate from the traditional four lineages, as well as every available isolate from Mali and the Ivory Coast. While the lineage hierarchy for both the GPC and NP supports the results from the original analysis (Bowen et al., 2000), lineage I is not the most basal lineage in the LP analysis. However, this is not strongly supported by the posterior probability suggesting that until more strains are isolated it will be difficult to resolve with the LP gene. It is possible for two closely related arenaviruses to reassort (Lukashevich, 1992). It is possible that a reassortment event occurred between ancestral lineage I and lineage II strains, but as previous studies have not detected any reassortment events among LASV strains (Vieth et al., 2004;Emonet et al., 2006) this seems unlikely. However, the lineage I isolate was not available at the time. Our results indicate that lineage I remains the most basal lineage based on full-length GPC analysis. However, these findings do not support the previous findings by Ehichioya et al., which places lineage II as the most basal lineage for GPC. Evidence of recombination between arenavirus species has been described within the New World arenaviruses (Fulhorst et al., 1999;Weaver et al., 2000), which could explain different groupings between two genes in the same segment. However, no evidence of recombination was detected in their analysis (Ehichioya et al., 2011). It is possible that the different alignment method utilized prior to our analysis contributed to the different outcome.
The discrepancy between the GPC gene and the NP and LP genes in the topology of lineage IV is likely due to the additional number of strains belonging to lineage IV in the GPC gene tree. As full genome sequences of these strains become available we would expect these observed differences to be resolved.
Analysis of all three full-length genes supports the emergence of a fifth LASV lineage, which appears to have diverged from a common ancestor with lineage IV around 250 years ago. Conflict situations and the resulting human movement have been described to perturb the virus relationship with its peridomestic natural host, M. natalensis. Movement of M. natalensis over large distances, such as transportation by ship or even through movement of refugees during conflicts, can lead to foci of transmission among the local M. natalensis population (Lalis et al., 2012). Emergence of the fifth lineage may have therefore occurred due to human movement during the colonial period.   Figures S7-S12. The node ages, in years, are included on the major nodes, with the 95% confidence ranges displayed in parentheses below the median node ages. The isolates are grouped by their lineages, as represented by the bars to the right of the trees. The reverse axis represents the age, in years, from the most recent isolate.
FIGURE 3 | Map of LASV movement across West Africa. Based on the phylogenetic data, LASV has gradually spread west, beginning in Eastern Nigeria. Ehichioya et al. has previously illustrated the movement of LASV within Nigeria (Ehichioya et al., 2011). The isolates of lineage V and the BA366 Liberian isolate share a more direct common ancestor with lineage III than the Sierra Leone Isolates share with lineage III isolates. This suggests that the virus was likely present in Mali, the Ivory Coast, and Liberia prior to establishing itself in Sierra Leone. The areas from which the isolates were collected are shaded in gray.
When comparing the AV strain to the Sierra Leone strains, the AV strain is more closely related to the Liberian BA366 strain and the Nigerian lineage III strains. Based on this relationship between the Nigerian isolates and the isolates of lineages IV and V, it appears likely that the virus spread gradually west, establishing focal points of transmission in the Ivory Coast and Mali prior to its arrival in Liberia, Guinea, and Sierra Leone. Although the prediction varies between the three genes, this migration likely occurred during the precolonial and colonial periods, possibly arriving in Mali and the Ivory Coast between 300 and 450 years ago and Guinea or Liberia around 250 years ago. This is supported by the substitution rate estimates previously calculated, suggesting that the spread of LASV across West Africa occurred between 300 and 800 years ago (Ehichioya et al., 2011). Andersen et al. recently performed a similar phylogenetic analysis using a large number of complete L segment sequences. Their conclusions were almost identical to our own, indicating a gradual movement of LASV across West Africa (Andersen et al., 2015). In both analyses, the virus is predicted to have arrived in Mali and the Ivory Coast a full century prior to its arrival in Sierra Leone (Figure 3).
In conclusion, this study reports that a fifth LASV lineage exists within Mali and the Ivory Coast, sharing a sister relationship with the isolates of lineage IV. Despite the apparent presence of the virus within Mali over the last 200 years, it is peculiar that LF cases have only begun to surface within the last decade. The recent emergence of reported cases may be due to the lack of surveillance in the region, particularly in villages with limited access to healthcare. Nevertheless, the presence of a genetically distinct LASV lineage within this region will likely serve to increase the genetic variability in an already diverse virus species. These findings highlight the importance of considering genetic diversity among LASV isolates when developing and testing treatments and vaccine candidates.