Multiple Introductions of Mycobacterium tuberculosis Lineage 2–Beijing Into Africa Over Centuries

The Lineage 2–Beijing (L2–Beijing) sub-lineage of Mycobacterium tuberculosis has received much attention due to its high virulence, fast disease progression, and association with antibiotic resistance. Despite several reports of the recent emergence of L2–Beijing in Africa, no study has investigated the evolutionary history of this sub-lineage on the continent. In this study, we used whole genome sequences of 781 L2 clinical strains from 14 geographical regions globally distributed to investigate the origins and onward spread of this lineage in Africa. Our results reveal multiple introductions of L2–Beijing into Africa linked to independent bacterial populations from East- and Southeast Asia. Bayesian analyses further indicate that these introductions occurred during the past 300 years, with most of these events pre-dating the antibiotic era. Hence, the success of L2–Beijing in Africa is most likely due to its hypervirulence and high transmissibility rather than drug resistance.


INTRODUCTION
Tuberculosis (TB) is mainly caused by a group of closely related bacteria referred to as the Mycobacterium tuberculosis Complex (MTBC). The MTBC comprises seven phylogenetic lineages adapted to humans and several lineages adapted to different wild and domestic animal species (Gagneux, 2018). The human-adapted lineages of the MTBC show a distinct geographic distribution, with some "generalist" lineages such as Lineage (L)2 and L4 occurring all around the world and others being geographically restricted "specialist" that include L5, L6, and L7 Stucki et al., 2016). Africa is the only continent which is home to all seven human-adapted lineages, including the three "specialist" lineages exclusively found on the continent. Current evidence suggests that the MTBC overall originated in Africa (Gagneux, 2018) and subsequently spread around the globe following human migratory events (Wirth et al., 2008;Comas et al., 2013). The broad distribution of some of the "generalist" lineages and their presence in Africa has been attributed to past exploration, trade, and conquest. For instance, an important part of the TB epidemics in sub-Saharan Africa is driven by the generalist Latin-American-Mediterranean (LAM) sublineage of L4, which is postulated to have been introduced to the continent post-European contact (Stucki et al., 2016;Brynildsrud et al., 2018).
Among the different human-adapted MTBC lineages, the L2-Beijing sublineage has been of particular interest (Merker et al., 2015). L2-Beijing has expanded and emerged worldwide from East Asia; its most likely geographical origin (Luo et al., 2015;Merker et al., 2015). In some parts of the world, the recent emergence of L2-Beijing has been linked to increased transmission (Yang et al., 2012;Holt et al., 2018), high prevalence of multidrug-resistant TB (MDR-TB) (Borrell and Gagneux, 2009), and to social and political instability, resulting into displacement of people and poor health systems (Eldholm et al., 2016). Increasingly, L2-Beijing is also being reported in Africa (Bifani et al., 2002;Affolabi et al., 2009;Gehre et al., 2016;Mbugi et al., 2016), and evidence suggests that L2-Beijing in African regions is becoming more prevalent over time (Cowley et al., 2008;van der Spuy et al., 2009;Glynn et al., 2010). Some authors have hypothesized that the introduction of L2-Beijing into South Africa resulted from the importation of slaves from Southeast Asia during the 17th and 18th centuries and/or the Chinese labor forces arriving in the 1900s (van Helden et al., 2002). Alternatively, in West Africa, the presence of L2-Beijing was proposed to reflect more recent immigration from Asia (Affolabi et al., 2009;Gehre et al., 2016). To a certain extent, the recent expansion of L2-Beijing in parts of Africa has been associated with drug resistance (Githui et al., 2004;Klopper et al., 2013) and higher transmissibility (Guerra-Assunção et al., 2015). In addition, a study in the Gambia showed a faster progression from latent infection to active TB disease in patient house-hold contacts exposed to L2-Beijing (de Jong et al., 2008).
Whilst L2-Beijing seems to be expanding in several regions of Africa, no study has formally investigated the evolutionary history of L2-Beijing on the continent. In this study, we used whole genome sequencing data from a global collection of L2 clinical strains to determine the most likely geographical origin of L2-Beijing in Africa and its spread across the continent.

Whole Genome Sequence Analysis and Phylogenetic Inference
We used a customized pipeline previously described to map short sequencing reads with BWA 0.6.2 to a reconstructed hypothetical MTBC ancestor used as reference (Comas et al., 2013). SAMtools 0.1.19 was used to call single nucleotide polymorphisms (SNPs), and these SNPs were annotated using ANNOVAR and customized scripts based on the M. tuberculosis H37Rv reference annotation (AL123456.2). For downstream analyses, we excluded SNPs in repetitive regions, those annotated in problematic regions such as "PE/PPE/PGRS" and SNPs in drug-resistance associated genes. Small insertions and deletions were also excluded from the analyses. Only SNPs with minimum coverage of 20x and minimum mapping quality of 30 were kept. All SNPs classified by Samtools as having frequencies of the major non-reference allele lower than 100% (AF1<1) within each genome were considered to be heterogeneous and were treated as ambiguities and excluded, and were otherwise considered fixed (AF1 = 1). Mixed infections or contaminations were discarded by excluding genomes with more than 1,000 heterogeneous positions and genomes for which the number of heterogeneous SNPs was higher than the number of fixed SNPs. In addition we excluded genomes for which the number of fixed SNPs would fall below Q1-1.5 IQR of all fixed SNPs considering all L2 genomes (Q1 being the first quantile and IQR the inner quantile range as calculated in R3.5.0). All genomes were typed for lineage and sub-lineage using SNP markers as defined in Steiner et al. (2014) plus Coll et al. (2014) and those showing simultaneously more than one marker were excluded. We concatenated fixed SNPs from the variable positions obtained, which yielded a 32,269 bp alignment. The alignment was then used to infer a maximum likelihood phylogeny using RAxML 8.3.2 with a general time reversible (GTR) model in RAxML and 1,000 rapid bootstrap inferences, followed by a thorough maximum-likelihood search (Stamatakis, 2006). The topology was rooted using the reference strain, H37Rv which belongs to Lineage 4.

Reconstruction of the Ancestral Geographic Range
To investigate the likely geographic origin of L2-Beijing strains in Africa, we inferred the historical biogeography of L2 using the RASP software (Yu et al., 2015) on a representative subset of 422 genomes due to software's sample limitation. We achieved this by randomly removing clustered genomes (i.e., those with 12 or less SNP distances) from the same country of origin, using hierarchical clustering implemented in pvclust package in R (Suzuki and Shimodaira, 2006) on a distance matrix of the 781 genome sequences. We then applied a Bayesian binary based method (BBM) in RASP to reconstruct geographical states at the ancestral nodes on the best-scoring ML tree inferred with RAxML using the 422 L2 genomes. We used geographical regions (according to United Nations geoscheme) as proxy for origins of the L2 strains. In total 14 regions were loaded as geographic distributions (indicated in Table S1). The ancestral reconstruction was performed with the Proto-Beijing clade as outgroup. We finally ran Bayesian analysis with 10 chains and 50,000 generations.

Stochastic Character Mapping
To determine the number of introduction events of L2-Beijing into African regions, we applied stochastic character mapping as implemented in SIMMAP (Bollback, 2006) on the 781 L2 phylogeny inferred from the best-scoring ML tree rooted on the Proto-Beijing clade, using the make.simmap function in phytools package 0.6.60 in R 3.5.0 (Revell, 2012;R Core Team, 2018). Geographical origin of the L2 strains was treated as a discrete trait and modeled onto the phylogeny using ARD model with 100 replicates. This model allows unequal rates of state transition permiting independent region-to-region transfers. We summarized the results of the 100 replicates using the function summary in phytools package 0.6.60 in R (Revell, 2012). We referred to the resulting introductions as migration events "M, " and discuss only those introductions with 5 or more genomes.

Population Genetic Analyses Nucleotide diversity (pi)
We calculated the mean pair-wise nucleotide diversity per site (Pi) measured by geographic region. We excluded geographic regions represented by <20 genomes. Confidence intervals were obtained by bootstrapping through resampling using the sample function in R with replacement and the respective lower and upper confidence levels by calculating 2.5th and 97.5th quartiles. Resampling was additionally done using the smallest size of the geographical regions to account for the effect of different sample sizes.

Pairwise SNP distances
We used dist.dna function of ape package implemented in R (Paradis et al., 2004) to calculate pairwise SNP distances with raw mutation counts and pairwise deletions for gaps. Mean pairwise SNP distance to all strains of the same geographic population was calculated per strain and the distribution of the mean SNP pairwise distance for all strains plotted. The mean pairwise SNP distances were assumed not to be normally distributed and we therefore used Wilcoxon rank-sum test to test the differences among geographic regions. Additionally, we calculated pairwise SNP distances within African L2-Beijing populations for migration events with more than 10 genomes each.

Drug Resistance
To distinguish between drug-susceptible and drug-resistant strains, we used genotypic drug resistance molecular markers previously described (Steiner et al., 2014). We categorized strains into: susceptible as having no drug resistance specific mutations; monoresistant as having mutations conferring resistance to a single drug; MDR as having mutations conferring resistance to isoniazid and rifampicin; and extensively drug-resistant (XDR) as having mutations conferring resistance to fluoroquinolones and aminoglycosides in addition to being MDR (Table S2).

Bayesian Molecular Dating Data preparation and preliminary analysis
To estimate the historical period in which L2-Beijing was introduced to Africa, we performed a set of Bayesian phylogenetic analyses using tip-calibration (Rieux and Khatchikian, 2017). Among the 781 studied L2 strains, we had information on the year of sampling for 308. We performed all further analysis on this subset of 308 strains. We excluded all genomic positions that were invariable in this subset and all positions that were undetermined (missing data or deletions) in more than 25% of the strains, and obtained an alignment of 10,769 polymorphic positions.
In tip dating analysis it is important to test whether the dataset contains strong enough temporal signal (Rieux and Balloux, 2016). To do this, we performed a tip randomization test (Ramsden et al., 2008) as follows. We used BEAST2 v. 2.4.8 (Bouckaert et al., 2014) to run a phylogenetic analysis with a HKY + GAMMA model (Hasegawa et al., 1985), a constant population size prior on the tree and a strict molecular clock. Additionally, we used the years in which the strains were sampled to time-calibrate the tree, and we modified the extensible markup language (xml) file to specify the number of invariant sites as indicated by the developers of BEAST2 here: https://groups. google.com/forum/#!topic/beast-(strict_preliminary.xml). We ran three independent runs (245 million generations in total), and we used Tracer 1.7 (Rambaut et al., 2018) to identify the burn-in (8 million generations), to assess that the different runs converged, and to estimate the effective sample size (ESS) for all parameters, the posterior and the likelihood (ESS > 110 for all parameters). We then used TipDatingBeast (Rieux and Khatchikian, 2017) to generate 20 replicates of the xml file in which the sampling dates were randomly reassigned to different strains. For each replication, we ran the same BEAST2 analysis as for the original (observed) dataset (one run per replicate, 50 million generations, 10% burn-in). We used TipDatingBeast to parse the log files output of BEAST2 and compare the clock rate estimates for the observed data and the randomized replications. The estimates of the molecular clock rate did not overlap between the observed and the randomized dataset, indicating that there is a clear temporal signal and that we could proceed with further analysis ( Figure S2).

Model selection
To identify the clock model that best fits the data, we estimated the marginal likelihood of three different clock models: UCED and UCLD (Drummond et al., 2006), assuming a coalescent constant population size tree prior and the HKY model of nucleotide substitution. We used the Model selection package of BEAST2 to run a path sampling analysis (Lartillot and Philippe, 2006) following the recommendations of the BEAST2 developers (http://www.beast2.org/path-sampling/). We used the following settings: 100 steps, 4 million generations per step, alpha = 0.3, pre-burn-in = 1 million generations, burn-in for each step = 40% ( * PS.xml). For these analyses, we used proper priors as suggested by (Baele et al., 2012).

UCLD analysis
Since the model selection analysis indicated that the UCLD clock was the best fitting model, we repeated the analysis using the UCLD and the same settings used in the path sampling analysis, sampling every 10,000 generations. We ran three independent runs (800 million generations in total), we used Tracer 1.7 (Rambaut et al., 2018) to identify the burn-in (10 million generations), to assess that the different runs converged and to estimate the effective sample size (ESS) for all parameters, the posterior and the likelihood (ESS > 260 for all parameters) (UCLD_final.xml and Table S3).
We checked the sensitivity to the priors by running one analysis of 250 million generation sampling from the prior, and compared the parameter estimates with the analysis using the data. We observed the posterior distribution and the prior distribution of all parameters are very distinct ( Table S4), indicating that the parameter estimates are influenced by the data and not by the priors (Bromham et al., 2018). Additionally we repeated the dating analysis with a coalescent exponential population growth tree prior (UCLD + HKY + exponential growth) and with a GTR model of nucleotide substitution (UCLD + GTR + constant size). All these analyses resulted in similar estimates of the age of the introductions of L2 in Africa, thus showing that our results are not strongly influenced by the tree prior and the nucleotide substitution model (Table S6).
We repeated the tip randomization test with the UCLD model as described above (20 replicates, one run per replicate, 105 million generations per replicate or more, burn-in 10%), and again we found a temporal signal ( Figure S3).
To summarize the results, we sampled the trees from the three runs (5% burn-in corresponding to 10 million generations or more, sampling every 25,000 generation). We then summarized the 31,758 sampled trees, created a maximum clade credibility tree using the software TreeAnnotator from the BEAST2 package and used FigTree version 1.4.2 (http://tree.bio.ed.ac.uk/software/ figtree) for visualization ( Figure S4).

Phylogenetic Inference of L2 Strains
We analyzed a total of 781 L2 genomes originating from 14 geographical regions including Eastern and Southern Africa ( Figure S1 and Table S1). We focused on seven geographical regions that had more than 20 genomes each, and assigned the remainder to "Other, " including two genomes from Western Africa ( Figure 1A). The resulting phylogeny of L2 was divided into two main sublineages: the L2-proto-Beijing and L2-Beijing, supporting previous results (Luo et al., 2015;Shitikov et al., 2017). The L2-proto-Beijing was the most basal L2 sublineage and was restricted to East-and Southeast Asia. L2-Beijing, particularly the "modern" (also known as "typical") sublineage, was geographically widely distributed and included strains from Africa. We further characterized L2-Beijing using the recently described unified classification scheme for L2 (Shitikov et al., 2017).

The Population Structure of L2-Beijing in Eastern and Southern Africa
Our findings showed the population of African L2-Beijing to be heterogeneous (Figures 1B, 2 and Table S5). Most of the African L2-Beijing strains were classified into several groups within the "modern" sublineage, which included primarily the "Asian-African" sublineages (L2.2.4, L2.2.5 and L2.2.7), consistent with previous findings (Merker et al., 2015). We also identified the "ancient" (atypical) strains among the African L2-Beijing. Given that "ancient" L2-Beijing strains (L2.2.1-L2.2.3) are generally uncommon (Luo et al., 2015), it is interesting to observe such strains in both African regions. In several instances, African L2-Beijing strains did not fall into any of the previously defined groups (Figure 2). Of the two African regions studied here, East Africa had higher proportion of previously uncharacterized L2-Beijing strains (43/92, 46.7%).
In summary, our findings show that African regions harbored distinct L2-Beijing populations. This is unlike Eastern Europe and Central Asia, where L2-Beijing is dominated by a few highly similar strains (Casali et al., 2014;Eldholm et al., 2016). Of note, L2-Beijing strains typical Eastern Europe and Central Asia were completely absent from the African populations (Figure 2).

Genetic Diversity of L2-Beijing Strains Across Geographic Regions
The spatial distribution of L2-Beijing sublineages and the prevalence of "ancient" L2-Beijing strains observed in this study and previously (Luo et al., 2015;Merker et al., 2015), suggest that L2-Beijing has expanded worldwide from Asia. This view can further be supported by the measures of genetic diversity of L2-Beijing in the different geographical regions (Figure 3). As expected, East-and Southeast Asia contained the most genetically diverse L2 populations, which is consistent with previous results (Luo et al., 2015). Conversely, L2 populations in other geographies were less genetically diverse, suggesting recent dissemination of L2 to these regions. Within Africa, Southern Africa showed a higher diversity in L2-Beijing populations compared to Eastern Africa.
The genetic diversity within the African L2-Beijing populations not only reflects the number and variety of source populations but also local patterns of diversification that occurred after their introduction. Therefore, the higher genetic diversity of the L2-Beijing populations in Southern Africa compared to Eastern Africa likely reflects both aspects.

Multiple Introductions of L2-Beijing From Asia Into Africa
Based on our reconstructed phylogeny, African L2-Beijing strains clustered into several unrelated clades indicating multiple introductions into Africa (Figure 1B). We next investigated the most likely geographical origins of those introductions. As anticipated, our ancestral reconstruction using RASP estimated East Asia as the most likely origin of all L2 (posterior probability of 96.1%) and L2-Beijing (posterior probability 92.5%) ( Figure S5). Our data further indicate that L2-Beijing was introduced into Africa from East-and Southeast Asia on multiple occasions independently. Furthermore, we observed both direct introductions from Asia into Africa as well as subsequent dispersal within the continent (Figures S6, S7).
Using stochastic mapping, we estimated a total of 13 introductions or migration events (M1-M13) into Africa (Figure 4). Eight of the African L2-Beijing introductions originated from East Asia and five from Southeast Asia. Out of the 13 introductions, three (M3, M10, and M13) were present in both African regions analyzed here, suggesting initial introductions from Asia followed by subsequent spread within Africa. Overall, our analysis inferred more independent introductions into Southern Africa (n = 7, M1, M4, M7-9, M11 and M12, all of them with extant strains sampled in South Africa) than Eastern Africa (n = 3, M2 sampled in Malawi and Tanzania; M5 and M6 sampled in Kenya and in Malawi, respectively). Taken together, our data suggest that multiple migration events have shaped the populations of L2-Beijing in Africa.

Bayesian Molecular Dating
Different hypotheses have been formulated on the possible timing of the introduction of L2-Beijing into Africa (van Helden et al., 2002). Here we used tip-calibration to date the phylogenetic tree of L2 and estimate the age of its introduction to Africa. For these analyses, we identified 308 strains among the 781 for which the sampling year was known. These strains were sampled during a period of 19 years; 1995-2014 (Figure S8), were evenly distributed on the complete phylogenetic tree ( Figure S9) and included 40% members of the African L2-Beijing strains ( Figure S10). Eleven of the 13 African introductions were represented in this dataset (M1-M3 and M6-M13).
We performed a Date Randomization Test with a strict clock and with a relaxed clock. With both models we detected no overlap in the 95% credibility interval of the clock rate estimates of observed and randomized datasets indicating that there was sufficient temporal signal in the dataset to perform inference (see methods, Figures S2, S3). Further, We found that the UCLD clock had the highest marginal likelihood and a Bayes Factor of 27 with the second best fitting model, the strict clock (Table 1), indicating strong evidence in favor of the UCLD clock (Kass and Raftery, 1995).
We performed a phylogenetic analysis with BEAST2 using the UCLD clock. Under the UCLD model, the coefficient of variation (COV), which is a summary of the branch rates distribution (standard deviation divided by the mean), gives an indication on the clock-likeness of the data (Drummond et al., 2006). A coefficient of variation of zero indicates that the data fit a strict clock, whilst a greater COV indicates a higher heterogeneity of rates through the phylogeny. We obtained a mean COV of 0.22 (95% credibility interval= 0.1732, 0.2732), indicating a moderate level of rate variation across different branches and thus supporting the results of the path sampling analysis that favored the UCLD model.

Recent Origins of the African L2-Beijing Clades
We used the UCLD clock model to infer the clock rate and divergent times of the 308 L2 strains with known sampling dates and estimated a mean substitution rate of 1.34 × 10 −7 [95% Highest Posterior Density (HPD), 9.2867 × 10 − 8-1.7719 × 10 −7 ]. These estimates are in agreement with previously reported rates from epidemiological studies (Walker et al., 2013;Eldholm et al., 2016). However, the estimated rate by Eldholm et al. (2016) is relatively higher compared to our estimates, which likely reflects the fact that our dataset included the entire Lineage 2 as opposed to a only a single particular clade in the case of Eldholm et al. (2016). We estimated the most recent common ancestor (MRCA) of the extant L2-Beijing of the 308 strains to the year 1225 [95% HPD, 900-1519] (Figure S4). This estimate was slightly younger than the previous estimate for the whole of Lineage 2 by the study of (Bos et al., 2014), likely reflecting the difference in methodology. The latter study used ancient M. pinipeddii DNA recovered from pre-Columbian human remains to calibrate the phylogeny as opposed to tip-dating based on contemporary sampling dates. For each African clade, we estimated the year of introduction using the 0.975 quantile of the HPD of the age of the MRCA as the upper limit (most recent possible year) and the 0.025 quantile of the HPD of the divergence time between the closest non-African L2-Beijing strain (the closest outgroup) and the African clade of interest as lowest limit (most ancient possible year). This approach produced conservative estimates, while relying only on the age of the MRCA of the African clades would systematically underestimate the age of the introductions. Our estimates placed the earliest introductions of the African L2-Beijing (M1, M3, M7, and M12) in the eighteenth and nineteenth century (Figure 5 and Table S6). Four additional migration events (M6, M9, M10, and M11) were estimated to have occurred between the beginning of the nineteenth century and the first half of the twentieth century. Finally, the three most recent introductions to Africa happened in the second half of the twentieth century (M2, M8, and M13). Diversity patterns of the African clades exclusive to Easternand Southern Africa could further provide support for the recent introductions of African L2-Beijing. We thus calculated the pairwise SNP distances within the individual introductions to explore the local patterns of diversification associated with regional epidemics after the introductions. Although strains within Southern African introductions were relatively more distantly related, L2-Beijing strains from both African regions were on average 20 to 40 SNPs apart (Figures S11-S13). The latter thresholds were proposed to correspond to strains involved in transmission clusters of estimated 50 to100 years (Meehan et al., 2018), supporting the relatively recent introductions of L2-Beijing into the African continent.
Overall, these results indicate that the different introductions of L2-Beijing to Africa occurred over a period of 300 years. While the earliest introduction is unlikely to have happened after 1732-1874, the most recent is unlikely to have occurred before 1946-1980. However, our 95% HPDs show a wide range of uncertainty, which likely resulted from extrapolating several centuries back in time based on a comparably short calibration interval spanning only 19 years.

Introductions of L2-Beijing Into Africa Unrelated to Drug Resistance
Because of the repeated association of L2-Beijing with antibiotic resistance (Borrell and Gagneux, 2009), the emergence and dissemination of L2-Beijing strains has often been attributed to drug resistance. However, our estimated timing of these introductions suggest that African L2-Beijing strains were introduced prior the discovery of TB antibiotics, and thus must have involved drug-susceptible strains (Figure 5). To explore this question further, we assessed the drug resistance profiles of L2-Beijing strains linked to the various introduction events into the two African regions. We found that all the Eastern African populations contained only drug-susceptible strains and that approximately three-quarters of L2-Beijing strains in the Southern African populations were drug-susceptible, with the remaining being either mono-, multi-, or extensively drugresistant (Figure 6 and Figure S14). Taken together, these results suggest that the emergence of L2-Beijing in Africa, particularly in Eastern Africa, was not driven by drug resistance. Moreover, our data indicate independent acquisition of drug resistance for the resistant strains detected in the Southern African L2-Beijing population (Figure 6), which might partly contribute to the subsequent spread of L2-Beijing in Southern Africa but not in Eastern Africa.

DISCUSSION
This study investigated the most likely geographical origin of the L2-Beijing in Africa. In line with previous findings (Luo et al., 2015;Merker et al., 2015), we identified East Asia as the most likely place of origin of L2 and L2-Beijing. Our findings further revealed multiple independent introductions of L2-Beijing into Africa linked to separate populations originating from both Eastand Southeast Asia. Some of these introductions were followed by further onward spread of L2-Beijing within African regions. Finally, we demonstrate that most introductions of L2-Beijing on the continent occurred before the age of antibiotics. L2-Beijing has received much attention given its hypervirulence in infection models (Manca et al., 2001;Ribeiro et al., 2014), faster progression to disease and higher transmission potential in humans (de Jong et al., 2008;Holt et al., 2018), frequent association with drug resistance, and recent emergence in different regions of the world (Bifani et al., 2002;Borrell and Gagneux, 2009;Fenner et al., 2013). Several studies indicate L2-Beijing originated in Asia and spread from there to the rest of the world (Luo et al., 2015;Merker et al., 2015). Our results support this notion by identifying "Asia" as the most likely geographical origin of both L2 and L2-Beijing based on our ancestral reconstructions and the fact the L2-Beijing populations in Asia are much more diverse than in other regions. In addition, our findings show that L2-Beijing was introduced into Africa multiple times from both East-and Southeast Asia. The presence of L2-Beijing in South Africa has previously been proposed to be due to the importation of slaves from Southeastern Asia by Europeans in the seventeenth and eighteenth centuries followed by the import of Chinese labor-forces in the early 1900s (van Helden et al., 2002;Mokrousov et al., 2005). Our Bayesian dating estimates predicted the earliest introductions of L2-Beijing into Africa to have occurred in the eighteenth and nineteenth centuries, concurring with these proposed time periods. However, our findings also point to later introductions of L2-Beijing into the continent in the nineteenth and early twentieth centuries. The timings of the latest three introductions in the second half of the twentieth century coincide with the decolonization and post-colonial period in Africa when investments into infrastructure and other projects by Chinese enterprises substantially increased (Yuan, 2006;Rice, 2011). These activities also brought many Chinese workers to Africa during a time when TB was still very prevalent in China (Murray, 2018). Hence, many of these workers were likely latently infected with L2-Beijing and might have later reactivated (Pescarini et al., 2017). Overall, our findings suggest that L2-Beijing has emerged in Africa over the last 300 years and that these introductions have occurred sporadically ever since.
The repeated association of L2-Beijing with drug resistance (Borrell and Gagneux, 2009) has led some to propose that drug resistance is another reason why this sublineage might successfully compete against and eventually replace other M. tuberculosis genotypes (Parwati et al., 2010). However, the underlying reason for the association of L2-Beijing with drug resistance remains unclear (Borrell and Trauner, 2017), and it is also far from universal, with several reports from e.g., China and other regions finding no such association (Hanekom et al., 2007;Yang et al., 2012). Our results show that most introduction events of L2-Beijing into Africa pre-date the antibiotic era, and because of that, these introductions were most likely caused by drug-susceptible strains. The notion that the initial emergence of L2-Beijing in Africa was not driven by drug resistance is further supported by our findings that none of L2-Beijing strains from Eastern Africa strains analyzed here were drug-resistant. Of note, our observations suggest that drug resistance in South Africa was acquired via independent events post initial introductions from Asia. This is in sharp contrast to the situation in Eastern Europe and Central Asia, where L2-Beijing is highly prevalent but dominated by few recently expanded drug-resistant clones, which account for up to 60% of the L2-Beijing populations in some of these countries (Casali et al., 2014;Eldholm et al., 2016). The association of L2-Beijing with drug resistance in these regions were likely favored by the economic and public health crises that followed the collapse of Soviet Union (Luo et al., 2015;Merker et al., 2015).
Based on our finding that the original introductions of L2-Beijing into Africa involved drug-susceptible strains and that the prevalence of drug-resistant L2-Beijing in Africa overall is comparably low (WHO, 2017), we propose that some of the other characteristics of this sub-lineage, in particular its high virulence, high transmissibility and rapid progression from infection to disease, were responsible for the initial competitive success of L2-Beijing in Africa. Until recently, Africa was considered a Virgin Soil for TB and that TB was only introduced following European contact . However, this notion is incompatible with the current evidence supporting an African origin for the MTBC overall (Hershberg et al., 2008;Wirth et al., 2008;Comas et al., 2013). Indeed, Africa is the only continent to harbor all seven known human-adapted MTBC lineages, three of which are almost exclusively observed there (Gagneux, 2018), as well as several animal-adapted MTBC ecotypes, which affect several wild African mammals (Brites et al., 2018). Moreover, one study showed that TB epidemics on the continent were caused by many different "native" genotypes prior to foreign contacts , suggesting that TB existed in Africa before the initial European contact. Perhaps the clinical and epidemiological picture of TB in Africa was different at the time, characterized by a low virulent and slow progressing disease, which would have escaped the attention of European colonial officials reporting the absence of TB in Africa, at a time when the TB epidemic was raging in the cities of Europe and North America, killing up to 20% of the adult population (Comas and Gagneux, 2011). We hypothesized previously that rapid co-expansion of MTBC L2, L3, and L4 with their respective human host populations in China, India and Europe might have selected for higher virulence and shorter latency in these "modern" lineages (Hershberg et al., 2008). The emergence and expansion of these "foreign" genotypes including L2-Beijing into Africa as reported here, and as reported earlier for L4 (Stucki et al., 2016;Brynildsrud et al., 2018), demonstrate the ability of these lineages to successfully compete against the existing genotypes on the continent, likely as a result of their high transmissibility and rapid progression to disease. Following their initial establishment, poor TB treatment programs subsequently selected for drug resistance in L2-Beijing but also in other MTBC lineages including L4, which might have facilitated their further spread in countries such as South Africa .
Finally, the estimates of the TMRCA for L2 reported here are largely consistent with recent reports using similar tipdating analyses based on isolation dates (Eldholm et al., 2016), and support the notion that the MTBC overall is younger than what has been proposed in earlier studies, based on a hypothesized co-divergence of the human-adapted MTBC and modern humans since their migration out of Africa (Comas et al., 2013;Luo et al., 2015). This study is limited by the fact that we analyzed a globally diverse collection of L2 genomes available in public repositories. Hence, these strains might not be fully representative of the respective geographical regions. Moreover, our African L2-Beijing dataset came from convenient sampling and comprised L2-Beijing mainly from Eastern and Southern Africa, as whole genome data of L2-Beijing from the other African regions were unavailable at the time of the study. However, the representation of African L2-Beijing in our sample reflects the overall prevalence of this sub-lineage as recently reported for the continent (Mbugi et al., 2016;Chihota et al., 2018).
Moreover, although regions outside of Eastern-and Southern Africa were underrepresented, this is unlikely to invalidate our findings regarding the multiple independents of L2-Beijing into Africa, except by underestimating the number of true introductions.
In conclusion, this is the first study to address the geographical origins of L2-Beijing in Africa using whole genome sequencing data. Our findings indicate multiple independent introductions of L2-Beijing epidemics into Africa from East-and Southeast Asia during the last 300 years that were unrelated to drug resistance. The TB epidemics in Africa have remained fairly stable over the last few decades (WHO, 2017). However, Africa's population growth and increasing urbanization (driven by booming economies) are likely to have an impact on the future of TB in this continent, whether directly by e.g., facilitating transmission or indirectly by promoting new risk factors such as diabetes that increase TB susceptibility (Dye and Williams, 2010). It is therefore crucial to follow the TB epidemics in the continent very closely, especially those related to hypervirulent strains such as L2-Beijing, as these might take particular advantage of this expanding ecological niche (Cowley et al., 2008).

DATA AVAILABILITY
The xml files used for this study can be found here https://github. com/SwissTPH/TBRU_L2Africa.git.

AUTHOR CONTRIBUTIONS
LR, DB, FM, DS, LF, and SG planned the study. SL, BM, and JF performed the experiments. LR, DB, FM, SMG, SL, BM, CB, SB, KM, MB, LJ, KR, EJC, LD and LF contributed strains and prepared the data. LR, DB, FM, and SG analyzed the data. LR, DB, FM, and SG drafted the manuscript. All authors critically reviewed the manuscript.

ACKNOWLEDGMENTS
We would like to thank Sebastián Duchêne and Yan Yu for their technical support and Linda-Gail Bekker for contributing strains. All bioinformatics analyses were performed at the scientific computing core facility of the University of Basel, sciCORE (http://scicore.unibas.ch/). This work was supported by the Swiss National Science Foundation (grants 310030_166687 to SG), the European Research Council (309540-EVODRTB to SG) and SystemsX.ch This research was also partially supported (strain collection) by a funding supplement from the National Institutes of Allergy and Infectious Diseases (NIAID) under award numbers U01 AI069924 (IeDEA Southern Africa) and U01 AI069911 (IeDEA East Africa).