Population genomics and demographic sampling of the ant-plant Vachellia drepanolobium and its symbiotic ants from sites across its range in East Africa

The association between the African ant plant, Vachellia drepanolobium, and the ants that inhabit it has provided insight into the boundaries between mutualism and parasitism, the response of symbioses to environmental perturbations, and the ecology of species coexistence. We use a landscape genomics approach at sites sampled throughout the range of this system in Kenya to investigate the demographics and genetic structure of the different partners in the association. We find that different species of ant associates of V. drepanolobium show striking differences in their spatial distribution throughout Kenya, and these differences are only partly correlated with abiotic factors. A comparison of the population structure of the host plant and its three obligately arboreal ant symbionts, Crematogaster mimosae, Crematogaster nigriceps, and Tetraponera penzigi, shows that the ants exhibit somewhat similar patterns of structure throughout each of their respective ranges, but that this does not correlate in any clear way with the respective genetic structure of the populations of their host plants. A lack of evidence for local coadaptation in this system suggests that all partners have evolved to cope with a wide variety of biotic and abiotic conditions.


Introduction 31
The Whistling Thorn acacia, Vachellia drepanolobium (formally referred to as Acacia 32 drepanolobium), and its several species of resident ants comprise a well-studied symbiosis 33 that dominates savannahs throughout its range in Eastern Africa. The ants nest only in 34 hollow, swollen-thorn domatia of this Vachellia species, and receive extrafloral nectar from 35 the host tree. In return, the ants typically attack other organisms that venture onto the 36 trees, thereby deterring herbivores and potential enemies (Hocking 1970, Madden and 37 Young 1992). Because ants of different species compete to occupy V. drepanolobium 38 domatia, the system has been used to investigate possible mechanisms of species 39 coexistence (e.g., , Palmer 2004 focused on elucidating how the system as a whole responds to environmental perturbation 46 (Palmer et al. 2008, Riginos et al. 2015. 47 The full range of V. drepanolobium roughly spans from Tanzania in the south, 48 Uganda in the west and southern Ethiopia in the North, although it is most abundant across 49 the savannahs of Tanzania and Kenya, where it is restricted to black cotton soils. This 50 study examines V. drepanolobium and its ant associates at locations distributed throughout 51 Kenya. At each site, we describe both sides of the association, recording tree sizes and 52 densities, as well as which ants are present at each site and in what numbers. The sites 53 encompass a geographically diverse area: two are in Laikipia in north central Kenya,three 54 are in the highlands area south of Nairobi, three are in the Rift Valley, and two sites lie on 55 the edges (one western and one eastern) of the Rift. Together with this demographic 56 survey, we report the results of a population genomic study of V. drepanolobium and its 57 three obligate phytoecious (i.e., tree-dwelling) ant inhabitants, Crematogaster mimosae, C. 58 nigriceps, and Tetraponera penzigi. few cases, primarily addressing the population genetics of just one of the participants (e.g. 78 Quek et al. 2007). We thus do not know whether the congruence in population structure 79 that has so far been observed is a general feature of macrosymbioses. 80 A population genetic comparison of partners in the V. drepanolobium symbiosis is 81 particularly interesting because of the well-documented behavioral variation in the ant 82 partners. At a landscape scale, do we see signs of the same behaviors observed at Mpala, 83 particularly the hierarchy in short-range colonization ability? Do the different partners 84 show evidence of local adaption to regional environments, both abiotic (i.e., temperature, 85 rainfall), and biotic (i.e., the presence or absence of certain species or genotypes)? 86 Answering these questions would enable us to extend the substantial body of 87 research that has been done at sites in Laikipia, Kenya to sites throughout the range of V. million years ago as the African tectonic plate began to split apart, a process that continues 103 to this day (Chorowicz 2005). The difference between the relatively low-lying sites in the 104 Rift and the highlands sites outside the Rift can be as much as a kilometer (Table 1) At most of the sites, we surveyed a transect at least 0.15 ha in area, measuring every 110 tree greater than 1.0 cm diameter at 0.5 meters above the ground. At Koriema and Gilgil, 111 the patch of undisturbed habitat containing V. drepanolobium was smaller than this, so we 112 surveyed every V. drepanolobium tree in these locations, and then calculated the area they 113 occupied. Within these transects, we recorded the height, stem diameter at 0.5 meters 114 above the ground, and ant occupant of each tree. At the Mpala site, we surveyed 6.7 115 hectares of the Mpala CTFS-ForestGeo plot for tree number and ant occupant, and a 0.24 ha 116 subset of that area for tree number, ant occupant, and tree height and diameter. When the 117 canopies of two trees were entangled, we counted this as a single tree for our purposes, 118 since ants could move freely between them without having to leave the canopy; in these 119 cases, and when a single tree had multiple stems 0.5 meters above the ground, we recorded 120 the diameter of the single largest stem, and the height of the tallest point in the combined 121 canopy. 122

123
Collections 124 Between 2012-2014, mainly at the sites described above, we collected tissue from V. 125 drepanolobium and its three primary ant associates, C. mimosae, C. nigriceps, and T. penzigi. 126 These sites included five sites in the highlands east of the Rift Valley, and five in or adjacent 127 to the Rift (we were not able to include a sixth Rift site, Narok, in the population genomic 128 analysis because we surveyed this site after the other sites, in 2016, but we include the 129 demographics for this site; for sample sizes see File S1). We did not include a fourth ant 130 associate, Crematogaster sjostedti, in our population genetic study because we found it on 131 Extraction Kit (AutoGen). We used the Mouse Tail protocol for animal tissue. Throughout 149 our protocols, we followed the manufacturer's recommended protocols except where 150 described. For extractions of V. drepanolobium gDNA, we modified this to lyse the cell walls 151 by performing the lysis step using the CTAB buffer of Cullings (1992), to which was added 152 5 µL/mL β-mercaptoethanol. Genomic DNA was stored at -20° C before use. 153 The amount of genomic DNA was then increased by whole genome amplification, 154 using Multiple Displacement Amplification via the REPLI-g mini kit (Qiagen) in 15 or 20 µL 155 reactions. 156 We used the double-digest restriction-site associated DNA sequencing (RADseq) 157 protocol of Peterson et al. (2012), but modified their protocol in the following ways: we 158 started with an (amplified) genomic DNA mass of 150 ng, which we then digested with 159 restriction enzymes chosen to provide hundreds to thousands of markers in each species. 160 We used EcoRI-HF and BfaI (for the ants) or EcoRI-HF and MspI (for V. drepanolobium) 161 (New England Biolabs). Bead cleanups throughout the protocol were performed with a 162 MagNA bead solution which we made following the description of Rohland and Reich 163 (2012). For bead cleanups, we added 1.5X volume of MagNA beads, but otherwise followed 164 the protocol provided with Agencourt AMPure beads. We used the 48 inline indices for 165 EcoRI described in the Sequences-S1 spreadsheet in the supplement of Peterson et al. 166 (2012 We quality filtered reads using the FASTX-Toolkit version 0.0.13 180 (http://hannonlab.cshl.edu/fastx_toolkit/). For each read, the first seven basepairs, 181 including the EcoRI-HF restriction site and two often-low-quality bases, were removed 182 using the fastx_trimmer tool. The trimmed reads were then quality-filtered using the 183 fastq_quality_filter tool, removing any reads with a quality score of less than 25 at more 184 than 2% of bases. 185 We then aligned all reads for all individuals within each species using the 186 denovo_map.pl script of Stacks, allowing 2 mismatches between loci when processing a 187 single individual, and 2 mismatches when building the catalog. We also explored several 188 other values for these parameters; the values given above gave the best combination of a 189 larger number of alleles without large increases in heterozygosity which could indicate that 190 different loci were being improperly combined (see File S2 for details). To build the final 191 matrix of SNPs, we culled individuals for which sequencing had failed or had produced too 192 low coverage to be useful, removing roughly a third of the individuals of each species. We 193 called SNPs using the populations program of Stacks. We adjusted the stacks "-r" parameter 194 to produce a similar number of markers (around 1000) within each species, but with 195 variable amounts of missing data for each species. The -r parameter defines what 196 proportion of the individuals within a species must have a given SNP for that SNP to be 197 included in the data set. We filtered out sites with an unusually high heterozygosity (i.e., 198 over 0.6), required a minimum minor allele frequency of 0.02, and outputted only a single, 199 randomly-selected SNP per RADseq fragment. Although individuals varied in the degree to 200 which they had missing data, individuals with high and low amounts of missing data were 201 distributed fairly evenly across sites (File S3). 202 We also explored the effects of missing data on our results by producing two 203 additional SNP data sets: In one, the "r" parameter was kept constant at 0.6 across all 204 species, producing a variable number of SNPs for each species, but similar amounts of 205 missing data. We used the same individuals as in the first data set. In the final data set, we 206 used a subset of the individuals from the first two data sets, and included SNPs only if they 207 were found in 100% of individuals within each species. This produced a matrix with no 208 missing data. The results for these additional data sets were congruent with the original 209 data set, and are presented in File S3. 210 211

Population statistics 212
We divided the 10 sites into four populations based on their geography; these populations 213 also broadly corresponded to the genetic clustering found by DAPC (described below). As 214 described in Figure 1, these corresponded to northern and southern populations in and 215 around the Rift, and northern and southern populations in the highlands east of the Rift. 216 For C. mimosae, only a single colony was found in any of the Rift sites; we included this 217 colony in the southern highlands population. We used Stacks to calculate observed and 218 expected heterozygosity as well as θ (estimated as π). We calculated these values 219 individually for each population, using the entire fragment for each locus identified above. 220 We used Arlequin 3.5.2.2 (Excoffier and Lischer 2010) to calculate summary 221 statistics and pairwise FST between each population, after first translating Stacks' genepop-222 format output files into Arlequin using PGDSpider 2.0.9.0 (Lischer and Excoffier 2012) in 223 combination with a custom perl script. All such scripts used in this study are available on 224 Dryad. Since we analyzed several data sets with varying degrees of missing data, we set 225 Arlequin's internal missing-data threshold to 1 (i.e., to include all loci with any data). 226 227

Analysis of molecular variance (AMOVA) 228
We used an AMOVA to compare how much genetic variation was partitioned by the 229 four geography-based populations (FCT) versus how much population-level variation was 230 partitioned by the collection sites within each population (FSC). We used Arlequin to 231 perform locus-by-locus AMOVAs, presenting the average FSC and FCT across all loci. We also 232 report whether each measure is significantly greater than zero, as determined with 10,000 233 For V. drepanolobium, we tested whether trees assigned to a particular genetic 242 cluster were more commonly occupied by colonies of a particular ant species. For each 243 genetic cluster, we performed a multinomial exact test of goodness of fit test to determine 244 whether the distribution of ant occupants within each genetic cluster was different from 245 expected. To determine the expected ant-occupancy frequencies for each of the tree's 246 genetic clusters, we averaged the frequency of each ant species at the sites at which that 247 cluster was found, weighting each site by the number of trees from that cluster that were 248 present there. We considered only those sites where trees from multiple genetic clusters 249 coexisted, and we tested association with four categories of ant occupant: C. mimosae, C. 250 nigriceps, T. penzigi, and trees empty or occupied by a different species. When the expected 251 value was zero for one of these species of ant occupant, we dropped that species from the 252 test. We performed all tests using the xmulti function from the XNomial package in R 253 (Engels 2015). 254

Isolation by distance and environment 256
We then asked whether geographic structure was associated with environmental 257 differences among sites, which might suggest local adaptation to abiotic conditions. To and Precipitation of the Driest Quarter. In the case of the Lake Naivasha site, specimens 269 were collected from five nearby locations along the South Lake Road, which differed in 270 their WorldClim variables. We therefore treated them as separate sites for this analysis, 271 despite their relative geographic proximity. For each species, the climate data from all sites 272 were then scaled to have a mean of zero and standard deviation of one; environmental 273 distance was then estimated as the Euclidian distance between each site for these six 274 variables. Geographic distances between each site were calculated using the distGeo 275 function from the geosphere package (Hijmans 2016). Genetic distances between each site 276 were calculated using Nei's distance, as implemented by the dist.genpop function of 277 adegenet. 278

Site survey 281
Plots varied substantially along several axes (Table 1, Figure 2). Tree density varied 282 about five-fold, from 220 V. drepanolobium trees per hectare at Gilgil to a maximum of 1090 283 at the Isinya Road site; trees generally grew more densely in the eastern highlands than in 284 the sites within the Rift. Average tree height also varied; at most sites the average height 285 was 1-3 m, but at Koriema and Mogotio in the northwestern part of its range, V. 286 drepanolobium trees averaged 4-5 meters, and grew up to 10 meters in height. 287 Likewise, there was substantial variation in ant communities among sites. C. 288 sjostedti was only found on V. drepanolobium trees in the northeastern sites of Suyian and 289 Mpala. C. mimosae was more widespread, but commonly found only in the Eastern 290 Highlands sites. In and around the Rift, C. mimosae was found rarely at Narok, and outside 291 of that site, only a single colony was found, near the Lake Naivasha site. C. nigriceps and T. 292 penzigi, however, were found at every location throughout the studied range. Furthermore, 293 which ant was numerically dominant varied as well: for all four ant species, there was at 294 least one site at which that species occupied more trees than did any other ant species. 295 As we have a single time point for each site, we are unable to measure directly 296 interspecific dynamics such as the competitive hierarchy among the ants. However, we 297 were able to analyze an indirect measurement, namely, the average height of trees 298 We found a roughly similar pattern at our site in Mpala, with average tree heights 305 following the competitive hierarchy (Figure 3), except that C. mimosae occupied larger 306 trees than the more competitive C. sjostedti. In this case, however, the standard error of the 307 mean for C. sjostedti was quite large, so the true population mean may easily have been 308 greater than that of trees occupied C. mimosae. Moreover, competitive dynamics are known 309 to vary even within Mpala , so this may be a result of our survey taking place 310 in a different year and a different location within Mpala than the previous surveys. 311 We did not find, however, that a similar pattern was maintained across all sites. At 312 many sites, the largest trees were occupied by less competitively dominant ants: at the 313 Ngong Hills, for instance, the largest trees were occupied by T. penzigi, while at Kitengela, 314 the relationship between tree height and ant competitiveness was entirely reversed from 315 the pattern at Mpala. These patterns in data may indicate that, just as the competitive 316 hierarchy varies from site to site at Mpala, it also varies widely across the landscape. It may 317 also be the case that the competitive hierarchy at the other sites follows the canonical 318 hierarchy, but that what constitutes a valuable tree varies across sites. For example, C. 319 sjostedti and C. mimosae occupy the most valuable trees, which are the largest trees at 320 Mpala, but at other locations, the most valuable trees to the ants may be the medium-sized 321 ones. Either case strongly suggests that the dynamics of interspecific competition among 322 the ants is context dependent and varies strongly across the landscape. 323

DNA sequence alignment and base-calling 325
We successfully genotyped individuals of each species at between 1027 and 1122 326 loci, depending on the species (Table 2) However, in the Southern Highlands and Southern Rift sites, individuals belonging to both 371 clusters could be found even at the same site. 372 The ant species show a pattern that is quite different from that seen in their host 373 tree: in these species, genetic clustering has a stronger association with geography in each 374 case. Within each ant species, each of the four geographic areas (or two for C. mimosae) is 375 dominated by a single cluster that is rarely, if ever, found outside that region. The exception 376 to this is T. penzigi, where two genetic clusters are common in the Southern Rift sites, and 377 the Ngong Hills site is dominated by one of these clusters, rather than clustering with the 378 other Southern Highlands sites. Finally, the sole C. mimosae colony found in the Rift (at 379 Lake Naivasha) clustered with the individuals from the Southern Highlands sites. 380 We found no evidence that trees assigned to different genetic clusters were 381 associated with particular ant occupants (multinomial exact tests, p > 0.1 for all). For both V. drepanolobium and C. mimosae, we found no evidence of isolation by 391 distance or isolation by environment (p-values > 0.05). In the case of C. nigriceps, we found 392 evidence of both isolation by distance and isolation by environment (p-values < 0.05). For 393 T. penzigi, we found evidence of isolation by distance (p < 0.05 in all three data sets), but no 394 evidence of isolation by environment (p > 0.05 in all three data sets). 395 396 Discussion 397 The V. drepanolobium system shows considerable variability across its range in 398 central and southern Kenya. Much of this variation comes from comparing the sites in the 399 Eastern Highlands to the sites in the Rift Valley, where the tree is less common, patches are 400 smaller and sparser, and tree heights vary greatly. However, there is substantial variation 401 even among the eastern highlands sites, where densities, average heights, and tree biomass 402 vary by a factor of 2 or more. Nor is the variability confined to the plant; the different ant 403 species show equally broad differences. In particular, the four ant species found at Mpala 404 are commonly found together only in Laikipia; we did not find C. sjostedti outside of 405 Laikipia, and we found C. mimosae only rarely outside of the Eastern Highlands (although it 406 is also found at lower elevations in Tanzania: Hocking 1970). Even when comparing sites 407 with the same species of ants, how the trees were divided among the ants varied: among 408 our five highlands sites, numerical dominance varied, with the plurality, or even majority of 409 trees being occupied by any one of C. sjostedti, C. mimosae, or C. nigriceps, depending on the 410 site. We also found that the ant species occupying the tallest trees varied widely from site 411 to site, suggesting that not just the proportions of species, but also, potentially, the 412 outcomes of competition between them may vary from site to site. The variation we found 413 across sites in Kenya agrees with what Hocking (1970) described for four sites in Tanzania  414 and one in Kenya (Figure 2). He also found substantial difference across sites, with two of 415 his sites dominated by C. mimosae, and three sites split about evenly between C. mimosae 416 and C. nigriceps. His study and this study, combined, cover the bulk of the range in which V. 417 drepanolobium is commonly found, and show that inter-site variation is the norm for this 418 system. 419 The population structure was weaker in the ants than in the trees, whereas we found that 451 population structure was stronger in the ants than in the trees. Clearly much work remains 452 to determine whether any general patterns exist, but our results here suggest that the wide 453 diversity of patterns seen in macrobe-microbe mutualisms (e.g., lichens) is also seen in 454 large scale symbioses, like ant-plants. 455 We found little evidence that the genetic structure of V. drepanolobium and/or its 456 associated ants is associated with environmental differences. Only for C. nigriceps did we 457 find significant isolation-by-environment, suggesting that this ant species may be more 458 locally adapted to environmental conditions than are the other species. It will be 459 interesting to explore further to which environmental conditions particular populations of sjostedti and C. mimosae are rare or absent. This discovery is the more striking because the 486 V. drepanolobium system has been so well studied, producing many high-impact studies 487 over the course of several decades (e.g., Stanton et al. 1999 Philanthropic Trust and the National Science Foundation (NSF SES-0750480) to NEP. We 518 dedicate this paper to the memory of our friend, Gilfrid Powys, who generously made his 519 land available to us and supported our work in numerous ways throughout the duration of 520 this research. He will be greatly missed. 521 Data Accessibility: We will upload the DNA sequences used in this project to the NCBI 522 SRA. Scripts used in these analyses will be available on Dryad. 523 524    693 5 At Lake Naivasha, only T. penzigi was found at the site we surveyed. However, we found C. nigriceps and a 694 single colony of C. mimosae nearby along the South Lake Road. The -r parameter in stacks denotes the minimum proportion of individuals in which a SNP must be found in 698 order to be included in the data set. To ensure that variable levels of missing data were not biasing our 699 results, we also produced a data set with similar levels of missing data for each species (presented in File S3).

817
Population statistics 818 The three data sets produced broadly similar results, despite differences in the individuals 819 and SNPs making up each set (Tables S4.2, S4.3). 820 821 Analysis of molecular variance 822 As shown in Table S3.2, AMOVA tests revealed significant genetic variation partitioned 823 both among our geographic populations (FCT), and also among sites within those 824 populations (FSC). This was true for all species and all data sets. 825

832
FCT corresponds to the proportion of total genetic variation partitioned among the four populations. After each is a p-value from an AMOVA indicating 833 whether the partitioning of genetic variation is significantly greater than zero. 834 * No V. drepanolobium individuals from the SW population had high enough coverage to be included in data set 3. 835 ** All FSC and FCT values were significantly greater than zero, with p < 0.00001, except where noted. clusters (as well as a third cluster in data set 2) may be found even at the same site. 847

836
Each of the ant species shows a much stronger associations between genetic 848 clustering and geography. Across all data sets, each of the four geographic areas was 849 dominated by a single cluster that was rarely, if ever, found outside that region. The 850 exceptions to this were C. nigriceps, in which data set 2 showed two genetic clusters in the 851 Northern Highlands, and T. penzigi, where two genetic clusters were common in the 852 Southern Rift sites, and the Ngong Hills site was dominated by one of these clusters, rather 853 than clustering with the other Southern Highlands sites. In data set 3, the Southern Rift 854 sites and the Ngong Hills formed a single cluster, and the other two Southern Highlands 855 sites were each dominated by their own cluster. Finally, the representative individual from 856 the sole C. mimosae colony found in the Rift (at Lake Naivasha) clustered with the 857 individuals from the Southern Highlands sites. 858 We found no evidence that trees assigned to different genetic clusters were 859 associated with particular ant occupants (multinomial exact tests, p > 0.1 for all); this 860 equally the case for all three data sets. 861 862 Isolation by distance and environment 863 Within each of the four species, the three data sets produced similar results. For both V. 864 drepanolobium and C. mimosae, we found no evidence of isolation by distance or isolation 865 by environment in any of the data sets (all p-values > 0.05). In the case of C. nigriceps, we 866 population structure for each of the four species. We then repeated the previous runs, 917 using the same parameters and priors, except that we estimated the likelihood of the data 918 for each species under the characteristic population structure of each other species. To 919 provide a comparison, we also estimated the likelihood of the data for a single species 920 under its own characteristic population structure. When two species had different numbers 921 of populations (i.e., C. mimosae has only two populations, while C. nigriceps has four), we 922 reduced the data set of the species with more populations to allow the appropriate 923 comparison. 924 925

927
Results of the coalescent analysis 928 Population parameters estimated by fastsimcoal2 coalescent simulations revealed different 929 dynamics in the various species (presented in Tables S5.1 and S5.2). In this analysis, which 930 takes into account population growth and migration, population sizes in the V. 931 drepanolobium tree and the ant C. mimosae were large relative to C. nigriceps and T. penzigi, 932 more so than the previously-determined θ values would suggest. We also do not observe 933 that southern population of C. nigriceps and T. penzigi are particularly large, as suggested 934 by θ. 935 Figure S4.1: Populations parameters estimated by fastsimcoal2, for C. mimosae. Estimated parameters are shown in red. Population growth rates were calculated as log(Population size at divergence time/Population size at present)/Divergence time. Growth rates could be positive or negative depending on the estimates of population sizes. For simplicity, we assumed that the ancestral population(s) were static, without migration between them.