Selective Sweeps Lead to Evolutionary Success in an Amazonian Hyperdominant Palm

Despite the global importance of tropical ecosystems, few studies have identified how natural selection has shaped their megadiversity. Here, we test for the role of adaptation in the evolutionary success of the widespread, highly abundant Neotropical palm Mauritia flexuosa. We used a genome scan framework, sampling 16,262 single-nucleotide polymorphisms (SNPs) with target sequence capture in 264 individuals from 22 populations in rainforest and savanna ecosystems. We identified outlier loci as well as signal of adaptation using Bayesian correlations of allele frequency with environmental variables and detected both selective sweeps and genetic hitchhiking events. Functional annotation of SNPs with selection footprints identified loci affecting genes related to adaptation to environmental stress, plant development, and primary metabolic processes. The strong differences in climatic and soil variables between ecosystems matched the high differentiation and low admixture in population Bayesian clustering. Further, we found only small differences in allele frequency distribution in loci putatively under selection among widespread populations from different ecosystems, with fixation of a single allele in most populations. Taken together, our results indicate that adaptive selective sweeps related to environmental stress shaped the spatial pattern of genetic diversity in M. flexuosa, leading to high similarity in allele frequency among populations from different ecosystems.


INTRODUCTION
The geographic distribution of genetic diversity results from the interplay between demographic forces (i.e., genetic drift and gene flow), mutation, recombination, and natural selection (Wright, 1931). The study of adaptation to environmental conditions has led to an understanding of how species cope with climate and landscape changes (e.g., Evans et al., 2014;Gratani, 2014;Yeaman et al., 2014;Hornoy et al., 2015). Genome scan approaches have contributed to the understanding of adaptive responses (Savolainen et al., 2013;Tiffin and Ross-Ibarra, 2014), especially because local adaptation may affect genetic variation at specific loci (Smith and Haigh, 1974;Kaplan et al., 1989;Orr, 1998;Kawecki and Ebert, 2004), allowing for the identification of genomic regions that are tightly linked to or directly under natural selection. However, it is important to note that most adaptive traits are polygenic with large number of alleles with small effects (McKay and Latta, 2002;Pritchard and Di Rienzo, 2010). In these traits, selection may cause only small changes in allele frequencies, and the response to selection may be the outcome of allelic covariances among loci (McKay and Latta, 2002;Pritchard and Di Rienzo, 2010;Le Corre and Kremer, 2012).
Evidence of local adaptation has been identified in plants from temperate climates (Parchman et al., 2012;Yeaman et al., 2014Yeaman et al., , 2016Hornoy et al., 2015;Jordan et al., 2017), yet only recently have these approaches been applied to tropical species (e.g., Lanes et al., 2018;Collevatti et al., 2019;Brousseau et al., 2020). Despite recent advances, the detection of selection footprints using genome scans is still limited because of demographic history, which can constrain the detection of other types of selective forces, such as background, balancing, or purifying selection (Otto, 2000;Vitti et al., 2013). The detection of selection footprints may also be hindered by range shifts due to paleoclimatic changes, causing incomplete lineage sorting and high genetic differentiation . Thus, disentangling the effects of demographic history from selection in extant genetic diversity is important to better understand the evolution of species distributions, but remains a challenge.
Target sequence capture has been increasingly used to genotype large numbers of markers because it significantly reduces costs and effort compared with whole-genome sequencing and can be used to study the role of evolution and ecology at the genome-wide scale (Jones and Good, 2016;Silva-Junior et al., 2018;Andermann et al., 2019). In plants, the method has been applied successfully to robustly estimate population genetic parameters in groups such as maize (Fu et al., 2010), pine [Pinus taeda, (Neves et al., 2013); Pinus albicaulis (Syring et al., 2016)], poplars (Zhou and Holliday, 2012), and tropical trees and palms (e.g., Collevatti et al., 2019;Loiseau et al., 2019).
Palms have high levels of genetic and phenotypic variation and a long and rich fossil history (Baker and Dransfield, 2016). The family has a wide distribution across large geographic areas and latitudinal ranges, but climatic variables cause palms to be spatially restricted along temperature and other geographic and environmental gradients (Eiserhardt et al., 2013;Freitas et al., 2019). Mauritia flexuosa L. f. (Arecaceae) is one of the most hyperdominant species in Amazonia and is common in swampy environments (Rull, 1998;Ter Steege et al., 2013;Rull and Montoya, 2014). It has a large geographic range across different ecosystems (Figure 1): in lowland tropical forest in the Amazon and Orinoco Basins and the Guiana Shields, and in savannas in the Llanos and the Brazilian Cerrado.
The high abundance and wide geographic distribution of M. flexuosa raise the question of how the species reached such level of evolutionary success. The species is dioecious and windpollinated (Khorsand and Koptur, 2013). Fossil records and phylogeographic analyses of M. flexuosa show a capacity for resilience to climate change, at least through Quaternary range shifts (e.g., de Lima et al., 2014;Melo et al., 2018). Moreover, the species has high genetic differentiation among populations from different ecosystems and river basins as shown by chloroplast (de Lima et al., 2014) and microsatellite loci (Melo et al., 2018;Sander et al., 2018), suggesting that genetic drift may shape the current distribution of genetic variation in M. flexuosa. A genomewide scan allows for a comprehensive evaluation of the relative roles of genetic drift and selection in the spatial distribution of genetic diversity and provides an unprecedented window into the evolutionary success of this iconic Neotropical palm species.
Here, we test between local adaptation and genetic drift as a driver of the evolutionary success characterizing the high abundance and broad geographic range of the hyperdominant species M. flexuosa. We identify local adaptation through detection of outlier loci and correlation of allele frequencies with environmental variables that indicate ecosystem-based differences among populations. To identify selective sweeps and genetic hitchhiking events, we used likelihood ratio tests. For these tests, we sampled populations in three different ecosystems across its geographic distribution, genotyping 16,262 singlenucleotide polymorphisms (SNPs) with target sequence capture approach. Coalescent simulations were used to reconstruct the demographic history of the species based on putatively neutral SNPs in an attempt to disentangle the effects of demography (genetic drift and gene flow) and natural selection on the distribution of genetic diversity. Because of the contrasting environmental conditions of the different ecosystems, i.e., tropical rain forest and seasonal savanna, we expect to find local adaptation and high differentiation in allele frequencies among ecosystems.

Population Sampling
We sampled 264 individuals from 22 populations of M. flexuosa throughout its geographic range (Figure 1 and Supplementary Table S1 in Supplementary Appendix 1). Populations were sampled in Amazonia (11 populations), Llanos (two), and Cerrado (nine) ecosystems across different river basins: eight populations from the Amazon basin, one in the Caribbean, two in the North Atlantic, two in the Orinoco, three in the Paraná-Paraguay, one in the São Francisco, and five in the Araguaia-Tocantins (Figure 1  Each color represents an inferred genetic cluster (K = 5). The size of cluster chart section represents population coancestry for each cluster. The plot shows the best K = 5 found in six from 10 runs. The inset map shows the geographic distribution of M. flexuosa based on the occurrence records (black dots) from the Global Biodiversity Information Facility (https://www.gbif.org). The shape file of river basins was obtained from HydroBASINS v. 1c (Lehner and Grill, 2013). Details on the sampled populations are provided in Supplementary Table S1 in Supplementary Appendix 1.
(Universidade de Brasilia, Brasília, DF, Brazil). A shape file of river basins was obtained from HydroBASINS v. 1c (Lehner and Grill, 2013). Leaves of adults were collected, and geographic coordinates were taken for each individual.

SNP Genotyping
We used a set of 2,909 120-mer probe sequences targeting single copy loci derived from Cocos nucifera, Elaeis guineensis, Nypa fruticans, Phoenix dactylifera, and Sabal bermudana covering 169 single-copy nuclear genes for target sequence capture (Heyduk et al., 2016). M. flexuosa has 2n = 30 and genome size 1C = 4.7 pg (Röser et al., 1997). We extracted the DNA using Qiagen DNeasy Plant Mini kit (Qiagen, DK). Targeted DNA enrichment, capture, and sequencing were carried out by RAPiD Genomics LLC (Gainesville, FL, United States) using the SureSelectXT (Agilent Technologies, CA, United States) enrichment system and Illumina DNA sequencing on a HiSeq2000 instrument (Illumina, CA, United States). Genomic DNA libraries were prepared using the Agilent protocols, including DNA shearing followed by ends repair, 3 -end adenylation, adaptor ligation, and amplification following Neves et al., 2013. Libraries were pooled in equimolar amounts and sequenced in two lanes of an Illumina HiSeq2000 instrument (Illumina, CA, United States), 2 × 101-bp mode.

Genome-Wide Diversity and Genetic Structure
To characterize genome-wide diversity, we calculated the density of SNPs across all probes using a bin size of 100 bp, allele frequencies, and the percentage of missing data. We also inferred the ratio of transitions to transversions substitutions (Ts/Tv) and expected heterozygosity using VCFtools.
To characterize population genetic diversity across all SNP loci, we estimated expected heterozygosity under Hardy-Weinberg equilibrium (He) and inbreeding coefficient (f ) using Arlequin v. 3.5 (Excoffier and Lischer, 2010). We used a hierarchical analysis of molecular variance (AMOVA) implemented in Arlequin to estimate the genetic differentiation between ecosystems (F CT ) and among populations within ecosystems (F SC ). We also analyzed the differentiation among river basins (F CT ), among populations within river basins (F SC ), and inbreeding coefficient (F IS ). Significance levels of 0.05 for each estimate were determined with 10,000 permutations.
Further, we estimated genetic diversity and analyzed genetic structure in a subset of more stringently cleaned, and linkage disequilibrium (LD) pruned loci. We used PLINK 1.9 (Chang et al., 2015) to eliminate loci with more than 15% of missing data (-geno 0.15) and a minor allele frequency (MAF) lower than 1% (-maf 0.01). The LD pruning was also performed with PLINK with sliding windows of 50 SNPs, removing associated loci (R 2 > 0.30) using the option -indep-pairwise 50 5 0.3. The list of cleaned and LD-pruned SNPs was checked for SNPs with positive selection signal, and none were identified, based on Outflank and Bayenv2 results. Therefore, this list of potentially neutral and high-quality SNPs was used to estimate genetic diversity and to perform an AMOVA. Finally, we used Bayesian clustering implemented in fastSTRUCTURE software (Raj et al., 2014) to identify the most likely number of genetic clusters among M. flexuosa samples. The analysis was performed with number of groups K ranging from 2 to 22. We assessed the most likely number of clusters supported by the data in Structure Selector (Li and Liu, 2018), with a threshold of 0.5, following (Puechmaille, 2016). To generate output files, we used CLUMPAK v. 1.1 (Kopelman et al., 2015). For this set of potentially neutral loci, we analyzed whether populations are isolated by distance using a linear regression of linearized F ST on the logarithm of geodesic geographic distance. We also performed an autocorrelation analyses using Moran's I, implemented in the software SAM (Spatial Analysis in Macroecology; Rangel et al., 2010), to understand the effect of spatial scale in genetic differentiation.

Simulation of Demographic History
To test whether demographic dynamics in the Quaternary were responsible for the distribution of allele frequencies in extant populations of M. flexuosa, we modeled the demographic history and performed simulations for 3,500 potentially neutral loci. We simulated two demographic scenarios derived from ecological niche modeling (see de Lima et al., 2014;Melo et al., 2018): "Range Expansion" (smaller range size at the LGM compared to present day) and "Range Stability" (similar range size at the LGM and present day). We also simulated the scenario of "Multiple Refugia" (retraction of savanna-like vegetation during glacial periods leading to many refugia of different effective sizes) derived from paleovegetation reconstruction and fossil record. We simulated 22 demes using the software DIYABC v. 2.1 (Cornuet et al., 2014). For model calibration, we used the demographic parameters estimated using coalescent analyses following Melo et al. (2018). Population dynamics were simulated backward (see Supplementary Figure S1 in Supplementary Appendix 2) from the present (t = 0) to t = 525 generations ago (at 21 ka, using a generation time of 40 years; Melo et al., 2018), with effective size drawn from a uniform distribution with minimum Ne = 300 and maximum Ne = 3,000 (Supplementary Figure S1 in Supplementary Appendix 2).
Mean number of alleles, mean expected heterozygosity, and F ST were inferred from 600,000 simulations and compared with the observed values and the relative fits of the models using approximate Bayesian computation (Cornuet et al., 2014), also implemented in DIYABC. We used the relative proportion of each scenario in the simulated data set closest to the observed data set (hereafter, direct approach) and the logistic regression (hereafter, logistic approach) of each scenario probability on the deviation between simulated and observed summary statistics (Cornuet et al., 2008).

Genome Scans for Selection Footprints
To test for local adaptation, we used three different genome scan approaches to identify loci under selection. First, OutFLANK (Whitlock and Lotterhos, 2015) was used based on the expected χ 2 distribution of F ST in the absence of selection. To generate the distributions, we trimmed the F ST distribution at 5% and 10% and used a minimum expected heterozygosity of 0.10 and a false discovery rate <0.05.
Secondly, we used a Bayesian framework implemented in Bayenv2 (Coop et al., 2010;Günther and Coop, 2013) to identify local adaptation by estimating linear correlations between allele frequencies and environmental variables, while controlling for relationships among populations. Four bioclimatic variables from the WorldClim Global Climate database 1 with a spatial resolution of 30" (0.93 × 0.93 = 0.86 km 2 at the equator) were obtained for the 22 populations sampled. The four bioclimatic variables were identified by low collinearity in factorial analysis with Varimax rotation. These four variables explained 90.4% of the total environmental variation among the 22 populations (Supplementary Table S2 in Supplementary Appendix 1). The selected variables were as follows: annual mean temperature, mean diurnal range, precipitation of driest month, and precipitation of the wettest quarter.
We also obtained soil data related to soil fertility from the Harmonized World Soil Database v. 1.2 2 . Varimax factorial analysis selected four soil variables that explained 77.5% of the variation among populations (Supplementary Table S3 in Supplementary Appendix 1): topsoil salinity, topsoil organic carbon, topsoil silt fraction, and topsoil reference bulk density. Loci with adaptive selection signal were selected following Günther and Coop, 2013: high Bayes factor (>100.0), i.e., the likelihood ratio of the probability of the linear relationship hypothesis (between allele frequency and environmental variable) and the null hypothesis (no linear relationship) given the data and a Spearman correlation coefficient | ρ | > 0.15. These values were in the 0.1% quantile of the distribution of the data.
For the loci under selection, we calculated the shift in allele frequency between pairs of populations, i.e., the difference between the allele frequency of the population with highest and lowest value for each climatic and soil variable and for each SNP locus. For each variable, we obtained a vector of differences in allele frequency for the loci, and then, we plotted the density function using the function density implemented in the stat package of R v. 3.6.1 (R Core Team, 2016).
Finally, SweepFinder2 v. 1.0 (Degiorgio et al., 2016) was used to detect selective sweeps and genetic hitchhiking based on deviations of a neutral null hypothesis. We applied the composite likelihood ratio (CLR) to identify possible location of recent selective sweeps in each of the 22 populations, following Nielsen, 2005. We used only contigs containing loci with evidence of adaptive selection identified in Bayenv2. We assumed unknown polarity and considered that the lower frequency allele was the derived state. For each contig, the CLR function was calculated on a grid of 122 positions along the length of the probe sequence. The selective sweep CLR test statistic was the maximum composite likelihood value optimized over all possible positions, compared to the composite likelihood of the neutral null model as calculated by the program. We simulated new neutral data sets with population scaled mutation rate (θ) and population scaled recombination rate (ρ) typical for highly heterozygous tropical forest trees estimated from genome-wide data (Silva-Junior and Grattapaglia, 2015). We used θ/bp=0.018 bp −1 and p/bp=0.0011 bp −1 , assuming homogenous rates among regions. Samples with the same number of individuals in the real populations and sequence length of scaffolds were generated under Wright-Fisher model using the software ms (Hudson, 2002) with 100 replicates per scaffold. For each replicate, multiple alignments were performed using MAFFT v. 7.130b (Katoh and Standley, 2013), and variant sites were detected across the samples using SNP sites (Page et al., 2016). The composite likelihood inference was repeated for each replicate in SweepFinder2 using the same grid size as for the real data. Composite likelihood values were collected from all the replicates across all positions along the length of the simulated sequences totaling 10.7 million data points. A cumulative histogram of the values was used to obtain the neutral composite likelihood threshold of 95% (Supplementary Figure S2 in Supplementary Appendix 2). The value of 1.22 was used as the significance cutoff. The probable locations of sweeps were taken as the maximum composite values above the significance cutoff calculated for each scaffold.

Annotation of SNP Loci Under Selection
Putative SNP loci under selection were annotated against the genomes of Embryophytes of the Phytozome v. 12.1 (Goodstein et al., 2012) database 3 . The probe sequences were used in a BlastX, and in the few cases where no Blast hit was obtained, the E. guineensis and M. flexuosa transcriptomes were used (Supplementary Table S1 in Supplementary  Appendix 3). Affected genes were analyzed with respect to their functional annotation in terms of Gene Ontology, PFAM, and PANTHER categories obtained from the Blast best hits (Ashburner et al., 2000).

SNP Detection and Genotyping
Using target sequence capture, we obtained more than 37 Gbp from the 264 individuals. A total of 366,464,311 reads were sequenced, with a mean of 832,873 reads per individual. The sequencing generated 34,569 SNPs, which after data filtering resulted in 16,262 high-quality polymorphic SNPs, with an average call rate of 93.2%. The average per-sample call rate across all 264 individuals was 93.1%. The transitions-to-transversions ratio was 2.06. The average missing genotype per loci was 6.8% in the sample of 264 individuals, and 6.9% per individual sampled (Supplementary Figure S3 in Supplementary Appendix 2). The distribution of MAF showed the expected "L" shape distribution with a larger proportion of low frequency SNPs. The vast majority of the SNPs (82.2%) had MAF < 0.15, and the median MAF was equal to 0.079. The average read depth at SNPs coordinates across samples was 25.4x ± 34.5x, and the minimum depth of aligned reads to call a heterozygous genotype for a sample was three with the median equal to 26. To perform a genetic structure analysis for putatively neutral loci, the SNPs were further cleaned to remove loci in LD. The 3,479 loci after LD pruning had no selection signal. Therefore, these neutral LD pruned SNPs were used in a Bayesian clustering method, which supported six genetic clusters (K = 6) with low admixture in populations from different ecosystems (Figure 1 and Supplementary Figures S4, S5 in Supplementary Appendix 2; Supplementary Tables S4-S6 in Supplementary Appendix 1). However, no individual was assigned to cluster 6 with coancestry coefficient Q > 0.001, being impossible to display in a map due  Table S4 in Supplementary  Appendix 1). Thus, we chose the partition of K = 5 (Figure 1 and Supplementary Tables S5, S6 in Supplementary Appendix 1).

Genome-Wide Diversity and Genetic Structure
The 22 populations showed low and significant genetic differentiation for the 16,262 SNPs (F ST = 0.070, p < 0.001) and negative and significant inbreeding coefficient (F IS = −0.583, p < 0.001), showing higher frequency of mating between unrelated individuals than expected at random, as predicted for a dioecious species. Genetic differentiation was higher for the 3,470 putatively neutral loci (F ST = 0.104, p < 0.001), and the inbreeding coefficient was significant (F IS = −0.559, p < 0.001). Hierarchical AMOVA showed low, but significant differentiation among ecosystems (Amazonian, Llanos, and Cerrado), for the entire dataset (F CT = 0.015, p < 0.001) and for the 3,470 putatively neutral, LD-pruned loci (F CT = 0.063, p < 0.001). Differentiation among populations within ecosystems was low for the entire dataset (F SC = 0.064, p < 0.001) and for the putatively neutral loci (F SC = 0.067, p < 0.001). Differentiations among river basins (F CT = 0.055, p < 0.001) and among populations within river basins (F SC = 0.028, p < 0.001) were low but significant for the entire dataset, as well as for the 3,470 putatively neutral, LD-pruned loci (F CT = 0.080, p < 0.001; F SC = 0.041, p < 0.001). Genetic differentiation among pairs of populations was significantly correlated to geographical distance (r 2 = 0.290, p < 0.001, Supplementary Figure S6A in Supplementary  Appendix 2). Autocorrelation analysis showed a significant relationship between genetic differentiation and geographical distance for populations up to 1,500 km (Supplementary Figure S6B in Supplementary Appendix 2).

Simulation of Demographic History
The demographic simulations using putatively neutral loci, to analyze the role of genetic drift in shaping the spatial distribution of M. flexuosa genetic diversity, supported a scenario of "Multiple Refugia" as the most probable predictor of the observed genetic parameters using the logistic approach (Supplementary Figure S7

Genome Scans for Selection Footprints
The OutFLANK genome scan detected 45 outlier loci (Supplementary Table S8 Supplementary Figures S10, S11 in Supplementary Appendix 2). Climatic and soil variables varied among populations (Figure 2, Supplementary Table S11 in Supplementary Appendix 1), but did not show a clinal spatial pattern (Figure 2). Most loci under selection had small differences in allele frequency between populations with different climatic and soil conditions (Figure 3).
We detected hard selective sweeps within 28 of the 41 contigs identified from Bayenv2 (Supplementary Table S12 in Supplementary Appendix 1). These 28 contigs contained 134 positions that were likely targets of selective sweeps (Supplementary Table S12 in Supplementary Appendix 1). Most populations were fixed for the same allele with selection signal (Figure 4, see also Supplementary Figures S12, S13 in Supplementary Appendix 2).

Functional Annotation of SNPs With Selection Signal
Functional annotation indicated that the 45 outlier SNPs identified by OutFLANK are located in nine genes (Supplementary Table S8 in Supplementary Appendix 1). These genes are related to diverse biological processes, including housekeeping genes such as ribonuclease, ATP synthetase, Rab GDP dissociation inhibitors, and the GTPase family.
The 17 loci correlated to climatic variables are located in 16 genes (Supplementary Table S9 Appendix 3). The 38 genes with putative signals of adaptive selection are involved in a wide spectrum of biological processes such as ATP synthesis, DNA replication, mitotic checkpoint, and transcription factors (elongator acetyltransferase complex-ELP). Additionally, we found SNPs correlated to soil variables potentially affecting magnesium (Mg) transport (Mg transporter mrs2-4-related gene) and genes related to NAD(P) activity (Supplementary Tables S9, S10 in Supplementary Appendix 1).
SweepFinder2 analysis detected 28 SNP loci with selective sweeps signal located within 25 genes (Supplementary Table S12 in Supplementary Appendix 1). As expected, the list of genes is similar to and nested within those obtained with the Bayenv2 analyses. These include genes involved in cellular metabolism and DNA and RNA synthesis. We also found genes related to UVA radiation resistance, to ion transport through membrane, and Mg transporters. A shift is defined as the difference between the allele frequencies of the population with the highest and the lowest value for each climatic and soil variable. The density function was obtained for the vector of differences in allele frequency for the loci and calculated using the function density implemented in the stat package in R 3.6.0.
than 28 loci under natural selection, indicating evidence for selection as a driver of the evolutionary success of M. flexuosa, but did not support local adaptation. Despite the climatic and soil differences between Amazonia, the Llanos, and the Cerrado, we found only slight differences in allele frequency at SNPs with selection footprints based on Bayenv 2 analysis (Figure 3), and most populations showed fixation of the same allele with selection signal (Figure 4,  Supplementary Figures S12, 13 in Supplementary Appendix 2). The small shifts in allele frequency distributions among contrasting environments for most loci suggest selective sweeps of a single allele favored across most populations. We interpret these results as evidence for adaptation to the relatively stable microhabitat of M. flexuosa. Although the species occurs across highly variable macroclimatic conditions, from humid Amazonia rainforest to seasonally dry Cerrado savanna, the species is strictly associated to watercourses and swamps, named "veredas" in Cerrado, "moriches" in the Llanos in Colombia and Venezuela or "aguajales" in the Amazonian areas in Peru. In fact, the species is dependent on water and has physiological and morphological adaptations to germinate in swampy environments (Silva et al., 2014). Taken together, we suggest that the lack of or small differentiation in allele frequencies among populations from different ecosystems is due to the high specialization of M. flexuosa to swampy microhabitats, where selective sweeps maintain the same favorable alleles, despite the strong macroclimatic differences.
Bayesian clustering (Figure 1) showed differentiation among populations from different ecosystems in putatively neutral SNPs, but with some admixture between Amazonia and Cerrado, especially in populations from Araguaia/Tocantins, North Atlantic, and Caribbean river basins. In addition, AMOVA evidenced significant, but low differentiation among populations from different ecosystems, and isolation-by-distance is an important driver of genetic differentiation. Differentiation caused by genetic drift not only supports our hypothesis of homogenization among ecosystems due to selective sweeps, but also raises the hypothesis of migration spreading favorable alleles among populations leading to allele fixation (Yeaman and Whitlock, 2011;Savolainen et al., 2013).
Resilience to climate change can ensure evolutionary success (e.g., Whitlock and Barton, 1997). Although highly specialized to swampy microhabitats, M. flexuosa is resilient in the face of climatic change ensuring its evolutionary capacity. Suitable habitats for M. flexuosa may have been available during the Quaternary glacial cycling, mainly in the Amazon basin, and likely served as sources of migrants to colonize more unstable areas in Central Brazil during interglacial phases (de Lima et al., 2014;Melo et al., 2018). The demographic dynamics may have spread favorable mutations throughout populations leading to allele sharing of adaptive loci among populations. Similar patterns in allele frequency at adaptive loci were found in other widespread Neotropical tree species . Moreover, despite the macroclimatic differences among populations from different ecosystems, there is a lack of strong latitudinal cline variation in climatic variables in the Neotropics (Figure 2), which may preclude the detection of adaptive signals in many loci, even in widespread species such as M. flexuosa. In fact, we identified adaptive selection correlated with mean annual temperature and mean diurnal range in temperature. Despite variation among populations, these variables show no clear geographic trend (Figure 2). The lack of strong latitudinal clines may explain the contrasting results compared to temperate species, for which adaptive selection has been reported for traits with strong latitudinal clines, such as photoperiod response and temperature response-related traits (e.g., Parchman et al., 2012;Yeaman et al., 2014;Hornoy et al., 2015). Top soil organic carbon and salinity have small differences among populations, and although top soil silt fraction and bulk density show larger differences among populations, we identified a patchy distribution among ecosystems, which may further mask the detection of local adaptive selection in traits related to soil adaptation. Nevertheless, all loci with adaptive signal are correlated with these soil variables. In addition, because most adaptive traits have quantitative inheritance (Fisher, 1930;Wright, 1935), the response of M. flexuosa to environmental variation may be polygenic, and thereby selection may cause only small changes in allele frequencies, causing weak selection signature on a single locus Le Corre and Kremer, 2012).

Selective Sweeps and Adaptation to Environmental Stress
Genes potentially affected by SNPs with selection signal have fundamental roles in cells and are usually highly conserved during evolution. Our analysis was based on functional annotation; thus, the real gene function must be confirmed using experimental assay such as reverse transcriptase-polymerase chain reaction to confirm gene expression. For instance, some genes highlighted in our analysis code for transduction membrane proteins (Wang et al., 2003) or are members of the ATP-binding cassette (ABC) or enzymes with catalytic activity. Adenosine 5' triphosphate (ATP) and NADPH play a central role in metabolism, diffusing throughout the cell to the sites where energy is used for catalytic activities (Bonora et al., 2012). We also found adaptive signal for the proliferating cell nuclear antigen-like complex, which plays important roles in DNA replication and repair, including translesion synthesis, error-free damage bypass, break-induced replication, mismatch repair, and chromatin assembly (Boehm et al., 2016).
We identified genes under adaptive selection related to plant growth and adaptation to stress. The endomembrane trafficking protein MONENSIN SENSITIVITY1 (MonI), which is an important requirement for protein trafficking in cells and plant growth in Arabidopsis thaliana (Cui et al., 2014), was significantly correlated to variation in mean temperature diurnal range (Figure 4). The Mg transmembrane transport MRS2-4-related gene, essential for Mg homeostasis and the ability to adapt to a wide range of environmental conditions in A. thaliana (Oda et al., 2016), was correlated to top soil bulk density (Figure 4). This correlation indicates that M. flexuosa is adapted to variable and potentially stressful environments, enabling it to persist in adverse conditions.
We also found genes related to drought stress, such as the Rab scort protein geranyltransferase component, which is involved in abscisic acid and auxin signaling in A. thaliana (Johnson et al., 2005) and was correlated to mean temperature diurnal range (Figure 4). Interestingly, we found correlations with bioclimatic variables related to temperature, but not with precipitation, most likely because water via swampy habitat preference is required for M. flexuosa establishment. Thus, temperature variation among populations may impose higher constraints on the success of M. flexuosa.
It is important to note that populations from different ecosystems displayed the same fixed allele for these loci (Figure 4), indicating the role of selective sweeps shaping adaptation and spatial patterns in genetic diversity in M. flexuosa. On the other hand, we found weak evidence of local adaptation in different ecosystems, as evidence by few loci with exclusive alleles in just one ecosystem (Supplementary Figures S11, 12 in Supplementary Appendix 2). However, these loci potentially affect genes related to primary metabolism and are likely not directly related to local adaptation to climatic or soil conditions. Instead, they are most likely in LD with genes under selection.
Considering the number of loci scrutinized (16,262) and the transcriptome size of M. flexuosa (46,600 genes, Sarah et al., 2017), the number of loci detected with positive selection signal was low. It is important to note that the probe set did not cover all genes in M. flexuosa genome; thus, the number of loci that can be potentially detected is much lower than that in the transcriptome. It should be noted that the available methods to detect adaptive selection are based on the assumption that positive selection on a mutation leads to hard selective sweeps and therefore have little power to detect other types of selection such as background and balancing selection (Nielsen, 2005;Przeworski et al., 2005;Messer and Petrov, 2013). Thus, it is possible that our methodology allowed detection of only a fraction of loci under selection.

Low Neutral Differentiation Among Ecosystems
The putatively neutral SNPs showed low but significant genetic differentiation among populations across (F CT = 0.063) and within ecosystems (F SC = 0.064). In addition, the Bayesian clustering showed some admixture among populations, especially from Amazonia and Cerrado. These results contrast with the higher differentiation for microsatellite loci (F CT = 0.095; F SC = 0.137), most likely due to higher mutation rates at microsatellite loci, leading to higher heterozygosity and differentiation. While SNPs are biallelic and evolve under an infinite sites model, microsatellites are multiallelic loci and evolve under stepwise and two-phase mutation models (Valdes et al., 1993;Di Rienzo et al., 1994). However, our findings showed similar patterns of spatial distribution of genetic clusters based on SNPs and microsatellites (Melo et al., 2018), with low admixture among ecosystems, suggesting an important role for genetic drift in shaping the spatial distribution of genetic diversity in M. flexuosa. Further, microsatellites, chloroplast sequences, and putatively neutral SNPs recovered a demographic scenario of multiple refugia present during the last glacial maximum (LGM, 26-19 kya;de Lima et al., 2014;Melo et al., 2018), reinforcing the hypothesis of the role of Quaternary climate change influencing the distribution of genetic diversity in M. flexuosa. Given the high fragmentation across current landscapes, it is likely that the genetic structure of M. flexuosa between Amazonia and Cerrado will remain in the future because of low gene flow among populations.
We also found higher frequency of mating between unrelated individuals than expected at random (negative inbreeding coefficient). It is important to note that M. flexuosa is a dioecious species, which therefore requires outcrossing. However, the high frequency of heterozygotes may also be the outcome of genotyping errors, leading to an overestimation of heterozygous genotypes (Chen et al., 2017). We believe this is unlikely here because we used standard procedures for SNP filtering (see Silva-Junior et al., 2018) with a minimum depth of aligned reads of three (median 26). Coverage depth of 1-2x has been found sufficient for accurate allele calls (Sims et al., 2014).

CONCLUSION
Using target sequence capture SNP genotyping to generate data for over 16,000 SNPs, we unravel the evolutionary history of a hyperdominant palm species. Our findings suggest that selection together with migration led to the spread of alleles with higher fitness among populations from different ecosystems and promoted the evolutionary success of the widespread, highly abundant Neotropical palm M. flexuosa in swampy microhabitats.

DATA AVAILABILITY STATEMENT
Data and additional supporting information may be found in the online version of this article as supporting information. DNA datasets are available in GeneBank database (access number: BioProject ID PRJNA588981).

AUTHOR CONTRIBUTIONS
RC and CB conceived the experiment. RC and CB funded the work. WM conducted the experiment. WM, LV, EN, and RC performed the analysis including assembling, annotation, and curated the data. RC wrote the first draft of the manuscript. All co-authors contributed to the final version of the manuscript.

FUNDING
Field work and genetic data acquisition were supported by competitive grants from CNPq to RC and CB (PVE/CNPq project no. 407310/2013-4) and Biodiversity and Ecosystem Services in a Changing Climate (BECC) Strategic Research Area at the University of Gothenburg, to RC (Rede Cerrado CNPq/PPBio project no. 457406/2012-7), and CAPES/PROCAD (project no. 88881.068425/2014-01). WM was supported by a Ph.D. FAPEG fellowship and LV received a CAPES fellowship. RC has been supported by productivity grants from CNPq, which we gratefully acknowledge.