Spatial and Ecological Drivers of Genetic Structure in Greek Populations of Alkanna tinctoria (Boraginaceae), a Polyploid Medicinal Herb

Background and Aims: Quantifying genetic variation is fundamental to understand a species’ demographic trajectory and its ability to adapt to future changes. In comparison with diploids, however, genetic variation and factors fostering genetic divergence remain poorly studied in polyploids due to analytical challenges. Here, by employing a ploidy-aware framework, we investigated the genetic structure and its determinants in polyploid Alkanna tinctoria (Boraginaceae), an ancient medicinal herb that is the source of bioactive compounds known as alkannin and shikonin (A/S). From a practical perspective, such investigation can inform biodiversity management strategies. Methods: We collected 14 populations of A. tinctoria within its main distribution range in Greece and genotyped them using restriction site-associated DNA sequencing. In addition, we included two populations of A. sieberi. By using a ploidy-aware genotype calling based on likelihoods, we generated a dataset of 16,107 high-quality SNPs. Classical and model-based analysis was done to characterize the genetic structure within and between the sampled populations, complemented by genome size measurements and chromosomal counts. Finally, to reveal the drivers of genetic structure, we searched for associations between allele frequencies and spatial and climatic variables. Key Results: We found support for a marked regional structure in A. tinctoria along a latitudinal gradient in line with phytogeographic divisions. Several analyses identified interspecific admixture affecting both mainland and island populations. Modeling of spatial and climatic variables further demonstrated a larger contribution of neutral processes and a lesser albeit significant role of selection in shaping the observed genetic structure in A. tinctoria. Conclusion: Current findings provide evidence of strong genetic structure in A. tinctoria mainly driven by neutral processes. The revealed natural genomic variation in Greek Alkanna can be used to further predict variation in A/S production, whereas our bioinformatics approach should prove useful for the study of other non-model polyploid species.


INTRODUCTION
Polyploids are very frequent in flowering plants as at least 35% of extant species are of recent polyploid origin (Wood et al., 2009). Despite their high abundance in nature, economic, esthetic, and medicinal importance, polyploids are still rarely investigated in population genomics (van de Peer et al., 2017). One of the reasons is that such investigations remain challenging, due to uncertainty in estimating allele copy number (i.e., dosage) and, therefore, to accurately assess genotype and allele frequencies (Meirmans and van Tienderen, 2013;Dufresne et al., 2014;Meirmans et al., 2018). In addition, the genome complexity of polyploids is often exacerbated by heterogeneous inheritance patterns across chromosomes, ranging from fully tetrasomic to disomic inheritance (Dufresne et al., 2014;Meirmans et al., 2018).
Genotype likelihood-based methods (as implemented, e.g., in ANGSD; Nielsen et al., 2011;Korneliussen et al., 2014) that are accounting for uncertainties regarding genotype calling have been shown to greatly improve accuracy of population genetics parameters estimated from low to medium coverage NGS data in diploid species (Warmuth and Ellegren, 2019). Recently, an accurate Bayesian method for estimating polyploid genotypes at biallelic single-nucleotide polymorphisms (SNPs) based on likelihoods has been developed (Blischak et al., 2016(Blischak et al., , 2018 but has not yet been widely applied (but see, e.g., Brandrud et al., 2020). Restriction site-associated DNA sequencing (RAD-seq; Baird et al., 2008) is a popular method to generate NGS data and investigate genetic variation and population structure, especially for species not benefiting from an available suitable reference genome (Rochette and Catchen, 2017). However, only a limited number of RAD-seq studies have been employed to investigate genetic variation in non-model polyploid species (Zohren et al., 2016;Brandrud et al., 2017;Záveská et al., 2019;Wagner et al., 2020;Wang et al., 2021).
Genetic variation is a fundamental determinant of the capacity of populations, species, and ecosystems to persist and adapt in the face of environmental change (Hughes et al., 2008). More than a single gene pool often exists across a species' range within a spatial and/or ecological arrangement (Hartl and Clark, 1997;Jones and Wang, 2012). The intensity of gene flow between populations, together with the potential for divergent selection and neutral processes, including drift, is largely defining the degree of population genetic structure (Papadopulos et al., 2014). Geographically, limited dispersal among individuals produces a spatial genetic structure called isolation by distance (IBD; Vekemans and Hardy, 2004). IBD is quantified as the strength of correlation between genetic dissimilarity and geographic distance (Wright, 1943). However, an increased divergence between populations can be also observed as a result of ecologically-based divergent selection against maladapted migrants and hybrids, which, in turn, reduces effective gene flow (Nosil et al., 2009), commonly referred as ecologically driven isolation by environment (IBE; Wang and Bradburd, 2014). The distribution of intraspecific genetic variation often reflects geographical proximity, as well as the spatial heterogeneity in the ecological landscape suggesting that functional traits may show a similar pattern, mirroring selective forces (Fournier-Level et al., 2011). Hence, investigating the genetic structure of populations together with its drivers is key for understanding the evolutionary history of a group, and predicting its future, with important implications for biodiversity management and conservation.
In the current study, we used a ploidy-aware likelihoodbased genotyping to quantify genetic diversity and to investigate the population structure of the polyploid plant Alkanna tinctoria Tausch (Boraginaceae) across most of its distribution area in mainland Greece. In addition, as a potential outgroup species, we included the closely related A. sieberi A.DC (Chacón et al., 2019), a rare endemic species on Crete Island. A. tinctoria is a tetraploid perennial herb mainly distributed in Greece and other Mediterranean countries (Dimopoulos et al., 2013;Strid, 2016). However, in countries, such as Slovakia (Eliáš et al., 2015), Bulgaria (Petrova and Vladimirov, 2009), and Hungary (Király, 2007), A. tinctoria is considered as very rare. The species is of great interest to biochemists and pharmacists as its roots biosynthesize specialized secondary metabolites known as alkannin or its enantiomer shikonin (A/S; Papageorgiou et al., 1999). The red extracts from roots have a long history of curative usage, dating back to the period of the Greek physician Hippocrates (Papageorgiou et al., 1999;Assimopoulou et al., 2011). Several modern investigations have shown the effectiveness of A/S in wound healing, and as antibacterial and anticarcinogenic agents (Malik et al., 2016;Yan et al., 2019;Zhang et al., 2019). Although the bioactivity of A/S from A. tinctoria and related species has been much explored, its natural genetic diversity remains poorly investigated (Ahmad et al., 2019). We employed ploidyaware genotyping with RAD-seq, supported with cytogenetic investigations, to answer the following questions: (1) Are there intra-and interspecific ploidy differences? Do the genotypic data suggest auto-or allotetraploid origin for A. tinctoria? (2) How much genetic diversity is present in A. tinctoria in Greece and how is this diversity distributed? (3) What are the factors shaping the observed genetic structure? Given the medical relevance and historical importance of A. tinctoria our population genomic and cytogenetic appraisal serves as a valuable foundation for understanding its genetic and phenotypic variability.

Study Species
Herein, we focus on two medicinally important species within the tribe Lithospermeae (Boraginaceae): A. tinctoria with a wide range in Greece and A. sieberi, as the potential outgroup species, with a very restricted range (local endemic to Crete). On Crete, the two species have partly overlapping distributions (Strid, 2016). Both species commonly occur at low elevation (<1,000 m above sea level) and occupy similar habitats, such as rocky, sandy, disturbed, and other open places in phrygana, xeric grasslands and coastal habitats (Strid, 2016 Table S1) were collected from 14 localities of A. tinctoria across three mainland regions of Greece (North, Central, and South), as well as from additionally two A. sieberi localities found on the island of Crete (AT27, AT28; Figure 1). These areas roughly reflect four distinct phytogeographical regions of Greece, i.e., North-East -NE, East-Central -EC, Sterea Ellas -StE, and Kriti and Karpathos -KK (Dimopoulos et al., 2016;Strid, 2016 Table S1).
Total DNA was extracted from 40 to 50 mg of powdered leaf tissue using the method described in Lefort and Douglas (1999). DNA extracts were further cleaned and purified using the DNeasy PowerClean CleanUp Kit (Qiagen, Hilden, Germany) following manufacturers' instructions.

Cytogenetic Analysis
Fresh tissue of A. tinctoria was collected from three geographic regions of Greece (North, South, and Central Greece), while entire plants of A. sieberi individuals were collected from two sampling localities (Supplementary Table S1) and were grown under greenhouse conditions at the Austrian Institute of Technology, Tulln, Austria. A total of ten individuals of A. tinctoria were used for genome size assessment, plus five individuals of the outgroup species A. sieberi (Table 1). Fresh leaf tissue was subjected to flow cytometry using about 25 mg of fresh leaf tissue, which was co-chopped together with internal standards (Galbraith et al., 1983), either Pisum sativum "Kleine Rheinländerin" (4.42 pg./1C; Greilhuber and Ebert, 1994) or Capsicum flexuosum (7.44 pg./1C; unpublished) in Otto's isolation buffer (Otto et al., 1981). The isolate was filtered through a nylon mesh (30-50 μm), treated with RNase in the 37°C water bath for 30 min, stained with a propidium iodide containing Otto's staining buffer (Otto et al., 1981) and stored in the refrigerator until measurement. This was performed on CyFlow flow cytometers (Sysmex/Partec, Muenster, Germany) equipped

A B
FIGURE 1 | Sampling localities of Alkanna and its genetic structure in Greece. Alkanna tinctoria was sampled across Greece, and A. sieberi on Crete island (localities AT27 and AT28). Geographical coordinates of sampling locations are provided in Supplementary Table S1. Ancestry proportions inferred with TESS3 averaged for sampling locality are plotted as pie charts. Inset shows ancestry proportions from STRUCTURE as vertical bars where each vertical bar represents an individual and each color represents a genetic cluster. (A) K = 3, and (B) K = 5. Genetic structure estimates are based on RAD-seq data. For TESS3 and STRUCTURE, 7,935 and 1,000 SNPs, respectively, present across at least 90% individuals were used. The map of Greece was extracted from the WorldClim database (Hijmans et al., 2005).
Frontiers in Plant Science | www.frontiersin.org with lasers (532 nm; 30 mW Partec, Muenster, Germany, or 100 mW Cobolt Samba, Cobolt AB, Solna, Sweden). The 1C values were calculated based on a linear relationship between the G1/0 mean peak positions on the fluorescence axis of the standard and the objects. Chromosome numbers were analyzed for two individuals of A. tinctoria originating from Northern Greek locality AT03. Actively growing root meristems were harvested from young plants, pretreated with 0.002 M 8-hydoxyquinoline in darkness for 2 h at room temperature and 2 h at 4°C, and fixed in 3:1 ethanol:acetic acid. Briefly, root meristems were hydrolyzed in 5 N HCl for 20 min, washed with water, and stained with Schiff 's reagent (Merck, Darmstadt, Germany) for 1 h at room temperature in darkness. Root meristems were then squashed in a drop of 60% acetic acid under a cover slip, and the preparations were analyzed using Axioplan light microscope (Carol Zeiss, Vienna, Austria) equipped with a CCD camera. Images were captured using ZEN software (Carl Zeiss, Vienna, Austria).

Library Preparation and RAD-seq
Preparation of the RAD-seq libraries for a total of 148 accessions (n = 8-11 per sampling locality, average 9.3; Table 2;  Supplementary Table S1) was carried out following the protocol described in Paun et al. (2016) Catchen et al., 2013), rescuing barcodes and restriction sites (−r) with a maximum of one mismatch, and discarding reads with low-quality scores (−q) and any uncalled bases (−c). As no reference genome is available to date for Alkanna, we first assembled the RAD loci with Stacks. To optimize the de novo assembly, we followed Heckenhauer et al. (2018) and varied the value of minimum 1 http://vbcf.ac.at/ngs/ reads required to create a stack (m), the maximum mismatches allowed between stacks within individuals (M), and the number of mismatches allowed between sample loci (n) to maximize the number of obtained polymorphic loci, including a maximum of ten SNPs, and that were recovered for at least 80% of individuals. The assembly parameters were optimized on a subset of 60 individuals. The final assembly parameters chosen based on the criteria described above to produce the final catalogue were m = 3, M = 1, and n = 4. The RAD-seq loci from the catalogue have been extracted with the EXPORT_SQL.pl module of Stacks. They have been further used as individual contigs to generate a synthetic reference in FASTA format. Following the pipeline in Brandrud et al. (2020), the raw reads were mapped back to this synthetic reference using bowtie2 v.2.3.4.1 (Langmead and Salzberg, 2012) with default settings. Further processing (conversion to BAM, sorting by reference coordinates, and addition of read group information) was done using Picard v.2.20.6, 2 and realignment around indels was performed using the Genome Analysis Toolkit (GATK v.3.81;McKenna et al., 2010).
We further inferred tetraploid genotypes with the GATKbased approach implemented in EBG v.0.3.2-alpha (Empirical Bayes Genotyping in Polyploids; Blischak et al., 2018). This pipeline is able to refine polyploid genotypes starting from flat priors and has been shown to result in negligible levels of genotype errors for both auto-and allopolyploids for intermediate to high coverages. Briefly, as recommended we employed the GATK UnifiedGenotyper for variant calling and genotype estimations with ploidy set to four. Further, we filtered the GATK-generated tetraploid VCF file using filter-VCF.R (Blischak et al., 2018) to retain only biallelic SNPs present in at least 90% of individuals with a minimum quality threshold of 100 and a minimum read depth of five. The resulting polymorphic position was then used to extract base quality scores from the original BAM files using the mpileup command of SAMtools v.1.6 (Li et al., 2009;Li, 2011), to extract read counts and to estimate per site error rates with read-counts-from-VCF.py and, respectively, per-locus-err.py scripts provided with EBG (Blischak et al., 2018). Finally, we inferred final genotypes as alternative allele counts using the GATK model implemented in EBG.

Genetic Diversity and Population Structure of Greek Alkanna
The summary statistics for each sampling locality was assessed by estimating within-population genetic diversity [Hs, analog to the expected heterozygosity (He)] and inbreeding coefficient (G IS , analog to F IS ) in GENODIVE v.3.0 (Meirmans, 2020). In addition, nucleotide diversity (π) was estimated with a custom script (Supplementary Notes S1). Finally, we calculated the amount of private alleles for each sampling locality as well as for the genetic groups identified by clustering-based approaches. Private alleles calculations were corrected for a small sample size using the factor [(n + 1)/n] following Brandrud et al. (2020), where n is the number of individuals in the relevant group.
To further assess genetic relationships among the sampled individuals, we first estimated a pairwise relatedness with a method of moments as implemented in PolyRelatedness v.1.8 (Huang et al., 2016). This was then displayed as a heatmap of coancestry with the heatmap.2 function in the R package Gplots v.3.1.1 (Warnes et al., 2009). Next, we carried out a principal component analysis (PCA) based on the covariance of allele frequencies between individuals using GENODIVE (v.3.0; Meirmans, 2020). The first three PCs were visualized as scatter plots using the R package Ggplot2 v.3.2.1 (Wickham, 2016). Only SNPs with no missing data were used for the aforementioned analyses (Supplementary Table S2).
We further assigned individuals to genetic groups using two clustering-based approaches: TESS3 v.1.1.0 implemented in the R package tess3r (Caye et al., 2016) andSTRUCTURE v.2.3.4 (Pritchard et al., 2000). For TESS3, we used a dataset, including a single SNP per RAD-seq locus (in total 7,935 SNPs) with maximum 10% missing data, whereas 1,000 SNPs were randomly selected using a custom script (Supplementary Notes S1) from this dataset for STRUCTURE analyses. TESS3 is a spatially explicit clustering program, and therefore, it is more robust when geographic discontinuities are present in the sampling. Furthermore, TESS3 can accommodate data with thousands of SNPs and remains faster as compared to STRUCTURE. Importantly, like STRUCTURE, TESS3 can analyze polyploid data, which is otherwise currently not possible in other fast algorithms, such as FastStructure (Raj et al., 2014) or Admixture (Alexander and Lange, 2011). In TESS3, the clustering was performed from K1 to K10 with ten replicates of each K. The final choice of K was based on cross validation criteria. Our objective using STRUCTURE was to confirm the individual assignment inferred by TESS3. We ran STRUCTURE with 20 replicates for each K value ranging from K = 1 to K = 6, with a burnin period of 50,000 and 100,000 steps. The optimal number of ancestral populations from STRUCTURE results was determined using Evanno's ΔK method (Evanno et al., 2005) as implemented in STRUCTURE HARVESTER (Earl and vonHoldt, 2012), and the replicate runs were summarized using CLUMPAK online server 3 (Kopelman et al., 2015).
To quantify the strength of genetic differentiation among all sampling localities as well as among the K groups identified as above, we estimated pairwise F ST with GENODIVE (v.3.0). For F ST estimation between K groups, we only included those individuals where the individual assignment to respective groups was greater than 0.8 at the optimal value of K.

Spatial vs. Environmental Contribution to Genomic Divergence
To disentangle the roles of dispersal limitation vs. environmental drivers in shaping the genomic differentiation of A. tinctoria, we first identified the most important climatic variables for our target species using a gradient forest (GF) analysis in the R package gradientForest v.0.1-17 (Ellis et al., 2012). As a nonparametric, machine learning regression tree approach, GF evaluates the associations between genetic variants and other variables by partitioning allele frequency data at split values along the covariate gradients and defining the explained variability as split importance values. The overall importance of each covariate can then be assessed from GF plots which show the mean importance of the covariate weighted by R 2 (Bay et al., 2018;Jia et al., 2020;Zhang et al., 2020). For GF analysis, we retrieved 19 bioclimatic variables for our sampling locations from the WorldClim database (Hijmans et al., 2005) at a spatial resolution of 2.5 arc-minutes. To avoid collinearity among covariates across our data points, the Pearson correlation coefficient between the covariates was estimated using the cor function implemented in the R package stats (R Core Team, 2020). We retained only one variable from all pairs having absolute correlation values greater than 0.9 using the findcorrelation command implemented in the R package caret v 6.0-86 (Kuhn et al., 2016). Finally, a total of seven covariates were used in the GF analysis. The three most important covariates identified by GF were then used in downstream analysis.
Next, we performed simple and partial Mantel analyses to test whether geographical or environmental processes contributed to the observed genome-wide divergence among A. tinctoria populations. The analyses were performed on pairwise genetic distances among populations as linearized F ST expressed as (F ST /1 − F ST ) and log-transformed geographic or environmental distances. Partial Mantel test was performed by controlling the respective geographical or environmental distance matrices. These analyses were carried out using GENODIVE with 999 permutations.
To further quantify the contribution of IBD and IBE, we employed a multiple matrix regression with randomization (MMRR; Wang, 2013) supplemented with a commonality analysis (CA) as implemented in Cuevas et al. (2020). MMRR is a multiple regression method where distance matrices are used as input to assess the contribution of multiple predictor variables (i.e., geography and environment) to patterns of genetic divergence. The fit of the model is assessed by R 2 (coefficient of determination), where the statistical significance of the effect of the predictor variable (βn) and R 2 are estimated by randomized permutations of columns and rows of the dependent variable distance matrix (i.e., genetic distances). However, this approach cannot take into account the multi-collinearity among predictors which can affect the estimation of regression coefficients (βn), R 2 and values of p (Prunier et al., 2015). To overcome this limitation, we followed the implementation of CA suggested by Prunier et al. (2015) and Cuevas et al. (2020). This analysis decomposes the model coefficients into unique and common variance components. The unique effect represents the sole contribution of the predictor, while the common component quantifies the proportion of variances contributed by the collinearity among two or more predictor variables. Here, CA allows estimating the unique and common effect of geographical and environmental distances on genomic divergence (F ST ) while accounting for collinearity among explanatory variables. We ran MMRR using pairwise F ST as a response variable and geographical and environmental distances as predictor variables (model = F ST ~ GEO + ENV) with 1,000 permutations. Variance partitioning analysis was then performed using CA to quantify the common and unique effect of GEO and ENV using a R-script available at https://datadryad.org/stash/dataset/doi:10.5061/dryad. q573n5th7 (Cuevas et al., 2020).
In addition, we identified loci associated with environmental variables using BayPass v.2.2 (Gautier, 2015). In a nutshell, BayPass can identify loci under selection using an F ST -like statistic (XtX) that explicitly accounts for the confounding effect of population structure. To do so, BayPass first estimates the covariance matrix of population-level allele frequencies (Ω matrix in Gautier, 2015), which can capture even complex demographic histories. Then BayPass detects outlier loci among variants exhibiting the highest XtX values. BayPass can also identify variants associated with population-specific covariates, including climate or phenotypic data (Gautier et al., 2018;Leroy et al., 2020). The program estimates Bayesian factors (BF) by comparing models with and without covariates. Neutral calibrations of the XtX and BF metrics were performed using pseudo-observed datasets (PODS), following the best practices for BayPass (Gautier, 2015). PODS were generated for 100,000 SNPs using BayPass_utils.R following the recommendations of the BayPass manual. We then used conservative (XtX > 99% quantiles and BF > 99.9%) and very conservative quantiles for this neutral simulation (XtX > 99.9% quantiles and BF > 99.9%) as thresholds to define outliers in the real dataset. All analyses were run using the standard covariate model (STD) and default parameters, and with scaled covariates, as suggested by Gautier (2015).

Genome Size and Chromosome Counts
The genome size of A. tinctoria was found to vary from 1.21 to 1.28 pg. (1C-value), whereas greater variation was observed among the A. sieberi samples, with sizes from 1.08 to 1.21 pg. (Table 1). Neither species showed apparent classes of 1C values suggestive of multiple ploidies, nor were there substantial differences in 1C values of geographic relevance. This suggests that both species might share the same ploidy. These results are the first reported for A. tinctoria and A. sieberi. Within the genus, the only yet published species' genome size is 11.10 pg./1C in A. leiocarpa (Bou Dagher-Kharrat et al., 2013), which is the approximately 10-fold compared to our findings in A. sieberi and A. tinctoria. Finally, chromosome counts for two A. tinctoria samples were performed and were consistent with 2n = 30 chromosomes in this species (Supplementary Figure S1).

Polyploid Genotype Calling Supports Tetrasomic Inheritance
After demultiplexing, RAD-seq data included on average 1.2 (SD ± 0.5) million high-quality reads per individual.
The optimized de novo assembly resulted in a catalogue of 33,126 polymorphic RAD tags of a length of 94 bp. On average, 44.0% (SD ± 8.5%) of the reads mapped back to these contigs with a final average coverage across individuals of 16× (SD ± 6×). Filtering of variants obtained from the EBG pipeline based on missing data, and minor allele frequency yielded 16,107 high-quality SNPs. The distribution of tetraploid genotypic frequencies against their allelic frequencies (estimated from EBG genotypes) in A. tinctoria showed the modes of the distribution of heterozygous genotypes (i.e., Aaaa, AAaa, and AAAa) to be of similar heights in agreement with the hypothesis of tetratomic inheritance (Figure 2; Arnold et al., 2015;Lloyd and Bomblies, 2016; see also section "Discussion" below).

Higher Diversity and Lower Divergence in Northern Greece
To investigate the level and distribution of genetic variation across the sampled localities, we estimated summary statistics. The genetic diversity estimates (π and He) revealed that the northern sampling localities harbor comparatively higher amount of diversity than the other sampled localities of A. tinctoria ( Table 2), possibly due to a larger effective population size (Ne) and more limited drift effects in the North. Nevertheless, on average diversity estimates per locality differed only slightly between the two species suggesting similar breeding systems and dispersal capacities within populations. Inbreeding coefficients (F IS ) estimated for the sampling localities were all relatively close to zero ( More surprisingly, genetic divergence between A. sieberi localities was also high (F ST = 0.333); however, one of the A. sieberi localities (AT27) showed lower differentiation to A. tinctoria populations than to the second (AT28) locality of A. sieberi (Supplementary Table S3).

North-South Population Structure in Greek Alkanna
The relatedness-based heatmap allows to identify three main genetic groups (Figure 3A). Two main groups were specific of A. tinctoria hereby called North and Southern-Central. Within the latter group, the analysis allows to find evidence of substructure, both at the regional scale (Central vs. Southern) and even sometimes, more locally, between the different sampling sites (e.g., AT25; Figure 3A). The third main group consisted of the A. sieberi locality from central Crete (AT28), which showed the least relatedness to all A. tinctoria accessions ( Figure 3A). However, interestingly, all individuals from the western Cretan locality (AT27) that have been initially identified as A. sieberi showed intermediate relatedness to the individuals of A. tinctoria from Southern-Central group and to A. sieberi FIGURE 2 | Population genomic data consistent with tetrasomic inheritance. Genotype frequencies (y-axis) as function of allele frequencies (x-axis) in A. tinctoria. The inset shows raw plotting of genotype and allele frequencies of A. tinctoria. The estimates are derived from 16,107 SNPs present across all individuals. For tetrasomic inheritance, all five genotypes are expected to be observe at intermediate (0.5) allele frequency. In addition, the three middle peaks are expected to be of similar height. The pattern observed here is largely consistent with tetrasomic inheritance (see section "Discussion"). from the central Cretan locality (AT28), therefore suggesting interspecific admixture. One accession from A. tinctoria from South Greece (AT01) shows also increased co-ancestry with A. sieberi (indicated with an arrow in Figure 3A), suggesting recent gene flow between A. tinctoria and A. sieberi.
To further investigate population structure, we performed PCA (Figures 3B,C), which highlighted consistent patterns with the previously identified clusters in the coancestry heatmap. The first two PCs, which together explained 27.6% of the total variance, primarily distinguished individuals of A. sieberi from the central Cretan locality (AT28), and those of A. tinctoria from the North on one side and from the Southern-Central localities on the other. The individuals from the western Crete population (AT27), plus the accession from AT01 already identified as admixed based on the coancestry heatmap, were located between the A. tinctoria Southern-Central group and the cluster formed by A. sieberi accessions from AT28 ( Figure 3B), again consistent with admixture. The central Greek localities were well resolved from Southern Greece in particular by PC3, but this axis explains a lower proportion of the total variation (5.6%, Figure 3C).
We then used Bayesian clustering to further investigate these patterns of genetic structure. For that purpose, we used the spatially-explicit TESS3 and the non-spatially explicit, albeit widely used, STRUCTURE programs. Both the cross-validation score as implemented in TESS3 and the use of the Evanno's ΔK statistic support K = 3 as the best K value (Supplementary Figure S2), suggesting that the dataset can be best explained by three main genetic clusters. For the optimal structure (K = 3), the clustering of individuals is consistent with all previously described analyses (i.e., the presence of three genetic groups), including the likely interspecific admixed nature of AT27 ( Figure 1A). Since heatmap and PCA showed substructure within the Southern-Central group, we further evaluated the individual memberships at K values from K = 4 to K = 5. As expected, at these values of K, a further separation of Southern and Central A. tinctoria was observed (Supplementary Figure S3), whereas eastern Crete accessions of A. sieberi showed signatures of admixture between the Southern-Central group and the central Crete population at K = 4 (Supplementary Figure S3) and formed an independent group at K = 5 ( Figure 1B).
Finally, to evaluate the strength of genetic differentiation between the identified groups, we estimated F ST between the three genetic pools (K = 3) with well-assigned individuals (q > 0.80). We observed a significant genetic divergence between the three groups (p = 0.001). Genetic differentiation was highest between AT28 and mainland groups (North: 0.29 and Southern-Central: 0.31) and lowest between Southern-Central and North group (0.14).

Geographical and Environmental Influences on Genomic Divergence Within Alkanna tinctoria
To assess the role of environmental and geographical variables in the onset of genetic differentiation, we first identified the most important climate-related covariates using a random forest-based approach called gradient forest (GF). Among the seven non-collinear covariates tested with GF, the top three covariates were related to precipitation, including precipitation of coldest quarter (BIO19), precipitation of warmest quarter (BIO18), and annual precipitation (BIO12; Figure 4A). We then used the top three climatic variables identified by GF in Mantel statistics and genotype-environment association analysis. The pairwise genetic distances (F ST /1 − F ST ) between populations were strongly correlated with pairwise geographical distances (R 2 = 0.49, p = 0.001).
We also observed a milder, but still significant association between the genetic and the environmental distance matrices (R 2 = 0.24, p = 0.002; Figures 4B,C). In multiple comparisons, where we controlled for either distances, the geographic distances were strongly and highly significantly associated with genetic distances (Mantel's r = 0.63, p = 0.001), whereas the association with environmental distances was weaker albeit significant (Mantel's r = 0.31, p = 0.020). The multivariate model (MMRR), including both predictors (F ST ~ GEO + ENV), explained 55% (R 2 = 0.55) of the genomic variation within the Greek A. tinctoria (Table 3). Consistent with the results from the partial Mantel analysis, in MMRR, the geographic distances showed a significant and a higher β weight (β GEO = 0.037, p = 0.002; Table 3), explaining 75% of total variation in the model. A considerable proportion of this total variability was assigned as unique (unique 54%; common 21%; Table 3). The environmental distances had a lower β weight (β ENV = 0.025, p = 0.003), explaining 45% of the variation in the model (unique 24%; Table 3). Taken together, these results suggest that IBD has a stronger contribution to the overall genetic differentiation among A. tinctoria populations as compared to IBE.

Genotype-Environment Associations
Although the overall population structure appears more explained by migration-drift equilibrium, some loci could be affected by selection either directly, or through linkage. We therefore performed a F ST outlier scan and a genotype-environment association analysis. To identify non-neutral markers, we performed a classical genome scan using the XtX statistics, an F ST analog which explicitly accounts for the confounding effect of population structure, including IBD. Using BayPass, we identified 107 SNPs (0.7% of all SNPs) with conservative criteria, including 17 (0.11%) using a more conservative criterion.
Then, under BayPass, we assessed associations of three top bioclimatic variables and identified between up to 0.14% of the SNPs with BF > 99.9% depending on the variable. Interestingly, among the 107 previously detected F ST outliers, several SNPs were associated with precipitation: Six SNPs were associated with annual precipitation (BIO12; BF > 99.9%, XtX > 99%), including one SNP among the 17 strongest outliers. Similarly, six and four SNPs were associated with precipitation of warmest quarter (BIO18) and of coldest quarter (BIO19), including three and one SNPs among the 17 strongest outliers, respectively (Supplementary Figure S4). Among all these outliers, three SNPs were associated with both BIO12 and BIO19 (Supplementary Figure S4). Clinal variation of allele frequencies across each clinal gradient follows linear trends as shown in Supplementary Figure S5.

DISCUSSION
RAD-seq is being commonly employed to study the intra-and interspecific genomic divergence, to investigate the levels of genomic diversity and to resolve phylogenomic relationships of diverged lineages (Paun et al., 2016;Rochette and Catchen, 2017;Brandrud et al., 2020). However, the use of RAD-seq in non-model polyploid species is still challenging due to difficulties of inferring allele copy numbers (Clevenger et al., 2015;Wagner et al., 2020). Thus, only recently, RAD-seq has started to be used for analyzing population structure of non-model polyploid plant species (Zohren et al., 2016;Brandrud et al., 2017). Here, using an appropriate ploidy-aware genotype calling approach (Blischak et al., 2018;Brandrud et al., 2020), we investigated the main drivers of genomic diversity and divergence of Greek populations of A. tinctoria -a species of particular interest in the traditional and modern pharmacopeia (Papageorgiou et al., 1999;Assimopoulou et al., 2011), jointly analyzed with two populations of a potential closely related outgroup species, A. sieberi. Overall, we detect a marked regional structure along a latitudinal gradient and highlight an important role for physical distance between the investigated sites in shaping genetic diversity. However, we also identified loci with variation related to ecological pressures, in particular related to precipitation, suggesting that both levels can be at play in shaping the population structure in A. tinctoria.

Genome Size Variation and Fixed Dysploidy at the Tetraploid Level
Considering all available evidence, including the newly described chromosome number (2n = 30), previous microsatellite analyses where up to four alleles were observed per individual (Ahmad et al., 2019), and previously reported base chromosome numbers in the genus (x = 7 most frequently reported, but also x = 10, 11 and the derived secondary base number of x 2 = 15; Coppi et al., 2006;Weigend et al., 2016), we inferred that A. tinctoria is a dysploid at the tetraploid level (4x + 2) derived from lineages/species with x = 7. Indeed, x = 7 is one of the widespread in Lithospermae (Weigend et al., 2016). A tetraploid state is also strongly supported by the present genomic analyses, in particular by the similar frequencies of Aaaa, AAaa, and AAAa genotypes when the data are scored as tetraploid (Figure 2). On the other hand, the relatively consistent genome size estimates within A. tinctoria, and the lack of substantial genome size differences between A. tinctoria and A. sieberi (Table 1) suggests that the sampled accessions of these two species represent a single ploidy. This is in agreement with previous karyological findings for A. tinctoria (Grau, 1968;Kamari and Papatsou, 1973;Luque, 1990;Markova, 1995), supporting a single ploidy across its entire distribution range. In tribe Lithospermae, several tetraploid species have been identified (Luque, 1990;Korbecka et al., 2003;Coppi et al., 2006;Kolarčik et al., 2014;Weigend et al., 2016). Moreover, both allopolyploids and autopolyploids have been described in this tribe (Luque and Valdes, 1984;Kolarčik et al., 2014Kolarčik et al., , 2015Weigend et al., 2016). In the present study, two lines of genomic evidence led us to suggest that A. tinctoria might be an autopolyploid. First, in allotetraploids, especially those with well-differentiated parents and disomic inheritance, the proportion of duplex heterozygous genotypes (e.g., AAaa) is expected to be significantly higher than other classes of heterozygous genotypes (e.g., AAAa and Aaaa). At the other end of a continuum, autotetraploids with tetrasomic inheritance are expected to show the reversed trend. In A. tinctoria, we observed that triplex heterozygous genotype class (i.e., AAAa and Aaaa together) is approximately two-fold more frequent than the duplex heterozygous genotypes (Figure 2). Second, for loci with intermediate allele frequency, allopolyploids are expected to be largely represented by AAaa genotypes because of fixed differences between divergent subgenomes (Lloyd and Bomblies, 2016). In contrast, in autopolyploids, all five genotype classes are expected to be observed around this allele frequency (Lloyd and Bomblies, 2016). When A. tinctoria genotype proportions were plotted against allele frequencies, all five classes of genotypes could be seen at intermediate allele frequency (Figure 2). This distribution is comparable to that described for autotetraploid Arabidopsis arenosa (Arnold et al., 2015) and it is expected under a tetrasomic mode of inheritance (see Figures 1C,D in Lloyd and Bomblies, 2016). However, we note that in depth cytological investigation would be required to confirm the hypothesis put forward here.

Spatial Structure in Greek Alkanna Is Associated With a Latitudinal Gradient
Genetic diversity within localities of A. tinctoria was moderate and mostly homogenous across the sampling range, with higher values in the North ( Table 2). Although not directly comparable, a previous study also reported moderate albeit similar levels of genetic diversity at inter simple repeat markers (ISSR) in three other Alkanna species (Semerdjieva et al., 2020). The relatively higher genetic diversity and lower genetic differentiation between sampling localities ( Table 2; Supplementary Table S3) observed in the North could be a result of higher connectivity of stands and larger Ne leading to lower effect of genetic drift as compared to central Greece where A. tinctoria has a much more fragmented distribution. During field visits, we observed an absence of A. tinctoria from several regions located around East-Central Greece (e.g., Lárisa, Magnisia, and Fthiotida) probably due to a change of management practices (conversion into agricultural lands), or due to expanding urbanization over natural habitats. Reduced connectivity of suitable habitats could lead to genetic isolation (Toczydlowski and Waller, 2019) and thus expose populations to greater effects of genetic drift. Consistent with an increased effect of genetic drift in more isolated localities, we observed highest amount of F ST as well as reduced genetic diversity at locality AT25 in central Greece which appeared disconnected from other populations by vast agricultural landscape. Genetic structure analyses identified two divergent A. tinctoria clusters, localized around Northern and, respectively, Southern-Central Greece, with a weaker subdivision between South and Central accessions (Figures 1, 3), consistent with a latitudinal structuring of regional genetic variation. The strength of divergence was further confirmed by a significant F ST (0.14, p = 0.001) estimated between these main genetic pools. Although investigating different regions and spatial scales, other studies also found evidence for significant genetic structure in other Alkanna species (Wolff et al., 1997;Semerdjieva et al., 2020).
Genetic structure, such as that observed in A. tinctoria, is possibly explained by the establishment of populations from multiple glacial refugia as have been reported for other Balkan and Mediterranean regions (for a review see Nieto, 2014). However, this might be a less likely explanation for the Greek A. tinctoria, as currently available phylogeographic or phylogenetic studies on plants (Lakušić et al., 2013;Jaros et al., 2018;Janković et al., 2019) from Greek part of Balkan rather provide indications for an East-West genetic differentiation (Španiel et al., 2017;Falch et al., 2019). But what might have been the forces giving rise to the North-South clustering in Greek A. tinctoria? Although not explicitly tested in the current study, the identified genetic structure in mainland localities of A. tinctoria mirrors the Greek phytogeographic zones suggested in Flora Hellenica (Dimopoulos et al., 2013(Dimopoulos et al., , 2016Strid, 2016). For example, the northern lineage of A. tinctoria belongs to the North-East (NE) phytogeographic region, whereas the majority of the southern-central localities pertain to the Sterea Ellas (StE), except two populations which were collected at the boundary between East-Central (EC) and StE (AT17 and AT19), and between Peloponnisos (Pe) and StE (AT07 and AT08). Botanical exploration and biosystematics studies have shown consistent discontinuities in species composition, richness, and endemism patterns between StE and NE (Strid, 1996(Strid, , 2016Dimopoulos et al., 2013Dimopoulos et al., , 2016. These phytogeographical regions take into account biogeography and have been mainly defined by network biogeography (Kougioumoutzis et al., 2017), or comparative floristics (Strid, 1996). Hence, these divisions reflect the paleogeographic history of those regions and dispersal limitations of organisms owing to geographic as well as ecological barriers. In the case of Greece, this could include the complex geological history of the Aegean region (e.g., fragmentation of mainland and subsequent island formation including the Aegean archipelago), topographic heterogeneity (numerous mountain massifs and mountain ranges separated by rivers, valleys, plains, and lowlands), varied climatic (descending rainfall and ascending temperature gradients from north to south), and soil features (Dimopoulos et al., 2016;Perlès, 2016). Hence, these phytogeographic regions represent a conglomerate of many factors (including geological, geographical, and environmental covariates), which might have shaped the observed genetic differentiation in A. tinctoria.

Dispersal Limitation as a Primary Source of Genetic Divergence in A. tinctoria
The high relatedness of geographically proximal individuals, as well as the above-mentioned clustering patterns (Figures 1, 3), suggest that IBD might have played a major role in shaping the observed genetic differentiation. Indeed, we inferred a significant relationship between geographical and genetic distances even after controlling for climatic covariates ( Figure 4B; Table 3). In A. tinctoria, seed dispersal is limited to very short distances (Ulrych and Szabóová, 2010), with pollen exchange likely promoting almost exclusively gene flow between localities. Although no detailed study has been carried out on pollination in A. tinctoria, existing evidence suggests that pollination is performed by bees (Ulrych and Szabóová, 2010;Dafni et al., 2012), insects with limited potential for long-distance dispersal, especially in areas with complex topography, such as Greece, where plains are separated by mountains and sea. Such a pattern could therefore be consistent with IBD, i.e., a migration-drift equilibrium hypothesis. Furthermore, the southern Balkan Peninsula has been recognized as an important glacial refugium for Mediterranean biota (Médail and Diadema, 2009), opening the possibility that A. tinctoria existed in this region for a long period. A long local history together with limited dispersal, patchy distribution and topographic barriers, could have led to the observed pattern of IBD.

Climate as a Second Source of Population Structure in A. tinctoria
Climatic variables have been implicated as strong selective factors in plants (Joshi et al., 2001). Understanding the association of climatic covariates with the observed genetic divergence can prove useful in determining a species' ability to persist against varying climatic scenarios (Jia et al., 2020). Our study identified significant associations between environmental variables and genetic differentiation after controlling for spatial covariates ( Figure 4C; Table 3). Specifically, GF analysis indicated that the genetic structure across our sampling localities is strongly associated with precipitation characteristics (Figure 4A). The Mediterranean climate is mainly characterized by hot, dry summers and cold, comparatively wet winters (Joffre et al., 1999). In this environment, the precipitation regime imposes strong constraints on plants and can act as an important evolutionary driver (Galmés et al., 2007). A. tinctoria is habitually confined to sandy soils and rocky places, xeric grasslands, and coastal habitats, highlighting the sensitivity to water availability, and autumnal wetter periods have been shown to coincide with rigorous plant growth for this species (Pluhár et al., 2001). Moreover, rainfall during late winter is important for early spring flowering in many plant species (Moore and Lauenroth, 2017). Therefore, precipitation characteristics appear to play an important role in promoting local adaptation and shaping genetic differentiation in A. tinctoria. Additional support for this hypothesis is also offered by the identification of outlier loci associated with rainfall (Supplementary Figure S4). Although the identified loci are few and do not entirely deviate from the neutral expectations, some of these loci show substantial allele frequency changes associated with the respective covariates (Supplementary Figure S5), pointing to adaptive genetic differentiation.

Genomic Divergence and Introgression in Mainland-Island Congeners of Alkanna
Our investigation of genetic differentiation detected a high genetic differentiation between mainland A. tinctoria and A. sieberi accessions from one of the two sampled Cretan localities (AT28; Figures 1, 3; average F ST = 0.271, p = 0.001) after excluding putatively admixed individuals. Phylogenetic and historical biogeographical studies have shown that the diversification in islands mainly occurred via vicariant speciation that is taxa become isolated owing to formation of barriers (Comes et al., 2008;Cellinese et al., 2009;Poulakakis et al., 2015;Sfenthourakis and Triantis, 2017;Jaros et al., 2018). However, secondary colonization events and gene flow between divergent populations from mainland or neighboring islands during sea level fluctuations and recurrent land bridge formation have also been proposed in other species (Li et al., 2010;Zhai et al., 2012). Due to lack of dated phylogenomic studies in genus Alkanna and the absence of samples of A. tinctoria from Crete do not allow us to predict whether the observed genetic split between mainland-island congener is a consequence of strictly vicariant divergence, or resulted from the colonization event(s), as the later has been hypothesized for some Cretan (sub)species (Cellinese et al., 2009;Jaros et al., 2018). In the future, a more wide sampling of each taxon is needed to understand whether the observed pattern follows the allopatric model of vicariant divergence, or other processes (i.e., post-fragment colonization) are also at play.
Intriguingly, in addition to genetic divergence between mainland-island congeners, our data identify significant introgression between the divergent southern lineage of A. tinctoria and the Central population from Crete. We identified via traditional and Bayesian clustering that the investigated accessions from AT27 showed an admixed ancestry between the two species (Figures 1, 3). In addition, AT27 formed its own independent clade on the third PC, as well as in the STRUCTURE and TESS3 results at K = 5 (Figure 1B), but showed comparatively lower heterozygosity. In turn, this implies that the admixture event might be old and further point toward a complex demographic history where the admixed population might have experienced bottlenecks during colonization phase resulting in the reduction of genetic diversity.

CONCLUSION
In the present study, we provide evidence that only dysploid tetraploid A. tinctoria and A. sieberi are prevalent in Greece, the main distribution area for these species. Our further population genetic inferences start from ploidy-aware genotype calling and successfully employ a strategy that can be extended to other polyploid non-model plant species, for example, in order to develop hypotheses regarding mode of inheritance of polyploids. Overall, we found a moderate intrapopulation genetic diversity in A. tinctoria, but significant regional differentiation, together with clear evidence of admixture between mainland and island congeners. Furthermore, the strong genetic structure uncovered is in line with native phytogeographical divisions and appears mainly shaped as a result of dispersal limitations and drift, but also of isolation by environment, with precipitation as the main contributing factor. Finally, the genomic variation revealed here can be used to further inform and predict the structure of phenotypic variation (e.g., alkannin and shikonin production) in this important medicinal plant, prompting to future studies.

DATA AVAILABILITY STATEMENT
The raw sequence reads can be found at NCBI (https://www. ncbi.nlm.nih.gov/) under the accession number PRJNA705074.

AUTHOR CONTRIBUTIONS
MA, ES, CL, and OP planned and designed the research. NK collected the samples. ET and HW-S performed flow cytometry and chromosomal counts. MA prepared the RAD-seq libraries and analyzed the data with support from OP and TL. MA, ES, TL, and OP interpreted the results. MA and OP wrote the manuscript with the input from all authors. All authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
This article was dedicated to CL, a wonderful scientist and mentor that passed away too early. We thank Angela Sessitsch and our partners in MICROMETABOLITE project for obtaining the funding, Eleni Maloupa, Andreana Assimopoulou, and Nikolas Fokialakis for provision of plant material used in this study, Karin Hansel-Hohl for her support in the wet laboratory, Antonielli Livio and Silvia Madritsch for initial help with Bash and R scripts, and Paul Blischak for advices on EBG implementation.