The Two Sides of the Mediterranean: Population Genomics of the Black Sea Urchin Arbacia lixula (Linnaeus, 1758) in a Warming Sea

Global environmental changes may have a profound impact on ecosystems. In this context, it is crucial to gather biological and ecological information of the main species in marine communities to predict and mitigate potential effects of shifts in their distribution, abundance, and interactions. Using genotyping by sequencing (GBS), we assessed the genetic structure of a keystone species in the Mediterranean shallow littoral ecosystems, the black sea urchin Arbacia lixula. This bioengineer species can shape their communities due to its grazing activity and it is experiencing an ongoing expansion with increasing temperatures. The population genomic analyses on 5,241 loci sequenced in 240 individuals from 11 Mediterranean sampled populations revealed that all populations were diverse and showed significant departure from equilibrium. Albeit genetic differentiation was in general shallow, a significant break separated the western and eastern Mediterranean populations, a break not detected in previous studies with less resolutive markers. Notably, no clear effect of the Almería-Oran front, an important break in the Atlanto-Mediterranean transition, could be detected among the western basin populations, where only a slight differentiation of the two northernmost populations was found. Despite the generally low levels of genetic differentiation found, we identified candidate regions for local adaptation by combining different genomic analysis with environmental data. Salinity, rather than temperature, seemed to be an important driver of genetic structure in A. lixula. Overall, from a population genomics standpoint, there is ample scope for A. lixula to continue thriving and adapting in the warming Mediterranean.


INTRODUCTION
The Mediterranean is a sea under siege (Coll et al., 2012). In this highly anthropized basin, the sum of direct human impacts, introduced species, and ongoing global change is altering the communities, ecosystem services, health and economic value of the marine environment (Linares et al., 2020). This sea acts as a miniature ocean that has been impacted by climate change (Lejeusne et al., 2010), as illustrated by mass mortality events correlated with thermal anomalies (such as heat waves or lengthening of summer temperatures) (Rivetti et al., 2014;Garrabou et al., 2019), or multi-species collapses related to exceptional thermal conditions in the Levant basin (Rilov, 2016). Furthermore, the ongoing warming trend of the surface seawater of the Mediterranean (Shaltout and Omstedt, 2014) can favor the expansion of non-indigenous species with tropical affinities (Raitsos et al., 2010;Bianchi et al., 2018), and also the spreading of native warm-affinity biota toward northern waters of its basin, a process called "meridionalisation" of the Mediterranean Sea (Bianchi, 2007;Parravicini et al., 2015). Some examples of these native "northward travellers" are the ornate wrasse Thalassoma pavo (Lowe, 1843), the orange stony coral Astroides calycularis (Pallas, 1766) or the black sea urchin Arbacia lixula (Linnaeus, 1758) (Bianchi and Morri, 1994;Francour et al., 1994;Bianchi, 2007;Gianguzza et al., 2011;Musco et al., 2016).
It is imperative to gather biological and ecological information of the main actors, or keystone species, in Mediterranean marine communities to predict and mitigate potential effects of shifts in their distribution, abundance, and interactions (Vergés et al., 2014) as well as to understand the evolutionary, ecological and environmental drivers of population structure. Sea urchins are amongst the most important keystone species in shallow littoral communities, directly related to high-value ecological and societal services. Sea urchins are landscape-changing, bioengineer species whose grazing activity can dramatically impact the algal communities dominant in shallow waters (Lawrence, 1975;Sala et al., 1998;Ling et al., 2009). Sea urchins provide a landmark instance of tipping points, as density increases beyond given thresholds result in collapse of macrophyte assemblages and the generation of barren grounds (Filbee-Dexter and Scheibling, 2014). In turn, the tipping points depend on community characteristics, such as nutrient availability (Boada et al., 2017) or human-derived stressors (Ling et al., 2015). Barren grounds are resilient states, and a decrease of sea urchin abundance below the tipping point is typically insufficient to revert to the macroalgal-dominated landscapes (hysteresis, Ling et al., 2015).
The two dominant sea urchins in shallow littoral communities are the purple (or common) sea urchin Paracentrotus lividus and the black sea urchin Arbacia lixula (Gianguzza, 2020). Both have different biological and ecological characteristics relevant for their potential impact. They have different feeding habits: P. lividus feeds preferentially on macroalgae, while A. lixula is omnivorous and has a catholic diet (Wangensteen et al., 2011;Agnetta et al., 2013). The two species interact in the formation of barren patches of substrate (Bulleri et al., 1999;Bulleri, 2013;Agnetta et al., 2015), but several studies show that A. lixula can have a leading role in the creation and maintenance of these barrens (Bulleri et al., 1999;Bonaviri et al., 2011;Gianguzza et al., 2011). They also differ in their potential responses to ongoing warming, with P. lividus recognized as a temperate water species (Boudouresque and Verlaque, 2020) and A. lixula as a thermophilous species (Mortensen, 1935;Tortonese, 1965;Gianguzza, 2020). Several studies have confirmed experimentally the warm-affinity of A. lixula and the positive role of temperature in the reproduction and early lifestages of the species Privitera et al., 2011;Wangensteen et al., 2013a,b). Transcriptomic analyses also reinforced this view (Pérez-Portela et al., 2020). Whereas its abundance is higher in the warmest regions (and subregions) of the Mediterranean basin (Guidetti and Dulèić, 2007), its frequency in the coldest regions (north-western Mediterranean) is increasing in the last decades (Francour et al., 1994;Wangensteen et al., 2013a). In contrast, the temperate species P. lividus is less tolerant to warming temperatures (Spirlet et al., 2000;Shpigel et al., 2004;Rilov, 2016). Thus, a scenario of progressive replacement of the common sea urchin by the black sea urchin in the Mediterranean rocky habitats is envisaged in the short-middle term Privitera et al., 2011). This replacement can be further enhanced by the fact that P. lividus is more prone to predation than A. lixula (Guidetti, 2004;Guidetti and Dulèić, 2007;Gianguzza et al., 2010). In addition, P. lividus is subject to heavy harvesting for human consumption (Boudouresque and Verlaque, 2020), while A. lixula is not an appreciated food item.
Population genetic studies provide crucial information about the health status of species, their biogeography, and connectivity between populations, which are key parameters to understand their biology and to predict future changes (Burton, 2009). Wangensteen et al. (2012) used mitochondrial sequence data to analyze populations in the complete distribution range of A. lixula, and important differences were detected between populations in western Atlantic, eastern Atlantic, and Mediterranean. However, no differentiation was detected among Mediterranean populations. Using microsatellite data, Pérez-Portela et al. (2019) analyzed populations from the western Atlantic and Mediterranean, and confirmed an important phylogeographic break between these two areas, with no significant differentiation between western and eastern Mediterranean basins. This led to a picture of panmixia in the Mediterranean for this species, which is in accordance with its long larval lifespan (29-36 days, Pedrotti, 1993), providing a wide dispersal potential. Nonetheless, the ability to detect population genetic differentiation is influenced by the number of markers (Clusa et al., 2018). Consequently, population genomics offer the possibility to better evaluate connectivity and to identify environmental drivers of adaptation Torrado et al., 2020).
In the present study we use a panel of genome-wide markers to analyze the genetic structure of Mediterranean populations including evolutionary and environmental drivers of this keystone species. We used Genotyping-by-Sequencing (GBS) procedures to identify thousands of loci, both neutral and putatively under selection. Specifically, we want (1) to generate a set of genomic loci with polymorphic sites for this species, (2) to detect outlier loci and loci associated to environmental variables, (3) to assess the genetic structure of A. lixula populations in the Mediterranean Sea, including the transitional area at the Alboran Sea, and (4) to evaluate the present status of genetic diversity and population connectivity of the expanding black sea urchin in the Mediterranean Sea.

Sampling
Between 20 and 23 individuals of Arbacia lixula were sampled by scuba diving, from autumn 2014 to summer 2015, from 11 locations spanning the Mediterranean basin (for a total of 240 individuals, Table 1 and Figure 1). Western Mediterranean samples included three populations from the transitional zone of the Alboran Sea: Tarifa (TA), Torremuelle (TO), and La Herradura (LH), plus Carboneras (CA), Palos (PA), Xàbia (XA) and Colera (CO) from the remaining Iberian coastline and Cap Bon (CB) at the north-eastern margin of the Gulf of Tunis. Eastern Mediterranean localities were Murter (MU) and Otranto (OT) from the Adriatic Sea, and Dalyan (DA) from the eastern side of the Aegean Sea (Figure 1). This sampling spans ca. 3,100 km (shortest distance by sea) from TA to DA. The layout allows testing the influence of the main oceanographic barriers in the area (Pascual et al., 2017): The Almería-Oran front between Atlantic and Mediterranean, the Ibiza Channel separating North and South of the Iberian Peninsula, the Siculo-Tunisian divide between western and eastern Mediterranean, and the Otranto Strait separating the Adriatic Sea.
Sea urchins were dissected in situ to extract Aristotle's lantern and were stored in absolute ethanol. Between 10 and 15 mg of muscle tissue of the Aristotle's lanterns was used to extract the DNA using QIAamp R DNA Mini Kit (QIAGEN) following manufacturer's instructions.

Genotyping by Sequencing and Loci Selection
Between 0.58 and 3.02 µg (mean ± SD = 1.96 ± 0.43) of genomic DNA (with ratio 260/280 ranging from 1.61-2, and ratio 260/230 ranging from 1.23-2.77) were used to perform a Genotyping by Sequencing protocol (GBS, Elshire et al., 2011) at the "Centre Nacional d'Analisis Genòmica" (CNAG-CRG, Barcelona, Spain). We selected the restriction enzyme ApekI based on digestion profile sizes of the genome of A. lixula (data not shown). The resulting fragments were ligated to individual barcodes and to a common adapter. A PCR was performed for fragment enrichment before paired-end sequencing of 2 bp × 100 bp fragments in an Illumina HiSeq 2000 platform using 96 multiplexed individuals in each lane of a flow cell. Each individual was sequenced in different lanes to obtain a mean number of reads of ca. 4 million.
The GIbPSs toolkit (Hapke and Thiele, 2016) was run to analyze the raw Illumina sequences. This program has already been used in previous studies of marine invertebrates (Casso et al., 2019;Carreras et al., 2020) and has proven its suitability to analyze GBS paired-end sequences of non-model organisms. Preprocessing of the sequences (which include trimming, filtering and assembling of paired-end sequences), locus identification (formation of the svars, definition of loci and alleles), and analysis of inferred loci (checking for indels and discarding deeply sequenced loci) were carried out following methods in Carreras et al. (2020).
In short, all raw sequences were trimmed to 80 bp. Filtering values were set to discard sequences shorter than 32 bp, with N's or with a phred score quality lower than 22 in a sliding window of 5 bp. A minimum overlap of 5 bp was required to assemble forward and reverse sequences of a paired-end read. Duplicate sequences originated in loci shorter than the read length were identified and only the forward read was kept for further analyses . Svars were identified for each individual separately and grouped into loci using the default distance settings, requiring at least five sequences for a locus to be deemed as valid. Note that all the polymorphic sites in each locus were used for identifying alleles. We thus used sequence variants (multiallelic), where each allele at a given locus is defined as the whole sequence with all polymorphic sites, not just one SNP per locus (biallelic). Multiallelic markers have been shown to provide more information and resolution power than biallelic markers (Ryman et al., 2006), and empirical tests on population genomics of different marine organisms have demonstrated this superior resolution on GBS data (Casso et al., 2019;Carreras et al., 2020). A global loci catalog was assembled with all the alleles found in all loci when combining all the individuals. Loci flagged by the program as potentially including an indel or with more than two alleles per individual were discarded, and deeply sequenced loci (those with a median depth percentile above 0.2, likely corresponding to paralogous regions with multiple copies) were likewise removed. A final locus selection was done to retain only loci present in at least 170 individuals (70% of the total dataset).

Population Structure
The "pegas" R package (Paradis, 2010) was used to evaluate Hardy-Weinberg Equilibrium (HWE) for each locus in each sampled site, and significant departures were identified. A Bonferroni correction was not applied since it dramatically increases the probability for type II errors (Cabin and Mitchell, 2000;Moran, 2003). Instead, we applied a Benjamini and Yekutieli (B-Y) false discovery rate (FDR) correction for multiple comparisons (Benjamini and Yekutieli, 2001) as detailed in White et al. (2019) at an overall corrected 0.05 α-level. Any locus with significant departures from HWE in seven or more sampling sites (>60%) was removed from the final dataset. Pairwise genetic distances among sampling sites were assessed using the Weir and Cockerham (1984) estimator of F ST using the "hierfstat" R package, with 999 permutations to obtain the respective p-values (Goudet, 2005) and significance thresholds were set after B-Y FDR correction as detailed above. The results were plotted with heatmaps and dendrograms using the "gplots" R package (Warnes et al., 2016). Genetic diversity values (observed and expected heterozygosity and allelic richness) and F IS values per sampling site were calculated using the R packages "genetics" and "hierfstat" (Goudet, 2005;Warnes et al., 2019). The significance of the inbreeding coefficient was tested with function boot.ppfis and 999 permutations. If the generated values did not include 0, the corresponding F IS was deemed significant.
A discriminant analysis of principal components (DAPC) was carried out with the "adegenet" R package (Jombart, 2008) using localities as prior groupings. We used the function "xvalDapc" to determine the number of PCs to retain and three discriminant functions using localities as prior grouping information. In order to ascertain whether there is genetic differentiation between the two major groups of sampled sites, western (TA, TO, LH, CA, PA, CB, XA, and CO) and eastern (OT, MU, and DA) Mediterranean (see section "Results"), an analysis of the molecular variance (AMOVA) was performed, grouping populations in these regions, with the program Arlequin version 3.5.2.2 (Excoffier and Lischer, 2010). Two further AMOVAs were performed with the western Mediterranean populations to test for differentiation between the Alboran Sea and the remaining localities, and between the two northernmost western localities (XA and CO, see section "Results") and the remaining populations.
We also performed a STRUCTURE analysis, which implements a Bayesian clustering method to identify the most likely number of genetic groups (K) even in the presence of linked loci (Falush et al., 2003). Following Evanno et al. (2005), we carried out 10 runs per each value of K ranging from 1 to 11. We used the model of correlated allele frequencies and a burn-in of 50,000 steps followed by 500,000 Markov Chain Monte Carlo steps. We estimated the statistic K to infer the most likely number of groups using STRUCTURE HARVESTER (Earl and vonHoldt, 2012). The 10 runs of STRUCTURE for the most probable K were averaged using CLUMPP version 1.1.2 (Jakobsson and Rosenberg, 2007).
A Mantel test was done to test isolation by distance in our dataset using the "vegan" R package (Dixon, 2003;Oksanen et al., 2019) considering the shortest distance by sea among sampling localities, measured with Google Earth. In order to avoid the influence of any possible inter-basin genetic structure, the Mantel test was repeated considering only sampling sites of the western Mediterranean (TA, TO, LH, CA, PA, CB, XA, and CO).
A network analysis was done using the F ST matrix with the program EDENetworks (Kivelä et al., 2015). We used the F ST values among populations as the dissimilarity matrix and then transformed them into a network. We chose the automatic thresholding option, that sets the maximal distance values required to keep an edge just above the percolation threshold (at which the network breaks down into its main components). Finally, geographic coordinates of sampling populations were introduced to adjust the network to the actual geographic location.

Environmental Data
Both monthly mean sea surface temperature and salinity data were obtained from Copernicus Marine Environment Monitoring Service database (CMEMS 1 , product identifier: MULTIOBS_GLO_PHY_REP_015_002) using the R package "ncdf4" version 1.16 (Pierce, 2017) from January 1987 to December 2015 and averaged.
We used the envfit function of the vegan R package (Oksanen et al., 2019) to fit the environmental variables into the DAPC ordinations as vectors, whose direction indicated the main gradient of change in the environmental variable and whose length was scaled to reflect the correlation between the spatial configuration and the environmental variable. The significance of these correlations was assessed with 999 permutations of environmental variables and using the squared correlation coefficient (r 2 ) as a goodness of fit statistic.
In addition, for each abiotic variable, we computed distance matrices using the Euclidean distances for the mean, maximum, minimum and range of the variable. These distance matrices were compared individually with the F ST matrices using partial Mantel tests as implemented in "vegan." The geographic distance matrix was used as a controlling variable, as both temperature and salinity have an east-west pattern of variation in the Mediterranean (Coll et al., 2010) that can co-vary with geographic distance among localities (as it is mostly due to the longitudinal span of the samples).

Outlier Loci
Candidate outlier loci were identified using two different programs: Arlequin (Excoffier and Lischer, 2010) and BayeScan (Foll and Gaggiotti, 2008). The analysis with Arlequin was performed using 100,000 coalescent simulations and 100 demes in a finite Island model. Benjamini-Yekutieli FDR corrections were applied to the p-values as mentioned above. BayeScan search was implemented with a total of 150,000 iterations, with a burnin period of 100,000 and 40 pilot runs of 10,000 iterations each. Outlier loci in BayeScan were identified using a qval of 0.05. In order to avoid false positives only loci detected with the two programs were kept as candidate outliers.
Two different sample groupings were used to detect outlier loci: "All populations, " in which all samples were included and grouped by sampling site to obtain a general overview of the outlier loci across the sampled geographical range; and "West vs. East" grouping, in which all samples were included in two groups according to their geographical distribution, western or eastern Mediterranean. We created two datasets using the results of these analyses. First, we defined as the "outlier" dataset the one comprising the outlier loci found by any of the two groupings ("All populations" or "West-East"). Second, we defined a "neutral" dataset in a conservative way by removing from the general dataset the loci of the "outlier" dataset plus any outlier found in any of the two groupings by any of the two programs using a conservative α < 0.05 for the p-value provided by Arlequin and the q-value provided by BayeScan. In this way we made sure that no potentially selected locus was included in the "neutral" dataset.
The analyses of population structure carried out with the complete loci dataset (DAPC, F ST , comparisons with environmental variables) were repeated using only the "outlier loci" and the "neutral loci" datasets.

Redundancy Analysis
We used a Redundancy Analysis (RDA) (Forester et al., 2018) in order to identify candidate loci significantly associated to environmental variables (association loci). As this type of analysis needs a SNP biallelic dataset, we selected from our multiallelic variant dataset only the first polymorphic position (SNP) of each locus, using the option -uS 2 of GIbPSs (Hapke and Thiele, 2016). Given the restriction of the analysis that does not allow missing data, we imputed missing genotypes by replacing them with the most common genotype across all individuals. Finally, we identified significantly associated loci with the "rda" function of the R package "vegan" (Oksanen et al., 2019) using as response variable the matrix of SNP genotypes of all loci and the environmental variables as predictors.

Locus Identification and Diversity Estimates
A mean of 4,043,922 reads per individual were obtained. A dataset of 5,241 loci was retained after all filtering steps and used for the analyses. A genepop file with the allele composition of all individuals is available in Supplementary File 1. A list of all alleles with their sequences is presented in Supplementary File 2. In order to link loci names in both files an equivalence table is provided as Supplementary File 3. The raw sequence data was uploaded to the SRA repository (Bioproject PRJNA746276).
The length of the selected loci ranged from 35 bp to a maximum of 152 bp (mean ± SD = 124.44 ± 35.26 bp). Thus, there was a wide range of alleles per locus (from 2 to 190; mean ± SD = 25.77 ± 21.59), with 27 loci having only two alleles and 68 having more than 100 alleles (Supplementary Figure 1). Table 1 shows the main features of the sampled populations. The mean number of alleles per locus per sampling site was similar (mean ± SD = 7.20 ± 0.14), ranging from 6.84 (Colera) to 7.33 (Xabia). The number of private alleles per locus (mean ± SD = 1.20 ± 0.048) ranged from 1.127 (Colera) to 1.275 (Murter).
There was a deficit of heterozygotes at all sampling sites (Ho = 0.379 ± 0.002 and He = 0.441 ± 0.002, mean ± SD). F IS values were in all cases positive and low (mean ± SD = 0.141 ± 0.006), being nevertheless significant at all sites. Table 1) were very low in general with values near zero or even negative (mean = 0.00228). Nonetheless, comparisons among western (TA, TO, LH, CA, PA, CB, XA, and CO) and eastern (OT, MU, and DA) Mediterranean sampling sites showed the highest genetic distances (mean = 0.00464, Supplementary Table 1 and Figure 2), and were all significant. On the other hand, pairwise F ST comparisons within the two major regions had lower values (among western localities, mean = 0.0005; among eastern localities, mean = −0.0001) and were mostly not significant, albeit XA and CO, the two northernmost sites of the western Mediterranean, showed slightly higher genetic distances with the rest of the western Mediterranean sites (mean = 0.0012). The transitional localities found in the Gibraltar area and the Alboran Sea (TA, TO, and LH) did not show any significant differentiation with the remaining western Mediterranean populations, except for the comparison XA-TO. Likewise, the heatmap and the dendrogram based on F ST population pairwise distances showed two major groups, western and eastern Mediterranean, with XA and CO forming a clade slightly separated from the rest of the western sampling sites (Figure 2).

Pairwise F ST values (Supplementary
The two major groups are also evident in DAPC results (Figure 3). Western sites appear clustered in one extreme and the eastern localities in the other extreme of the first axis. Within the western group, the Alboran Sea populations appear concentrated in one extreme of the first axis and XA and CO are placed on the other extreme, while CB (the easternmost population in the western basin) occupied a central position in the western Mediterranean cluster. Further, AMOVA results (Supplementary Table 2) showed significant genetic differences between the western and eastern groups of populations, although the percentage of variation explained was low (0.41%, p < 0.001). Likewise, an AMOVA of the western populations comparing the Alboran Sea with the remaining localities showed no significant differentiation between groups (p = 0.085), while a comparison between XA and CO with the other western Mediterranean populations explained a small but significant percentage of variation (0.06%, p < 0.001). In all AMOVAs, no differentiation was found among populations within groups, and most variance was concentrated within individuals (Supplementary Table 2).
The change in K of the STRUCTURE analyses (Supplementary Figure 2) pointed to the existence of three genetic clusters in our dataset. The plot of the assignment probabilities (Figure 4) showed that one of the clusters was mostly restricted to the eastern Mediterranean, with some presence in the northern populations of the western basin. The other two groups were present throughout the western Mediterranean.
The Mantel test between geographic and genetic distances was highly significant (r = 0.816, p = 0.003, Supplementary Figure 3). This effect, however, was due to the separation between eastern and western basin populations, which had both wide intervening distances and genetic differentiation. When the analysis was repeated with only the western populations, no relationship was found (r = −0.144, p = 0.260).
The results of the network analysis, superimposed to the geographic layout ( Figure 5) again reinforced the existence of two main groups of localities, linked by a weak edge between XA and OT. Accordingly, these two localities had the highest betweenness centrality, which reflects the importance of a node in connecting other nodes of the network.

Environmental Data
The environmental vectors plotted in the DAPC ordination in Figure 3 show that the mean temperature had the shortest vector length, while mean, minimal, and maximal salinity were the longest vectors, indicating higher correlation of salinity variables with the obtained ordination. The fit of the environmental variables to the ordination scores is presented in Table 2. All variables considered had a significant fit with the ordination based on genetic distances. However, the r 2 coefficients were much higher for salinity measures (above 0.78 for mean, maximal, and minimal salinity).
As expected, when using partial Mantel tests to compare distance matrices correcting for the effect of geographic distance  ( Table 2) the correlations between genetic and environmental distance matrices were lower, but they were still significant for the mean and maximal salinity measures, thus confirming that salinity had an effect on the genetic make-up of the populations, over and above the effects of geographic distance per se. No significant effect of temperature variables was detected after controlling for distance.

Outlier Loci
A total of 25 loci were identified as outliers (putatively under selection): 13 loci were found when considering all populations, these same loci plus another 12 were found when joining populations in western and eastern groups. Additionally, we ended up with a set of 4,540 neutral loci after removing 701 loci, including the 25 outlier loci mentioned above and 676 additional loci found by any of the outlier detection methods applied with a non-corrected threshold of 0.05. Hereafter we will call them the "neutral" loci and the "outlier" loci datasets. Supplementary File 3 provides the IDs of the loci belonging to both datasets. The F ST values among populations obtained using these datasets are presented in Supplementary Table 3. As expected, the mean F ST was an order of magnitude higher when using outlier loci than when using the neutral dataset (0.0669 vs. 0.0022), although the pairwise comparisons found to be significant with the neutral and FIGURE 5 | Network of the populations overlapped on the geographic map. The populations marked in red or orange (XA and OT) share the edge linking the two parts of the network just above the percolation limit, and have therefore the highest betweenness centrality. Edge thickness and color reflect the strength of the edges. Colors follow a jet colormap, with cold colors indicating weaker edges and hot colors stronger edges. outlier loci are mostly the same as those found with all the loci (involving basically comparisons of the eastern Mediterranean with the other localities). The population structure revealed by the neutral and outlier loci was similar to that obtained with the complete loci dataset, as shown by heatmaps (Supplementary Figure 4) and DAPCs (Figure 6), with a major break between the populations of the west and east of the Mediterranean (with XA and CO appearing as a separate group in the western Mediterranean cluster, Supplementary Figure 4). Both ordinations showed a similar pattern of correlation with the environmental variables as the complete dataset. The envfit analyses revealed a significant fit of all environmental vectors with both loci datasets (Supplementary Table 4). As before, the r 2 values were much higher for salinity (with the exception of salinity range) than for temperature. When partial Mantel tests were run to filter out the effect of geographic distance on the correlation between genetic and environmental distance matrices, only salinity (mean and maximal) showed a significant correlation with the genetic differentiation (F ST ) measures (Supplementary Table 4), as already detected with the complete dataset ( Table 2).

Redundancy Analysis
The first two axes of the RDA including all the samples explained an accumulated 38.43% of the genetic variability found (Figure 7). The first axis explained most of this variability and sorted the samples along a longitudinal gradient related mainly to salinity-related variables, with eastern localities at one extreme and western localities at the other. The second axis was related to temperature variables. A total of 243 loci were found to be significantly associated to the environmental variables tested, being salinity variables the ones with the higher number of associated loci (Supplementary Figure 5). Eight of these loci were also detected as F ST outliers in the previous analyses.

DISCUSSION
We have assessed the genetic structure of a keystone species in Mediterranean shallow littoral communities, the black sea urchin Arbacia lixula. This information is particularly timely in view of the ongoing and foreseen expansion of this highly impactful species on the wake of increasing temperatures Wangensteen et al., 2013a,b;Visconti et al., 2017;Gianguzza, 2020;Pérez-Portela et al., 2020).
While previous studies using less resolutive markers (Wangensteen et al., 2012;Pérez-Portela et al., 2019) failed to detect any significant genetic structure within the Mediterranean, thus leading to a picture of a panmictic population in this Sea, we found a significant signal of genetic differentiation between the western and the eastern Mediterranean basins. This signal was detected in spite of a context of overall low F ST values. The genomic approach, using thousands of markers, has the potential to reveal even subtle patterns of genetic structure (e.g., Bradbury et al., 2010Bradbury et al., , 2015Milano et al., 2014;Carreras et al., 2017Carreras et al., , 2020. In addition, our approach is potentially more informative than using only one SNP per locus, as we took into account all variable positions, resulting in multiallelic (mean of 25.77 alleles/locus considering all individuals) rather than biallelic markers, which provides higher resolution (Ryman et al., 2006;Casso et al., 2019;Carreras et al., 2020).
There was a high genetic diversity within populations, with a mean of ca. 7.2 alleles per locus and over 5,900 private alleles in each population. High levels of genetic diversity have been also found with nuclear (Pérez-Portela et al., 2019) and mtDNA markers in Arbacia lixula (Wangensteen et al., 2012), as well as in other sea urchin species using genomic markers . We also detected that the observed heterozygosity was lower than expected, with F IS values positive and significant in all populations. This same result was obtained with microsatellites (Pérez-Portela et al., 2019). Positive inbreeding coefficients, even for species with planktonic development, are common in marine invertebrates (Addison and Hart, 2005;Olsen et al., 2020). Our results thus reinforce the view that the populations of A. lixula are not in general in Hardy-Weinberg Equilibrium and that some mechanism of assortative mating is at play, even in broadcast-spawning species with long-lived larvae, as found also in Paracentrotus lividus . We can only speculate on the causes of this lack of HWE, but unrecognized mating structures associated, for instance, to gamete recognition proteins (bindin) known in sea urchins (Zigler, 2008) and in the genus Arbacia in particular (Metz et al., 1998), can explain the pattern found. Further research is necessary to definitely settle this point.
The Mediterranean is a landmark model for analyzing genetic structure of marine species. It has several oceanographic discontinuities generating hydrodynamic units (Rossi et al., 2014;Torrado et al., 2021) that impact differentially on the connectivity of populations (Patarnello et al., 2007;Galarza et al., 2009;Pascual et al., 2017). In a meta-analysis of published records, Pascual et al. (2017) noted that oceanographic fronts had a stronger influence in species with high mobility or long-lived pelagic larval stages. This seemingly contradictory result was explained because species with low mobility and short larval lives remain close to the shore and are less affected by oceanographic features. In our case, A. lixula would be a species sensitive to the presence of fronts, given its broad scope for dispersal linked to long-lived (several weeks, Pedrotti, 1993) larvae. Notwithstanding, only a significant differentiation associated with the separation between eastern and western Mediterranean basins was detected in our study, and some subtle patterns were found within western populations (see below). The network analysis confirmed the separation between eastern and western basin networks, related only by a weak link between XA and OT. Of note here is that the Tunisian population (Cap Bon), right in the vicinity of the Sicily Channel break, clustered clearly with the western populations, which agrees with the fact that the Algerian current brings water to this area directly from the Alboran Sea zone (Millot, 2005). There was no indication of a separation of the Adriatic populations from the Levantine site (Dalyan), suggesting a minor role of the Otranto Strait break.
Although the Mantel test comparing genetic and geographic distances was not significant when restricted to the western basin populations, some structure could be detected among them. In particular, the heatmaps and clustering analyses showed some differentiation of the two northernmost populations of the western basin (XA and CO) from the rest. This separation is also reflected by a significant (albeit weak) differentiation in AMOVA. This result may point to a role of the Ibiza Channel front in the distribution of A. lixula. These two populations lie at the north of this divide, and have the highest maximal salinities of the western Mediterranean populations. A denser sampling design in the northwestern Mediterranean area is necessary to fully grasp the genetic structure of A. lixula in the area, and its relationship with oceanographic features.
On the other hand, the Almería-Oran front seems to play no significant role in shaping the distribution of A. lixula. This front, separating the Alboran Sea from the rest of the Mediterranean, is the true boundary between the Atlantic and Mediterranean for many species (Patarnello et al., 2007). Pérez-Portela et al. (2019) found a weak signal of differentiation between populations of the Alboran Sea and the rest of the Mediterranean. However, this was due mostly to an abnormal behavior of the Torremuelle (TO) population in the sampled year (2009). This population was shown to have significant genetic change over time, as is often the case in invertebrate species in the Alboran Sea area (Calderón et al., 2012;Pascual et al., 2016). Our AMOVA between Alboran Sea and the rest of western Mediterranean populations proved not significant, in agreement with Pérez-Portela et al. (2019). Thus, while there is a significant distinction between Atlantic and Mediterranean populations of the species (Wangensteen et al., 2012;Pérez-Portela et al., 2019), our results support the idea that the break for this species appears to be in the Gibraltar Strait and not in the Almería-Oran front (Wangensteen et al., 2012) as shown also in other species (Pascual et al., 2017).
When we analyzed the correlation between the ordination results based on genetic data and the main environmental variables, we found an overall significant role of both temperature and salinity, albeit the variance explained was much higher for salinity. However, in our sampling scheme there is a confounding effect of geographic distance as there is a longitudinal gradient of these variables in the Mediterranean (Coll et al., 2010). We thus factored out this unwanted effect using partial Mantel tests, which showed that only some salinity-related variables remained significant. Furthermore, Redundancy Analyses also indicated a significant correlation between these variables and specific loci, showing that they have a key role in the genomic structuring.
There is a well-demonstrated effect of temperature on biological and reproductive parameters of this species (Wangensteen et al., 2013a,b;Visconti et al., 2017), as well as in its transcriptional response (Pérez-Portela et al., 2020). However, our results indicate that another variable (salinity, to our knowledge unexplored so far) should be taken into consideration as a driver of the distribution and adaptation of A. lixula to a changing environment. This sensitivity to salinity changes might be the reason why A. lixula (like other regular sea urchin species) has so far been unable to colonize the Black Sea (Öztoprak et al., 2014).
A few loci (25 in total) were detected as outliers in the comparisons between all populations and the comparisons between western and eastern populations. The F ST values corresponding to these loci were much higher than those obtained with the neutral loci, but the overall structure and ordination results remained the same considering the complete dataset, the outlier loci, or the neutral loci. Contrary to expectation, the outlier loci (putatively under selection) did not show in general a higher correlation to environmental variables. Thus, if they are subject to selection, they are responding to unexplored variables.
The comparison with the results obtained using the same GBS technique on the common sea urchin P. lividus  is particularly revealing, as these two co-occurring sea-urchins interact in shaping the sublittoral communities, and a replacement of P. lividus by A. lixula as a result of warming is envisaged Privitera et al., 2011;Pérez-Portela et al., 2019, 2020. Both species had similar reproductive strategies involving ovipary and long-lived, planktotrophic larvae. The genetic diversity observed here (mean Ho = 0.379) is of the same order as the one detected for P. lividus (mean of 0.370), and the mean number of alleles per locus is also similar (7.20 vs. 7.34). The F IS values were higher for the P. lividus (mean of 0.252 against 0.141 for A. lixula) indicating a stronger deviation from HWE in the former. For the common sea urchin, the effective Atlanto-Mediterranean break appears to be in the Almería-Oran front (Duran et al., 2004;Calderón et al., 2008;Carreras et al., 2020). The Iberian populations analyzed by Carreras et al. (2020) were collected simultaneously with the present samples, so no temporal hydrographic feature can explain this result. The differentiation between the Alboran Sea populations and the rest of the Mediterranean is more clear-cut than in the case of A. lixula, and there is also a separation between western and eastern Mediterranean, thus populations of P. lividus within the Mediterranean may be more structured than those of A. lixula. While adaptation to salinity and temperature appeared as an important driver in the transition between Atlantic and Mediterranean for P. lividus, only temperature was found to have a significant influence within the Mediterranean . In contrast, in A. lixula salinity seems to play a relevant role in the species' genetic makeup in the Mediterranean.
In conclusion, previously unrecognized genetic structure was detected in A. lixula Mediterranean populations. Although differentiation is shallow, there is a significant break between western and eastern populations, and no clear effect of the transitional area (Alboran Sea) between Atlantic and Mediterranean was detected. Contrary to expectation, salinity, rather than temperature, may be an important driver of genetic adaptation in A. lixula. Over a substrate of high genetic diversity, the break detected can facilitate differential adaptation in the two basins. Overall, from a population genomics standpoint, there is ample scope for A. lixula to continue thriving and adapting in the warming Mediterranean.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: NCBI [accession: PRJNA746276].

AUTHOR CONTRIBUTIONS
XT, MP, and CP conceived and designed the study. XT, CP, OW, MP, and CC participated in the sampling. ÀG-C, MP and VO undertook the laboratory analysis. CC, ÀG-C, VO, MP and XT conducted the data analysis. XT, MP and CC wrote the manuscript with input from all the authors. All authors contributed to the article and approved the submitted version.

FUNDING
This research was funded by projects PopCOmics (CTM2017-88080, funded by MCIN/AEI/10.13039/501100011033 and by "ERDF A way of making Europe" of the European Union) and MarGeCh (PID2020-118550RB, funded by MCIN/AEI/10.13039/501100011033) from the Spanish Government. This is a contribution from the Consolidated Research Group "Benthic Biology and Ecology" SGR2017-1120 (Catalan Government). Some localities were sampled under the European FP7 CoCoNet project (Ocean 2011-4, Grant Agreement #287844).

ACKNOWLEDGMENTS
We would like to thank Hector Torrado for help with the analysis of the temperature and salinity information. We would also like to thank the professionals from Antheus srl for sample collection in the Adriatic, especially Stanislao Bevilacqua and Giuseppe Guarnieri. We would also further like to thank Claudia Kruschel, Simonetta Fraschetti, and Tony Terlizzi who helped with logistics and sampling. We are indebted to Jamila Ben Souissi for providing the Tunisian samples.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmars.2021. 739008/full#supplementary-material  Supplementary Table 3 | Values of F ST among populations considering the "outlier" and the "neutral" loci datasets. Below the diagonal are indicated the p_values (significant values after FDR correction shown in bold).
Supplementary Table 4 | Results of the envfit tests and the partial Mantel tests performed on the "outlier" and the "neutral" loci datasets.
Supplementary File 1 | Genepop file (Supp_Mat_1_ArbaciaGenPop_5241loci.txt) with the allele composition of the localities studied.
Supplementary File 2 | Plain text file (Supp_Mat_2_Arbacia_Loci_alleles.txt) with the sequence data of all alleles in all loci. The first column indicates the loci code, the second column indicates the allele code and the third column indicates the full allele sequence.
Supplementary File 3 | Text file (Supp_Mat_3_Loci_info.txt) containing loci information. The first column indicates the name of the loci as in the Genepop file, the second column indicates the name of the loci given by GibPss, the third column indicates the result of the outlier analysis and the last column the results of the RDA.