Evidence of Elevational Speciation in Kerteszia cruzii (Diptera: Culicidae) in the Ribeira Valley, São Paulo, Brazil

Kerteszia cruzii [former Anopheles (Kerteszia) cruzii] is a bromeliad mosquito widespread in the Brazilian Atlantic rainforest. In South-eastern Brazil, it plays an important role in malaria transmission because it was infected with at least four Plasmodium species. There is robust evidence that Ke. cruzii is a species complex. We used single nucleotide polymorphisms (SNPs) from a nextRAD sequence (nextera-tagmented, reductively amplified DNA) to investigate the genetic structure of Ke. cruzii in the Ribeira Valley, South-eastern Brazil. Furthermore, we verified whether the genetic structure was associated with forest cover, elevation, slope, and vegetation physiognomy. Our results showed two distinct lineages in the studied region associated with elevation and isolation by distance. The first lineage included samples from coastal localities and the second comprised specimens from inland or mountain sites. At one sampling locality (Esteiro do Morro in Cananéia municipality), both lineages are sympatric. These results are in accordance with previously published data that showed elevated stratification in Ke. cruzii. However, Fst values did not indicate the existence of cryptic or sister species in Ke. cruzii in this region, we concluded that elevational speciation probably occurs, and we hypothesized that differences in population structure found might be associated with the distribution of bromeliad species.


INTRODUCTION
Kerteszia cruzii occurs in areas of the Brazilian Atlantic rainforest and is abundant, depending on the abundance of the bromeliad phytotelmata. Formerly, Kerteszia was classified as a subgenus of the genus Anopheles until 2017, when the results of a phylogenetic analysis of the complete mitochondrial genome consistently showed Kerteszia as a monophyletic taxon placed outside the genus Anopheles (Foster et al., 2017). Consequently, Foster et al. (2017) elevated Kerteszia, Lophopodomyia, Stethomyia, and Nyssorhynchus to the genus level.
Females of Ke. cruzii are primarily sylvatic, presenting a low degree of sinanthropy (Forattini, 2002), can feed on humans, and are more abundant at the edges of forests where they usually perform their blood repast (Forattini et al., 1986;Medeiros-Sousa et al., 2019). Although the females of some populations are capable of blood-feeding at any time of the day, they usually present two activity peaks, one at twilight and the other at dawn (Forattini et al., 1986). Females that feed at dusk were found to live longer and, therefore, are more likely to survive and exceed the extrinsic incubation period of Plasmodium (Dalla Bona and Navarro-Silva, 2010).
Several studies have suggested that Ke. cruzii is a complex of species. Analyses of the banding pattern of the ovarian polytene chromosome showed that this species encompasses three sibling species in the Boracéia and Juquitiba municipalities in São Paulo state, Brazil (Ramírez and Dessen, 2000a;Ramírez and Dessen, 2000b). A broader study using samples from Rio de Janeiro (RJ), São Paulo (SP), Santa Catarina (SC), and Bahia (BA) states in Brazil demonstrated that the specimens from Bahia state are genetically distant from the remaining populations analyzed from the South and Southeast of Brazil (Carvalho-Pinto and Lourenço-de-Oliveira, 2004).
Subsequently, studies employing the timeless nuclear gene reaffirmed the findings of Carvalho-Pinto and Lourenço-de- Oliveira (2004) and suggested that in Itatiaia municipality, RJ state, Brazil, there are two putative species under the name of Ke. cruzii (Rona et al., 2010(Rona et al., , 2013. A study in Cananéia municipality in Southeastern Brazil employing wing geometric morphometrics and COI gene analyses showed that samples from hilltops differ from those found in the lowlands (Lorenz et al., 2014). Analyses employing the complete mitochondrial genome of four Kerteszia specimens, including four distinct populations of K. cruzii, showed differences in the codon composition from three localities in the South-eastern region (São Paulo and Cananéia both in São Paulo state; Itatiaia in RJ) of Brazil and one from South Brazil (Maquiné in the Rio Grande do Sul state) (Oliveira et al., 2016). Analyses using the cpr and clock nuclear genes revealed two lineages in the Serra do Mar region: one corresponding to samples from the low land coast and another corresponding to that from mountain specimens (de Rezende Dias et al., 2018). Recently, Kirchgatter et al. (2020), employing sequences from GenBank from NADH4 and COI, identified three genetic lineages of this species in Brazil corresponding to the Serra do Mar, Serra da Mantiqueira, and Serra da Cantareira.
Herein, we employed single nucleotide polymorphisms (SNPs) from a nextRAD sequencing (nextera-tagmented, reductively amplified DNA) to verify whether landscapes presenting heterogeneous vegetation physiognomies, with distinct vegetal coverage, can influence the genetic structure of Ke. cruzii. The main objectives of this study were to (1) test the genetic structure of Ke. cruzii in different gradients of vegetation cover in six locations described in Laporta et al. (2015); (2) to search for genetic signatures associated with distinct environmental variables.
The collection sites in Esteiro do Morro, Eldorado, Sete Barras, Toca da Onça, and Tapiraí are in transition landscapes between dense ombrophilous forests and rural environments, with forest cover gradients of 74.99% (Tapiraí), 65.37% (Esteiro do Morro and Sete Barras), and 44.66% (Eldorado). Sítio Itapuan and Pedrinhas presented heterogeneous landscapes. For example, in Sítio Itapuan, besides dense ombrophilous forest and the rural environment, restinga, water, and mangroves (97.49% of natural Atlantic Forest) are still present. The landscape in Pedrinhas is characterized as a transition between the urban environment, restinga vegetation, and mangroves (with 94.48% of the preserved natural vegetation) (Laporta et al., 2015) (Figure 1).
Field collections were performed using Shannon traps from 6:00 p.m. to 10:00 p.m. The mosquitoes were preserved in silica gel until morphological identification and DNA extraction. Specimens were morphologically identified using the dichotomous keys of Forattini (2002) and Consoli and Lourenço-de-Oliveira (1994) and confirmed by barcode COI amplification and sequencing (Bourke et al., 2018). Genomic DNA was extracted using the salt method described by Miller et al. (2007) and modified by Laporta et al. (2015) from 59 mosquito specimens (Table 1).

Next Generation Sequencing (NGS) and SNPs Detection
The nextRAD libraries were assessed by the SNPsaurus LLC Company, as in Russello et al. (2015). Briefly, the genomic DNA was first fragmented using Nextera reagent (Illumina, Inc.), which also ligated short adapters to the ends of the fragments; the reaction was scaled to fragment 3 ng of genomic DNA. Each fragment was then amplified using 25 cycles at 75 • C, with one of the primers matching the adapter and extending eight nucleotides into the genomic DNA with the selective sequence TGCAGGAG. Thus, only fragments starting with a sequence that can be hybridized by the selective sequence of the primer can be efficiently amplified. Thereafter, the amplified fragments were sequenced on a HiSeq 4000 with one lane of 150 bp reads (University of Oregon). Genotyping analysis was performed with custom scripts (SNPsaurus, LLC) using bbduk (BBMap tools) 1 to trim the reads (Supplementary Material S1). A de novo reference was created after collecting 10 million reads equally from the samples; reads with counts less than 7 or greater than 700 were excluded from the analysis. The remaining loci were aligned to each other to identify allelic loci and collapsed allelic haplotypes to a single representative. Then, the reads were mapped to match the reference with a threshold of 95% using bbmap (BBmap tools), and genotype calling was performed with call variants in BBmap tools. Finally, the vcf archive was filtered using vcf tools 1 http://sourceforge.net/projects/bbmap/ (Danecek et al., 2011) to exclude: (1) alleles with frequencies less than 5%; (2) genotype calls under 50%; (3) samples with more than 50% missing data; (4) loci with more than 10% missing data; and (5) loci with a deviation of the Hardy-Weinberg equilibrium (P < 0.01).

Population Analyses
The vcf file was converted into the necessary formats to perform the remaining analyses in PGDspider v 2.0 (Lischer and Excoffier, 2012). To assess the genetic distances within individuals and groups of individuals, two matrices were generated using Nei's distances in R package v. 3.5.2, using StAMPP (Pembleton et al., 2013). The matrices were visualized in Splitstree4 v. 4.14.2 (Huson and Bryant, 2006). StAMPP was also employed to generate pairwise Weir and Cockerham's (1984) Fst matrixes. The statistical significance (P) of each value was determined using 100 permutation tests. To evaluate the genetic structure of the studied samples, Bayesian analysis in STRUCTURE software v. 2.3.4 (Pritchard et al., 2000) and principal component analysis (PCA) in R v. 3.5.1, using the adegenet package (Jombart, 2008;Jombart and Ahmed, 2011) was performed. STRUCTURE analysis was performed using Strauto (Chhatre and Emerson, 2017) in seven runs (K = 1-7) with ten replicates for each run. The Markov chain Monte Carlo (MCMC) was carried out for 1 million generations and a burn-in period of 100,000 for each run. The Evanno method (Evanno et al., 2005) was implemented in STRUCTURE Harvester (Earl and vonHoldt, 2012) to determine a suitable number of clusters. The ten replicates for the best K value were combined in CLUMPP v1.1.2 (Jakobsson and Rosenberg, 2007) under the algorithm Large K Greedy with 2,000 permutations, and the results were visualized in Distruct v 1.1 (Rosenberg, 2004). Both CLUMPP and distruct were used through the pipeline CLUMPAK (Kopelman et al., 2015) on the webserver http:// clumpak.tau.ac.il. The K-means clustering of the PCA analysis was determined for the Bayesian inference criterion (BIC), and then, we performed a discriminant PCA (dPCA) analysis.
Arlequin v. 3.5 software (Excoffier and Lischer, 2010) was employed to evaluate isolation by distance with 1,000 permutation tests and to perform a molecular variance analysis (AMOVA) (Weir and Cockerham, 1984) with 1,000 permutations. The latter was implemented in two ways: one considering the seven localities belonging to one group. At the same time, in the second, the populations were divided into two groups: one containing the lowland samples and the other containing the samples collected in the interior and mountains.

Genomic Signatures
We employed two distinct methodologies to detect the candidate loci. The first is based on differences in allelic frequencies in samples implemented by BayScan v 2.01 (Foll and Gaggiotti, 2008). This software uses the multinomial model Dirichlet and the reversible jump Markov chain Monte Carlo (RJ-MCMC) algorithm to obtain the posterior probability distributions. We used the default to perform this analysis, which uses 20 pilot runs with 5,000 interactions to adjust the distribution of the RJ-MCMC algorithm and a false discovery rate (FDR) value of 0.05.
The second approach was implemented in LFMM v.1.2 (Frichot et al., 2013). This methodology associates the allelic frequencies with environmental variables, using latent factor mixed models based on a Bayesian distribution, which can decrease FDR because it can estimate aleatory effects, which may be associated with genetic population events and isolation by distance. The number of latent factors was based on the STRUCTURE, PCA, and dPCA results. To decrease the FDR rates, we estimated the inflation factor according to the authors' suggestions. Based on the hypotheses of the study, the following variables were quantified in the landscape (100-km 2 ) surrounding the field collection points: (1) the mean elevation, (2) the mean terrain slope, and (3) the proportion of each vegetation cover (i.e., ombrophilous dense forest, restinga, and mangrove) (Supplementary Material S2). In addition to these variables, the distance (km) from the field collection point to the coast shore was also calculated ( Table 2).

RESULTS
After filtering, 4,523 loci per individual remained out of the original 19,906 loci from the genotype call. In addition, one sample from Sete Barras showed more than 50% missing data; thus, it was discarded from further analyses. The unrooted phylogenetic tree (Figure 2A), produced using the Nei's distance matrix with the individuals, showed two distinct groups: one with only lowland samples (from Pedrinhas -Ilha Comprida; Esteiro do Morro and Sítio Itapuan, both in Cananéia municipality), and another with all individuals from the inland (Tapiraí, Sete Barras, Eldorado, and Eldorado -Toca da Onça), and two individuals from Esteiro do Morro. The second network recovered using the same analytical approach as in the first round, but defining the populations of all seven sites sampled (Figure 2B), showed three distinct genetic groups. The first group included all specimens from inland sites, the second group included the Esteiro do Morro population only, and the third group consisted of the remaining lowland samples (Pedrinhas and Sítio Itapuan). The pairwise divergence (FST) results were relatively low. However, they were statistically significant, except between Eldorado and Sete Barras ( Table 3). The distances ranged from 0.001 (between Eldorado and Sete Barras populations) and 0.279 (between the Sítio Itapuan and Tapiraí populations). Among lowland specimens, the Fst values varied from 0.01 to 0.3, and the countryside samples ranged between 0.001 and 0.038. The best-fit K chosen by the Evanno method was K = 2. The Bayesian multilocus analysis from STRUCTURE showed Esteiro do Morro as the most heterogeneous population, while Sítio Itapuan was the most homogenous (Figure 3).
The first two principal components represent only 25% of the variability. However, the analysis showed the same tendency of Nei's distance and the Bayesian analysis of STRUCTURE software. The Y-axis clearly showed two groups: Pedrinhas, Esteiro do Morro, and Sítio Itapuan are in the negative quadrant, whereas Tapiraí, Eldorado, and Eldorado -Toca da Onça and Sete Barras, and three samples from Esteiro do Morro are in the positive quadrant (Supplementary Material S3). The most suitable K value chosen by the BIC is K = 2. As such, the results of the dPCA analyses clearly show two distinct groups Frontiers in Ecology and Evolution | www.frontiersin.org in the X-axis of the first discriminant variable (Figure 4). The AMOVA analyses showed that the variation among individuals (80.70%) was greater than among populations (19.27%) or when considering two groups (lowland × inland/mountain, 17.38%). Mantel's test suggests isolation by distance (regression coefficient = 0.57, P = 0.003).
The BayeScan analysis identified 18 outliers. Conversely, the LFMM analysis, which considered the environmental variables, showed that most putative loci under selection were associated with elevation (38 loci) and distance (27 loci). In comparison, 11 and 6 loci were associated with slope and mangrove, and eight  were associated with forest and restinga, respectively ( Table 4). In contrast, some loci were shared among environmental variables ( Table 4). Fourteen loci overlapped in both analyses all associated with distance in the LFFM analysis.

DISCUSSION
Multiple studies have indicated that the Ke. cruzii may represent a species complex (Ramírez and Dessen, 2000a;Ramírez and Dessen, 2000b;Carvalho-Pinto and Lourenço-de-Oliveira, 2004;Rona et al., 2010;Rona et al., 2013;de Rezende Dias et al., 2018;Kirchgatter et al., 2020). In this study, we used the nextRAD generation sequence approach to investigate the patterns of the genetic structure of Ke. cruzii in the Ribera Valley, Southeastern São Paulo state, Brazil. In addition, we verified the association between the genetic structure and landscape variables. The results of our analyses revealed the presence of two distinct lineages of this species in the studied regions and that they are associated with elevation and isolation by distance. The first lineage corresponds to lowland samples (from Pedrinhas, Sítio Itapuan, and Esteiro do Morro), and the second lineage is composed of specimens from inland and mountain sites (from Sete Barras, Eldorado, and Eldorado Toca da Onça, and Tapiraí). The results of PCA, Structure, and SplitTree analyses showed that in Esteiro do Morro, both lineages coexist in sympatry. Similar results were found by de Rezende Dias et al.  formed of specimens from the Serra do Mar mountain range. Likewise, specimens collected in Bocaina were clustered in both lineages and were found in sympatry. Unlike our results, however, their findings were not associated with isolation by distance, and the Fst values found between the lineages were higher (0.57) than the value found in our study (∼0.25, between lineages). Consequently, de Rezende Dias et al. (2018) strongly suggests the existence of two cryptic species under the name of Ke. cruzii in the study region.
Although the Fst values calculated using NGS datasets generated for samples employed in this study were relatively low, they were statistically significant, except for the values comparing populations from Eldorado and Sete Barras localities. The Fst values obtained from comparisons within lowland and inland and mountain samples were lower (ranging from 0.01 to 0.03 among lowland and 0.001 to 0.038 among inland and mountain samples) than between these regions (in which Fst varied from 0.1 to 0.28), evidence of restricted gene flow between these lineages. Taken together with LFMM analyses (Table 4), genetic differentiation was associated with elevation and distance, despite differences in vegetation type, forest cover, or slope.
Altitudinal stratification in Ke. cruzii was also verified by Lorenz et al. (2014) in the Cananéia municipality when using COI barcode sequences and wing geometric morphometrics. The COI haplotypes were very polymorphic; however, only two of the 60 haplotypes were shared by lowland and hilltop samples. Therefore, considering the results of Lorenz et al. (2014), de Rezende Dias et al. (2018, and our results, we can consider elevational speciation, an ecological speciation in which adaptive divergence leads to dichotomous low/high altitude distribution. Elevational speciation is well known and has been observed in birds, frogs, and plants (Badyaev and Ghalambor, 2001;Caro et al., 2013;Chapman et al., 2013;Funk et al., 2016). Usually, elevational speciation is studied and observed at high elevations (>1,000 m). However, we observed that small differences among altitudes (∼50 m) showed local adaptation in Ke. cruzii. Lowlands were also associated with high species richness and abundance in Culicidae, which uses bromeliads as larval habitats in Cananéia, southeast Brazil (Marques et al., 2012). Accordingly, even small variations, such as 200 m in elevation, can imply differences in the structure of Culicidae bromeliad communities. Similarly, bacterial and eukaryotic communities of phytotelma, on which the larval stages develop, also vary at low and high elevations (Gilbert et al., 2020;Malfatti et al., 2020). Additionally, Culicidae species distribution varies according to bromeliad species; for example, Culex (Microculex) spp. are more commonly found in Vriesea friburgensis than in Aechmea lindenii, whereas Wyeomyia incaudata and Wy. pilicauda are primarily associated with A. lindenii (Müller and Marcondes, 2006). Furthermore, altitude is an important factor for species distribution in the bromeliads of the Atlantic Rain Forest (Brandão et al., 2009;Fontoura et al., 2012). For example, Aechmea catendensis and Aechmea serragrandensis were found clustered in lowlands, while Echinocactus sessiliflorus and Aechmea guainumbiorum were found mainly in the Submontana region in South and Southeast Brazil. Conversely, Aechmea cephaloides is typical of the highlands (Fontoura et al., 2012). A similar pattern occurs within Vriesea, the most common genus of Brazilian bromeliad, with species distribution differing at distinct altitudinal ranges (Malfatti et al., 2020). Considering (1) the evolutionary relationship between Ke. cruzii and bromeliads, (2) the evidence that bromeliad species follow a pattern of elevational distribution, and (3) that culicids are distributed according to bromeliad species, we can hypothesize that the lineages found herein may be associated with different bromeliad species and their elevational distribution pattern.
Although our results showed two distinct lineages in the Ribeira Valley, São Paulo, Brazil, Fst values did not corroborate the existence of cryptic or sister species under Ke. cruzii in this region. The differences between these lineages were mainly associated with isolation by distance and elevation, more than the vegetation mosaic, slope, or forest cover. Therefore, we conclude that this species may be under elevational speciation in this region and hypothesize that it is likely associated with the distribution of bromeliad species. In order to confirm our hypothesis, further investigations need to be performed in the region, to verify potential association between bromeliads species and population structure in K. cruzii.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: NCBI SRA BioProject, accession no: PRJNA729988, https://www.ncbi.nlm. nih.gov/bioproject/PRJNA729988.

AUTHOR CONTRIBUTIONS
BD-S and MAMS conceived the study. BD-S conducted the field collection work, performed the NGS and population analyses, and wrote the manuscript. GL contributed with the environmental metrics and analyses. TO helped with laboratory and NGS analyses. All authors read and agreed with the final version of this manuscript.