ORIGINAL RESEARCH article

Front. Plant Sci., 15 February 2017

Sec. Plant Development and EvoDevo

Volume 8 - 2017 | https://doi.org/10.3389/fpls.2017.00138

Genetic Diversity and Population Structure of Two Tomato Species from the Galapagos Islands

  • 1. Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology Thuwal, Saudi Arabia

  • 2. Genomics of Plants Stress Unit, Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa and Instituto de Biologia Experimental e Tecnológica Oeiras, Portugal

  • 3. Department of Biology and Center for Genomics and Systems Biology, New York University New York, NY, USA

Abstract

Endemic flora of the Galapagos Islands has adapted to thrive in harsh environmental conditions. The wild tomato species from the Galapagos Islands, Solanum cheesmaniae and S. galapagense, are tolerant to various stresses, and can be crossed with cultivated tomato. However, information about genetic diversity and relationships within and between populations is necessary to use these resources efficiently in plant breeding. In this study, we analyzed 3,974 polymorphic SNP markers, obtained through the genotyping-by-sequencing technique, DArTseq, to elucidate the genetic diversity and population structure of 67 accessions of Galapagos tomatoes (compared to two S. lycopersicum varieties and one S. pimpinellifolium accession). Two clustering methods, Principal Component Analysis and STRUCTURE, showed clear distinction between the two species and a subdivision in the S. cheesmaniae group corresponding to geographical origin and age of the islands. High genetic variation among the accessions within each species was suggested by the AMOVA. High diversity in the S. cheesmaniae group and its correlation with the islands of origin were also suggested. This indicates a possible influence of the movement of the islands, from west to east, on the gene flow. Additionally, the absence of S. galapagense populations in the eastern islands points to the species divergence occurring after the eastern islands became isolated. Based on these results, it can be concluded that the population structure of the Galapagos tomatoes collection partially explains the evolutionary history of both species, knowledge that facilitates exploitation of their genetic potential for the identification of novel alleles contributing to stress tolerance.

Introduction

Biodiversity in the Galapagos Islands has inspired theories of adaptation and evolution, and increased our understanding of processes of population divergence and speciation (). The volcanic origin and tectonic activity of the Galapagos Islands makes them a unique site for studying the impacts of isolation and environment on diversification. The islands were formed at a volcanic hotspot in the Nazca Plate, which is moving east at approximately 59 km per million years (). The age of the islands can be estimated by their current distance from the hotspot: the western islands are millions of years younger than the eastern ones (). The eastward movement of the Nazca Plate causes its subduction beneath the South American plate, isolating previously interconnected islands and causing their eventual disappearance from east to west (). The isolation of the islands and constantly changing environmental conditions have allowed the adaptation and divergence of many species, differing morphologically and genetically from one island to the other ().

The vascular flora of the Galapagos Islands includes around 550 species, of which approximately 200 are endemic (). Of particular interest are two wild tomato species, Solanum cheesmaniae (formerly Lycopersicon cheesmanii) and S. galapagense (formerly L. cheesmanii forma minor), collectively termed “Galapagos tomatoes” (Figure 1). Both species of Galapagos tomatoes were first considered as one. However, based on clear morphological differences and molecular evidence from an allozyme analysis, described them as two different species. The adaptation of these wild germplasms to different environments, such as arid or saline soils, makes them a potential valuable source of genetic variation in terms of stress tolerance genes, which could be transferred into commercial varieties by introgression breeding (; ). However, to efficiently utilize this wild germplasm resource, it is necessary to understand the population structure and genetic variation (). This will assist breeders in selecting germplasm that are more diverse and prevent the less efficient crossing of accessions that are very closely related. It also makes screening of the wild germplasm more efficient by enabling the selection of highly diverse accessions.

FIGURE 1

). Accessions shown are S. cheesmaniae (LA0421), S. galapagense (LA0317), and S. lycopersicum (LA4345).

Next-generation sequencing technologies, such as geno-typing-by-sequencing (GBS) and “Diversity Arrays Technology” (DArTseq), now allow genome-wide fingerprinting without prior genome sequence information (). The GBS approach can be more informative than predesigned single nucleotide polymorphism (SNP) arrays when applied to wild germplasm because it is unbiased and includes information on rare alleles (, ; ). What makes DArTseq different from other GBS methods is their complexity reduction approach, targeted to select genome fractions with coding regions (). The restriction enzymes used in DArTseq for complexity reduction separate low copy sequences from the repetitive regions of the genome (). These low copy sequences are more informative for marker discovery, especially for breeding purposes (). Here, we genotyped 67 accessions of Galapagos tomatoes from the TGRC collection (Supplementary Table S1) using DArTseq and found that S. cheesmaniae and S. galapagense fall into distinctive clades. Further, we found that the accessions of S. cheesmaniae separate based on the island/region from which they originated, and the population structure can be linked to the geological movements of the islands. From this, it can be inferred a clear evolutionary sequence within the Galapagos tomatoes, revealed by molecular, rather than morphological means.

Materials and Methods

Plant Material and DNA Isolation

A total of 67 Galapagos tomato accessions – 40 S. cheesmaniae and 27 S. galapagense -, together with their passport data, were obtained through the Tomato Genetic Resources Center (TGRC) UC, Davis, CA, USA (Supplementary Table S1). In addition, two S. lycopersicum varieties (Heinz 1706, and Moneymaker) and one S. pimpinellifolium accession (LA0480), also obtained from TGRC, were used for comparison. The one S. pimpinellifolium accession was added to compare the Galapagos tomatoes to another wild tomato, while two S. lycopersicum varieties were added as references: Heinz 1706 is the variety for which the reference genome sequence was completed (); and Moneymaker is a popular commercial variety.

To break their dormancy, soften the seed coat, and promote germination, Galapagos seeds were treated with 2.7% sodium hypochlorite solution for 1 h (), then soaked in ddH2O for 1 h. Seeds were then placed in magenta boxes with 0.65% agar gel containing Murashige and Skoog salts for germination. The magenta boxes were kept in a Percival growth chamber at 26°C with a 16 h photoperiod. Seedling tissues, without the root, were harvested when their first true leaf started to emerge. Ten seedlings of the each accession were frozen in liquid nitrogen and ground using sterile mortar and pestle. From this ground tissue, DNA extraction was performed as indicated by DArT Pty Ltd (Canberra, ACT, Australia), in: http://www.diversityarrays.com/sites/default/files/pub/DArT_DNA_isolation.pdf, but with addition of β-mercaptoethanol along with the “fresh buffer.” Washing with CIAA was done three times before addition of isopropanol, and the pellets were dissolved in ddH2O. DNA quality and concentration were determined by electrophoresis in 0.8% agarose gel and spectrophotometry using a NanoDrop 2000 (Thermo scientific, Wilmington, DE, USA), and were normalized to a concentration of 100 ng/μL.

DArTseq Analysis

DArTseq analysis was performed by DArT Pty Ltd. For the purposes of complexity reduction, the gDNA samples were digested with PstI and TaqI restriction enzymes. Adapters were ligated to PstI ends and short adapter-ligated fragments were amplified. PstI-RE site-specific adapters were tagged with 96 different barcodes to run all DNA samples within a single lane on an Illumina Hiseq2000 (Illumina Inc., San Diego, CA, USA). PstI adapters included a sequencing primer site. Quality control was performed by filtering FASTQ files using 90% confidence limits for at least 50% of the bases and further filtering for barcode sequences. The filtered data was then split using a barcode-splitting script and the barcode was trimmed. After trimming the barcode, the average read length was 66 bp with a minimum length of 38 bp and maximum length of 70 bp. The sequences were aligned against a reference constructed by DArT Pty Ltd, from GBS data gathered from the same species, independent of the availability of the whole genome sequence. The short sequence tags were also aligned against the publicly available tomato genome () using Bowtie software (). All alignments were processed using an analytical pipeline developed by DArT Pty Ltd to produce “silicoDArT” tables and “SNP” tables (). SNP markers were scored 0/1 or 1/0 (homozygous) or 1/1 (heterozygous, scoring the presence of both alleles). The data are available in Supplementary Data Sheet S1.

SNP Filtering

PLINK () was used to filter out SNPs with more than 20% missing values and those that were actually or nearly monomorphic (i.e., with minor allele frequencies below 2.5%). Linkage disequilibrium (LD)-based SNP pruning was also performed with PLINK (), using a pairwise approach with a threshold of r2 = 0.8.

Population Structure

The smartPCA application included in the EIGENSOFT 6.0 package () was used to perform a PCA of the SNP dataset that was pruned based on LD. The outlier removal option was disabled.

STRUCTURE software () was run from command line using the admixture model, a burn-in period length of 500,000 and 250,000 MCMC iterations after burn-in. Five independent runs were performed for each K from K = 1 to K = 8. The best number of K was then chosen with the DeltaK method () by running the Structure Harvester software (). Inferred clusters were processed with CLUMPP software () to align the multiple replicate analyses of the same data set and obtain a consensus matrix. Lastly, the DISTRUCT software () was used to make a barplot of the Q matrix. The accessions were sorted by island and longitude in the barplot.

An analysis of molecular variance (AMOVA; ) of the SNP data was done using R package “poppr” () on the dataset of all Galapagos tomato accessions, as well as on S. cheesmaniae and S. galapagense datasets individually. Significance of the AMOVA and Phi statistic analyses was tested using 999 permutations with the R package “ade4” ().

Genetic Distance Tree

The genetic distance matrix was generated from the allelic data (3,974 diploid loci/SNPs) using the simple matching method as a dissimilarity index (), with pair-wise allele deletion when missing data was 40% or higher. The simple matching dissimilarity index was calculated as:

where dij is the dissimilarity between units i and j, L is the number of loci, π is the ploidy, ml is the number of matching alleles for locus l.

From the distance matrix, an unweighted Neighbor-Joining tree () was constructed using the Darwin 6.0 software (); branches were tested with 1,000 bootstraps. The tree root is the node of degree 2 in hierarchy.

Results

Genotyping by Sequencing and SNP Markers Discovery

We genotyped 67 Galapagos tomato accessions using the DArTseq service from DArT Pty Ltd. A total of 4,887 SNPs were identified in the sequenced fragments. After aligning to the tomato reference genome (), 282 SNPs aligned to more than two positions. Given that these could be suggestive of repetitive regions, or paralogous sequences, these SNPs were excluded. SNPs that aligned to different locations were considered to be individual SNPs (even if reported as one SNP by DArT). SNP filtering for SNPs with less than 20% missing values and with minor allele frequencies below 2.5%, resulted in 3,974 polymorphic SNP markers left for analysis. Based on their alignment to the tomato reference genome (), these markers are well distributed across the chromosomes (Supplementary Table S2), with median, mean, and standard deviation of distance between adjacent markers of 91, 338, and 844 kb, respectively (Figure 2A). Furthermore, when the data were separated for the individual species and the same parameters were used for SNP filtering, 2,820 SNPs were obtained for S. cheesmaniae, 1,448 SNPs for S. galapagense, and 3,905 SNPs for S. lycopersicum. In the case of S. lycopersicum, the minor allele frequency test was excluded due the low allele diversity between the two lines. SNP lists of all species were compared in a Venn diagram (Figure 2B), and found that S. cheesmaniae shares 71.5% of the SNPs with S. lycopersicum, while S. galapagense only shares 57.6% of the SNPs with S. lycopersicum. From the 2,820 SNPs kept for S. cheesmaniae, the 1,448 SNPs kept for S. galapagense, and the 3,905 kept for S. lycopersicum, only 360 SNPs are shared among all species and 60 SNPs are shared between the Galapagos tomatoes.

FIGURE 2

).

Germplasm Collection Is Largely Homozygous

We used the SNP markers to estimate the conserved homozygosity in each genotype. The proportion of homozygous markers in each genotype ranges from 96.1 to 99.4% in S. cheesmaniae and from 98.2 to 99.4% in S. galapagense. The high level of homozygosity could be caused by their propagation at the TGRC. It is also consistent with the early reports by Rick (), where he described the Galapagos tomatoes to be highly autogamous, due to their flower structure being adapted to self-pollination and claimed that natural populations exist in a virtually pure-line condition. also reported low levels of heterozygosity present in the wild specimens of Galapagos tomatoes collected during her expedition to the islands. This strict autogamy, has led to rapid fixation of alleles and accumulation of mutant genes, which contributed largely to the differentiation between populations (). Moreover, this germplasm collection could be used directly for further genetic studies without the need to develop inbred lines.

S. cheesmaniae and S. galapagense Can Be Clearly Differentiated by Genetic Analysis

To dissect the pattern of genetic variation among the accessions, we used both PCA from EIGENSOFT 6.0 () and Bayesian clustering from STRUCTURE software (). For PCA, the SNP data was further pruned based on LD, to obtain a subset of SNPs that are in approximate linkage equilibrium with each other. PLINK () was used to calculate LD, based on pairwise genotypic correlation with a threshold of r2= 0.8. LD-based pruning reduced the SNP marker collection to 2,428 SNPs.

The PCA showed a clear division between the accessions identified as S. galapagense and S. cheesmaniae, as well as from S. lycopersicum and S. pimpinellifolium (Figure 3A). The accessions belonging to S. galapagense cluster closely together (orange diamonds), while S. cheesmaniae accessions are also clustered (green circles), with the exception of two accessions: LA0531 and LA3124 (marked by an arrow), which could be admixtures. Individual PCAs of the accessions belonging to each of S. galapagense and S. cheesmaniae can be found in the Supplementary Materials (Supplementary Figure S2).

FIGURE 3

The groups obtained from the PCA were identical with those formed by an alternative clustering program, STRUCTURE (), which uses a Bayesian clustering approach to identify the number of populations (K) with the highest structure (Supplementary Figure S3). This is determined by plotting Delta K, based on the method of ; our DeltaK plot showed a peak at K = 3 (Supplementary Figure S4), suggesting the presence of three genetically distinct groups that differentiate the two Galapagos tomato species and the tomatoes native to mainland South America (Figure 3B). Once more, accessions LA0531 and LA3124 appear to be genetic admixtures. The admix nature of these two accessions was confirmed by the ancestry membership coefficients (Q), which show that LA0531 belongs to both cluster 1, formed by the rest of S. cheesmaniae accessions (Q = 0.674), and to cluster 2, formed by all S. galapagense accessions (Q = 0.325), whereas, LA3124 appears to be part of cluster 1 (Q = 0.432) and cluster 3, formed by S. lycopersicum and S. pimpinellifolium (Q = 0.568) (Supplementary Table S3). The collection notes of these two accessions, obtained from TGRC database1 (Supplementary Table S4), report morphological differences from the typical S. cheesmaniae since the time of their collection (Supplementary Figure S1), which confirm that they are hybrids and crossing did not occur during later seed propagation, but in the natural environment. It is also worth noting, that the two accessions are unique to Baltra and Santa Fe, respectively, two very small islands – only one accession was collected from each island (Supplementary Table S1).

The results from PCA and STRUCTURE are largely consistent. For both analyses we found that S. galapagense, S. cheesmaniae, S. lycopersicum, and S. pimpinellifolium accessions are clearly separated. Although the sampling size of S. lycopersicum and S. pimpinellifolium accessions is small, the three reference sequences clustered together in all of the analyses performed. Thus, they provide useful reference points to facilitate estimation of the genetic distance between the Galapagos tomato populations. For this purpose, a genetic distance matrix was generated using the simple matching method as a dissimilarity index (). From the distance matrix, an unweighted neighbor-joining tree () was constructed using Darwin 6.0 software (). The neighbor-joining tree revealed a clear differentiation between S. cheesmaniae and S. galapagense with a bootstrap support of 100% (Figure 4A).

FIGURE 4

Clustering of S. cheesmaniae Accessions Could Be Similar to the Age of Island Formation

In the genetic distance tree, the S. cheesmaniae accessions separate into two sub-clusters with a 100% bootstrap support, while the admixtures remained separated from the main S. cheesmaniae branches (Figure 4A). Likewise, the population structure inferences using STRUCTURE (Figure 4B) of S. cheesmaniae accessions show two sub-clusters (K = 2) and no further structure within the S. galapagense group (K = 1) (Supplementary Figure S5). S. galapagense showed no structure, even when using the No admixture model in STRUCTURE (; data not shown).

Interestingly, the division of the S. cheesmaniae cluster shows clear geographic structure. The grouping of the accessions matches their region of origin: the accessions collected in the western islands separate clearly from those collected in the eastern islands. This can be linked to the island formation timeline, as all the islands originated at the volcanic hotspot and then moved east with the Nazca plate which holds the Galapagos archipelago (). The eastern islands are therefore older than the western islands () and from this we can infer that those populations found in the eastern islands may be older than the ones found in the western islands; alternatively, they could be ancestors of the accessions in the west. This inference is supported by the comparison of Wright’s fixation index (FST) values () for each S. cheesmaniae cluster, obtained from the STRUCTURE analysis (). The mean FST-value of the accessions from the western islands (0.6239), is considerably higher than the mean FST-value of the accessions from the eastern islands (0.2790), which suggests the occurrence of a strong episode of genetic drift on those populations from the newer islands ().

Analysis of Genetic Variation in Galapagos Tomatoes

An AMOVA () was performed to examine patterns of genetic variation and to estimate variance components at the levels of species and accessions. AMOVA showed that 43.1% of the total variation in the Galapagos tomato populations was explained by differences between the two species; whereas 51% was explained by differences between accessions within the species (Table 1). This confirms that the two species are considerably different, but also there is great variation among the accessions within a species.

Table 1

DfSum SqMean SqVariance %SigmaPhiP-value
Galapagos tomatoes
Between species1106191061943.11620.430.001
Between accessions within species632546840451.01910.900.001
Within accessions65143922.15.9022.10.940.001
Total12937526291100375
S. cheesmaniae
Between region12806280624.367.90.240.001
Between accessions within region361453440468.91920.910.001
Within accessions3871818.96.7618.90.930.001
Total7518058241100279
S. galapagense
Between region14524525.259.990.050.142
Between accessions within region25864934687.11660.920.001
Within accessions2739414.67.6714.60.920.001
Total539495179100190

Analysis of Molecular Variance (AMOVA) and Monte-Carlo significance tests for the collection of 67 Galapagos tomatoes accession, the group of 40 S. cheesmaniae accessions and the group of 27 S. galapagense accessions.

Df, degrees of freedom; Sum Sq, sum of squares; Mean Sq, mean of squares.

With the purpose of investigating if the region of origin of each accession (east or west of the Galapagos archipelago) had any influence on the genetic variation within the species, further analysis of variance was performed using the region of origin as a factor for each species. These AMOVA revealed that while the most genetic variation occurs between accessions (68.9% in S. cheesmaniae and 87.1% in S. galapagense), there is a significant variation (24.3%) between regions of origin of the S. cheesmaniae accessions (east and west).

Discussion

We selected SNP markers to measure genetic variation, since SNPs are one of the most common types of genetic variation. Also, they are co-dominant markers, which allow us to estimate the homozygosity of the Galapagos tomatoes germplasm, and thus, their usefulness for genetic studies. The flower morphology of the Galapagos tomatoes suggested their autogamous nature and a high occurrence of inbreeding (). used an allozyme analysis (allelic determined variants of isozymes) to confirm that the variation between populations was greater than the variation within each population. This is consistent with our study, where the lowest percentage of homozygous SNPs was 96.1%. Furthermore, the AMOVA of the allelic data showed significant amount of genetic variance between species and also a significant amount of genetic variance between populations belonging to the same species.

Based on their morphology, S. cheesmaniae and S. galapagense can be clearly differentiated into two taxonomic groups (Figure 1), but results from genetic studies have been contradictory. showed that there was little variation within 54 analyzed accessions, while consistently differentiated between the taxonomic groups when analyzing 26 accessions. used AFLP analysis on 16 accessions and showed clear differentiation between the taxa (although both taxa were still considered as one species). used 3.3 kb SNP arrays on 34 accessions, but could not differentiate between the two taxa, suggesting they were likely to be morphotypes, rather than two species. However, the SNP array they used was based on sequence information from S. lycopersicum (). With current genomic technology, the GBS approach used in this work, DArTseq, enabled a robust genetic characterization of the wild Galapagos tomatoes. By using three different clustering methods – PCA, STRUCTURE and neighbor joining by genetic distance/dissimilarity – we clearly show that S. cheesmaniae and S. galapagense are two genetically distinct species (Figure 3).

In addition, we show that two S. cheesmaniae accessions from two very small islands, Baltra and Santa Fe, are admixtures, based on the calculated ancestry membership coefficients (Q) for clustering by STRUCTURE (Supplementary Table S4). One accession (LA3124) appears to be a hybrid with S. lycopersicum or S. pimpinellifolium; and another accession (LA0531) may be a hybrid with S. galapagense (Supplementary Figure S1). Their differences in morphology from the typical S. cheesmaniae are reported in the collection notes (Supplementary Table S4). Accession LA0531 comes from two different specimens collected and archived together, whose different morphologies were attributed to depauperation. This may have allowed introgression between S. cheesmaniae and S. galapagense. Accession LA3124 had big seeds, similar to those of red cherry tomato. Interestingly, LA3124 was previously classified as S. pimpinellifolium by Zuriaga et al. (2009), even though the passport data classified it as S. cheesmaniae. This highlights its closeness and possible admixture with S. pimpinellifolium or S. lycopersicum.

Solanum galapagense accessions are tightly clustered, indicating little genetic diversity within this species. These results are consistent with , who suggested the occurrence of strong genetic bottlenecks in S. galapagense during island colonization and recent adaptation. If S. galapagense populations were established from a small number of individuals that colonized the islands, genetic diversity has not recovered since the founder event. In addition, the isolation of the islands makes the genetic diversity susceptible to genetic drift ().

In contrast, S. cheesmaniae accessions showed clear differentiation between accessions that were collected in the western islands and those collected in the eastern islands. The clustering by region of origin revealed that those accessions in close genetic proximity are also close in geographical origin (Figure 4), which suggests a correlation between biodiversity and geography in the Galapagos archipelago. This is consistent with previous reports on biogeography of other endemic species from the Galapagos Islands, such as, marine iguanas, Darwin’s finches, and giant tortoises, among others (). The major factors influencing biogeography in the Galapagos Islands are the volcanic activity and plate tectonics (). The study of these, suggests a colonization from east to west (), from older to younger islands, supported also by the fact that ocean currents entering the islands from different directions, average an east-west direction (). The main islands originated at the volcanic hotspot west of the archipelago and slowly moved east, at approximately 59 km per million years, with the tectonic movement of the Nazca plate (). Newly formed islands at the hotspot are inhospitable, but as they move east, cool down, and erode, the arrival and establishment of life forms becomes possible (). At the same time, the subduction of the Nazca plate beneath the South American plate, has caused the oldest islands to drown (), forcing migration of the biodiversity to newer islands. From this, we infer that the eastward movement of the islands could have influenced the gene flow in S. cheesmaniae and their adaptation.

reported increased diversity in the Galapagos tomatoes from the western islands and especially from their western slopes. They attributed this increased diversity to the unusually higher precipitation in that area, as plant species richness in the Neotropics is known to be correlated with annual precipitation (). Concordantly, our results show that S. cheesmaniae populations in the western islands seem to have higher levels of genetic diversity than those in the eastern islands (Figure 4B). This may be because fewer populations of S. cheesmaniae exist in the eastern islands, but it could also be attributed to the founder effect of possible colonization events from western islands to eastern islands. East-to-west colonization permits the dispersal of species from the older islands to the newest islands as their volcanic activity decreases, soil develops, and they become habitable (). However, there is still some volcanic activity in the older islands that can destroy the island’s flora. Re-colonization after volcanism in the eastern islands, by a few individuals from the western islands, would also reduce genetic diversity in the eastern islands.

The division of S. cheesmaniae in two groups, has also been reported by , based on internode length, S. cheesmaniae “short” and S. cheesmaniae “long.” However, it is not yet clear if the genetic-geographical clustering coincides with the morphological separation. Given that made their own collection and only took samples in the central islands, their data and the results presented here cannot be compared. Further morphological characterization of the TGRC collection could confirm if the whole collection of S. cheesmaniae could be separated into “short” and “long” morphotypes and if this division would be consistent with the geographical division.

No substructure was found in the S. galapagense collection. This is likely to be due to a relatively recent divergence of this species. Interestingly, there are no reports of S. galapagense from any of the eastern islands. The combination of the distribution and lower genetic diversity leads us to hypothesize that S. galapagense is a relatively new species that diverged from S. cheesmaniae after the eastern islands became isolated.

According to estimates from , the eastern islands of San Cristobal, Santa Fe, and Española emerged approximately 2.4–4.0 million years ago, whereas, the first western islands with the presence of S. galapagense, Floreana and Santa Cruz, emerged somewhere around 1.1 and 2.3 million years ago. also estimated that Floreana and Santa Cruz were in peak volcanic activity two million years ago, making colonization impossible, while the volcanic activity of San Cristobal was on the decline at that time as it drifted away from the hotspot and became detached from Santa Cruz and Floreana. It was only one million years later that volcanic activity in Floreana and Santa Cruz decreased sufficiently for colonization to occur (). Thus, divergence time of S. galapagense can be estimated to have occurred roughly one million years ago, when Floreana and Santa Cruz, the last of the eastern islands without S. galapagense, became a suitable habitat for life. These estimates are consistent with previous reports by , who suggested that the initial radiation of the genus Lycopersicon occurred over seven million years ago, and that S. lycopersicum (then referred to as L. esculentum) and its closest relatives (which include S. galapagense (then referred to as L. cheesmanii, accession LA0483) and S. pimpinellifolium (then referred to as L. pimpinellifolium) diverged from a common ancestor approximately one million years ago. A more recent divergence time (between 0.19 and 0.29 million years ago) was suggested by for S. lycopersicum and S. galapagense. However, the estimates by are in reference to the Heinz 1706 variety, which, in the same publication, was found to contain several regions of significant introgressions from S. pimpinellifolium (), which could bias the estimates of species divergence time.

To our knowledge, there is not an estimate of S. cheesmanie divergence from a common ancestor. If the colonization of Galapagos Islands was east to west, then S. cheesmaniae could be an older species than S. galapagense, and could even be an ancestor to it. suggested that S. cheesmaniae could possibly be closer to the original stem line of the red-fruited species than any other member. He argued that their autogamous reproduction and the lack of competition in the Galapagos could have resulted in its preservation as an ancient biotype (). However, the lower genetic variation in S. cheesmaniae found in the older islands could be due to a founder effect, and colonization could have happened from west to east. If this was the case, S. cheesmaniae and S. galapagense could have diverged around the same time from the same ancestor.

The recent divergence of the tomato clade species and their close relationship has made their phylogenetic classification difficult, especially with the casual occurrence of interspecific hybridization (Zuriaga et al., 2009). Many attempts to determine the phylogeny have been made using diverse methods (; ; ; Zuriaga et al., 2009; ; ; ), but the results are not consistent. Most publications agree that there is a close relationship among the red-fruited group, which contains S. lycopersicum, S. cheesmaniae, S. galapagense, and S. pimpinellifolium. However, they have not yet reached a consensus on the relationship order within the red-fruited group. According to Zuriaga et al. (2009), using only a few accessions of each species in phylogenetic studies could be the cause of conflicting phylogeny results. Adding intraspecific variation to phylogenetic studies could improve the resolution. Our study provides a rich intraspecific dataset, which could be used to further characterize the phylogeny of the tomato clade. Meanwhile, we found a fine phylogenetic relationship between accessions (Figure 4A) and significant intraspecific structure in S. cheesmaniae, corresponding to the age of the island of origin of each accession (Supplementary Figure S6A). Interestingly, the accessions coming from the eastern islands of Santa Cruz and San Cristobal, are closely related within each island and separate from each other, while the more diverse accessions from the western islands, Fernandina and Isabela, are interrelated across the two islands. Additionally, S. galapagense accessions also group by island of origin (Supplementary Figure S6B), further demonstrating the influence of the islands’ biogeography on the Galapagos tomatoes gene flow.

To conclude, we propose a likely sequence of events for the diversification and speciation of wild tomatoes on the Galapagos Islands which is not only of evolutionary interest, in the classic evolutionary “laboratory” of the Galapagos Islands, but which also provides guidance for the strategic discovery of diversity, such as of novel stress tolerance alleles (; ), useful for future improvement of cultivated tomato, the largest horticultural crop globally.

Statements

Author contributions

YP, SS, SN, and MT designed research. YP and SS performed research. IP and SH contributed analytic tools. YP, SS, and MT discussed data analysis. YP analyzed data. YP, SS, and MT wrote the paper.

Funding

The research reported in this publication was supported by funding from King Abdullah University of Science and Technology (KAUST).

Acknowledgments

We thank Moonsun Hwang for the optimization of the DNA extraction protocol. We also thank Sarah Darwin, Susan McCouch, Andrzej Kilian, Helena Oakey, Brigitte Courtois, and Vanessa Robitzch for their valuable insights. Computer-related support from Luca Passone, and significant text editing input from Virginia Unkefer are also gratefully acknowledged.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpls.2017.00138/full#supplementary-material

DATA SHEET S1

SNP data for Galapagos tomatoes. The table shown in the ‘SNPs’ worksheet contains all SNP data, obtained from DArTseq analysis, for 40 S. cheesmaniae accessions, 27 S. galapagense accessions, one S. pimpinellifolium accession and two S. lycopersicum varieties. The SNP markers initially scored 0/1 or 1/0 (homozygous) or 1/1 (heterozygous, scoring the presence of both alleles) were translated to the specific nucleotides found at each allele (A = adenine, T = thymine, G = guanine, C = cytosine, 0 = missing data). Each of the SNP markers has a unique identifier number. The data sheet contains 19 columns of information per SNP marker. The description of each column can be found in the same file, in the ‘SNPsMetadataDefinitions’ worksheet. To approximate the position of the SNP marker in the genome, the sequenced reads were aligned to the tomato genome version 9 downloaded from ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Slycopersicum/assembly/Slycopersicum_225.fa.gz. Aligner: blastn, E-value: 5E-05, Min. base identity: 80%.

References

  • 1

    ChesselD.DufourA. B.ThioulouseJ. (2004). The ade4 package: one-table methods.R News4510. 10.2307/3780087

  • 2

    ChristieD. M.DuncanR. A.McBirneyA. R.RichardsM. A.WhiteW. M.HarppK. S.et al (1992). Drowned islands downstream from the Galapagos hotspot imply extended speciation times.Nature355246248. 10.1038/355246a0

  • 3

    CourtoisB.AudebertA.DardouA.RoquesS.Ghneim-HerreraT.DrocG.et al (2013). Genome-wide association mapping of root traits in a japonica rice panel.PLoS ONE8:e78037. 10.1371/journal.pone.0078037

  • 4

    CruzV. M. V.KilianA.DierigD. A. (2013). Development of DArT marker platforms and genetic diversity assessment of the U.S. collection of the new oilseed crop lesquerella and related species.PLoS ONE8:e64062. 10.1371/journal.pone.0064062

  • 5

    DarwinC. (1859). On the Origin of the Species, or, the Preservation of Favoured Races in the Struggle for Life, 15th Edn.London: John Murray, 10.1016/S0262-4079(09)60380-8

  • 6

    DarwinS. C. (2009). The Systematics and Genetics of Tomatoes on the Galapagos Islands.Ph.D. dissertation, University College London, London.

  • 7

    DarwinS. C.KnappS.PeraltaI. E. (2003). Taxonomy of tomatoes in the Galapagos Islands: native and introduced species of Solanum section Lycopersicon (Solanaceae).Syst. Biodivers.12953. 10.1017/S1477200003001026

  • 8

    DodsworthS.ChaseM.SärkinenT.KnappS.LeitchA. R. A. (2016). Using genomic repeats for phylogenomics: a case study in wild tomatoes (Solanum section Lycopersicon?: Solanaceae).Biol. J. Linn. Soc.11796105. 10.1111/bij.12612

  • 9

    EarlD. A.vonHoldtB. M. (2012). STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method.Conserv. Genet. Resour.4359361. 10.1007/s12686-011-9548-7

  • 10

    EvannoG.RegnautS.GoudetJ. (2005). Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study.Mol. Ecol.1426112620. 10.1111/j.1365-294X.2005.02553.x

  • 11

    ExcoffierL.SmouseP. E.QuattroJ. M. (1992). Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data.Genetics131479491. 10.1007/s00424-009-0730-7

  • 12

    FalushD.StephensM.PritchardJ. K. (2003). Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies.Genetics16415671587. 10.1111/j.1471-8286.2007.01758.x

  • 13

    FirdausS.van HeusdenA. W.HidayatiN.SupenaE. D. J.MummR.de VosR. C. H.et al (2013). Identification and QTL mapping of whitefly resistance components in Solanum galapagense.Theor. Appl. Genet.12614871501. 10.1007/s00122-013-2067-z

  • 14

    GeistD. J.SnellH.SnellH.GoddardC.KurzM. D. (2014). “A paleogeographic model of the Galapagos Islands and biogeographical and evolutionary implications,” inThe Galapagos: A Natural Laboratory for the Earth Sciences, edsHarppK. S.MittelstaedtE.d’OzouvilleN.GrahamD. W. (Washington, DC: John Wiley & Sons, Inc), 145166. 10.1002/9781118852538.ch8

  • 15

    GentryA. H. (1982). Patterns of neotropical plant species diversity.Evol. Biol.15184. 10.1007/978-1-4615-6968-8_1

  • 16

    HeberleH.MeirellesG. V.da SilvaF. R.TellesG. P.MinghimR. (2015). InteractiVenn: a web-based tool for the analysis of sets through Venn diagrams.BMC Bioinformatics16:169. 10.1186/s12859-015-0611-3

  • 17

    JakobssonM.RosenbergN. A. (2007). CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure.Bioinformatics2318011806. 10.1093/bioinformatics/btm233

  • 18

    JamesK. E.SchneiderH.AnsellS. W.EversM.RobbaL.UszynskiG.et al (2008). Diversity arrays technology (DArT) for pan-genomic evolutionary studies of non-model organisms.PLoS ONE3:e1682. 10.1371/journal.pone.0001682

  • 19

    KamvarZ. N.TabimaJ. F.GrünwaldN. J. (2014). Poppr: an R package for genetic analysis of populations with clonal, partially clonal, and/or sexual reproduction.PeerJ2:e281. 10.7717/peerj.281

  • 20

    KoenigD.Jiménez-GómezJ. M.KimuraS.FulopD.ChitwoodD. H.HeadlandL. R.et al (2013). Comparative transcriptomics reveals patterns of selection in domesticated and wild tomato.Proc. Natl. Acad. Sci. U.S.A.110E2655E2662. 10.1073/pnas.1309606110

  • 21

    LangmeadB.TrapnellC.PopM. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.Genome Biol.10:R25. 10.1186/gb-2009-10-3-r25

  • 22

    LawessonJ. E.AdsersenH.BentleyP. (1987). An updated and annotated check list of the vascular plants of the Galapagos Islands.Rep. Bot. Inst. Univ. Aarhus16174.

  • 23

    LucattiA. F.van HeusdenA. W.de VosR. C. H.VisserR. G. F.VosmanB. (2013). Differences in insect resistance between tomato species endemic to the Galapagos Islands.BMC Evol. Biol.13:175. 10.1186/1471-2148-13-175

  • 24

    LvJ.QiJ.ShiQ.ShenD.ZhangS.ShaoG.et al (2012). Genetic diversity and population structure of cucumber (Cucumis sativus L.).PLoS ONE7:e46919. 10.1371/journal.pone.0046919

  • 25

    MakiM. (1999). Genetic diversity in the threatened insular endemic plant Aster asa-grayi (Asteraceae).Plant Syst. Evol.21719. 10.1007/bf00984918

  • 26

    McCouchS. (2004). Diversifying selection in plant breeding.PLoS Biol.2:e347. 10.1371/journal.pbio.0020347

  • 27

    MerlenG. (2014). “Plate tectonics, evolution, and the survival of species,” inThe Galapagos: A Natural Laboratory for the Earth Sciences, edsHarppK. S.MittelstaedtE.d’OzouvilleN.GrahamD. W. (Hoboken, NJ: John Wiley & Sons, Inc.), 119144. 10.1002/9781118852538.ch7.

  • 28

    NesbittT. C.TanksleyS. D. (2002). Comparative sequencing in the genus lycopersicon. Implications for the evolution of fruit size in the domestication of cultivated tomatoes.Genetics162365379.

  • 29

    NuezF.ProhensJ.BlancaJ. M. (2004). Relationships, origin, and diversity of Galapagos tomatoes: implications for the conservation of natural populations.Am. J. Bot.918699. 10.3732/ajb.91.1.86

  • 30

    ParentC. E.CacconeA.PetrenK. (2008). Colonization and diversification of Galapagos terrestrial fauna: a phylogenetic and biogeographical synthesis.Philos. Trans. R. Soc. Lond. B Biol. Sci.36333473361. 10.1098/rstb.2008.0118

  • 31

    PeraltaI. E.PeraltaI. E.SpoonerD. M.SpoonerD. M. (2005). Morphological characterization and relationships of wild tomatoes (Solanum L. sect. Lycopersicon).Monogr. Syst. Bot.104227257.

  • 32

    PeraltaI. E.SpoonerD. M. (2001). Granule-bound starch synthase (GBSSI) gene phylogeny of wild tomatoes (Solanum L. section Lycopersicon [Mill.] Wettst. subsection Lycopersicon).Am. J. Bot.8818881902. 10.2307/3558365

  • 33

    PerrierX.Jacquemoud-ColletJ. P. (2006). DARwin Software. Available at: http://darwin.cirad.fr/

  • 34

    PriceA. L.PattersonN. J.PlengeR. M.WeinblattM. E.ShadickN. A.ReichD. (2006). Principal components analysis corrects for stratification in genome-wide association studies.Nat. Genet.38904909. 10.1038/ng1847

  • 35

    PritchardJ. K.StephensM.DonnellyP. (2000). Inference of population structure using multilocus genotype data.Genetics155945959. 10.1111/j.1471-8286.2007.01758.x

  • 36

    PurcellS.NealeB.Todd-BrownK.ThomasL.FerreiraM. A. R.BenderD.et al (2007). PLINK: a tool set for whole-genome association and population-based linkage analyses.Am. J. Hum. Genet.81559575. 10.1086/519795

  • 37

    RickC. M. (1983). “Genetic variation and evolution of Galapagos tomatoes,” inPatterns of Evolution in Galapagos Organism, edsBowmanR. I.BersonM.LevitonA. (San Francisco, CA: American Association for the Advancement of Science), 97106.

  • 38

    RickC. M.FobesJ. F. (1975). Allozymes of Galapagos tomatoes: polymorphism, geographic distribution, and affinities.Evolution (N. Y).29443457. 10.2307/2407257

  • 39

    RomagosaI.Borràs-GelonchG.SlaferG.van EeuwijkF. (2013). “Genotype by environment interaction and adaptation,” inSustainable Food Production, edsChristouP.SavinR.Costa-PierceB. A.MisztalI.WhitelawC. B. A. (New York, NY: Springer), 846870.

  • 40

    RosenbergN. A. (2004). DISTRUCT: a program for the graphical display of population structure.Mol. Ecol. Notes4137138. 10.1046/j.1471-8286.2003.00566.x

  • 41

    RushD. W.EpsteinE. (1976). Genotypic responses to salinity: differences between salt-sensitive and salt-tolerant genotypes of the tomato.Plant Physiol.57162166. 10.1104/pp.57.2.162

  • 42

    SaitouN.NeiM. (1987). The neighbor-joining method: a new method for reconstructing phylogenetic trees.Mol. Biol. Evol.4406425.

  • 43

    SokalR. R.MichenerC. D. (1958). A statistical method for evaluating systematic relationships.Univ. Kansas Sci. Bull.3814091437.

  • 44

    SpoonerD. M.PeraltaI. E.KnappS. (2005). Comparison of AFLPs with other markers for phylogenetic inference in wild tomatoes [Solanum L. section Lycopersicon (Mill.) Wettst.].Taxon544361. 10.2307/25065301

  • 45

    StricklerS. R.BombarelyA.MunkvoldJ. D.YorkT.MendaN.MartinG. B.et al (2015). Comparative genomics and phylogenetic discordance of cultivated tomato and close wild relatives.Peer. J.3:e793. 10.7717/peerj.793

  • 46

    The 100 Tomato Genome Sequencing ConsortiumAflitosS.SchijlenE.de JongH.de RidderD.SmitS.et al (2014). Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing.Plant J80136148. 10.1111/tpj.12616

  • 47

    The Tomato Genome Consortium (2012). The tomato genome sequence provides insights into fleshy fruit evolution.Nature485635641. 10.1038/nature11119

  • 48

    TinkerN. A.KilianA.WightC. P.Heller-UszynskaK.WenzlP.RinesH. W.et al (2009). New DArT markers for oat provide enhanced map coverage and global germplasm characterization.BMC Genomics10:39. 10.1186/1471-2164-10-39

  • 49

    Víquez-ZamoraM.VosmanB.van de GeestH.BovyA.VisserR. G. F.FinkersR.et al (2013). Tomato breeding in the genomics era: insights from a SNP array.BMC Genomics14:354. 10.1186/1471-2164-14-354

  • 50

    WenzlP.CarlingJ.KudrnaD.JaccoudD.HuttnerE.KleinhofsA.et al (2004). Diversity Arrays Technology (DArT) for whole-genome profiling of barley.Proc. Natl. Acad. Sci. U.S.A.10199159920. 10.1073/pnas.0401076101

  • 51

    WenzlP.RamanH.WangJ.ZhouM.HuttnerE.KilianA. (2007). A DArT platform for quantitative bulked segregant analysis.BMC Genomics8:196. 10.1186/1471-2164-8-196

  • 52

    WrightS. (1951). The genetical structure of populations.Ann. Eugen.15322354. 10.1017/CBO9781107415324.004

  • 53

    ZamirD. (2001). Improving plant breeding with exotic genetic libraries.Nat. Rev. Genet.2983989. 10.1038/35103590

  • 54

    ZuriagaE.BlancaJ.NuezF. (2009). Classification and phylogenetic relationships in Solanum section Lycopersicon based on AFLP and two nuclear gene sequences.Genet. Resour. Crop Evol.56663678. 10.1007/s10722-008-9392-0

Summary

Keywords

genotyping-by-sequencing, Solanum cheesmaniae, Solanum galapagense, genetic diversity, biogeography, tomato, wild relatives, Galapagos Islands

Citation

Pailles Y, Ho S, Pires IS, Tester M, Negrão S and Schmöckel SM (2017) Genetic Diversity and Population Structure of Two Tomato Species from the Galapagos Islands. Front. Plant Sci. 8:138. doi: 10.3389/fpls.2017.00138

Received

14 November 2016

Accepted

23 January 2017

Published

15 February 2017

Volume

8 - 2017

Edited by

Hirokazu Tsukaya, University of Tokyo, Japan

Reviewed by

Seisuke Kimura, Kyoto Sangyo University, Japan; Yasunori Ichihashi, RIKEN, Japan; Koji Takayama, Museum of Natural and Environmental History, Shizuoka, Japan

Updates

Copyright

*Correspondence: Mark Tester,

This article was submitted to Plant Evolution and Development, a section of the journal Frontiers in Plant Science

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics