Genetic Characterization of the Apple Germplasm Collection in Central Italy: The Value of Local Varieties

In the last 50 years, intensive farming systems have been boosted by modern agricultural techniques and newly bred cultivars. The massive use of few and related cultivars has dramatically reduced the apple genetic diversity of local varieties, confined to marginal areas. In Central Italy a limited spread of intensive fruit orchards has made it possible to preserve much of the local genetic diversity, but at the same time the coexistence of both modern and ancient varieties has generated some confusion. The characterization and clarification of possible synonyms, homonyms, and/or labeling errors in old local genetic resources is an issue in the conservation and management of living collections. 175 accessions provided by 10 apple collections, mainly local varieties, some of unknown origin, and well-known modern and ancient varieties, were studied by using 19 SSRs, analyzed by STRUCTURE, Ward’s clustering and parentage analysis. We were able to identify 25 duplicates, 9 synonyms, and 9 homonyms. As many as 37 unknown accession were assigned to well known local or commercial varieties. Polyploids made up 20%. Some markers were found to be significantly correlated with morphological traits and the loci associated with the fruit over color were related to QTLs for resistance to biotic stresses, aroma compounds, stiffness, and acidity. In conclusion the gene pool of Central Italy seems to be rather consistent and highly differentiated compared with other European studies (FST = 0.147). The importance of safeguarding this diversity and the impact on the management of the germplasm living collection is discussed.


INTRODUCTION
Apple (Malus × domestica Borkh., family Rosaceae, tribe Pyreae, 2n = 2x = 34) is one of the most ancient and widespread fruit crops in temperate regions. Almost certainly the domesticated apple is the result of a long evolutionary process extending over thousands of years, and it seems several species have contributed to its gene pool. On the basis of genetic data, of fruit and tree morphology, the wild Asian species M. sieversii M. Roem is actually considered the main contributor to the Malus × domestica gene pool, and the Tian Shan Mountains (Central Asia) the center of origin (Velasco et al., 2010;Cornille et al., 2014). Furthermore, hybridizations with other wild apple species present along the Silk Route, such as M. baccata (L.) Borkh. in Siberia, M. orientalis Uglitz. in Caucasus and M. silvestris (L.) Miller in Europe, have produced the diversity currently present in the domesticated apple (Vavilov, 1926;Harris et al., 2002;Cornille et al., 2012Cornille et al., , 2014Gross et al., 2014). Despite high genetic variability, the thousands of cultivars distributed throughout the world and the world-wide breeding programs, mainly based on organoleptic traits, aesthetic standards and disease resistance, the size of the genetic resources used by breeders has been limited and reduced to a few varieties such as 'Cox's Orange Pippin, ' 'Golden Delicious, ' 'Jonathan, ' 'Red Delicious, ' and 'McIntosh' (Noiton and Alspach, 1996). As a result, apple cultivation today is limited to closely related cultivars, and four of them, namely Golden Delicious, Gala, Red Delicious, and Idared, account for 48% of global production. Actually, the most important variety is Golden Delicious with 2.546 million metric tons, followed, by Gala with 1.331 million metric tons and Idared with 1.111 million metric tons (Food and Agricultural Organization of the United Nation [FAO], 2016;WAPA, 2017). This massive use of limited and related cultivars, combined with vegetative practices based on cuttings and grafting, has dramatically reduced apple genetic diversity and, hence, many interesting and well adapted traditional and local varieties, considered obsolete, were no longer cultivated and have been partly lost (Hammer et al., 2003).
Similar trends occurred in Italy, which with its 2.5 million t represents the fifth apple producer in the World and the second in Europe (Food and Agricultural Organization of the United Nation [FAO], 2016). Apple production is mainly concentrated in the North of Italy, in particular in Trentino Alto Adige, which with its 1.7 million t represents 67% of total Italian production, followed by Veneto (11%), Piemonte (6%), and Emilia-Romagna (6%). In these regions, as well as in the World and in other European Countries, production is based mainly on intensive orchards with few commercial varieties: Golden Delicious, Gala, Red Delicious, Fuji, and Granny Smith (Assomela-CSO, 2017). In Center-South Italy intensive apple orchards are appreciably present only in Campania (3% of Italian production). Historically these areas have never been inclined to intensive cultivation of fruit in general and apples in particular, even if in the 1920s there were unsuccessful attempts to introduce improved varieties in intensive orchards (Albertini et al., 2015). In these regions, apart from few family-run orchards, fruit production was, and currently remains, mainly directed toward self-consumption and local markets. Therefore, in Central Italy the almost complete absence of intensive apple cultivation allowed for the preservation of much of the existing genetic diversity even if the limited coexistence between modern and local varieties and the evolution of the farming systems since the 1950s, has led to the disappearance of several interesting and well adapted ancient varieties. Many of these varieties, although of low productivity, were relatively stable under extreme environmental conditions, and their high genetic variability guaranteed reliable harvesting for local communities in the past (Albertini et al., 2015). Over time, some of the modern, introduced varieties mixed with the autochthones, increasing the panorama of choice on the one hand, but generating some confusion regarding local genetic resources and their correct denomination on the other. Consequently, the need for characterization and clarification of possible synonyms, homonyms, and/or labeling errors in these old and local genetic resources is a fundamental and necessary step for the conservation and management of living collections. Indeed, the genetic variability and allelic diversity present in these old accessions could be of extreme interest in terms of response to selection in adaptation toward a changing environment (Caballero and García-Dorado, 2013). Therefore, even though such varieties are characterized by low fruit quality and yield, their allelic diversity could be essential for crop improvement, providing the presence of interesting traits for the development of new varieties.
The present research aims to understand the relationship between local accessions in central Italy with old and new varieties, as well as investigate for synonyms/homonyms for a better management of conservation and propagation of genetic resources.

Plant Material
One hundred and seventy five accessions of Malus × domestica, mostly Italian, were included in this study and were provided by several collections: the National Center of Fruit Tree Germplasm (CREA, coded c01), Parco Tecnologico Agroalimentare dell'Umbria (3A-PTA, coded c02 and c03 as coming from two living collections), the Department of Agriculture, Forestry and Food Science of the University of Torino (DISAFA, c04), the Archeologia Arborea private collection (c05), the Department of Agricultural, Food and Environmental Sciences of the Polytechnic University of Marche (D3A, c06), Malva Rinaldi School in Torino (c07), the Department of Agricultural Science, University of Bologna (DipSA, c08), the Giardino Armonico of Bevagna private collection (c09) and Azienda Ortofrutticola Sett'Olmi, Perugia (c10). Details are reported in Table 1. Many of the 175 accessions were well documented by reliable historical sources. Those lacking of several reliable information were coded as Unknown. Therefore, based on the initial information, the 175 accessions used in this study were classified into 17 commercial varieties (CV) used as control, 99 local varieties (LV), and 59 unknown accessions (UA).

Microsatellite Amplification
Total genomic DNA was purified from young leaves using the DNeasy 96 Plant Kit (Qiagen) according to manufacturer's protocol. Twenty one apple SSR primer pairs (Liebhard et al., 2002;Vinatzer et al., 2004;Silfverberg-Dilworth et al., 2005) distributed over the 17 apple linkage groups were used ( Table 2). Primer sequence and allele range for validated loci were analyzed by Multiplex Manager (Holleley and Geerts, 2009) to determine the best sets of loci to combine in a multiplex protocol. Multiplex Manager was used with the option of grouping all validated loci within the minimum number of PCRs avoiding allele range overlap and primer interactions.
PCRs were carried out in a final volume of 25 µl using 1 × Type-it Microsatellite PCR Master Mix (Qiagen), 0.2 µM of each fluorescent forward primer labeled with 6-FAM or ROX dyes (Sigma) and reverse unlabeled primer and 20 ng of template DNA. All amplifications were performed in a GeneAmpPCRSystem 9700 (Applied Biosystems, United States) consisting of a denaturing step of 5 min at 95 • C followed by 30 cycles of 95 • C for 30 s, 57 • C for 90 s and 72 • C for 30 s, and a final elongation step of 30 min at 60 • C.
PCR products were separated and analyzed on a 3130 XL DNA Analyzer (Applied Biosystems). The size of the amplified products was determined on internal standard DNA (GeneScan 500 Liz, Thermo) and the scorable peaks were assigned by GeneMapper software v.4.0 (Applied Biosystems).

Data Analysis
For each locus, common PCR artifacts leading to genotyping error were investigated. Presence of null alleles, large allele dropout and extreme stuttering was inferred by means of bootstrapping in Micro-Checker v2.2.3 (Van Oosterhout et al., 2004) based on 1000 bootstraps and 95% confidence interval. A preliminary analysis detected a deviation due the presence of null alleles for two loci (Hi23g02 and Hi03g02) which were therefore discarded. The statistical analysis was then based on the remaining 19 loci (Supplementary Table S2).
The statistical analysis of the SSR data matrix was carried out by Genodive (Meirmans and Van Tienderen, 2004) and SPAGeDi1.2 (Hardy and Vekemans, 2002). Both software packages are able to analyze data files containing diploid and polyploid accessions together. The analysis included: the detection of the number of allele (Na) per locus, the effective number of alleles (Ne), the percentage of rare alleles (RA = allele frequency < 0.01), of the observed (H o ) and expected (H e ) heterozygosity (Nei, 1978), inbreeding coefficient F, polymorphic information content (PIC) (Botstein et al., 1980) at each locus, determined using the following equation: The analysis also included the probability of identity (P ID ) (Waits et al., 2001) and the probability of identity among sibs P (ID)sib (Evett and Weir, 1998), calculated as follows: where p i and p j are the frequencies of the ith and jth alleles and i = j. Finally, the ability of each marker to discriminate two random cultivars was estimated by the Power of Discrimination (PD = 1 − P ID ) (Kloosterman et al., 1993).
In order to run a cluster analysis on diploid and polyploid accessions together, SSR data were converted to a binary data matrix by assigning "1" to the presence of a defined allele and "0" to its absence. The binary data matrix was then used to estimate a distance matrix, and the 175 accessions were clustered by Ward's hierarchical method (Ward, 1963) and validated by 1000 bootstrap replicates using PAST software (Hammer et al., 2001). The analysis revealed the presence of several identical accessions with the same genetic profile. As a consequence, these were removed and the remaining 150 were used again as binary data for a cluster analysis based on Ward's method, and the results compared with the Bayesian model-based clustering of STRUCTURE ver. 2.2.3 (Pritchard et al., 2000) using codominant data based on allele size. STRUCTURE software implements a clustering method assigning individuals and predefined populations to K inferred clusters, each characterized by a set of allele frequencies at Hardy-Weinberg equilibrium, based on estimates of the corresponding probabilities of membership to each group. The analyses were run on an admixture ancestral model with correlated allele frequencies, and the number of K clusters was determined firstly by simulating a range of K-values from 1 to 21 with 10 independent runs each. Since after K = 5 there were no other appreciable peaks, STRUCTURE was run again with twenty runs for K-values ranging from 1 to 11, using a burn-in and a run length of the Monte Carlo Markov Chain (MCMC) of 300,000 and 500,000 iterations for data collection. The best K-value was determined through the K method (Evanno et al., 2005) by using the STRUCTURE HARVESTER ver. 0.6.193 website (Earl and vonHoldt, 2012). The genotypes were assigned to the groups according to their highest membership coefficient, considering a strong affinity when the assigning probability (qI) was ≥ 0.80 (Breton et al., 2008;Pereira-Lorenzo et al., 2008;Miranda et al., 2010;Urrestarazu et al., 2012). The software STRUCTURE is able to infer the genetic structure either in diploid or polyploid genotypes as described in using the recessive allele approach (Falush et al., 2007). The membership of each accession to the Ward's clusters and to the groups of STRUCTURE were compared and discussed. The 92 accessions whose probability (qI) was ≥ 0.80 were grouped into K = 5 subset of 7, 9, 29, 11, and 36 individuals. The goodness of fit of these 5 groups was investigated by the F statistics (F IS and F ST ) (Weir and Cockerham, 1984), and the analysis of molecular variance (AMOVA) that estimates the fraction of the genetic variation among and within populations (Excoffier et al., 1992;Michalakis and Excoffier, 1996).
The software FaMoz (Gerber et al., 2003) was used to carry out a parentage analysis, to look for possible genetic relationships (parents) present among the entire sample of accessions and eventually confirming the results of STRUCTURE and Ward's clustering. FaMoz calculates the logarithm of the likelihood ratio, log of odds ratio (LOD score), by determining the likelihood of an individual being the parent of a given offspring, divided by the likelihood of these individuals being unrelated (Meagher and Thompson, 1986). LOD scores for any potential parentage relationship with a value greater than zero were computed, giving statistical significance to the data. Possible parents determined by LOD scores and significance thresholds were probed among the 150 accessions characterized with the set of 19 SSRs. Through 100,000 simulations with a rate of mistyping errors of 0.1% as described by Gerber et al. (2000), a LOD score threshold of 5.0 was found and used in our work (Supplementary Figure S1). For polyploid accessions a FaMoz control analysis was run on a 0-1 data matrix.
Moreover, in order to look for correlations between molecular data and morphological traits, a non-parametric correlation analysis (Spearman) was carried out using SAS 9.1 (Cary, NC, United States).

Genetic Diversity
The 19 nuclear SSRs were all polymorphic and produced scorable amplicons with a total of 278 alleles. The average number of alleles per locus was 14.6, ranging from 5 (Hi22f12) to 26 (CH05e03), but the number of effective alleles per locus was significantly lower (Ne = 5.94) ( Table 3). With the exception of locus Hi22f12, all loci showed at least one genotype with three alleles. Loci CH02b03b and CH01h01 identified as many as 28 individuals with three alleles, while the other loci identified between 8 and 22 individuals. Even if several individuals showed one locus with a third allele, only those showing a third allele in at least 3 loci (Urrestarazu et al., 2012) were considered putative polyploids. In the present study 35 individuals (20%) were classified as polyploids (Table 1), as they showed a third allele from 4 up to 13 loci.
Rare alleles were found in 17 loci and were more common as the number of alleles per locus increased. Rare alleles ranged from 12.5% (at locus CH04c07 rare alleles with a frequency less than 1% were 2 out of 16) to 57% (at locus CH-Vf1 they were 8 out of 14). No rare allele were found at the loci Hi22f12 and AU223657, where the range of allele size were the lowest, 16 and 13 bp, respectively ( Table 3).
Except for CH-Vf1 and CH01c06, all other loci were not in Hardy-Weinberg equilibrium (P ranging from 0.05 to less than 0.001) and this was expected as the 175 individuals do not belong to a panmictic population. Mean observed heterozygosity (H o ) was 0.78, ranging from 0.16 (locus Hi22f12) to 0.91 (CH01g12 and CH04c07) ( Table 3). Mean expected heterozygosity (H e ) was 3 | Genetic diversity in terms of range of allele size (bp), number of allele (Na), effective number of alleles (Ne), percentage of rarity (RA), observed (Ho) and expected (He) heterozygosity, inbreeding coefficient (F), polymorphic information content (PIC) and probability of identity (P ID and P IDsib ) of all 175 accessions of apple germplasm evaluated.

Locus
Range of allele size ( 8.5 × 10 −10 † RA: percentage of rare alleles; an allele is defined rare at a frequency < 0.01; ‡ The total value of the column is the cumulative P (ID) obtained as the product of the P (ID) of individual loci. The same is for P (ID)sib. 0.81, denoting high variability and ranging from 0.65 (CH-Vf1) to 0.89 (Hi07h02 and Hi03a10). F coefficients ranged from − 0.154 (CH-Vf1) to + 0.784 (Hi22f12), but the latter was the only value significantly departing from the others, thus denoting high homozygosity.
The 19 loci showed PIC values ranging from 0.65 (CH-Vf1) to 0.89 (Hi07h02 and Hi03a10), which were all higher than 0.5 and therefore very informative (Botstein et al., 1980). The probability of identity (P ID ) of a locus is the probability that two individuals share the same genotype at that locus, while the power of discrimination (PD = 1−P ID ) is the probability that two individuals have different genotypes at that locus. An overall mean value of PD = 0.98 (ranging from 0.924 to 0.997) indicates that the loci are polymorphic enough in discriminating individuals. By considering the profile of the 19 loci at the same time, the probability to find two identical individuals is indeed remote (P ID = 2.2 × 10 −38 ); therefore, two individuals with the same profile at 19 loci are expected to be clones of the same genotype.
After the removal of 25 duplicates, the 150 accessions were reanalyzed by Ward's clustering and analyzed by STRUCTURE. The results are reported in Figure 1 and are aligned in order to compare and, eventually, validate accession membership and population structure. On Ward's dendrogram and at a distance of 60 units we found two main groups (P < 0.001). The upper group includes all local varieties, while the other includes several commercial varieties that were used in this study as controls: Annurca, Reinettes, Abbondanza, Golden Delicious, Golden Gala, Fuji, Cripps Pink, and Stark Delicious. Moreover, at a distance of 35 the dendrogram showed essentially 035_Pagliaccia 034_Panaia 4 036_Casciola * Please note that other five pair of Unknown accessions were identical and only one was maintained for further analysis (see text in Results). Accessions in italics are putative polyplois.
Cluster 2 includes several ancient accessions from Umbria, such as Ruzza (#066 and #010) or Roggia (#128 and #018), which had already been described in the 16th century (Gallo, 1540). In this cluster there are two accessions named 'Conventina' (#064 and #127) from Gubbio (Umbria), which is the Italian name for 'Monastery, ' and where this variety was grown since the Middle Ages, and it is known to be suitable for mountain areas (Tonini, 1930). Lastly, there is a subcluster of Piattuccia (#062) and Pianella (#050, #081) whose names are due to a flattened fruit shape.
Cluster 3, with 45 accessions, includes the subgroups of Spoletina (#067, #139), of Limoncella (#126, #061, and #141), of several polyploids (red labeled beside the name), the local varieties Sona (#011), Coccianese (#059) and several unknown accessions. Limoncella is an ancient variety common in the South of Italy whose name means 'small lemon'; Sona is common in Valnerina (Umbria) and its name derives from the rattling sound of the seeds inside the carpel detaching at maturity.
Although apparently uniform, in Cluster 4 it is possible to detect two subgroups, one including all diploids accessions that refer to Annurca, an ancient cultivar of Campania whose origins were already described by Pliny the Elder (Pasquale, 1876), and the other including all polyploids and refers to Panaia (#034) and Polsola (#073), a synonym of Panaia, a local variety from Tuscany that spread in Umbria around 1700 and later in Abruzzo (Gallesio, 1817(Gallesio, /1839. In the Annurca subgroup there are two unknown accessions, #096 and #111, tightly joined together (P < 0.06), both collected in Umbria.
Cluster 5 is the most numerous. It includes 54 accessions and it is possible to distinguish the group of Reinettes, the group of Golden, that of Appiola, that of Abbondanza and that of Fuji, Cripps Pink, and Stark Delicious. The Reinette accessions (#131, #132, and #133) and those known as 'Mele Grigie' ('gray apples, ' from #019 to #024), provided by the Department of Agriculture of Torino, are very common in the North-West of Italy and are characterized by an acidic pulp and by a shrunken skin at maturity.
The plot of the average log-likelihood values for Ks ranging from 1 to 21 and the distribution of K-values (Evanno et al., 2005) according to K-values are shown in the Supplementary Figure S3. Two peaks were found, the first corresponding to K = 2 and the second to K = 5; the hierarchical genetic structure was investigated at K = 5. A threshold value of P qI ≥ 0.80 was used to assign individuals to the groups (Figure 1). With this threshold as many as 58 of 150 individuals were classified as admixture. At K = 5 STRUCTURE was able to define 5 groups, identified by the colors green, yellow, blue, red, and gray, respectively, which are represented side by side with those detected by Ward's clustering (Figure 1). By comparing the two grouping methods, and apart from the admixtures, it can be stated that there was a similar trend and correspondence. In Group 1 (green) only 7 out of 20 accessions were not classified as admixtures, and they all belong to Gelata and Cera (these are normally considered synonyms). In Group 2 (yellow) only 7 out of 18 accessions were classified at P (qI) ≥ 0.80; of these, some belong to the group of Ruzza (synonym of Roggia, #128, #066, #010, #018, Dalla Ragione and Dalla Ragione, 2011) and the rest to Piattuccia (#062, #110, #138), confirming their cluster closeness. In Group 3 (blue), the cluster of Spoletina, Sona and Limoncella, 28 out of 45 were correctly classified (P (qI) ≥ 0.80). In Group 4 (red) 11 out of 13 accessions were correctly classified: the Annurca's (all diploids) and a set of triploids, all belonging to #034_Panaia. Group 5 (gray) includes several commercial varieties (Reinettes, Golden Delicious, Golden Gala, Abbondanza, Fuji, Cripps Pink, and Stark Delicious) and here as many as 35 out of 54 accessions were accurately classified. Overall, by comparing the two methods, only two misclassified accessions emerged: the unknown #157 found in Ward's Cluster 3 but ascribed by STRUCTURE in Group 2 (yellow), and 039_Ciocarina Bianca in Cluster 5 attributed by STRUCTURE in Group 3 (blue).
Having established that 92 accessions showed a genetic structure, the five groups were compared in terms of number of alleles, expected and observed heterozygosity ( Table 5) and F-Statistics (Table 6). However, the mean number of alleles  . As expected, the mean number of alleles per locus in Groups 1, 2, and 4 was consistently lower than those for Groups 3 and 5, since heavily dependent on the number of accession forming the groups. Interestingly, in the former, less numerous groups, there was at least one locus with a fixed allele (Table 5), hence the indexes of diversity (H e ) were lower than those of Group 3 and 5. As expected, the overall F IS of the 5 groups was slightly negative (outbreeding), consistent for the majority of the loci (14), except for CH05e03 (0.1045, P < 0.05) and especially for Hi22f12 (0.7914, P < 0.001), highly homozygous compared with the expected values. The overall loci F STvalue (0.1470, P < 0.001) is to be considered rather high, meaning that the 92 accessions were well structured and close to a value (0.15), generally considered to indicate a threshold limit between a moderate and a great differentiation (Wright, 1978). However, upon closer inspection of the F ST -values at each locus it is possible to note that, apart from those loci whose values are close to the mean, some exceeded it by at least 3 times the standard error; these were CH05c06, Hi22f12, and AU223657 with 0.1995, 0.2162, and 0.3074, respectively, all significant at P < 0.001. Some other loci showed values more than 3 times lower than the mean; they were CH04c07, CH01h01, CH-Vf1, and CH01f03b, all significant at P < 0.001. In particular locus CH01h02 with a F ST -value of 0.0403 (P < 0.05) denoted a little differentiation. All the results reported above were confirmed by AMOVA, where the variation within groups was 75.9% and among was 24.1%, values different from those reported in literature (Gasi et al., 2010;Urrestarazu et al., 2012Urrestarazu et al., , 2016Pereira-Lorenzo et al., 2017).

Parentage Analysis
The parentage analysis was used (i) to investigate the origins of the 49 unknown accessions after the removal of 10 found identicals to well-known genotypes and (ii) to look for concordance with the results from STRUCTURE and Ward's clustering. Table 7 reports the 38 unknown accessions significantly related (LOD score > 5) to parents of known origin (LV and CV). As many as 27 of them showed full concordance with STRUCTURE and Ward's clustering. The most likely parents of 10 unknown accessions, classified by STRUCTURE into the Admixture group, were classified also by Ward's in the same group. Lastly, the 144_Unknown was included by STRUCTURE in group 5, whereas the most likely parent, 033_Panaia, was classified differently by the other two analytic procedures (Ward's Cluster3 and Admixture in STRUCTURE).
In particular, 051_Unknown was classified by STRUCTURE as admixture, showing a probability of 0.38 to be assigned to Group1 and of 0.61 to Group2; likely parents were 001_Cerina and 062_Piattuccia, belonging to Cluster1/Group1 and to Cluster2/Group2, respectively. Since Ward's clustering assigned #051 to Cluster 1, it can be stated that STRUCTURE was more efficient to infer its mixed genomic configuration.  SE 0.0460 0.0127 ‡ F IS is the loss of heterozygosity due to inbreeding, F ST is the loss of heterozygosity due to genetic drift; * , * * , * * * F-values significant at P < 0.05, 0.01, and 0.001, respectively; n.s. not significant.

Correlation of SSR Alleles and Some Morphological Traits
Correlations among morphological traits were not significant, except over color vs. fruit rustiness (r = −0.3166, P < 0.001). Eleven out of 19 SSR loci revealed that 15 alleles out of 278 were significantly (P < 0.001) correlated with five morphological traits ( Table 8). Four alleles showed a significant negative correlation with time of eating maturity, so that the presence of these alleles was related with early maturity. CH03d12_128 and Hi03a10_199 were positively correlated with fruit shape. Similarly, pulp color revealed two alleles with significant positive correlations with two SSR loci related to aroma compounds (Dunemann et al., 2009), indicating putative correlation between pulp color and aroma trait.
The over color was found negatively correlated with several alleles, with r-values ranging from − 0.3049 (for CH02b03b_077) to − 0.36308 (for CH_Vf1_127), meaning that their presence is by some means related to light colors (absence, yellow, and orange). Interestingly, the majority of the marker-alleles detected for the over color were related to QTLs for resistance to biotic stresses, aroma compounds, stiffness, and acidity, indicating a possible correlation among these traits (Kenis et al., 2008). Lastly, only one allele (CH01f03b_137) resulted positively correlated with ground color.
Moreover, three (CH01g12, Hi03a10, and Hi04e04) out of eleven loci identified in the correlation test were found in at least two traits, while CH01g12 was detected in maturity, over color and pulp color.

DISCUSSION
The rapid spread of modern, intensive agricultural techniques during the last century was also accompanied by a rapid spread of newly bred cultivars characterized by greater productivity and uniformity. Although this trend was more intensive in annual crops, it did not spare fruit orchards. The oversimplification of the agricultural systems in favorable areas caused parallel changes in the agricultural economy, farming assets, rural culture, and agricultural landscapes as well. From a biological point of view this determined a significant reduction of crop biodiversity and a progressive genetic erosion with the loss of many ancient, well adapted local varieties. For this reason, over the last 50 years, the need to develop effective strategies for the conservation and management of genetic resources has become a fundamental issue. At this purpose germplasm banks have been established all over the world, operating at international, national and local levels. Italy, in compliance with EU directives, has promoted several Regional germplasm collections, with the aim of maintaining and preserving the autochthonous diversity. In Italy fruit germplasm, and apple genetic resources in particular, are conserved by National and Regional institutions (such as CREA, University of Bologna, Malva Rinaldi School of Torino, and many others), several of whom kindly provided accessions that were used as controls in this study.
The area of Central Italy is characterized by hills, mountains, small valleys, a variety of soil types (Corti et al., 2013) and of exposure, generating many micro-environments. Rainfall amounts and distribution from the East to the West coasts, passing through the Apennines, is also different, as well as the temperatures due to altitude differences from sea level to 2000-3000 m a.s.l. (Longinelli and Selmo, 2003). Fruit in general, and apples in particular, were rarely grown in specialized and intensive cultivation over large areas. Therefore, coupling these conditions together can perhaps explain most of the reasons at the base of the rich diversity found in the apple germplasm of Central Italy, a picture difficult to find in other Italian regions.
Many of these local varieties are well adapted to specific agroclimatic conditions and often express some diversity with respect to the originals in terms of morphological and physiological traits, thus assuming different names. Often, the names were assigned on the base of phenotypic traits, strongly influenced by the environment and agricultural practice, thus increasing the existing confusion about local genetic resources and their correct denomination. This gives rise to the importance of characterizing the germplasm present in the national and regional germplasm banks, by identifying duplicates and redundant accessions, hence simplifying the management and reducing costs of living collections.
For this purpose, molecular markers and in particular SSR have been widely used in genetic diversity studies and clarified cases of synonymy and homonymy in core collections (Patzak et al., 2012;Urrestarazu et al., 2012Urrestarazu et al., , 2016Liang et al., 2015;Yun et al., 2015;Lassois et al., 2016). Following the detection of null alleles, two of the initial 21 loci (Hi23g02 and Hi03g02) were discarded. The remaining 19 showed a high degree of   polymorphism and discriminating power and allowed us to meet our objectives.
The pool of accession studied here showed a percentage of polyploids of 20%, a value intermediate between 8% reported by Urrestarazu et al. (2016) in screening a wide European collection, and those found in Spain: 34% by Pereira-Lorenzo et al. (2007), 29% by Ramos-Cabrer et al. (2007), and 24% by Urrestarazu et al. (2012).
In brief, our study showed that 25 accessions were duplicates, 9 had to be considered synonyms ( Table 4) and 9 homonyms. Six accessions from the living collection of 3A-PTA Pantalla and 3A-PTA Casalina (Amerina c02_049/ c02_087, Coccianese c02_058/ c02_059, and Rosa in Pietra c02_147/c03_118) were duplicates of the same genotype, because the SSR profiles were identical throughout the 19 loci. Also San Giovanni (c02_013) and Ruzza (c02_012) turned out to be identical to the corresponding accessions provided by the Department of Agricultural Science, University of Bologna (c08) and the National Center of Fruit Tree Germplasm (c01), thus confirming the goodness of the analysis.
Among polyploids we found some accessions named Panaia or Polsola, and they are synonyms (Dalla Ragione and Dalla Ragione, 2011). This ancient variety, whose local name 'Panaia' derives from 'bread basket, ' was very common in Central Italy and in the past two varieties were described by Gallesio (1817Gallesio ( /1839: 'Panaia massima' and 'Panaia a frutto piccolo.' These denominations refer to the dimension of the fruit, and may explain the homonymy between #034 and #075 (polyploids with bigger fruit) vs. #033 (diploid with smaller fruit).
Another significant result of our study was the genetic identification of several unknown accessions. Ten of them were excluded as duplicates of well-known accessions. By using different statistical approaches (Cluster, STRUCTURE and Parentage analysis) it was possible to assign 37 more accessions to known commercial or local varieties.
Lastly, it is worth mentioning that the 92 accessions found by the Bayesian analysis were well-structured at K = 5, where the F ST -value indicates a high differentiation among subpopulations, much higher than those reported in the literature Gasi et al., 2010;Urrestarazu et al., 2012Urrestarazu et al., , 2016, indicating that the material from Central Italy is a genetic pool worthy of safeguarding and conservation. In particular we found that F ST at some loci were very contrasting (see Table 6). The low F ST -values at loci CH04c07, CH01h01, CH-Vf1, CH01f03b and CH01h02 suggests that homogenizing selection across subpopulations reduces differentiation, whereas the high F ST -values at loci CH05c06, Hi22f12, and AU223657 suggest that selection for local adaptation is creating differentiation. In these cases we found that the allele 128 of CH-Vf1 is correlated with fruit over color and allele 137 of locus CH01f03b with fruit ground color, while allele 106 of CH05c06 and allele 235 of AU223657 are correlated with time of eating maturity (Table 8).
Unexpectedly, the observed heterozygosity at the locus Hi22f12 was significantly lower than the values at the other 18 loci (0.13 vs. 0.78), meaning that almost all individuals at this locus are homozygous. Actually, this is also the locus with the lowest number of alleles. The sequence of the Hi22f12 SSR locus was then used in the BLAST program analysis against the NCBI nr database, and we found that Hi22f12 is located inside the transcription factor IIE subunit 1-like gene (XM_008376494.2) of Malus × domestica, at the position 1187-1242. The expectation and the identity of the query against the reference gene was 2 × 10 −6 and 85%, respectively. The stability needed by this gene explains this low polymorphism, perhaps confined to introns. It would be interesting to extend this investigation to other germplasm collections.
In the present study Hi22f12 and some other SSR loci resulted fixed; in particular, loci CH05C06 and CH02c09 showed a fixed allele in group 2 of STRUCTURE. Of these, CH05C06 is of particular interest, associated with a major QTL for fruit titratable acidity (TA) detected in the Ma region (Liebhard et al., 2003;Kenis et al., 2008). In the cross 'Telamon × Braeburn, ' this QTL was mapped in the LG16, an interval between the markers CH05e04z and CH05c06 and explained 20-34% of the observed variance (Kenis et al., 2008). The Ma gene controls the level of malic acid in apples and many other fruits (Maliepaard et al., 1998;Liebhard et al., 2003;Xu et al., 2012). Indeed, acidity is one of the most important fruit traits and, in apples, it strongly affects quality and organoleptic characteristics. In fact, the balance between sugars and acids is the basis of the taste and flavor of fruit (Wu et al., 2007;Zhang et al., 2010) and is therefore of utmost importance in breeding programs (Visser and Verhaegh, 1978). Moreover, this locus seems to be associated with a second QTL (M2), coding for the aromatic compound β-damascenone (Dunemann et al., 2009), a potent aroma present in apples (Cunningham et al., 1986;Fuhrmann and Grosch, 2002) and other fruits (peaches and grapes) and beverages (coffee, beer, and wine), and is associated with descriptions such as "fruity-flowery, " and in particular "apple" and "baked apple" (Pineau et al., 2007). In our study the allele 120 at the locus CH05C06 is fixed in all accessions of Group 2 of STRUCTURE, the one hosting Conventina, Ruzza, Roggia, Rosona, and Piattuccia, all characterized by crispy, sugary, sour, and very aromatic pulp. In the same group 2 we found fixed also the allele 248 of the locus CH2c09 and this locus is linked to a QTL for the aromatic compound allylanil (M1) associated with anise and licorice descriptors on the LG 15 (Plotto and McDaniel, 2000;Dunemann et al., 2009). Two other allele in the present investigation, the 162 at locus CH01c06 and the 218 at locus Hi22f12, were also fixed in Group 1 and Group 4, respectively, but no information was found in the Malus database.

CONCLUSION
In conclusion, this paper highlights the presence of considerable genetic variability among the apple accessions recovered in Central Italy and the information obtained can be used to better manage large living collections of a fruit tree of great nutritional interest such as the apple.

AUTHOR CONTRIBUTIONS
EA, LC, and FV conceived the study. EA and GM designed and coordinated the experiments. EA, GM, and LC chose and provided the germplasm. GM and NF performed the lab experiments. GM, LR, and NF conducted the data analysis and wrote the manuscript, while EA and FV critically reviewed it.