Combining Ability and Molecular Marker Approach Identified Genetic Resources to Improve Agronomic Performance in Coffea arabica Breeding

Plant breeding aims to develop cultivars with good agronomic traits through gene recombination and elite genotype selection. To support Coffea arabica breeding programs and assist parent selection, molecular characterization, genetic diversity (GD) analyses, and circulating diallel studies were strategically integrated to develop new cultivars. Molecular markers were used to assess the GD of 76 candidate parents and verify the crossing of potential F1 hybrids. Based on the complementary agronomic traits and genetic distance, eight elite parents were selected for circulating diallel analysis. The parents and 12 hybrids were evaluated based on 10 morpho-agronomic traits. For each trait, the effects of general and specific combining abilities, as well as the averages of the parents, hybrids, and predicted hybrids, were estimated. Crosses that maximize the genetic gains for the main agronomic traits of C. arabica were identified. Joint analysis of phenotypic and molecular data was used to estimate the correlation between molecular GD, phenotypic diversity (PD), phenotypic mean, and combining ability. The selection of parents that optimize the allele combination for the important traits of C. arabica is discussed in detail.


INTRODUCTION
Coffee breeding programs aim to develop cultivars with agronomic and technological traits demanded by producers, combined with high productive potential, adaptation to different producing regions, and better cup quality (De Paiva Barbosa et al., 2019a). However, the genetic gain obtained through Coffea arabica selection is limited, mainly due to its low genetic variability (Setotaw et al., 2013). The recent origin, preferentially autogamous reproduction, and limited dispersion of the species are the primary reasons for this narrow genetic basis (Merot-L'anthoene et al., 2019;Scalabrin et al., 2020). Thus, efficient strategies to explore genetic variability are crucial for the parent selection and the success of C. arabica breeding programs (Alkimim et al., 2017). An alternative would be to estimate and explore genetic variability through molecular markers (Sousa et al., 2019;Alkimim et al., 2020).
Molecular markers are of great use for genetic improvement and selection because they allow precise access to information at the DNA level (Ferrão et al., 2015). In C. arabica, these markers have been used successfully for germplasm and cultivar fingerprinting, genetic diversity (GD), genetic mapping, and marker-assisted selection (Missio et al., 2011;Alkimim et al., 2017;Setotaw et al., 2020). Molecular markers also facilitate controlled crossbreeding certification in breeding programs. Coffea arabica is an autogamous species and hybridized artificially. In this process, selfing must be prevented using a secure sterility system (Longin et al., 2012). However, female coffee plants are emasculated near the flower opening when the stigma is ripe and pollination proceeds. Therefore, true hybrid assessment is difficult because self-fertilization may occur before out-of-crossing prevention. It provides inaccurate progenies that can adversely affect all stages of future breeding programs. These problems can be overcome with molecular marker assistance (Caballo et al., 2018;Chauhan et al., 2021). By analyzing the allelic profile of the parents, molecular markers quickly and accurately identify self-fertilized progenies, distinguishing them from hybrid progenies (Conceição et al., 2011;Stetter et al., 2016). This strategy is particularly important for breeding perennial and long-cycle species with low genetic variability, such as C. arabica.
Another strategy to explore GD among individuals and select genetic resource to be included in breeding programs is to use diallel crosses. Diallel analysis evaluates the general and specific combining abilities of the parents and predicts the average behavior of the hybrids (Kaushik and Dhaliwal, 2018;Maioli et al., 2020). This approach is widely used to identify elite parents for developing hybrids or cultivars for use in breeding programs, understand hybrid heterosis, and provide information about gene action (Moura et al., 2016;Pereira et al., 2017;Ofori and Padi, 2020;Olivo et al., 2020). For coffee, diallel study was carried out to compare the performance of parent lines and hybrids of C. arabica (Cilas et al., 1998) and Coffea canephora (Cilas et al., 2003, Cilas andBouharmont, 2005) and the data was used to assist the Cameroon breeding program (Cilas et al., 1998). In addition to its usefulness in breeding, diallel data also has been used to study the heritability of physical and mechanical properties of coffee wood (Cilas et al., 2006).
Among the different methods proposed for diallel crossing, circulating diallel (Kempthorne and Curnow, 1961) aims to reduce the number of hybrids to be evaluated and predicts the best unused hybrids. Thus, it is an efficient design and requires little mating efforts and experimental resources for plant material evaluation. It is of particular interest in the breeding programs for commercially important tree species (Tello et al., 2019), such as coffee.
Both molecular markers and diallel crosses have been used in different crops to analyze the relationships between genetic distance, agronomic performance, heterosis, and combining ability of hybrids. The results have been contrasting depending on the crop, genotypes, markers, and traits evaluated . Combining ability and genetic distance is highly correlated when related genotypes are crossed, as observed in the same heterotic group of corn (Makumbi et al., 2011) and related sunflower strains (Reif et al., 2013). Since C. arabica has a very narrow genetic base (Setotaw et al., 2013;Sousa et al., 2017b), the association of information on GD using molecular markers and the average behavior of genotypes based on morpho-agronomic traits may be useful in selecting parents.
This study aimed to use different approaches to assist the selection of parents to be introduced in coffee breeding programs for the main agronomic traits. Coffea arabica resources were selected based on a circular diallel mating design, for evaluating and predicting the performance of progenies in breeding programs. Diallel analysis and molecular markers genotyping were coupled to ensure the efficiency of coffee selection.

Genetic Material
For the GD study, 22 coffee genotypes were evaluated, corresponding to cultivars and elite accessions of C. arabica breeding programs developed in Brazil. More than one plant from each cultivar/access was analyzed, totaling 76 coffee plants ( Table 1). Eight genotypes were selected based on the complementary agronomic traits ( Table 2) and genetic distance. They were crossed according to the circulating diallel model ( Table 3). The potential hybrids were evaluated with molecular markers for controlled crossbreeding certification, and the true hybrids were planted.
The parents and hybrids were kept in the experimental area of the Department of Plant Pathology of the Universidade Federal de Viçosa (DFP/UFV), Brazil, region located at 20 • 44 ′ 26 ′′ S latitude, 42 • 50 ′ 54 ′′ W longitude and 665 m altitude. The annual temperature, considering the years of 2013 to 2016, varies from 5.4 to 37.5 • C, with annual mean temperature of 20.3 • C (20.1 • C in the last 30 years) and annual mean precipitation of 1,220.5 mm (1,289.0 in the last 30 years). The seedling (in bags), with three pair of leaves, were planted in January, 2013, following a randomized block design, containing 20 treatments, composed of 12 F 1 hybrids and eight parents, with four replications and three plants per plot. The plants were arranged at a spacing of 3.0 × 0.70 m.

DNA Extraction and Amplification With SSR Markers
The genomic DNA of each plant was extracted from young and fully expanded leaves using a previously proposed method  (Diniz et al., 2005). The quality and quantity of DNA were evaluated using a NanoDrop 2000 spectrophotometer. DNA purity was analyzed using 12 SSR primers ( Table 5). These markers can distinguish and show the unique molecular profile of the main Brazilian coffee cultivars (Sousa et al., 2017b), including those analyzed in our work. PCR amplification was performed in a 20 µl reaction mix containing 50 ng DNA, 1 U Taq DNA polymerase, 1× enzyme buffer, 1 mM MgCl 2 , 150 µM each dNTP, and 0.1 µM each primer, the volume was adjusted with sterile milli-Q water, using a PTC-200 thermocycler (MJ Research) and Veriti (Applied Biosystems). The reaction conditions were: initial denaturation at 94 • C for 2 min; 10 cycles at 94 • C for 30 s, decreased by 1 • C each cycle (from 66 to 57 • C) for 30 s; and 72 • C for 30 s, followed by another 30 cycles of denaturation at 94 • C, annealing at 57 • C, and extension at 72 • C, with 30 s each step. The final extension was performed at 72 • C for 8 min. The amplified DNA was separated by electrophoresis in a 6% denaturing polyacrylamide gel and visualized using silver nitrate staining, according to a previously described protocol (Brito et al., 2010).

Genetic Diversity Analyses
Molecular marker data were coded as codominant for the performance of GD analyses. The dendrograms were constructed according to the unweighted pair group method using arithmetic averages (UPGMA) methodology in the MEGA software (7.0) (Kumar et al., 2016). The scores of the genetic dissimilarity matrix were obtained by the arithmetic complement of the weighted index in the GENES software (Cruz, 2013). The genetic distance was calculated as follows: Where: Dii ′ : genetic distance between the accession pairs i and i ′ ; c j : number of common alleles between the accession pairs i and i; p j = a j A : weight associated with locus j, determined by: a j : total number of alleles of locus j; A: total number of alleles studied; L j=1 p j = 1

Diallel Analysis
The individual variance was analyzed using the data of 10 traits evaluated in the parents and hybrids, according to the following model: Where: X (ij)k = phenotypic score of the k-th observation regarding the ij-th genotype in the k-th block m = general average; = effect of the ij-th genotype (parent, i = j, or hybrid, i = j); b k = fixed effect of the k-th block; ε ijk = experimental error.
The superiority of the hybrid in relation to the others and/or the parents was evaluated using the Tukey test, with a 5% probability.
The general combining ability (GCA) and specific combining ability (SCA) were estimated using the parent and hybrid data, according to a previously described model (Kempthorne and Curnow, 1961). This diallel analysis was performed according to the following statistical model: Where: Y ij = mean score of the hybrid combination ij (i = j) or the i-th parent (i = j). Y ij = (1/r) r k=1 X (ij)k , where r is the number of repetitions; u = overall means of hybrid combinations; g i = effects of GCA of the i-th parent; g j = effects of GCA of the j-th parent; s ij = effect of SCA; and ε ij = mean experimental error.
The potential of hybrid combinations not obtained in the diallel (as the model was circulating diallel) was predicted using the following equation:Ŷ ij = u +ĝ i +ĝ j Where: Y ij = predicted score of the hybrid ij; µ = general average; g i andĝ j = estimates of general combining capabilities To assess the existence of significant differences between the effects of GCA and SCA, confidence intervals with a 95% probability were calculated. In this procedure, the bootstrap approach was adopted with the establishment of 5,000 new data sets obtained from the resampling of the original data. These sets were again submitted to diallel analyzes, generating estimates of the combining ability. The new estimates were ordered and, from the set, 5% of the extreme values were excluded (2.5% at each

Correlation Analysis
Pearson's correlation coefficient among parental GD, parental phenotypic diversity (PD), parental SCA, and hybrid phenotypic mean was estimated. Genetic diversity was obtained using the genetic distance matrix of the parents and analyzed using SSR markers. Phenotypic diversity was calculated by comparing the mean scores of each trait evaluated in the parents. The SCA was estimated using diallel analysis and the phenotypic means using the mean scores of each phenotypic trait evaluated in the

CLR
Coffee leaf rust incidence (score scale ranging from 1 to 5). 1, absence of pustules and hypersensitivity reactions; 2, few leaves with spore-free pustules ("flecks") and with hypersensitivity reactions; 3, few pustules per leaf with high spore production and poorly distributed; 4, average number of pustules per leaf, distributed in the plant with high spore production; 5, large number of pustules with high spore production and high defoliation of the plant. NOTE: Plants with score 1 or 2, resistant; 3-5, susceptible.

LM
Leaf miner infestation (score scale ranging from 1 to 5). 1, immune. leaves without any lesion; 2, leaves with few tapered lesions; 3, leaves with few and small lesions; 4, leaves with moderate infestation, typical lesions, and live larvae; 5, leaves with severe infestation, typical lesions and live larvae. hybrids. A network graph of the correlations was constructed as proposed previously (Rosado et al., 2017). The thickness of the lines represents the absolute score of the correlation. The width of the line was controlled by applying a cut-off score of 0.5, for easy graph visualization.

Genetic Diversity
Genetic diversity was analyzed based on the SSR marker data obtained by genotyping 76 C. arabica plants corresponding to 22 cultivars or elite accessions with potential use in the coffee breeding program. The markers were able to discriminate the cultivars and the molecular profiles revealed polymorphism within cultivars/accessions. Since the genotypes of the same cultivar/accession were segregated, two analyses were performed. The first analysis used 22 individuals, one for each cultivar/access, and presented the most frequent alleles for each SSR primer. This study aimed to verify the distance between cultivars and accessions. A total of 28 alleles were obtained from the selected plants. The number of alleles amplified by markers varied between two (CaEST-006, 029, 031, 071, 072, 045, 048, and SSR16) and three (CaEST-022, 040, 089, and SSR95), with an average of 2.3 alleles per locus. The highest genetic distance estimates were observed between Catiguá MG2 and Acauã (0.750), Catiguá MG2 and Topázio MG 1190 (0.696), Catiguá MG2 and Arara (0.643), Catiguá MG2  Table 1). The clustering analysis of the 22 plants, based on the genetic distance matrix estimated between the pairs of individuals, using the 12 SSR markers, resulted in a dendrogram with four groups: I, II, III, and IV. Group I was subdivided into two subgroups: I.a and I.b (Figure 1).
A second GD analysis was performed considering the genotypic data of 76 C. arabica plants, including one or more plants per cultivar/accession. The clustering analysis, based FIGURE 2 | Dendrogram based on the analysis of 12 SSR markers in 76 coffee trees (Table 1), obtained by the UPGMA technique, using the dissimilarity matrix of the weighted index arithmetic complement.
on the estimated genetic distance matrix between pairs of individuals, using 12 SSR markers, is shown in Figure 2  Vermelho IAC 144 and MGS Paraíso 2 was observed ( Figure 2).

Analysis of the Circulant Diallel
To assist the coffee breeding program, the most important C. arabica plants were selected for diallel analysis. The parental selection was based on the importance and complementarity of the agronomic traits ( Table 2) and on the GD approach. The GD allowed the selection of coffee genotypes in all groups: Oeiras MG6851 and Siriema in group I.a; Arara and UFV 311-63 in group I.b; Paraíso MG H419-1 and H 484-2-18-12 in group II; Acauã Novo in group III; and Catiguá MG2 in group IV (Figure 1). The potential F 1 hybrids developed, using eight cultivars/accessions crossed in a partial diallel, were evaluated with SSR markers to determine whether the progenies were obtained from controlled crossbreeding or self-fertilization. All parents were genotyped with SSR primers, and polymorphic and informative markers were identified for each hybrid progeny. Informative markers have polymorphisms between the parents; in this case, each parent must amplify at least one different allele. Thus, hybrid progenies have alleles present in both parents (Supplementary Figure 1), whereas selffertilized progenies have alleles present only in the female parent (Supplementary Figure 2). Based on molecular markers, C. arabica plants confirmed as true hybrids and originated from the parent used in the artificial hybridizations were used in the diallel study.
The circulant diallel scheme estimates the genetic parameters and selects the best parents and hybrids based on the GCA and SCA scores. Using this partial diallel approach, a sample of possible crosses was studied, and the potential of all hybrid combinations was predicted. Therefore, this analysis provided information about the parents using few crosses without any information loss, since all hybrids were estimated in the model.
The variance analysis for the 10 evaluated traits, as well as the means of the effects for GCA and SCA, are presented in Table 6. The treatment effects were significant for Vig, RFC, Y, MC, Fruit MU, and BES traits, showing a variability among the genotypes (parents and hybrids). General combining ability effects were significant for Vig, RFC, Y, MC, MU, CLR, and BES traits. These estimates provide information on gene concentration (favorable allele frequency) with additive effects in the parents. Specific combining ability effect was significant only for Vig, RFC, Y, and MC traits.
The GCA scores were estimated for each parent and the two best scores for Vig, Y, MU, CLR, and BES (significant for GCA using analysis of variance) are highlighted in bold in Table 7. Low GCA scores are desired for CLR and BES traits, since the lowest score is associated with plants resistant to these diseases. The same was considered for MU, as the lowest score indicates the highest uniformity. The estimates of the SCA effects for each hybrid and parent are shown in Table 8, and the best cross combination for Vig and Y, the most significant traits, are highlighted. The reliability of the CGA and SCA estimate were assessed from the confidence interval limits (Supplementary Table 2). In this analysis, if the range does not include the zero value, the estimate is statistically non-null. In addition, the confidence interval information is useful to assess whether there is a significant difference between the estimates in the case where the confidence intervals do not overlap.
Catiguá MG2 × UFV 311-63 and Catiguá MG2 × Acauã Novo were the best identified crosses for Vig. In this analysis, not only the best SCA effect, but also the cross involving at least one parent with a high GCA, were considered. Although both GCA and SCA effects were significant for RFC and MC traits, they were not highlighted in the table, as fruit color and MC did not affect the preference for the developed cultivar, but the information must be available for the grower.
The mean performance of the obtained hybrids and the predicted mean scores of the hybrid not obtained in the partial diallel design were estimated for each trait ( Table 9). The hybrids from crosses Catiguá MG2 × UFV 311-63 and Paraíso MG H419-1 × Arara showed the highest mean scores for Vig (8.085 and 7.750, respectively). Hybrids from Catiguá MG2 × Acauã Novo and Paraíso MG H419-1 × UFV 311-63 had the highest mean scores for the Y trait (5.250 and 4.960, respectively). Of the 10 evaluated traits, Vig, RFC, Y, MC, and CLR showed significant BES, incidence of brown eye spot; LM, leaf miner infestation. The two best scores for Vig, Y, MU, CLR, and BES (significant for GCA using analysis of variance) are highlighted in bold. BES, incidence of brown eye spot; LM, leaf miner infestation. The best crosses estimated based on the SCA, involving at least one parent with high GCA, for the main significant traits (Vig and Y) are highlighted in bold.
differences between the means of treatments, according to the Tukey test at the 5% probability level (Table 10).
To access the occurrence and degree of heterosis, graphs were drawn considering the mean of the parentals and hybrids based on seven traits (Figure 3). SC, RFC, and MC were not analyzed as these traits did not affect the cultivar preference. Higher levels of heterosis were found for Vig and Y. The data allow us to select hybrids that surpassed the average of their parents in some traits. Hybrid H15 showed high heterosis for Vig and Y, but low for other traits, including negative heterosis for RFS. H16 also showed high heterosis for Vig and Y and lower diseases incidence, but negative heterosis for MU and RFS. The performance of the hybrids is related to the combining ability of the parents for traits of interest. Thus, to reach heterosis for Y and Vig the best crosses are Paraíso MG H419-1 with UFV 311-63, H484-2-18-12, and Arara as well as Catiguá MG2 × UFV 311-63. However, the hybrids showed lower MU and RFS and higher incidence of BES and LM than one of   their parents. Hybrids with better MU than both parents were obtaining with the crosses Paraíso MG H419-1 × H484-2-18-12 and Oeiras MG 6851 × Acauã Novo. Higher fruit size was found only with the hybrid from Oeiras MG 6851 × Siriema. In general, the incidence of disease (CLR and BES) and infestation of LM were higher in the hybrids comparing with one of their parents.

Correlation Analysis
Correlation analysis allowed us to verify the relationship between the performance mean of the hybrids analyzed in the diallel model, considering each of the 10 morphoagronomic traits and the parental SCA, PD, and GD (based on molecular markers). The maximum score obtained from the correlation index was 0.93 (Figure 4). In general, PD was not correlated linearly with GD. Phenotypic diversity was also poorly correlated with SCA and the mean performance of the hybrids. The average score of the traits displayed a higher correlation with GD than with PD. Specific combining ability and the means of the hybrids for Vig and Y traits displayed the highest positive correlations with GD. In addition, the CLR trait was highly negatively correlated with Vig trait.

Genetic Diversity
The success of crop breeding programs lies in the efficient identification and incorporation of GD, while preserving the important economic traits of an individual plant (Swarup et al., 2020). To achieve this goal, breeders usually use cross-cultivated genotypes to avoid the linkage drag of wild genetic material, but they need to maintain diversity to address the producer and consumer demands. In addition, higher GD in plants allows them to adapt to sudden environmental changes (Raza et al., 2019).
In this study, the DNA of 22 coffee cultivars and accession, with the potential to be used in breeding, showed low diversity (mean: 2.3 bands/primer). The narrow genetic base of C. arabica has been reported worldwide, which is explained by the recent origin of the species with a single polyploidization event, autogamous reproduction system, and poor initial global distribution (Missio et al., 2011;Sousa et al., 2017a;da Silva et al., 2019;Jingade et al., 2019;Merot-L'anthoene et al., 2019;Sánchez et al., 2020;Scalabrin et al., 2020). Moreover, its genetic resources have been conserved in the field and, therefore, may be quickly eroded due to local hazards and global climatic change, worsening the genetic variability reduction (Legesse, 2020). For Brazilian cultivars analyzed in this study, the limitation of low  Table 3) obtained in a circulating diallel scheme, based on the means of agronomic traits. variability has been aggravated by the low number of plants introduced in the country and used in genetic breeding. It has been demonstrated that Brazilian C. arabica cultivars originate from a few parents (Setotaw et al., 2013). The genetic base of 121 cultivars released in Brazil between 1939 and 2009 was defined by 13 ancestors, among which seven ancestors contributed 97.55% genetic base. Low genetic variability of the 34 main C. arabica cultivars planted in Brazil has been confirmed in another study (Sousa et al., 2017b). Even with the recognized narrow genetic base of C. arabica plants available to be used in the breeding programs, informative data on GD was obtained through molecular markers, genetic distance matrix analysis, and dendrograms in this study. Bourbon Amarelo MG 0009, Topázio MG 1190, Catuaí Amarelo IAC 62, Ibairi IAC 4761, and Catuaí Vermelho IAC 144 were allocated to subgroup I.b (Figure 1). This can be explained by the fact that these cultivars/accessions are susceptible to CLR (Legesse, 2020) and do not have the introgression of genes from other coffee species. Genes that confer resistance to CLR and other diseases and pests have been introgressed into C. arabica cultivars through interspecific hybrids. The chromosomal section responsible for resistance introgressed in the cultivar and is responsible for increasing the GD (Setotaw et al., 2013(Setotaw et al., , 2020, which explains the separation of cultivars with no interspecific genome. The exception was observed for the cultivar Arara and accession UFV 311-63, which were allocated in subgroup I.b, but they are considered rust-resistant. Arara originated from the spontaneous hybridization between Obatã IAC 1669-20 and Catuaí Amarelo. Obatã IAC 1669-20 itself probably originated from the spontaneous hybridization between Sarchimor and Catuaí (Pereira and Oliveira, 2015). Successive backcrosses with cv. Catuaí may explain the genetic similarity to susceptible coffee trees.
The remaining cultivars/accessions were rust-resistant and distributed in the other groups of the dendrogram, except for the cultivar Arara and accession UFV 311-63, which were allocated in subgroup I.b, the group of rust-susceptible cultivars.
Rust-resistant C. arabica cultivars are generally derived from the HdT or Icatu interspecific hybrids between C. arabica and C. canephora (Del Grossi et al., 2013). These germplasms carry genes from the C. canephora species, which facilitates C. canephora genome introgression in C. arabica (Sousa et al., 2017a;Setotaw et al., 2020). These results demonstrate the potential of HdT and Icatu to expand the genetic base of C. arabica.
The polymorphism observed in the 76 C. arabica plants evaluated in our study, including the polymorphism within cultivar, must be considered for parent selection for the C. arabica breeding programs. This information can be used to choose cultivars and individuals within cultivars to be crossed to explore the existing genetic variability and complementarity. For example, if a cross between IAC 125 RN and HdT MG 0357 is the choice due to the complementarity of interest traits, the HdT MG 0357 plant n • 51 should not be selected, as it has been allocated to some IAC 125 RN plants. In contrast, plant n • 51 of HdT MG 0357 can be used, as they were allocated to different groups. These results demonstrate the high efficiency of SSR markers in assisting the selection of the best plants within each cultivar/accession, avoiding the selection of genetically similar plants.

Hybrid Identification by Molecular Markers
Coffea arabica plants were crossed considering their diversity and complementary traits and potential F 1 hybrids were obtained for C. arabica breeding. Before advancing to the next breeding generation, the true hybrids were identified. Confirmation of cross success and discrimination between parent genotypes and hybrids is essential for genetic breeding. During breeding, the obtained hybrids are prone to contamination by outcrossing with foreign pollen or physical admixtures (Carvalho, 2008;Krishna et al., 2020). In autogamous species, such as C. arabica, self-pollination is common before controlled outcrossing, preventing the transfer of desired traits in progenies. Therefore, cross certification and genetic purity testing of hybrids is a routine and essential approach to an efficient breeding program. Early hybrid identification is a limitation for coffee breeding (Sánchez et al., 2020), and molecular marker analysis can facilitate this process.
In this study, SSR markers were used to certify artificial crossing in a diallel scheme, and the desired crosses were detected in most analyzed hybrids. These results show the informative power of SSR markers in crossbreeding certification. This certification is of great importance in breeding programs, particularly of perennial species, such as C. arabica, as it eliminates unwanted genotypes and self-fertilized progenies early, thus saving time, financial resource, and labor.
Although the hybridizations were artificially made by hand, self-fertilized progenies were identified. Hybridization in autogamous plants generally consists of emasculating the flowers and removing the anthers few days before pollination (Georget et al., 2019). In C. arabica, the flowers are hermaphrodite and autogamy occurs due to the phenomenon of cleistogamy, which occurs before the flower opens. In this case, emasculation must be performed before the flower opens and before it has been self-pollinated; however, the stigma must already be ripe. Thus, emasculation or crossing was probably performed after selfpollination in the self-fertilized F 1 progenies.

Diallel Analysis
To efficiently select the best crosses for coffee breeding, a diallel analysis was performed in addition to testing the parent diversity, with eight coffee cultivars/accession. The diallel results also revealed the gene action involved in important agronomic characteristics. The variance analysis carried out for 10 morphoagronomic traits showed additive and non-additive genetic variability, which were statistically significant for both GCA and SCA. For all evaluated traits, the GCA was higher than the SCA, indicating high contribution of the additive gene action in controlling the traits. The additive genes affect the proportion of phenotypic variation transmitted to successive generations and are therefore responsible for the performance of the genotypes in the progeny at when homozygosity is reached (Silva et al., 2013). In autogamous breeding programs, plant selection is practiced in advanced generations of self-fertilization, maximizing genetic progress through the additive effects of genes (Hallauer et al., 2010). Classical C. arabica breeding is composed of genealogy and backcross breeding methods, and the mating system is applied for effective pure-line selection from selfing and elite genotype testing (Fanelli Carvalho et al., 2020). Thus, in autogamous species, such as C. arabica, GCA is important for breeders because it depends on additive variance, while SCA depends on the variance due to deviations in dominance.
Low positive or negative GCA estimate indicates that the parental GCA does not differ from the general average. However, when these estimates are high, the parent is superior to the other parents of the diallel. Thus, to develop the hybrid, the best combination should be those with high SCA estimates, whose parents have a high GCA estimate (Kaushik and Dhaliwal, 2018). In this study, Catiguá MG2 and Arara had high GCA estimates for Vig trait (0.494 and 0.299, respectively), indicating that these genotypes are the most recommended for developing a base population for breeding aimed at enhancing Vig. However, the cross between these two parents produced a hybrid with a negative SCA score. Thus, crossing Catiguá MG2 and UFV 311-63 should be prioritized for Vig improvement, as both parents presented positive GCA estimates and high SCA effect. The diallel analysis also showed that the combination of Paraíso MG H 419-1 and Arara has potential for Vig breeding as it presented a high SCA effect score (1.278) and involves one of the parents with high GCA estimate (Arara). Vig is an important trait for coffee, as it is positively correlated with Y (Severino et al., 2008;Pedro et al., 2011) and genotype adaptation, reflecting less depleted plants (Nadaleti et al., 2018).
Like other crops, one of the main objectives of coffee breeding programs is to increase productivity (De Paiva Barbosa et al., 2019a). Therefore, based on diallel data, Acauã Novo and Catiguá MG2 had the highest GCA scores for Y (0.603 and 0.581, respectively). The hybrid from the cross between these two parents showed an estimated SCA score of 0.324. Thus, as the estimated SCA score for this hybrid was high and positive and involved the two parents with the highest GCA scores, the cross between Catiguá MG2 and Acauã Novo is recommended for this trait. Another potential cross would be between the Catiguá MG2 and UFV 311-63, since the hybrid resulting from this cross had the highest estimated SCA score (0.909) and involved one of the parents with high estimated GCA score.
UFV 311-63 and Oeiras MG 6851 displayed the best GCA scores for fruit MU (−0.241 and −0.179, respectively) and no significant difference in the SCA scores. The lowest negative scores are desirable for MU, since the lowest scores indicate the highest fruit ripening uniformity. MU is related to coffee beverage quality, and in the recent years, there has been an increasing demand for the best quality coffees (De Paiva Barbosa et al., 2019a); moreover, MU contributes to labor reduction during harvest.
In addition to yield and quality, coffee breeding focuses on biotic stress resistance. In this study, we evaluated the parental resistance to CLR, BES, and leaf miner. Phenotypic evaluation was based on the incidence and infestation of these biotrophic agents. Thus, the plants that received the lowest score were the most resistant, implying that low GCA and SCA scores are desired. Arara and UFV 311-63 displayed the best GCA scores for CLR (−0.072 and −0.070, respectively), and no significant difference in the SCA scores. Thus, crosses involving UFV 311-63 and Arara have the greatest potential for resistance to CLR. UFV 311-63 and H 484-2-18-12 displayed the best GCA scores for BES (−0.268 and −0.178, respectively), and no significant difference in the SCA scores was obtained. Crosses involving at least one of these parents are recommended. Leaf miner infestation was similar in all parents studied.
The hybrids with the highest scores of phenotypic mean for Vig and Y traits were obtained from the recommended crosses based on GCA and SCA evaluation (Catiguá MG2 × UFV311-63 for Vig and Catiguá MG2 × Acauã Novo for Y). Moreover, the mean yield of these hybrids was higher than that of their parents. This result shows that the combining ability evaluation-based hybrid prediction is an effective technique and can be particularly useful for breeding programs, since cross selection in the diallel scheme was based on genetic parameters.
In general, crosses involving Catiguá MG2, UFV 311-63, and Arara were the most promising. Hybrids originating from Catiguá MG2 were recommended for Vig and Y traits, and this cultivar also showed the highest genetic distance from almost all other C. arabica progenies analyzed according to the GD study (Figures 1, 2). These results show a potential relationship between GD, phenotypic mean, and combining ability. In addition, Catiguá MG2 has been used as a source to enhance cup quality and obtain specialty coffee (Alex et al., 2016;De Paiva Barbosa et al., 2019b). This cultivar has also been highlighted for its high resistance to CLR (Del Grossi et al., 2013) and moderate resistance to bacteriosis caused by Pseudomonas syringae (Fernandes et al., 2020).
As the crosses were made between fairly inbred parents, the heterosis was also evaluated for the different traits. Hybrid vigor or heterosis represents the average superiority of a crossbred individual in relation to the average performance of their parents. Heterosis also depends on genetic differences between the parents being crossed. In our work, hybrids that surpassed the average of their parents in some traits were identified. Other studies showed the advantages of exploiting heterosis in C. arabica hybrids, with significant yield gains, when compared to the average of the parents and the best parent (Bertrand et al., 2005(Bertrand et al., , 2011. However, no hybrid in our work showed superiority for all evaluated agronomic characteristics. A higher level of heterosis was found for Vig and Y and, in general, the cultivar Paraíso MG H419-1 was used as one of the hybrids parentals.

Correlation Analysis
In breeding programs, genetically contrasting cultivars/accessions containing relevant agronomic traits are mostly crossed. Thus, through gene recombination, it is possible to obtain genetic gains that enable the success of breeding programs. Nevertheless, the ideal situation is the involvement of genetically diverse parents with good proven performance in crosses, whenever possible (Ramalho et al., 2013). To assist this strategy, it is important to evaluate cultivars and elite accessions and test the relationship among their GD (obtained here by molecular markers), PD, phenotypic mean of their hybrids, and combining ability. Thus, if these parameters are somehow correlated, a complex parameter should be estimated using a simple and low-cost analysis. For example, for Vig and yield (Y), the cross between Catiguá MG2 and UFV 311-63 has recommended based on the scores of general and specific combining abilities obtained through diallel analysis. Genetic distance matrix analysis indicated that these two genotypes had a high genetic dissimilarity score (distance = 0.589).
The joint analysis of molecular and phenotypic data was especially important in our work. Diallel analysis were performed based on phenotypic data from a single year, the first year with significant yield (3.4 years after planting). Since some evaluated traits are quantitative, data from additional years would be required to confirm the performance of hybrids and parentals. However, the high correlation between GD and phenotypic data allowed the early selection of genetic materials, which is essential for coffee, a perennial crop with a long reproductive cycle.
Network correlation indicated that PD was not linearly correlated with GD in the eight C. arabica plants evaluated using circulating diallel (Figure 4). Phenotypic diversity was also poorly correlated with specific combining abilities and mean performance of hybrids in all evaluated traits. However, hybrid performance and parent-specific combing ability were closely correlated with GD. These results corroborate those obtained for corn (Makumbi et al., 2011) and sunflower (Reif et al., 2013). The correlation between molecular marker-based GD and combining ability was evaluated in genetically related and unrelated groups of sunflowers (Helianthus annus L.) (Reif et al., 2013). A strong correlation was observed for related genotypes, but not for unrelated genotypes. For tropical test lines of corn (Zea mays L.), a high correlation was also observed when related genetic materials within the same heterotic group were crossed.
Previous studies have demonstrated a high correlation between the combination ability estimated using diallel crosses and genetic distance estimated using molecular markers, when genetically related genotypes are crossed. Coffea arabica plants are genetically related due to their narrow genetic base and, therefore, reduced genetic variability (Sousa et al., 2017a), which can maximize the correlation between these parameters. The correlation among molecular GD, hybrid performance, and SCA can help in cultivar selection for the genetic improvement of C. arabica. The use of molecular GD allows the early selection of C. arabica plants in the breeding program, which is particularly important for perennial and long-cycle species (Sousa et al., 2019).
Vig and Y traits were the most positively correlated with GD. In addition, CLR incidence trait was highly negatively correlated with Vig trait. These traits have been the focus of most coffee breeding programs. The 12 SSR markers used to assess GD were pre-selected in other studies (Pestana et al., 2015;Sousa et al., 2017b) because they are highly polymorphic, assuming high importance in this species due to their narrow genetic basis. Thus, it can be inferred that GD evaluation-based parent selection for the genetic improvement of C. arabica is a useful technique, since the molecular distance was highly correlated with the phenotypic means of the hybrids and parent combining ability.

CONCLUSIONS
Molecular marker-based GD analysis allows a detailed assessment of the genetic distance between and within coffee cultivars/accessions. Using molecular markers is an efficient approach to assist parent selection and true hybrid identification to develop a segregating C. arabica population for breeding and therefore efficiently increase GD in C. arabica cultivars. Circulating diallel cross is an effective technique for the genetic improvement of C. arabica, and selecting crosses based on general and specific combining abilities can be very useful for obtaining promising breeding population.
Coffea arabica breeders can increase GD through strategic molecular marker integration, crossbreeding certification, and diallel approach, while preserving the economically important traits of individual crops. Using this strategy, elite genetic resource can be included in breeding programs and new cultivars may be developed in response to rapid shifts in global coffee cultivation conditions and resources due to climate change and new demand from coffee producers and consumers.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

ACKNOWLEDGMENTS
We would like to thank Editage (www.editage.com) for English language editing.