Uncovering Candidate Genes Controlling Major Fruit-Related Traits in Pepper via Genotype-by-Sequencing Based QTL Mapping and Genome-Wide Association Study

All modern pepper accessions are products of the domestication of wild Capsicum species. However, due to the limited availability of genome-wide association study (GWAS) data and selection signatures for various traits, domestication-related genes have not been identified in pepper. Here, to address this problem, we obtained data for major fruit-related domestication traits (fruit length, width, weight, pericarp thickness, and fruit position) using a highly diverse panel of 351 pepper accessions representing the worldwide Capsicum germplasm. Using a genotype-by-sequencing (GBS) method, we developed 187,966 genome-wide high-quality SNP markers across 230 C. annuum accessions. Linkage disequilibrium (LD) analysis revealed that the average length of the LD blocks was 149 kb. Using GWAS, we identified 111 genes that were linked to 64 significant LD blocks. We cross-validated the GWAS results using 17 fruit-related QTLs and identified 16 causal genes thought to be associated with fruit morphology-related domestication traits, with molecular functions such as cell division and expansion. The significant LD blocks and candidate genes identified in this study provide unique molecular footprints for deciphering the domestication history of Capsicum. Further functional validation of these candidate genes should accelerate the cloning of genes for major fruit-related traits in pepper.


INTRODUCTION
Pepper (Capsicum species), like other Solanaceae family members including tomato and potato, is a New World genus with a primary center of diversity in Bolivia and Peru (Nee et al., 2006). Capsicum comprises more than 30 species, and the domestication of five of these species in the Americas, including the economically important plants C. annuum, C. baccatum, C. chinense, C. frutescens, and C. pubescens, dates back to 6,000 BC (Moscone et al., 2007;Paran and Van Der Knaap, 2007;Cheng et al., 2016). Peppers are referred to as capsicum, pimento, sweet pepper, red pepper, cayenne pepper, bird's eye pepper, jalapenos, or habaneros based on fruit shape and pungency (Moscone et al., 2007;Babu et al., 2011), and have various uses as vegetables, seasonings, ornamental plants, and medicinal crops. The easy cultivation of pepper has led to their widespread use worldwide, especially in tropical regions (Moscone et al., 2007;Babu et al., 2011). The majority of wild forms of Capsicum spp. display perennial herbaceous growth, with a small, erect, deciduous growth habit and red, pungent, soft-fleshed fruits (Paran and Van Der Knaap, 2007).
Among Solanaceae species, domestication-related traits have been described for tomato (Bai and Lindhout, 2007;Giovannoni 2018), potato (Li et al., 2018), and eggplant (Doganlar et al., 2002;Meyer et al., 2012). These traits are generally referred to as "domestication syndrome" because they can be used to distinguish cultivated crops from their progenitors (Doebley et al., 2006). The domestication syndrome traits are not fully elucidated for pepper. During domestication, Capsicum spp. might have been selected for fruit morphology and pungency (Babu et al., 2011;Che and Zhang, 2019). Other pepper domestication traits include a non-deciduous habit, fruit that remains on the plant until harvest, and pendent fruit orientation (Kaiser, 1935 Paran andVan Der Knaap, 2007). However, underlying genes are largely known.
Genetic and genomic analyses of cultivated crops and wild relatives have provided evidence for domestication by revealing selection footprints in the key genes controlling domestication traits (Zeder et al., 2006;Stitzer and Ross-Ibarra, 2018). Recent genetic and archaeological studies have revealed the spatiotemporal origins and processes underlying the domestication of these traits and have allowed domestication traits to be divided into two types based on the underlying genes. Some domestication traits are controlled by genes called 'domestication genes' that were subjected to early selection of major-effect QTLs, while other traits are controlled by genes that were selected later to produce diversified, improved crops; these genes are called 'improvement genes' (Pickersgill, 2007). Wang and Bosland (2006) published a comprehensive summary of genetic studies on Capsicum genes performed from 1912 to 2006 that lists 292 genes for various traits in pepper, including morphological and physiological traits, male sterility, and resistance to nematodes, diseases, and herbicides.
By contrast, due to advancements in next-generation sequencing (NGS) techniques and the availability of newer populations of tomato, six representative gene families were identified to control fruit size in this crop, including the Cell Number Regulator (CNR), Cytochrome P450 A78 class (CYP78A), IQ domain, Ovate Family Protein (OFP), YABBY, and WOX gene families (Monforte et al., 2014;Lin et al., 2014;Sacco et al., 2015;Soyk et al., 2017). Candidate genes belonging to these families such as CNR, SlKLUH, SUN, OVATE, FAS, and LC have been cloned, and their roles in regulating fruit elongation, locule number, and fruit shape have been well characterized (Xiao et al., 2008;Guo and Simmons, 2011;Rodriguez et al., 2011;Chakrabarti et al., 2013). These findings from tomato were successfully utilized for QTL mapping and downstream gene analysis in pepper, shedding light on the complex genetic architecture and genomic regions that govern these quantitatively inherited traits (Ramchiary et al., 2014;Wang et al., 2015;Chunthawodtiporn et al., 2018;Colonna et al., 2019).
Since the release of the first reference genome of pepper (Kim et al., 2014), genome-wide association study (GWAS) has been used to analyze only a few traits in pepper such as fruit weight (Nimmakayala et al., 2016a), capsaicinoid contents (Nimmakayala et al., 2016a;Han et al., 2018), peduncle length (Nimmakayala et al., 2016b), and fruit size and shape (Colonna et al., 2019) using diverse pepper germplasms. Combined QTL mapping and GWAS has been utilized to avoid identifying false-positive QTLs or associations for major fruit-related traits in pepper.
The goals of the current study were to (1) explore the correlations among five important fruit-related traits in pepper and (2) determine the significant genetic regions or genes governing genetic variations in the major fruit-related traits with strong evidence for selection during domestication. We obtained high-quality SNPs via genotype-by-sequencing (GBS) and subjected them to GWAS. We obtained candidate genes underlying the QTLs detected by GWAS and characterized their functions, laying the foundation for further functional validation and cloning of candidate genes for major fruit-related traits in pepper.

Plant Materials
A collection of 351 Capsicum accessions known as the 'pepper GWAS population' was used for analysis, comprising four major domesticated species including C. annuum (230 accessions), C. baccatum (48 accessions), C. chinense (48 accessions), and C. frutescens (25 accessions) ( Table S1). Among the accessions in the pepper GWAS population, 250 were previously selected as a core set representing the genetic diversity of more than 4,600 accessions from 97 countries (Lee et al., 2016).

Phenotypic Evaluation and Correlation Analysis
The pepper GWAS population was planted in a greenhouse at the Rural Development Administration (RDA)-Gene bank Jeonju, Republic of Korea (35°49′51.3" N, 127°03′47.1" E). Over a three-year period (2015-2017), six plants per accession were randomly planted, and the phenotypes of three plants per accession were evaluated. Five domestication traits were evaluated using a randomized block design, including four quantitative traits (fruit length [FL], width [FWd], weight [FWg], and pericarp thickness [PT]) and one qualitative trait (fruit position [FP]). All quantitative traits were measured using an electronic scale and a ruler. FP was scored as 1 to 3 (1 = erect, 2 = declining like a pendant, and 3 = intermediate

Genomic DNA Extraction and GBS
Genomic DNA was extracted from the samples using the CTAB method (Lee et al., 2017;Siddique et al., 2019;Solomon et al., 2019) and diluted to 80 ng/µl in distilled water. GBS libraries were constructed via double digestion with two sets of restriction enzymes (PstI/MseI and EcoRI/MseI) as previously described (Han et al., 2018;Siddique et al., 2019;Solomon et al., 2019). The digested DNA was ligated to adapters and amplified with 'TA' primers. The libraries were pooled in five tubes. The contents of the tubes were sequenced in separate lanes using the HiSeq 2000 platform (Illumina, San Diego, CA) at Macrogen (Seoul, Republic of Korea).

Reference-Based SNP Calling and Construction of the SNP Set
Raw 101-bp reads from the libraries were trimmed to a minimum length of 80 bp and filtered to a quality score >Q30. The filtered reads were aligned to the C. annuum 'CM334' reference genome v.1.6, http://peppergenome.snu.ac.kr (Kim et al., 2017) using the Burrows-Wheeler Aligner program v.0.7.12 (Li and Durbin, 2010). For SNP calling and filtering, the GATK Unified Genotyper v.3.3-0 was used (Depristo et al., 2011). The SNP set was constructed using three steps: prefiltering, imputation, and major filtering. First, pre-imputed SNPs were filtered to removed mono and tri-allelic SNP types and SNPs with a call rate >0.1. After pre-imputation filtering, SNPs with missing data were imputed using the FILLIN method in TASSEL (Bradbury et al., 2007). To obtain SNPs of suitable quality, hapSize was applied to obtain sequences ranging from 100 to 8,000 with two minSites (25, 50) and two minPres (250, 500). The final selected imputation option was dependent on the best option of regression (R 2 ) values and the imputed ratio of minor and major alleles. Finally, the major filtering step was performed under the following conditions: minor allele frequency >0.05, SNP coverage >0.6, and inbreeding coefficient (IF) >0.8.

Population Structure (Q) and Linkage Disequilibrium (LD) Estimations
To identify population stratification, principal component analysis (PCA) was performed using the 'pcaMethods' library in R software (Stacklies et al., 2007). The values of each PC were used as variables in the GWAS. The LD block of the GWAS population was estimated using PLINK v.1.9 (Chang et al., 2015) with the following settings: '-no-parents -no-sex -blocks no-pheno-req no-small-max-span -blocks-inform-frac 0.95blocks-max-kb 2000 -blocks-min-maf 0.05 -blocks-recombhighci 0.9 -blocks-strong-highci 0.98 -blocks-strong-lowci 0.7'. The calculated LD block intervals were used to search for candidate genes for specific traits.

Genome-Wide Association Study (GWAS) and Candidate Gene Identification
GWAS based on the compressed mixed linear model (CMLM) was conducted using the R package of Genomic Association and Prediction Integrated Tool (GAPIT) (Lipka et al., 2012) with the forward model selection using the Bayesian information criterion (BIC). The significance threshold −log 10 P-value of the GWAS was determined using Bonferroni (1936) correction (FDR P-value < 0.05) based on the number of independent SNPs in a population. The candidate genes inside LD regions with significant SNPs were investigated. Gene prediction was performed based on gff file v.2.0 of the CM334 v.1.6 reference genome (http://peppergenome.snu.ac. kr), and the function of each gene was predicted using Blast2GO (Götz et al., 2008) based on deduced protein sequences. To detect the physical positions of previous QTLs or orthologous genes from other species, BLAST searches of the pepper genome from the NCBI database (https://www.ncbi. nlm.nih.gov) were performed. To predict the molecular and biological functions of genes, the NCBI, Solanaceae (https:// solgenomics.net), and Arabidopsis databases (https://www. arabidopsis.org) were used.

QTL Mapping Using Recombinant Inbred Lines
To validate the GWAS results, recombinant inbred lines (RILs) were used for QTL mapping as described by Han et al. (2016). All information about the plant materials and phenotypes in this study were described in Han et al. (2016), but the genotyping results were altered using a more recent version of the reference genome (C. annuum 'CM334' v.1.6, http://peppergenome.snu.ac. kr). In brief, 120 F 7 -F 10 RILs derived from a cross between pungent C. annuum 'Perennial' and non-pungent C. annuum 'Dempsey' were grown for 3 years (2011, 2012, and 2014) in two locations: Anseong (2011 and 2012a) and Suwon, Korea (2012b and 2014). After SNP calling, sequencing reads were aligned to C. annuum 'CM334' v.1.6. A modified sliding window approach was used to investigate recombination breakpoints and to construct a bin map of the RILs. Bins were used as markers to construct a genetic map using the Carthagene program (De Givry et al., 2005) with default threshold values. All detailed Lee et al.

Pepper GWAS of Fruit-Related Traits
Frontiers in Plant Science | www.frontiersin.org July 2020 | Volume 11 | Article 1100 options were adapted from Han et al. (2016) and Siddique et al. (2019). Of the 18 reported traits (Han et al., 2016), we utilized major four fruit domestication-related traits (FL, FWd, FWg, and FP) for QTL analysis. Composite interval mapping (CIM) was performed with Windows QTL Cartographer 2.5 . The phenotypic values of each trait in the respective years and locations were analyzed separately to detect QTLs. The log of odds (LOD) threshold was determined by performing 1,000 permutation tests with 5% probability (P) for each trait, and the proportion of phenotypic variation (R 2 ) for each QTL was estimated. The 95% confidence interval was used to represent the location of each QTL.

SNP Discovery and Population Structure (Q) of a Pepper GWAS Population
We aligned sequences derived from GBS to the C. annuum cv. CM334 reference genome v. 1.6, http://peppergenome.snu.ac.kr (Kim et al., 2017). GBS genotyping of 351 accessions (Table S1) with two sets of libraries constructed using double digestion with two sets of restriction enzymes generated 8,717,361 SNPs. The GBS generated data is available in National Agricultural Biotechnology Information Center (NABIC, https://nabic.rda. go.kr/, ID= NV-0630-000001) Trimming and filtering-out of SNPs with a quality score <30, call rate <10%, and mono or triallelic SNPs types resulted in 1,869,524 SNPs (Table S2). To avoid potential errors in the interpretation of the GWAS results, the missing genotypes were imputed using the FILLIN method in the TASSEL package. Accordingly, approximately 26% of genotypes were imputed to minor alleles, and 21 and 59% of genotypes were imputed to hetero and major alleles, respectively, with a regression (R 2 ) value of 0.82. Using this imputed genotype, the final filtering step was performed under the following conditions: MAF >0.05, SNP coverage >0.6, and IF >0.8. This step resulted in a set of 507,713 high-quality SNPs, which were evenly distributed on 12 chromosomes ( Figure 1A, Table S2). Each SNP marker generated from this SNP set was named according to its physical position in the pepper reference genome. As the genetic structure of a population can strongly affect the results of GWAS, we performed principal component analysis (PCA) to analyze population stratification. This analysis yielded four genetic clusters, with 22.26% (PC1) and 13.47% (PC2) of the genotypic variance in the first and second axes. Each cluster was well clustered by species, including C. annuum, C. baccatum, C. chinense, and C. frutescens. Although some accessions showed slight admixture, there were no conspicuous sub-clusters in the structure ( Figure 1C).

Phenotypic Diversity of Major Fruit-Related Domestication Traits in the GWAS Population
We evaluated 351 Capsicum accessions from four species with maximum genetic diversity (Lee et al., 2016) for three years to assess the range of phenotype variation of the five fruit-related domestication traits (FL, FWd, FWg, PT, and FP). We detected a consecutive reduction in the mean values of the quantitative traits during the three years of the experiment, except for FWd, which had a higher value in 2016 (25.1 mm) compared to 2015 (24.3 mm) and 2017 (23.7 mm). While the maximum average FL value (72.4 mm) was obtained in 2015, the minimum value (66.9 mm) was obtained in 2017. Similarly, FWg and PT showed the highest average values in 2015 (22.9 g and 2.5 mm, respectively), followed by 2016 (19.9 g and 2.0 mm, respectively) ( Table 1). Three species showed either erect (40%) or pendant (60%) FP, whereas C. frutescens showed all erect FP. Intermediate FP was also observed in all species except C. frutescens ( Figure 2B). High broad-sense heritability (H 2 ) values were recorded for FL (0.8), FWd (0.81), FWg (0.83), and PT (0.72) ( Table 1).
All five fruit-related domestication traits showed significant positive correlations (P = 0.01). Specifically, highly strong positive correlations were detected between FWg and FWd (r = 0.91), followed by FWd and PT (r = 0.90) and FWg and PT (r = 0.88). Although FL had slightly lower positive correlations with these three traits (FWd, FWg, PT), it was the most highly correlated with FP (r = 0.46) compared to the three other traits ( Figure 2A). As the GWAS population was clustered by species, we performed ANOVA, which validated the variability in the traits among species (Figures 2B-F, Tables S3 and S4). This analysis uncovered significant variation between species groups for the five traits, ranging from 9.63 to 22.68 (F, with p = 0.00), with a mean difference of 0.08 to 0.17 (h 2 ), which also supported the differences among species ( Table S4). Most of the traits showed the greatest

Linkage Disequilibrium (LD) Pattern and GWAS of the C. annuum Cluster
The minimize the confounding effect of interspecies variation and the corresponding false-positive errors, among the entire population set used in the experiment, we selected the C. annuum cluster, as it contained a sufficient number of accessions with high levels of phenotypic and genotypic diversity without any interrelated population stratification ( Figures 1D, E). Since the five fruit-related traits showed high broad-sense heritability (H 2 ) values ( Table 1), indicating that genetic factors were the major determinants of the observed phenotypic variability, we subjected all traits to GWAS using the C. annuum CM334 v.1.6, http://peppergenome.snu.ac.kr reference genome. In the C. annuum cluster, 187,966 high-quality SNPs were filtered for use in GWAS following the criteria described above ( Figure 1B, Table S2). Using this SNP set, we compared the common and unique patterns of genetic variation in adjacent marker pairs of each chromosome by performing LD analysis throughout the genomes of the C. annuum accessions. We identified 12,234 LD blocks, with an average of 1,020 per chromosome. The average block size was 149 kb, each containing an average of 14 SNPs ( Table S5). The LD blocks were named based on their order on each chromosome.
We detected high variation for all fruit-related traits over the three years of analysis except for the qualitative trait (FP), which was observed for only one year. Common SNPs that were consistently correlated for at least two years of investigation exceeding the significance threshold (−log 10 P >6.575) were used to describe our results ( Figures S1 and S2, Table S6). Accordingly, a total of 178 common SNPs were identified, including 1 for FL, 148 for FWd, 28 for FWg, and 1 for PT. For FP, 52 significant SNPs from accessions with pendant, erect, and intermediate phenotypes were used for analysis (Table S6).
In detail, for marker-trait associations per year, eight significant SNPs were identified for FL; of these, three SNPs (S03_19218749, S03_19218759, and S03_19254384) on chromosome 3 were detected only in 2017. Two SNPs were identified on chromosome 4, including S04_211838587 only in 2016 and S04_211848210 in all three years of the study. Two and one additional SNP (S05_12080328, S07_214135504, and S11_72669050) each on chromosomes 5, 7, and 11 were identified only in 2015 and 2017, respectively ( Figure S1). SNP S04_211848210, which is associated with LD block H04-0562 on chromosome 4 in the region between 211.8 Mb and 211.9 Mb (spanning an interval of 64,916 bp), was expected to be highly correlated with FL, as it was consistently detected throughout the experiment ( Figures 3A, B). For FWd, we detected 281 significant SNPs throughout the experimental period, with an average -log 10 P-value ranging from 6 to 10.57 ( Figure S1). Most of the significant SNPs (98.9%) were located on chromosome 9 from 67.4 to 171 Mb and were linked to 26 LD blocks ( Figure  S3). In 2015, nine unique significant SNP positions (eight on chromosome 9 and one on chromosome 7) were detected. There were 123 unique SNPs in 2016, all on the middle and distal regions of chromosome 9. While 146 common SNPs were detected in 2015 and 2016, only two SNPs were identified on chromosome 12. Notably, two SNPs (S09_133144036 and S09_136634573) associated with H09-0745 and H09-0756 were consistently identified in all three years of the study (Figure 4, Table S6).
A genome-wide association scan also revealed 52 significant SNPs associated with the variation in FP ( Table S6). Most of the significant SNPs were located on chromosome 12, while three SNPs were detected in the 220 Mbp region on chromosome 3, and two SNPs were located at 161 and 198 Mbp on chromosome 5, respectively. Inside of chromosome 12, except for two SNPs detected in a 143.3 Mbp region, most significant SNPs were detected near the 211 to 219 Mbp region, with the highest association detected in the H12-0566 block area ( Figures 3F-H).

QTLs of Major Fruit-Related Domestication Traits in the RILs
To confirm the GWAS results, we examined QTLs for four major fruit-related domestication traits (FL, FWd, FWg, FP) using 120 RILs derived from a cross between C. annuum 'Perennial' and C. annuum 'Dempsey' (PDRIL); in these lines, 86 QTLs for 17 horticultural traits were previously mapped (Han et al., 2016). The only difference in the technique used in the current compared to the previous study is that here, we used the genetic map developed from the more recent version of the reference genome (CM334 v.1.6, http://peppergenome.snu.ac. kr). Based on the reference genome, we used 444,405 SNPs from 120 RILs and both parental lines to construct a binmap. Using a sliding window approach, all SNPs were grouped into 2,050 bins ( Figure S4, Table S8). The average length of the bins was 0.55 Mb, ranging from 100 kb to 83.5 Mbp. The total genetic distance of the bin map was 1,123.6 cM ( Table S9).
Using the same phenotypic information, a total of 17 QTLs were identified ( Table 2). Each QTL was named based on an abbreviation of the trait name and the chromosome number following 'PD_'. For each trait, three to five QTLs were detected, which were distributed throughout chromosomes 2, 3, 4, 5, 7, 9, 10, and 12. The phenotypic variation (R 2 ) explained by each QTL ranged from 8.3% (PD_FP9) to 38.5% (PD_FWd4).
Based on the mapping results, five minor QTLs for FL were detected on chromosomes 2, 3, 5, and 9 in one environment. For FWd, four minor QTLs were identified on chromosomes 3, 4, and 7. All major and minor QTLs for FWg were detected on chromosome 7 at 29.6 to 30.7 cM, explaining more than 18.4% (LOD >7.5) of the phenotypic variation (R 2 ) among four environments. For FP, three QTLs (PD_FP9, PD_FP10, PD_FP12.1) were commonly identified in two different environments but explained less than 10% of the phenotypic variation. However, the two remaining QTLs detected on chromosome 12 at 41.3 to 44.7 cM explained higher phenotype variation (>18%) in one environment. Except for PD_FWd3, all QTLs for FL, FWd, and FWg had positive additive effects, meaning that RILs with the maternal genotype had higher values, while all five QTLs for FP showed negative additive effects.
Using the same criteria as Han et al. (2016), QTLs detected in more than two environments with threshold R 2 values of 10% were considered to be major QTLs. Only one QTL, PD_FWg7.1, was identified as a major QTL for FWg, with R 2 (%) values ranging from 19.7 to 30.2. This large variation in R 2 values  indicates that FWg is highly affected by genotype × environment interactions in this population.

Candidate Genes Influencing Major Fruit-Related Domestication Traits Under Selection
Based on the GWAS results, we selected 64 significant LD blocks and 230 SNPs related to the five major fruit-related domestication traits to predict candidate genes using Blast2GO. Among the 111 genes identified in the significant LD blocks, 1, 70, 39, and 16 genes were correlated with FL, FWd, FWg, PT, and FP, respectively, with some duplication (Table S7). Based on their predicted functions and communality for two or more closely related traits, 16 genes appeared to have close correlations with major fruit-related domestication traits. First, a gene (CA.PGA v.1.6.scaffold517.20) located in the 211 Mb region of chromosome 4 in H04-0562 was strongly associated with FL. This gene, which is annotated as lowaffinity sulfate transporter 3-like, is located approximately 1.7 kb from S04_211848210. A gene in the same mapping region at a 4 7 k b d i s t a n c e f r o m S N P _ 2 1 1 8 4 8 2 1 0 , C A . P G A v.1.6.scaffold517.21, is annotated as Agamous-like MADS-box protein AGL104; this gene appears to be an important regulator of FL (Figures 3A, B).
In the 227 Mb region of chromosome 4, a single gene, CA.PGA v.1.6.scaffold1239.15, was detected for both PT and FWg. This gene, encoding growth-regulating factor 1-like, and is be closely linked with SNP S04_227983120, a significant SNP located inside the 2nd exon ( Figures 3C-E and 5A-C). The varieties carrying the G allele had heavier fruits with thicker pericarps than varieties carrying the C allele ( Figures 3E  and 5C).
Nine candidate genes are predicted to regulate FWg ( Figure  5). Of these, CA.PGAv.1.6.scaffold1239.15, which regulates PT, as described above, is located on chromosome 4 and contains a significant SNP inside its coding region ( Figures 5B, C).
Moreover, CA.PGAv.1.6.scaffold422.15, which is annotated as peroxidase 41-like, is located 229.8 kb away from the significant SNP S06_194967541, which was consistently identified all three years of the experiment ( Figure 5E). The seven remaining genes, which were commonly identified with FWd-associated genes, play roles in plant immunity and defense mechanisms ( Figures 5F-M).

DISCUSSION
GWAS is often used to explore the genetic basis of complex traits in field-grown and horticultural crops due to its efficient detection of many natural allelic variations underlying phenotypic diversity (Brachi et al., 2011). Despite its successful use, however, it is still difficult to link the trait of interest to causal genes due to the widespread existence of population structure inside the diversity panels (Pritchard et al., 2000;Zhang et al., 2009). Population stratification and cryptic relationships can generate spurious associations between phenotypes and unlinked SNPs, leading to false positives (Ioannidis, 2005;Moonesinghe et al., 2007). In the current study, we identified 64 significant LD blocks linked to fruit-related traits and uncovered 16 candidate genes as major genes related to pepper domestication.
Pepper germplasm accessions have been divided into subclusters based on species, geographical origin, fruit characteristics, or different routes of introduction (Nicolai et al., 2013;Lee et al., 2016). Similar to previous reports, we identified four distinct sub-populations of pepper based on Capsicum species classification. To improve the reliability and credibility of the association results, we performed GWAS using only the C. annuum sub-cluster, which contains a large number of accessions, with great phenotypic variability but without any strong population stratification. We generated 187,966 genomewide high-quality SNP markers from the C. annuum sub-cluster of the GWAS population using the GBS method. The LD blocks had an average size of 149 kb, indicating that at least 23,490 genome-wide SNPs are required for GWAS in pepper. Based on the estimated LD block size, the number SNP markers generated in this study is sufficient for GWAS in pepper. We analyzed marker-trait associations for five major fruitrelated domestication traits (FL, FWd, FWg, PT, FP) by GWAS. As a result, we identified 111 candidate genes within the 65 LD blocks. Of these, we selected 16 genes as strong candidate causal genes regulating fruit morphology according to the following criteria: 1) developmental genes known to be related to domestication in other plants; 2) genes within LD blocks containing significant SNPs detected in all three years of the study; and 3) SNP-containing genes associated with more than two traits.
T h r e e g e n e s ( C A . P G A v . 1 . 6 . s c a f f o l d 5 1 7 . 2 1 , CA.PGAv.1.6.scaffold3.10, CA.PGAv.1.6.scaffold5.16), which are annotated as Agamous-like MADS-box protein AGL104, transcription repressor OFP12-like, and leucine-rich repeat and IQ domain-containing protein 1-like isoform X3, respectively, satisfied the first criterion, as they belong to the MADS domain subfamily, Ovate Family Protein (OFP) family, and IQ domain family, respectively. The OFP and IQ domain gene families include the well-known ovate and sun genes in tomato (Rodriguez et al., 2011). A nonsense mutation in the ovate gene is responsible for the development of pear-shaped fruit instead of oval-shaped fruit in tomato (Wang et al., 2007;Schmitz et al., 2015). In Arabidopsis, this gene regulates the production of a gibberellic acid (GA) biosynthesis enzyme to control cell elongation (Wang et al., 2007). AGAMOUS-like (AGL) transcription factors, which belong to the plant type I MADS domain subfamily, regulate reproductive development. A number of AGL transcription factor genes are specifically expressed in the central cell of the female gametophyte and endosperm in Arabidopsis (Bemer et al., 2010). Two genes a s s o c i a t e d w i t h F P ( C A . P G A v . 1 . 6 . s c a ff o l d 1 3 6 8 . 1 , CA.PGAv.1.6.scaffold1387.3) are thought to be important candidates due to their regulation by auxin. The gene CA.PGAv.1.6.scaffold1387.3, which is annotated as BIG GRAIN 1-like A, is homologous to an auxin transport protein gene in Arabidopsis. This gene controls the adaxial-abaxial polarity of the pedicel (Yamaguchi et al., 2007), making it a good candidate gene for FP.
F i v e g e n e s ( C A . P G A v . 1 . 6 . s c a f f o l d 5 1 7 . 2 1 , CA.PGAv.1.6.scaffold517.20, CA.PGAv.1.6.scaffold422.15, CA.PGAv.1.6.scaffold5.32, and CA.PGAv.1.6.scaffold5.22) were chosen as candidate causal genes based on the second criterion: these genes are annotated as Agamous-like MADSbox protein AGL104, low-affinity sulfate transporter 3-like, peroxidase 41-like, elongation factor 1-beta-like, and uncharacterized protein LOC107842678, respectively. The first two genes, which are closely related to FL, are located at 211 Mbp on chromosome 4. In detail, CA.PGAv.1.6.scaffold517.20, a member of sulfate transporter family group 2, might be involved in the internal transport of sulfate between cellular or subcellular compartments within the plant (Hawkesford, 2003). Although sulfate is essential nutrient required for the biosynthesis of a wide range of sulfur-containing compounds, the functions of these genes in plants are unclear (Saito, 2000).
CA.PGAv.1.6.scaffold1239.15 encodes a Growth-regulating factor (GRF) that is correlated with both PT and FWg. GRFs are plant-specific transcription factors that were originally identified for their roles in stem and leaf development. Recent studies have highlighted their importance in other central developmental processes including flower and seed formation, root development, and the coordination of growth processes under adverse environmental conditions (Omidbakhshfard et al., 2015). We subjected the results of our phenotypic survey (conducted for three years to examine morphological traits in pepper) to Pearson correlation (r) analysis, which also supported the GWAS results.
A comparison of the QTLs mapped based on the PDRIL and GWAS results from the GWAS population revealed only one common genetic area associated with FP. Region 141.6 to 144.6 Mbp on chromosome 12 contains three QTLs (PD_FP12.1, PD_FP12.2, PD_FP12.3) and two signifi cant SNPs (S12_143380249, S12_143380271). Inside this common area, two genes were identified (CA.PGAv.1.6.scaffold18.1, and CA.PGAv.1.6.scaffold172.10); these genes are annotated as Lascorbate oxidase and ELKS/Rab6-interacting/CAST family member 1 isoform X1, respectively. Although the functional relevance of these candidate genes requires further validation, based on their putative functions, they represent strong candidate genes involved in pepper domestication. Among the 17 detected QTLs, one major QTL for FWg, PD_FWg7.1, spanning around 68.9 to 73.6 Mbp on chromosome 7 was identified. In this position, we were able to detect a relatively high peak than the surrounding area in GWAS. However, the P-values of those SNPs (-log10 P-value <2.9) did not pass a significant threshold. Some QTL positions for FWg and FWd were corresponding to QTLs reported by Wu et al. (2019). Unexpectedly, however, most QTLs or significant SNPs in QTL analysis and GWAS for fruit traits were not common. This may be due to several reasons including Beavis effect, differences in models or fundamental differences analysis as suggested Hansson et al. (2018).
In summary, we successfully used GWAS to identify genes responsible for major fruit-related traits in pepper. The significant haplotypes identified in this study provide unique molecular footprints for developing markers for pre-breeding or genomic selection. Future functional validation of the candidate genes identified in this study should provide additional targets for the improvement of major horticultural traits in pepper via breeding.

DATA AVAILABILITY STATEMENT
The datasets generated for this study can be found in The National Agricultural Biotechnology Information Center http:// nabic.rda.go.kr/, ID NV-0630-000001.