Characterization of the USDA Cucurbita pepo, C. moschata, and C. maxima germplasm collections

The Cucurbita genus is home to a number of economically and culturally important species. We present the analysis of genotype data generated through genotyping-by-sequencing of the USDA germplasm collections of Cucurbita pepo, C. moschata, and C. maxima. These collections include a mixture of wild, landrace, and cultivated specimens from all over the world. Roughly 1,500 - 32,000 high-quality single nucleotide polymorphisms (SNPs) were called in each of the collections, which ranged in size from 314 to 829 accessions. Genomic analyses were conducted to characterize the diversity in each of the species. Analysis revealed extensive structure corresponding to a combination of geographical origin and morphotype/market class. Genome-wide associate studies (GWAS) were conducted using both historical and contemporary data. Signals were observed for several traits, but the strongest was for the bush (Bu) gene in C. pepo. Analysis of genomic heritability, together with population structure and GWAS results, was used to demonstrate a close alignment of seed size in C. pepo, maturity in C. moschata, and plant habit in C. maxima with genetic subgroups. These data represent a large, valuable collection of sequenced Cucurbita that can be used to direct the maintenance of genetic diversity, for developing breeding resources, and to help prioritize whole-genome re-sequencing.


Introduction
The Cucurbitaceae (Cucurbit) family is home to a number of vining species mostly cultivated for their fruits. This diverse and economically important family includes cucumber (Cucumis sativus), melon (Cucumis melo), watermelon (Citrullus lanatus), and squash (Cucurbita ssp.) (Ferriol and Pico, 2008). Like other cucurbits, squash exhibit diversity in growth habit, fruit morphology, metabolite content, disease resistance, and have a nuanced domestication story (Paris and Brown, 2005;Chomicki et al., 2020). The genomes of Cucurbita ssp. are small (roughly 400 Mb), but result from complex interactions between ancient genomes brought together through an allopolyploidization event (Sun et al., 2017). These factors make squash an excellent model for understanding the biology of genomes, fruit development, and domestication. Within Cucurbita, three species are broadly cultivated: C. maxima, C. moschata, and C. pepo (Ferriol and Pico, 2008). Few genomic resources have been available for these species; although, draft genomes and annotations, along with webbased tools and other genomics data are emerging (Yu et al., 2023). Already, these resources have been used to elucidate the genetics of fruit quality, growth habit, disease resistance, as well as to increase the efficiency of cucurbit improvement (Montero-Pau et al., 2017;Zhong et al., 2017;Kazḿińska et al., 2018;Xanthopoulou et al., 2019;Wu et al., 2019a;Hernandez et al., 2020). However, there has yet to be a comprehensive survey of the genetic diversity in the large diverse Cucurbita germplasm panels maintained by the USDA within the National Plant Germplasm System.
Germplasm collections play a vital role in maintaining and preserving genetic variation. These collections can be mined by breeders for valuable alleles. They can also be used by geneticists and biologists for mapping studies (McCouch et al., 2020). Like many other orphan and specialty crops, there has been little effort put into developing community genetic resources for squash and other cucurbits. The Cucurbit Coordinated Agricultural Project (CucCAP project) was established to help close the knowledge gap in cucurbits (Grumet et al., 2021). This collaborative project aims to provide genomics resources and tools that can aid in both applied breeding and basic research. The genetic and phenotypic diversity present in the USDA watermelon, melon, and cucumber collections has already been explored as part of the CucCAP project, partially through the sequencing of USDA germplasm collections and development of core collections for whole-genome sequencing (Wang et al., 2018;Wu et al., 2019b;Wang et al., 2021). The diverse specimens of the USDA squash collections have yet to be well characterized at the genetic level. An understanding of squash diversity requires an appreciation of the elaborate system used to classify squash.
The classification system used in squash is complex. Squash from each species can be classified as either winter or summer squash depending on whether the fruit is consumed at an immature or mature stage, the latter is a winter squash (Loy, 2004). Squash are considered ornamental if they are used for decoration, and some irregularly shaped, inedible ornamental squash are called gourds. Gourds, however, include members of Cucurbita as well as some species from Lagenaria, and as a result, not all gourds are squash (Paris, 2015). Many squash are known as pumpkins; the pumpkin designation is a culture dependent colloquialism that can refer to Jack O' Lantern types, squash used for desserts or, in some Latin American countries, to eating squash from C. moschata known locally as Calabaza (Ferriol and Pico, 2008). Cultivars deemed as pumpkins can be found in all widely cultivated squash species. Unlike the previous groupings, morphotypes/market classes are defined within species. For example, a Zucchini is reliably a member of C. pepo and Buttercups are from C. maxima. Adding to the complexity of their classification, the Cucurbita species are believed to have arisen from independent domestication events and the relationships between cultivated and wild species remain poorly understood (Kates et al., 2017).
C. pepo is the most economically important of the Cucurbita species and is split into two different subspecies: C. pepo subsp. pepo and C. pepo subsp. ovifera (Xanthopoulou et al., 2019). Evidence points to Mexico as the center of origin for pepo and southwest/ central United States as the origin of ovifera. The progenitor of ovifera is considered by some to be subsp. ovifera var. texana, whereas subsp. fraterna is a candidate progenitor for pepo (Kates et al., 2017). Europe played a crucial role as a secondary center of diversification for subsp. pepo, but not subsp. ovifera (Lust and Paris, 2016). Important morphoptypes of pepo include Zucchini, Spaghetti, Cocozelle, Vegetable Marrow, and some ornamental pumpkins. C. pepo subsp. ovifera includes summer squash from the Crookneck, Scallop, and Straightneck group, and winter squash such as Delicata and Acorn (Paris et al., 2012).
The origin of C. moschata is more uncertain than C. pepo; it is unclear whether C. moschata has its origin in South or North America (Chomicki et al., 2020). Where and when domestication occurred for this species is also unknown; however, it is known that C. moschata had an India-Myanmar secondary center of origin where the species was further diversified (Sun et al., 2017). C. moschata plays an important role in squash breeding as it is crossfertile to various degrees with C. pepo and C. maxima, and can thus be used as a bridge to move genes across species (Sun et al., 2017). Popular market classes of C. moschata include cheese types like Dickinson, which is widely used for canned pumpkin products, Butternut (Neck) types, Japonica, and tropical pumpkins known as Calabaza (Ferriol and Pico, 2008).
C. maxima contains many popular winter squash including Buttercup/Kabocha types, Kuri, Hubbard, and Banana squash (Ferriol and Pico, 2008). This species also sports the world's largest fruit, the giant pumpkin, whose fruit are grown for competition and can reach well over 1000 Kg (Savage et al., 2015). Although this species exhibits a wide range of phenotypic diversity in terms of fruit characteristics, it appears to be the least genetically diverse of the three species described (Kates et al., 2017). C. maxima is believed to have a South American origin, and was likely domesticated near Peru, with a secondary center of domestication in Japan and China (Nee, 1990;Sun et al., 2017).
In this study, we set out to characterize the genetic diversity present in the USDA Cucurbita germplasm collections for C. pepo, C. moschata, and C. maxima. We present genotyping-bysequencing (GBS) data from each of these collections, population genomics analysis, results from genome-wide association studies (GWAS) using historical and contemporary phenotypes, and suggest a core panel for re-sequencing.

Plant materials and genotyping
All available germplasm were requested from USDA cooperators for C. maxima (534 accessions from Geneva, NY), C. moschata (314 accessions from Griffin, GA), and C. pepo (829 accessions from Ames, IA). Seeds were planted in 50-cell trays and two 19 mm punches of tissue (approximately 80-150 mg) was sampled from the first true leaf of each seedling. DNA was extracted using Omega Mag-Bind Plant DNA DS kits (M1130, Omega Bio-Tek, Norcross, GA) and quantified using Quant-iT PicoGreen dsDNA Kit (Invitrogen, Carlsbad, CA). Purified DNA was shipped to Cornell's Genomic Diversity Facility for GBS library preparation using protocols optimized for each species. Libraries were sequenced at either 96, 192, or 384-plex on the HiSeq 2500 (Illumina Inc., USA) with single-end mode and a read length of 101 bp.

Variant calling and filtering
SNP calling was conducted using the TASSEL-GBS V5 pipeline (Glaubitz et al., 2014). Tags produced by this pipeline were aligned using the default settings of the BWA aligner (Li and Durbin, 2009). Raw variants were filtered using BCFtools (Danecek et al., 2021). Settings for filtering SNPs were as follows, minor allele frequency (MAF) ≥ 0.05, missingness ≤ 0.4, and biallelic. Nine genotypes were removed based on missing data and preliminary PCA results in C. maxima. One genotype was removed from extitC. pepo (See Supplemental Info S1). Variants were further filtered for specific uses as described below.

Population genomics analysis
ADMIXTURE (Alexander and Lange, 2011), which uses a model-based approach to infer ancestral populations (k) and admixture proportions in a given sample, was used to explore population structure in each dataset. ADMIXTURE does not model linkage disequilibrium (LD); thus, marker sets were further filtered to obtain SNPs in approximate linkage equilibrium using the "-indep-pairwise" option in PLINK (Purcell et al., 2007) with r 2 set to 0.1, a window size of 50 SNPs, and a 10 SNP step size. All samples labeled as cultivars or breeding material were removed from the data prior to running ADMIXTURE. These samples were removed to prevent structure created through breeding from appearing as ancestral populations. Ancestral populations were then assigned to cultivars after training on data without the cultivars using the program's projection feature. Cross-validation was used to determine the best k value for each species. Briefly, ADMIXTURE was run with different values (1-20) and the cross-validation error was reported for each k. The most parsimonious k value with minimal cross-validation error was chosen for each species.
Principal components analysis (PCA) was used as a model-free way of determining population structure. PCA was conducted using SNPRelate (Zheng et al., 2012) on the same LD-pruned data used by ADMIXTURE.
Linkage disequilibrium was calculated in each germplasm panel using VCFtools (Danecek et al., 2011) with the settings "-geno-r2ld-window 1000". Filtered, but not pruned, data were used for the LD calculation.

Analysis of phenotypic data
Historical data were obtained from the USDA Germplasm Resources Information Network (GRIN; www.ars-grin.gov) for C. maxima, C. pepo, and C. moschata. All duplicated entries were removed for qualitative traits, where categories are mutually exclusive, leaving only samples with unique entries for analysis. Phenotypic data from two traits, adult and nymph squash bug damage, in C. pepo were transformed using the boxcox procedure. Contemporary phenotypic data were collected from a subset of the C. pepo collection grown in the summer of 2018 in Ithaca, NY. Field-grown plants were phenotyped for vining bush habit at three different stages during the growing seasons to confirm bush, semibush or vining growth habit. Plants that had a bush habit early in the season but started to vine at the end of the season were considered semi-bush.

GWAS
Variant data were filtered to MAF ≥ 0.05 and missingness ≤ 0.2, and then imputed prior to association analysis. LinkImpute (Money et al., 2015), as implemented by the TASSEL (Bradbury et al., 2007) "LDKNNiImputatioHetV2Plugin" plugin was used for imputation with default settings. Any data still missing after this process were mean imputed. The GENESIS (Gogarten et al., 2019) R package, which can model both binary and continuous traits, was used for conducting the associations. All models included the first two PCs of the marker matrix as fixed effects and modeled genotype effect (u) as a random effect distributed according to the kinship (K) matrix (u ∼ N(0, s 2 u K)). Binary traits were modeled using the logistic regression feature of GENESIS. The kinship matrix was calculated using A.mat from rrBLUP (Endelman, 2011) with mean imputation.

Genomic heritability
An estimate of genomic heritability (h 2 G ) (de los Campos et al., 2015) was calculated for all ordinal and quantitative traits using an equivalent model to what was used for GWAS, but without fixed effects. Variance components from the random genetic effect (s 2 u ) and error (s 2 e ) were then used to calculate the heritability as h 2 G = s 2 u s 2 u +s 2 e .

Syntenty of Bu putative region in C. pepo and C. maxima
A candidate gene for dwarfism (bush phenotype), Bu, in C. maxima was elucidated by a previous study and was named Cma_004516 (Zhang et al., 2015). Gene ID in the Cucurbit Genomics Database corresponding to Cma_004516 was identified by using the BLAST tool to align primer sequences used for RT-QPCR in the previous study (Zhang et al., 2015) against the C. maxima reference genome. The synteny analysis was done by using the Synteny Viewer tool and evaluating C. maxima's chromosome 3 with C. pepo's chromosome 10 and searching for an ortholog to the candidate gene. The physical position of the C. pepo ortholog was identified by searching the gene using the Search tool. All tools used in the analysis can be found on the Cucurbit Genomics Database at cucurbitgenomics.org/v2/.

Identification of a core collection
Subsets representative of each panel's genetic diversity were identified using GenoCore (Jeong et al., 2017) with the filtered SNP sets. The GenoCore settings were "-cv 99 -d 0.001".

Genotyping
Each Cucurbita ssp. collection was genotyped using the GBS approach. The collections comprised 534 accessions for C. maxima, 314 for C. moschata, and 829 for C. pepo. Figure 1 shows the geographical distribution of accessions broken down by species. C. maxima and C. moschata constitute the majority of accessions collected from Central and South America, whereas C. pepo accessions are more prevalent in North America and Europe. C. pepo had the highest number of raw SNPs (88,437) followed by C. moschata (72,025) and C. maxima (56,598). After filtering, C. pepo and C. moschata had a similar number of SNPs, around 30,000, whereas C. maxima had an order of magnitude fewer filtered SNPs (1599). This discrepancy may be an artifact of using PstI, a rarer base-cutter previously optimized for GBS of C. maxima [46], rather than ApeKI which was used for C. pepo and C. moschata. The number and distribution of SNPs across each chromosomes is shown in Table 1. Maps of SNP distribution for each species are shown in Supplemental Figure S1.

Population structure and genetic diversity
Filtered SNPs were used for population structure analysis. Available geographical, phenotypic, and other metadata were retrieved from GRIN and were used to help interpret structure results. Results from model-based admixture analysis are shown in Figure 2A. These data support 10 ancestral groups (K=10) in C. pepo, 6 in C. moschata, and 6 in C. maxima. The number of groups was based on the cross-validation error output of ADMIXTURE shown in Figure 3. For C. pepo and C. moschata, a clear minimum was reached. The optimal k for both roughly agreed with the number of known morpho-market classes and/or subspecies. In C. maxima, a local minimum was reach at k = 6 followed by a slight decrease after k = 8. For the sake of parsimony, and consistency with known morpho-market classes in C. maxima, a k of 6 was chosen. Population structure was driven mostly by geography, except in C. pepo where the presence of different subspecies was responsible for some of the structure. Commonalities among structure groups are described in Table 2. The first two principal components (PCs) of the marker data are shown in Figure 2B. As with the model-based analysis, PCA showed geography as a main driver of population structure with accessions being derived from Africa, the Arab States, Asia, Europe, North America, and South/Latin America. PC1 in C. pepo separates C. pepo subsp. ovifera, which have a North American origin, from subsp. pepo.
Ancestry proportions from admixture analysis were projected onto cultivars/market types identified in the accessions. Cultivars Geographical distribution of the USDA Cucurbita ssp. collection. The size of the pie chart is scaled according to the number of accessions. Sector areas correspond to the proportion of the three.
were grouped according to known market class within species to help identify patterns in ancestry among and between market classes. Key market types identified in accessions from C. pepo include Acorn, Scallop, Crook, Pumpkin (Jack O' Lantern), Zucchini, Marrow, Gem, and Spaghetti; Neck, Cheese, Japonica, and Calabaza in C. moschata; and Buttercup, Kobocha, Hubbard, and Show (Giant squash) in C. maxima. These groupings are shown in Figure 4. In general, members of each market class exhibit similar ancestry proportions. In C. pepo, market classes from the two different subspecies had distinct ancestry patterns. For example, Acorn, Scallop and Crook market classes are all from subsp. ovifera and all of these classes had similar ancestry proportions with roughly 20% of ancestry from the wild ovifera. In contrast, market classes within subsp. pepo had a small percentage of ancestry from wild ovifera and more ancestry in common with European and Asian accessions. With C. moschata, Neck, Cheese, and Calabaza market classes showed every similar ancestry patterns, whereas the Japonica class was more distinct. Relative to the C. pepo and C. moschata, the C. maxima cultivars were less differentiated from one another.
Results from linkage-disequilibrium analysis are shown in Figure 5. Similar trends are seen across species. In general, LD decays to zero once the distance between markers reaches more than 2 megabases (Mb). C. pepo maintains a higher LD, with an average R-squared between markers of 0.1 even beyond 2 Mb.

Analysis of phenotypic data
All historical phenotypic data from GRIN were compiled for analysis. Only traits with ≥ 100 entries were considered for further analysis. Filtering resulted in 26 traits for C. pepo, 5 for C. moschata and 16 for C. maxima. Traits spanned fruit and agronomic-related characteristics, as well as pest resistances. The number of records for a given trait ranged from 108 to 822, with an average of 270. Fruit traits included fruit width, length, surface color and texture, and flesh color and thickness. Agronomic data included plant vigor and vining habit, and several phenotypes related to maturity. Pestrelated traits included susceptibility to cucumber beetle and squash bug in C. pepo and Watermelon mosaic virus (WMV) and powdery mildew (PM) in C. maxima. Supplemental Figure S2 shows the distribution for each quantitative trait. Phenotypic data were superimposed over the first two PCs in each species to visualize correspondence between population structure and phenotype. Results are shown in Figure 6. In C. pepo, seed size was almost completely confounded with subspecies, with subsp. ovifera having mostly small seeds and subsp. pepo having larger seeds ( Figure 6A). In C. moschata, maturity was confounded with population structure ( Figure 6B). In C. maxima, plant habit was confounded with population structure ( Figure 6C).

Genomic heritability
An estimate of genomic heritability was calculated for all quantitative and ordinal traits and is shown in Table 3. In C. pepo, seed weight and morphological traits such as fruit length and width had very high (> 0.7) heritability estimates. Disease and insect resistance traits had lower heritabilites from 0.181-0.228. Trends were similar in both C. moschata and C. maxima, with C. maxima having lower heritability estimates across the board.

Genome-wide association and synteny analysis
Genome-wide association studies were conducted for all traits using a standard mixed-model K + Q analysis. A weak signal was detected in C. moschata on chromosome 3 for fruit length. Weak signals were detected in C. maxima for fruit ribbing on chromosome 17 and green fruit on chromosome 20. Five phenotypes were significantly associated with SNPs in C. pepo: bush/vine plant architecture on chromosome 10 using contemporary and historic data, fruit flesh thickness on chromosome 2, green fruit on chromosomes 2 and 19, and a non-significant, but clear signal for flesh color on chromosome 5. Weaker associations are shown in Supplemental Figure S3 with corresponding qqplots in Figure S4. The top five SNPs associated with each trait are shown in Supplemental Table S1.
The bush/vine phenotype in C. pepo exhibited the strongest signal. The signal was present in both the historical and contemporary data. This historical data consisted of 404 records and the contemporary data had 292 records. The two data sets overlapped by 92 accession records. Manhattan plots for the Bu gene GWAS results are shown in Figure 7A. Along with corresponding qqplots in Figure 7B. The genomic region corresponding to the signal was extracted and used for comparison against the candidate gene for dwarfism in C. maxima, CmaCh03G013600. The gene Cp4.1LG10g05740 on chromosome 10 in C. pepo was found to be orthologous to CmaCh03G013600 and coincides with the region significantly associated with the bush/vine plant architecture phenotype identified by GWAS in the C. pepo collection.

Development of a core collection
A core set of accessions that covered over 99% of total genetic diversity was identified in each of the panels. Roughly 5%-10% of the accessions were required to capture the genetic diversity in the panels (see Figure 8). This amounted to 117 accessions in C. pepo, 72 in C. moschata, and 72 in C. maxima.

FIGURE 3
Cross-validation error plots used to pick the optimum k value for admixture analysis. The k value that balances minimizing cross-validation error and parsimony was chosen for the final analysis. The chosen k is labeled with a red point.

Discussion
Cucurbita pepo, C. moschata, and C. maxima exhibit a wide range of phenotypic diversity. This diversity is evident in the GRIN phenotypic records for these species. We have demonstrated that there is also a wide range of genetic diversity through genotyping-bysequencing and genetic analysis of available specimens from the germplasm collections. Thousands to tens of thousands of wholegenome markers where discovered for each species. Clustering of samples and admixture analysis produced results that align closely Ancestry coefficients projected on cultivars from each species. Results are shown grouped by market/varietal class. Colors correspond to the groups in Figure 3A. Curves showing r 2 value as a measure of LD on the y-axis and pair-wise distance between markers in megabases on the x-axis. PCA plots with phenotypes superimposed over them for: (A) 100 seed weight in C. pepo; (B) maturity in C. moschata; (C) plant habit in C. maxima. with known secondary centers of origin in all species. This was especially clear in our analysis of the C. pepo collection. Cucurbita pepo has its origin in the New World, with a secondary center of diversification in Europe. This pattern was conspicuous in our PCA. Analysis of the admixture patterns within common market classes mirrored the results of the broader diversity panel. For example, it is well known that the Acorn, Scallop and Crook type C. pepo were primarily developed in the Americas, whereas Zucchini, Marrow, and Gem squash were developed in Europe. Thus, it is not surprising that Acorn, Scallop, and Crook types have a large proportion of subsp. ovifera in their background. Likewise, the Neck, Cheese, and Calabaza types have their origins in the Americas, whereas the Japonica type has more shared ancestry with Asian landraces. The various C. maxima market classes were less distinct from one another. Morphologically, many of the classes (Buttercup, Kabocha, and Kuri) are very similar, so it is not surprising that their admixture proportions are similar. Linkage decay curves showed a common pattern across all species, with the correlation between markers falling off precipitously around 2 Mb. Relative to the other two species, C. pepo had a higher baseline LD. This is likely due to the presence of two distinct subspecies, subsp. ovifera and subsp. pepo, in the C. pepo panel. In general, the three Cucurbita species studied have much higher LD than other outcrosses, such as maize. Studies in maize have shown that LD drops off within kilobases rather than megabases in diverse accessions (Yan et al., 2009). This suggests that the effective population size of Cucurbita species is much smaller than other agricultural species, and is consistent with studies looking at smaller panels in Cucurbita (Xanthopoulou et al., 2019). Although we have fewer markers in C. maxima, it is likely that the number of markers is sufficient to pick up major population structure in the panel given the extent of LD and clear results observed in the PCA.
Our GWAS analysis using contemporary and historic plant habit data led to the mapping of a locus on chromosome 10 associated with the bush/vine phenotype. It is notable that the contemporary and historical data were on different accessions, overlap of less than half. These associations represent validation using two distinct panels. This locus is likely the bush gene (Bu) GWAS results for the Bu gene in C. pepo. (A) shows the association results using the historical data with accompanying Q-Q plot. (B) shows the results using the contemporary data set. Both analyses supported an association on chromosome 10. locus that has been finely mapped to this location in previous C. pepo studies (Xiang et al., 2018;Ding et al., 2021). Although our GWAS hit does not constitute a novel gene association, it does demonstrate that the Bu locus, previously mapped in biparental populations, is also the primary driver of the bush phenotype in diverse germplasm. Thus, this locus is likely to have utility across a wide array of germplasm. We also demonstrated that this locus is syntenic with the bush gene previously mapped in C. maxima (Zhang et al., 2015). Recent work has also identified a bush gene in C. moschata, and underscores the importance of this trait for productivity in cucurbits (Wang et al., 2022). There are many other developmental and morphological traits shared across Cucurbita (Paris and Brown, 2005). Our results demonstrate the power of leveraging information across species within Cucurbita, and suggests the potential of transferring knowledge from the more studied C. pepo to C. moschata and C. maxima.
Few clear signals were detected for traits outside of plant habit in C. pepo. The goal of the USDA GRIN collection is to maintain genetic diversity, not necessarily true breeding stocks. Given that each species is out-crossing, there is inevitably heterogeneity in stocks. Heterogeneity was undoubtedly a complicating factor in our study. There would be a great benefit from phenotyping and genotyping stocks purified from the USDA collection; however, such an experiment was well outside the scope of this study. A further complicating factor of GWAS is trait architecture. Traits with a more complex architecture are not amenable to GWAS analysis, as complex traits are often governed by many loci of small effect. These traits are better targets for prediction using genomewide markers (Meuwissen et al., 2001). We accessed the ability of whole-genome markers to capture trait variability by calculating genomic heritability for all quantitative and ordinal traits. These estimates were high for many of the morphological and agronomic traits in each species. Yet, no major loci were detected for these same traits via GWAS. This points towards these traits having a more complex trait architecture. The moderate to high genomic heritability observed for morphological traits in this study is consistent with other estimates in squash (Hernandez et al., 2020).
High genomic heritability estimates with no significant association is a hallmark of more complex traits. A complex trait architecture is not the only explanation though. Confounding of a phenotype with population structure can lead to a similar outcome -the K + Q model will remove the association, but the genomic heritability will remain high. We observed that seed weight in C. pepo, maturity in C. moschata, and plant habit in C. maxima were strongly associated with population (see Figure 6). Association of plant habit with population structure in C. maxima helps explain why we were unable to recapitulate the known major effect Bu locus. A good approach for future studies hoping to elucidate loci underlying these traits with the germplasm panels presented would be to form biparental or multiparental populations across genetic groups to break up structure. A similar approach was used to map genes related to cucurbitacin content associated with subspecies in C. pepo (Brzozowski et al., 2020).
Our data provides many genome-wide markers which could be used as a source of markers to develop marker panels for use in breeding applications, as has been done in other crops (Arbelaez et al., 2019). Possible breeding applications would include marker assisted selection, marker assisted backcrossing, and purity assessment of seedstock using a low density panel; whereas, a medium density panel could be developed for routine genomic selection (Cerioli et al., 2022). Our clustering of samples based on marker data suggest geography is a key driver for overall population structure. When projecting ancestry proportions onto cultivars of known market classes, the ancestry proportions were relatively similar within market class grouping. Although there is genetic diversity within each species, this diversity is constrained within market classes. This suggests that crosses between these market classes would greatly increase the amount of genetic diversity to be leveraged in breeding efforts. Crossing between market classes would come at the cost of bringing in undesirable characteristics with regard to achieving a specific morphotype associated market class. This cost could be mitigated through the use of markers to recover morphotype expeditiously during pre-breeding (Cobb et al., 2019).
Our data provides a useful starting point for future studies. In the case where traits are common in the panel, the panel can be phenotyped for a trait of interest and combined with marker data and insight provided by our study. We demonstrated this approach in our association analysis of the bush gene. In the case of a rare phenotype, such as a resistance gene, subsets of the germplasm and markers should be used to develop custom populations. Plant introductions (PI) are frequently used as source parents in mapping studies and for germplasm improvement, as was the case for mapping Phytophthora capsici resistance and developing resistant breeding lines (LaPlant et al., 2020;Vogel et al., 2021). We found some traits that had high heritability, such as morphological traits, but we were not able to find any associations. Genomic predication rather than association may be the best approach for these traits. In other cases, it may be required to break population structure through crossing as we observed with seed weight in C. pepo, maturity in C. moschata, and plant habit it C. maxima. Certain applications, such as the creation of a hapmap or diversity atlas, require higher density resequencing data. Our GenoCore analysis provides subsets that will be useful in these efforts.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

Author contributions
CH wrote the first draft. MM, ZF, and RG provided project oversight. CH, JF, and KB conducted data analysis. KR and JL assisted with data curation and germplasm selection. MM, RG and ZF designed the experiment. JL contributed data. All authors contributed to the article and approved the submitted version.