Genetic mapping identifies genomic regions and candidate genes for seed weight and shelling percentage in groundnut

Seed size is not only a yield-related trait but also an important measure to determine the commercial value of groundnut in the international market. For instance, small size is preferred in oil production, whereas large-sized seeds are preferred in confectioneries. In order to identify the genomic regions associated with 100-seed weight (HSW) and shelling percentage (SHP), the recombinant inbred line (RIL) population (Chico × ICGV 02251) of 352 individuals was phenotyped for three seasons and genotyped with an Axiom_Arachis array containing 58K SNPs. A genetic map with 4199 SNP loci was constructed, spanning a map distance of 2708.36 cM. QTL analysis identified six QTLs for SHP, with three consistent QTLs on chromosomes A05, A08, and B10. Similarly, for HSW, seven QTLs located on chromosomes A01, A02, A04, A10, B05, B06, and B09 were identified. BIG SEED locus and spermidine synthase candidate genes associated with seed weight were identified in the QTL region on chromosome B09. Laccase, fibre protein, lipid transfer protein, senescence-associated protein, and disease-resistant NBS-LRR proteins were identified in the QTL regions associated with shelling percentage. The associated markers for major-effect QTLs for both traits successfully distinguished between the small- and large-seeded RILs. QTLs identified for HSW and SHP can be used for developing potential selectable markers to improve the cultivars with desired seed size and shelling percentage to meet the demands of confectionery industries.


Introduction
Groundnut or peanut is a self-pollinated, allotetraploid (AABB) (2n = 4X = 40), and leguminous oilseed crop with~2.7 GB genome size (Bertioli et al., 2019). Presently, groundnut is cultivated globally on 36.18 million hectares of area, yielding 71.68 million tonnes of pods in the year 2020 (FAOSTAT, 2020). Groundnut seeds contain the most nutritious components; 100 g of groundnuts contains proteins (16 g), oil or fat (49 g), carbohydrate (26 g), and dietary fibres (9 g) (Parmar et al., 2022). The estimated demand of edible oil based on current population projections and per capita consumption is likely to be 240 metric tonnes by 2050, which is nearly twice the current requirement (Corley, 2009). Improvement of yield and quality traits are the major objectives of many groundnut breeding programs Pandey et al., 2020a). Groundnut yield is influenced by hundred-seed weight (HSW), shelling percentage (SHP), and number of seeds per pod. Among them, HSW and seed number per pod are important allied traits that are positively correlated with groundnut yield. HSW is an important yield attributing trait that is positively correlated with yield per plant. Being a quantitative trait, seed weight is controlled by multiple genes and also influenced by the environment. Therefore, understanding the regulation of seed size has always been an important area of research for groundnut improvement.
In the pre-genomic era, efforts on genetic mapping for seed weight produced large QTL intervals, making identification of the key candidate genes very difficult (Varshney et al., 2009;Mondal & Badigannavar, 2019). QTLs for seed size and pod size in cultivated and wild relatives of groundnut were discovered using an advanced backcross population (Fonceka et al., 2012). Consistent QTLs for seed weight on chromosome A07 and B06 and for shelling percentage on A10 and B06 were identified using an SSR-based genetic map (Chen et al., 2017). During the last few years, since the availability of groundnut diploid genomes (Bertioli et al., 2016;Chen et al., 2016) and tetraploid genomes (Bertioli et al., 2019;Chen et al., 2019;Zhuang et al., 2019), several sequencing-based trait mapping efforts have been reported to fine map the genomic regions for key traits in groundnut. The discovered candidate genes include those associated with leaf rust and late leaf spot resistance (Pandey et al., 2017a), stem rot resistance (Dodia et al., 2019), fresh seed dormancy (Kumar et al., 2020), shelling percentage (Luo et al., 2019a), bacterial wilt resistance (Luo et al., 2019b), early leaf spot and late leaf spot resistance (Agarwal et al., 2018), tomato spotted wilt virus resistance (Agarwal et al., 2019), and yield-related traits (Jadhav et al., 2021). In the post-genomic era, a significant amount of genomic resources at the genome and transcriptome level have been developed in groundnut (Pandey et al., 2020a). The transcriptome map for subsp. hypogaea was developed to understand the differential expression of genes at various growth stages (Clevenger et al., 2016). Moreover, another gene expression atlas for subsp. fastigiata was developed for 20 tissues at various developmental stages, ranging from the seedling stage to the maturity stage (Sinha et al., 2020).
The development of high-density SNP chips in groundnut allowed the construction of high-density genetic maps that helped to saturate the large QTL intervals (Pandey et al., 2017b). SNP arrays were successfully used to dissect the yield-related traits (Pandey et al., 2020b), root-knot nematode resistance (Ballén-Taborda et al., 2019), stem rot resistance (Luo et al., 2020), late leaf spot resistance (Han et al., 2018;Zhang et al., 2020), fresh seed dormancy (Wang et al., 2022), leaf chlorophyll content (Zou et al., 2022), salinity tolerance (Zou et al., 2020a), background genome recovery during a marker-assisted backcross selection in groundnut (Shasidhar et al., 2020), and germplasm diversity analysis (Nabi et al., 2021). Specific locus amplified fragment sequencing (SLAF-seq)-based high-density genetic map and phenotyping under multiple environments identified a total of 27 QTLs for seed weight, seed length, and width . Recently, US-based nested association mapping (NAM) populations genotyped with a 58K SNP array discovered the genomic regions associated with seed and pod weights in groundnut . The SSR and SNP array-based genetic map identified the major genomic region on chromosome B06 and homologous region on chromosome A07/B07 . It has also been reported that in the US mini-core collection, the QTL on chromosome A05 is conserved with major effects on groundnut seed size . Similarly, in Chinese germplasm, the QTL on chromosome A05 was identified for the seed number per pod . Recently, the 58K SNP array was used to identify genomic regions associated with the seed aspect ratio (length width ratio) using GWAS on the US mini-core and Korean germplasm (Zou et al., 2020b). Although a large number of studies have been conducted globally to identify the QTLs for groundnut seed weight, there are limited reports on candidate genes or diagnostic markers for genomic-assisted breeding to improve seed weight.
Therefore, in order to identify the genomic regions and candidate genes associated with HSW and SHP, we developed a recombinant inbred line (RIL) population (Chico × ICGV 02251). The parental genotypes included a large-seeded cultivar "ICGV 02251" and a small-seeded germplasm line "Chico." A 58K highdensity SNP array was used to genotype the RIL population along with the two parents to construct a dense genetic map. The genetic map along with genotyping data and multiple seasons phenotyping data was used to identify the genomic regions associated with HSW and SHP in groundnut.

Plant material and phenotyping
A RIL population (Chico × ICGV 02251) comprising 352 RILs was developed by crossing Chico and ICGV 02251 and advanced using the single-seed decent (SSD) method. Both parents were allotetraploid (AABB), Arachis hypogaea and subsp. fastigiata, and the male parent, ICGV 02251, a late-maturing, Virginia bunch, has a significantly higher HSW and larger pod size than the female parent. Chico is an early-maturing Spanish bunch; a selection from PI 268661 was released in 1973 by the United States Department of Agriculture (USDA) in Georgia, Virginia, and Oklahoma (Bailey and Hammons, 1975). The RIL population was phenotyped for three seasons at ICRISAT, Patancheru, Hyderabad (India) during post-rainy 2013-14 (S1), rainy 2014 (S2), and rainy 2019 (S3). During each season, the RILs and parental lines were planted in three replications with a spacing of 30 × 10 cm in two rows of 2 m, with standard agronomic practices. The weather data for monthly average high and low temperatures and rainfall (mm) for the years 2013, 2014, and 2019 are shown in Supplementary Figure S1. The weight of 100 mature groundnut seeds from each RIL was measured as hundred-seed weight (gm), while for shelling percentage, 100 gm pods were shelled and the seed weight from these pods (gm) was divided by pod's weight and multiplied by 100 as a measure of the shelling percentage (%). The multi-season phenotypic data for SHP and HSW on 352 RILs and both parents were used for the identification of genomic regions associated with HSW and SHP.
2.2 DNA extraction and genotyping with an "Axiom_Arachis" array DNA from 352 RILs and both parents was extracted using the NucleoSpin Plant II kit (Macharey-Nigel, Duren, Germany). The DNA quality was analysed on 0.8% agarose gel, and concentration was measured using a NanoDrop 8000 spectrophotometer (Thermo Scientific). An Affymetrix GeneTitan ® platform was used to genotype the RIL population with the 58K SNPs "Axiom_ Arachis" array (Pandey et al., 2017b). Initially, the target probes for 352 samples were used in at least 20 µL DNA, with a concentration of 10 ng/µL. The samples were then amplified, fragmented, and hybridized on the array chip, followed by singlebase extension through DNA ligation and signal amplification, according to the procedure explained in the Affymetrix Axiom 2.0 Assay manual (axiom_2_assay_auto_workflow_user_ guide.pdf) (Pandey et al., 2020a). Axiome_Arachis is an SNP array developed for genotyping genetic populations in groundnut for trait mapping and association mapping (Pandey et al., 2017b).

SNP allele calling and quality analysis
We used the "Best Practices" workflow to perform quality control (QC) analysis of samples to select only those that pass the QC test for further downstream analysis. The "Sample QC" workflow was used to produce genotype calls for the samples that passed the QC test. The "Genotyping" workflow was used to perform genotyping on the imported CEL files regardless of the sample QC matrix. Before making the genotyping calls, samples that did not pass the QC were removed as their inclusion may reduce the quality of the analysed results. Finally, the "Summary Only" workflow was used to produce a summary containing details on the intensities for the probe sets for use in copy number analysis tools. It also allows exporting the SNP data after the analysis is completed for downstream analysis. The genotyping data with a towere filtered for monomorphismtal of 58,233 SNPs for 352 RILs was extracted from Axiom analysis suit as explained in Pandey et al., 2017b (Supplementary Table S1).

Construction of a genetic map using RIL population (Chico × ICGV 02251)
The 58,233 SNPs were filtered for monomorphism and highly missing (>30%), and only the selected polymorphic 10,236 SNPs between parental genotypes, ICGV 02251, and Chico were retained. The selected SNPs were subjected to the chi-square (χ 2 ) test to determine the goodness-of-fit to the expected 1: 1 segregation ratio; highly distorted markers were filtered out and not considered for the linkage map construction. Finally, after stringent filtration, informative SNPs were used for the construction of a genetic map. The alleles of ICGV 02251 were coded as "AA," Chico as "BB," and heterozygotes as "H." JoinMap (v4.0) software was used for the construction of a genetic map. Kosambi's mapping function was used to estimate the genetic distance to convert the recombination frequencies into map distances in centimorgans (cMs) (Kosambi, 1944). A total of 20 linkage groups were constructed individually by applying the LOD score (logarithm of the odds) with an LOD threshold of 3.0, and the recombination frequency (rf) threshold (z) was set to 50%. MapChart software was used to finalize the marker position with the final genetic map (Voorrips, 2002).

Identification of QTLs for seed weight and shelling percentage
Multi-season phenotyping data for seed weight and shelling percentage generated during S1, S2, and S3 was used with genetic map information along with genotyping data for QTL analysis. The inclusive composite interval mapping-additive (ICIM-ADD) algorithm implemented in inclusive composite interval mapping (ICIM) software was used for identification of the main-effect QTLs (Meng et al., 2015). Epistatic QTLs for seed weight and shelling percentage were identified to understand the combined effect of any two genomic regions on seed weight and shelling percentage. QTLs with >10% phenotypic variance explained (PVE) were considered major QTLs; the remaining were considered minor QTLs. Realizing that the groundnut seed weight is a complex trait and also affected by the environment, we carried out epistatic QTL (Q × Q) and environment effect QTL (E-QTL) analysis in ICIM. The ICIM-EPI algorithm from ICIM software was used for epistatic QTL analysis. The environmental-effect QTLs were identified using multi-environment trails (METs) in ICIM. The LOD threshold score of 3.0 was used as the minimum significance level for the main effect QTLs, epistatic QTLs, and environmental-effect QTLs. The major QTL regions were validated using extreme RILs for both HSW and SHP from RIL population. A total of 15 extremely small-seeded and 15 extremely large-seeded lines were used for the validation of QTL regions. In order to validate the major QTLs, the RILs distinguished by the alleles of the flanking markers of QTLs were compared with the mean values of phenotypes.

Identification and expression analysis of candidate genes in QTL regions
Major QTL regions identified with >10% PVE identified for both seed weight and shelling percentage were targeted for candidate gene discovery. Candidate genes associated were mined in the QTL interval between the positions of flanking markers on the physical map of groundnut genome (https://peanutbase.org/). The expression data for the candidate genes were accessed from the Arachis hypogaea gene expression atlas for subsp. fastigiata (Sinha et al., 2020). The heatmaps for the expression data were generated using the R-software package "Pheatmap" (Kolde, 2019).

Phenotypic variation for seed weight and shelling percentage in RIL population
Multi-season phenotyping data was generated during three seasons on RIL population (Chico × ICGV 02251). ICGV 02251 was used as the male parent with HSW of 101.8 gm, while Chico (small-seeded cultivar) was used as a female parent with HSW of 35.0 gm. Furthermore, Chico has a high SHP with 80.5%, whereas ICGV 02251 (largeseeded cultivar) had a medium SHP of 69.5%. The average HSW in RIL population was 61.5 gm during S1, 43.5 gm during S2, and 45.5 gm during S3, whereas the average SHP was 69.40% during S1, 66.7% during S2, and 68.75% during S3. The seed weight and shelling percentage were higher in season S1 than those in S2 and S3. The phenotypic data generated in three seasons for both HSW and SHP showed normal distribution on violin plots ( Figures 1A, B).

Important features of SNP array-based genetic map
The filtered 6235 SNPs were used for genetic map construction. A total of 4199 loci were mapped on A and B Frontiers in Genetics frontiersin.org 04 subgenomes with a total distance of 2708.36 cM. A total of 2036 SNPs did not show any linkage with the SNP markers in the generated genetic map. No attempt was made to map the unlinked SNPs in the final genetic map to avoid noise during QTL analysis. Among the 4199 SNP loci, 2343 loci were mapped on A subgenome, whereas 1856 loci were mapped for B subgenome with a distance of 1456.96 cM and 1251.4 cM, respectively. The A and B subgenomes reached an average inter-marker distance of 0.66 and 0.67 cM/loci, respectively. A maximum number of loci mapped in a specific linkage group ranged from 128 (B10) to 447 (A04). The average inter-marker distance for each linkage group ranged from 97.34 cM (A02) to 202.5 cM (B04). The average inter-marker distance was maximum, 1.17 cM/loci, for the linkage group A01 and minimum, 0.47 (cM/loci), for the linkage group A09 (Table 1; Figure 2).

Identification of the major-effect QTLs associated with seed weight and shelling percentage
A total of 13 major QTLs were identified for HSW and SHP; six QTLs were identified for SHP with 5.3%-15.8% PVE and LOD score ranging from 2.51 to 7.16 during three seasons (S1, S2, and S3); and seven QTLs were identified for HSW with 6.96%-21.29% PVE and LOD score ranging from 3.9 to 11.7 during three seasons (Table 2).

QTLs for hundred-seed weight
A total of seven QTLs were identified for HSW with a PVE range of 6.9%-21.29% and LOD score ranging from 3.9 to 11.7. A single QTL (qHSW-A01.1) was identified on A01 with 20.65% PVE and 8.82 LOD.

Identification of epistatic interactions for seed weight and shelling percentage
A total of 375 E-QTLs were identified for shelling percentage and seed weight-related traits. A total of 42 E-QTLs were detected Main-effect QTLs identified for hundred-seed weight (HSW) and shelling percentage (SHP). The tracks outside to inside illustrates, (A) 20 chromosomes of cultivated groundnut labeled as A01 to A10 and B01 to B10, (B) QTLs identified for shelling percentage (SHP), (C) QTLs identified for hundred seed weight. Inner links represent epistatic (Q × Q) interactions. Green color links represent the epistatic interaction for shelling percentage, and red color links are epistatic interactions for seed weight.
Frontiers in Genetics frontiersin.org 06 for SHP and 332 E-QTLs for HSW. The phenotypic variation explained by the E-QTLs identified for seed weight and shelling percentage ranged from 10.0%-11.2% to 10.58%-27.09%, respectively.
A total of 42 epistatic QTLs were detected for shelling percentage, and among them, eight had 3.05-3.55 LOD score and 10.65%-27.09% PVE during S1. The rest of the QTLs included 18 epistatic QTLs with 3.03-5.47 LOD score and 10.58%-26.31% PVE during S2 and 18 epistatic QTLs with 3.01-5.26 LOD score and 10.86%-25.14% PVE during S3. Major epistatic QTL for shelling percentage was identified with PVE of 15.46% and an LOD score of 5.4 that showed the interaction between genomic regions of chromosomes A02 and B06 (Figure 3; Supplementary Table S2).

Identification of environmental effect QTLs (Q × E) for seed weight and shelling percentage
A total of 15 environmental-effect QTLs were identified for seed weight and shelling percentage with LOD score value > 3.0. A total of seven environmental-effect QTLs were identified for seed weight with 4.11-9.56 LOD scores and 4.81%-27.18% PVE. A major environmental-effect QTL (EqtlSW.B05.1) on B05 was identified for seed weight with PVE of 10.03%. The same QTL region (qHSW-B05.1) was also identified as a main-effect QTL for HSW ( Figure 4A). A total of three such E-QTLs for HSW showed high PVE on chromosome A01 (27.1%), B05 (10.0%), and B09 (12.3%).
Here, we concluded that the QTL region on chromosome B05 explains the higher phenotypic variance due to the environmental effect with partial effects from a background genome. A total of eight environmental-effect QTLs were identified for SHP with 3.68-9.32 LOD score and 3.96%-11.06% PVE. A total of two major E-QTLs for SHP each on chromosome B06 and B10 were identified with PVE of 11.06% on B10 ( Figure 4B). We plotted QTL additive effects against additive by environmental effects to find the QTLs which are highly influenced by the environment. We observed that both SHP and HSW showed higher (>10%) phenotypic variance explained by environmental effects (Supplementary Table S3).

Expression of potential candidate genes at seed and pod developmental stages
The two parents used for developing the RIL population were subsp. fastigiata; hence, we used the fastigiata gene expression atlas (Sinha et al., 2020) to study the tissue-specific expression of candidate genes identified for HSW and SHP. In the QTL region Frontiers in Genetics frontiersin.org 08 qHSW_B05.1, YSL-like protein (Araip.W1XHN) was highly expressed in embryo and seed tissues. We identified two isoforms of YSL-like protein (Araip.W1XHN and Araip.Y0XXQ). Acyl-CoA acyltransferase (Araip.FGM9R) showed high expression in pod walls as compared to seed developmental stages. From the QTL region (qHSW_B06.1), two isoforms of spermidine synthase  Frontiers in Genetics frontiersin.org 09 Frontiers in Genetics frontiersin.org 10 (Araip.JGK52 and Araip.J2AQK) showed contrasting expression. The expression of Araip.J2AQK was higher in flower embryo and seed developmental stages. In the QTL region (qHSW_B09.1), sugar transporters (Araip.4HV2H) were highly expressed in flowers. Seed maturation protein (Araip.U0WFW) was expressed at the time of maturity in seeds. The TIFY family protein (BIG SEED locus) (Araip.YK09Y) was highly expressed in embryo and seed developmental stages. The epidermal patterning factor (Araip.9SE5V) showed high expression in flowers, seeds, and shells. Seed linoleate 9S-lipoxygenase (Araip.9SE5V) is shown in Figure 6A (Supplementary Table S5).
In the case of shelling percentage, several disease-resistance genes were highly expressed, such as MYB, NBS-LRR, and NAC domain proteins. In the QTL region qShP_A05.1, the spermidine synthase (Araip.8Z3VM) and ATP citrate lyase (Aradu.48LC5) showed higher expression in all the seed and pod tissues. In the QTL region qShP_A08.1, the genes involved in the synthesis of cellulose were highly expressed in the SHP QTLs. For instance, cellulose synthase (Aradu.GKF95) and fibre proteins (Aradu.7I425) showed high expression in all seed and pod developmental stages. Senescence-associated protein (Aradu.GR4IB) was highly expressed in pod walls at the time of maturity. In the QTL region qShP_B10.1, cold acclimation protein (Araip.RHZ53) was highly expressed in seed and pods. The lipases (Araip.4V9G7) and NAC domains (Araip.UA0W9) were highly expressed in flowers and mature shells. The expression analysis of genes showed that there are key genes involved in the cellulose biosynthesis pathway. However, a group of disease-resistant genes were highly expressed in all SHP QTL regions ( Figure 6B; Supplementary Table S6).

Development of KASP markers for hundred-seed weight in groundnut
We used multiple approaches for genetic dissection of groundnut seed weight. In a previous study, we used NAM population to map the genomic regions associated with seed weight and pod weight in groundnut . A KASP marker (snpAH00173) on chromosome A05 at 101618480 bp was developed and validated on small-and large-seeded groundnut germplasms. A sequencing-based trait mapping approach "QTLseq" was also used to identify the genomic regions for HSW of groundnut. An overlapping genomic region was identified on chromosome B09 in the present genetic mapping study and QTL-seq analysis for groundnut seed weight. A total of four KASP markers (snpAH0031, snpAH0033, snpAH0037, and snpAH0038) were recently developed from the same population using the QTL-seq approach (Gangurde et al., 2021). Because of high polymorphism, the seed weight KASP's markers were included in the quality control panel for their use in confirmation of F 1 s and hybrid purity testing. Moreover, the KASP markers can also be used in the marker-assisted selection breeding programs to improve the seed-weight trait in important groundnut cultivars. In this study, we discovered a novel genomic region on chromosome B09, containing important genes, such as BIG SEED locus and spermidine synthase (spds), associated with seed development.

Discussion
In the present study, a RIL population (Chico × ICGV 02251) was used for mapping the genomic regions associated with HSW and SHP in groundnut. The three seasons of phenotyping data and SNP array-based genotyping data were generated to identify the QTLs linked with HSW and SHP. We observed that HSW was comparatively higher in post-rainy seasons than in rainy seasons as confirmed by repeated planting in two consecutive rainy seasons. This might be due to high disease pressure in rainy seasons that affects seed size in groundnut. A high-density 58K SNP array was used to construct a dense genetic map comprising 4199 SNP loci in a map distance of 2708.36 cM with an average inter-marker distance of 0.65 cM. Only 7.2% of SNP loci (4199 SNPs out of 58,233 SNPs on the array) were mapped on 20 linkage groups of groundnut. Genetic diversity analysis in groundnut reported that it has a very narrow genetic base; therefore, the construction of very high-density genetic map in groundnut is very challenging (Pandey et al., 2012). The density of this genetic map was the highest when compared to the previous genetic maps constructed using the SNP array (Pandey M. K. et al., 2020) and genotyping by sequencing (GBS) (Dodia et al., 2019;Jadhav et al., 2021). Moreover, a genetic map with 3630 markers grouped in 2636 bins was used to identify the QTLs for groundnut seed weight . Genome-wide association analysis with DArT markers identified nine marker-trait associations for seed length and five for HSW, but due to the unavailability of annotated reference genome for groundnut, the researchers could not reach to candidate genes associated with HSW (Pandey et al., 2014). Earlier genotyping by SSR markers was laborious and time consuming. Now, allelic SNP markers along with high-quality reference genomes allows for genetic dissection of complex traits and the identification of candidate genes (Bertioli et al., 2019;Chen et al., 2019;Zhuang et al., 2019). In our previous study, we successfully used the SNP array for mapping the genomic regions associated with seed and pod weight in groundnut using NAM population . The seed size QTLs on chromosomes A05 and A07 were reported from two RIL populations (Luo et al., 2017;Luo et al., 2018). Interestingly, in this study, we identified a consistent major-effect QTL for shelling percentage on chromosome A05 with >10% PVE, and a major QTL on chromosome B05 was identified with 21% PVE with 7.7 LOD. In addition, a major-effect QTL was identified on chromosome B09 with 13.0% and 11% PVE. Therefore, the genomic region on chromosome B09 was targeted for the identification of candidate genes using PeanutBase (www.peanutbase.org).
We also demonstrated that the major effect of a QTL is not just because of the genetic background; sometimes, it might be due to the environmental effect or a combination of these two. A major QTL for HSW on chromosome B05 and two major QTLs for shelling Frontiers in Genetics frontiersin.org percentage on chromosome B06 and B10 exhibited major additive by environmental effect (Q × E). Similar findings have been reported on the background effect and QTL × Environment effect for yield traits in rice (Wang et al., 2014). Genes, such as BIG SEED locus and spermidine synthase, located in the QTL region on chromosome B09 negatively regulates the seed weight. Gene cloning for BIG SEED locus has been reported in Medicago truncatula, and the BIG SEED gene was isolated from soybean and overexpressed in the model legume crop, Medicago truncatula, which resulted in small seeds in transgenic lines (Ge et al., 2016). Spermidine synthase is also reported as the negative regulator of seed weight and size in rice (Tao et al., 2018). Therefore, we concluded that the BIG SEED locus and spermidine synthase genes can be targeted for the genome editing to enhance the groundnut seed weight. In the QTL region on chromosome B09, genes such as lipid transfer protein, sugar transporter, seed maturation protein, epidermal patterning factor-like proteins, and seed linoleate 9S-lipoxygenase associated with seed growth and development were identified. Seed linoleate 9S-lipoxygenase is a fat-metabolising gene linked with seed oil content . The epidermal patterning factor-like proteins are associated with plant epidermal cell growth factors and widely reported as a regulator for plant growth and development (Endo and Torii, 2019). A gene, serine threonine protein phosphatase identified in almost all QTL regions of seed weight and shelling percentage is associated with reactive oxygen species metabolism (ROS), plant's cold tolerance, and abscisic acid signalling (Hou et al., 2016). We identified a copy of spermidine synthase in the QTL region (QShP_A05), identified for the shelling percentage on chromosome A05, along with receptor-like kinases (RLKs) which play a major role in plant growth and stress response (Cui et al., 2021).
In the present study, we identified several disease-resistance genes, such as disease-resistance protein (Aradu.G0IJA) and Leucine-rich repeat receptor-like protein (Aradu.V4L4B), that are from NBS-LRR class in the QTL regions of shelling percentage. In addition, MLO-like proteins, laccase, senescence-associated proteins, cold acclimation proteins, and F-box family proteins were identified in the QTL regions of shelling percentage. The groundnut shell is made up of cellulose and fibre; cellulose synthase, fibre proteins, and laccase were identified in the QTL regions for shelling percentage. SWEET genes encoding sugar transporters play a major role for plant growth and development (Gupta, 2020).
In this study, both parental genotypes used in developing the RIL population were from subsp. fastigiata. Therefore, we used a gene expression atlas developed from subsp. fastigiata (Sinha et al., 2020). From gene expression patterns, we observed that the isoforms of YSL7-like protein showed multiple gene-expression patterns. For instance, isoform Araip.W1XHN was highly expressed, and Araip.Y0XXQ was not expressed in seed and pod tissues. We identified multiple isoforms of spermidine synthases (Araip.Q00X2 and Araip.8Z3VM) in the major QTLs of seed weight. The BIG SEED locus encoded by protein TIFY 4Blike isoform showed high expression in all seed developmental stages. In addition, we observed that the seed linoleate 9slipoxygenase (Araip.D6PZJ) and epidermal patterning factorlike proteins (Araip.9SE5V) are the most expressed genes in the seed tissues ( Figure 6A). In the case of shelling percentage, the disease-resistance genes in QTL regions of shelling percentage were confirmed with gene expression atlas. Almost all disease-resistance genes in the QTL regions of shelling percentage were highly expressed in pod walls and seed tissues. Interestingly, the spermidine synthase genes were identified in the QTL regions of both seed weight and shelling percentage. At the time of maturity, the senescenceassociated protein (Araip.GR4IB) was highly expressed in pod walls. Cellulose synthase and calcium-binding proteins NAC domains were differentially expressed in seed and pod tissues.
The identified genes, particularly, spermidine synthase, BIG SEED locus, and seed linoleate 9s lipoxygenase genes, can be targeted for functional validation and can be used in improving the seed weight and shelling percentage in groundnut. Furthermore, the QTLs will be validated on diverse seed weight groundnut genotypes for their use in genomics-assisted breeding for improving HSW and SHP.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author contributions
MP conceived the idea, conceptualized this research, and supervised the entire study. JP developed RIL population. SSG, SM, MV, and JP generated the phenotyping data. SSG generated the genotyping data and analysed the data. SSG and MP interpreted the results and wrote the manuscript. SP and DB carried out formal analysis and writing. MP, PS, RV, and BG revised the final manuscript.

Funding
The funding was received from Indian Council of Agricultural Research (ICAR) through ICAR-ICRISAT project, Department of Biotechnology (DBT), Government of India, MARS-Wrigley and Bill and Melinda Gates Foundation (BMGF), United States.