Genome-Wide Discovery and Deployment of Insertions and Deletions Markers Provided Greater Insights on Species, Genomes, and Sections Relationships in the Genus Arachis

Small insertions and deletions (InDels) are the second most prevalent and the most abundant structural variations in plant genomes. In order to deploy these genetic variations for genetic analysis in genus Arachis, we conducted comparative analysis of the draft genome assemblies of both the diploid progenitor species of cultivated tetraploid groundnut (Arachis hypogaea L.) i.e., Arachis duranensis (A subgenome) and Arachis ipaënsis (B subgenome) and identified 515,223 InDels. These InDels include 269,973 insertions identified in A. ipaënsis against A. duranensis while 245,250 deletions in A. duranensis against A. ipaënsis. The majority of the InDels were of single bp (43.7%) and 2–10 bp (39.9%) while the remaining were >10 bp (16.4%). Phylogenetic analysis using genotyping data for 86 (40.19%) polymorphic markers grouped 96 diverse Arachis accessions into eight clusters mostly by the affinity of their genome. This study also provided evidence for the existence of “K” genome, although distinct from both the “A” and “B” genomes, but more similar to “B” genome. The complete homology between A. monticola and A. hypogaea tetraploid taxa showed a very similar genome composition. The above analysis has provided greater insights into the phylogenetic relationship among accessions, genomes, sub species and sections. These InDel markers are very useful resource for groundnut research community for genetic analysis and breeding applications.


INTRODUCTION
The systematic development of different types of molecular markers over three decades through various approaches, such as storage proteins, isozymes, random amplified polymorphic DNA (RAPDs), restriction fragment length polymorphism (RFLP), amplified fragment length polymorphism (AFLP), cleaved amplified polymorphic sequence (CAPS), simple sequence repeat (SSR), start codon targeted polymorphism (SCoT), etc. helped in conducting genetic studies in several crops species including groundnut (Bechara et al., 2010;Pandey et al., 2012b;Varshney et al., 2013;Liu et al., 2015). Among different marker types, SSRs have dominated genetic studies in groundnut due to multiple advantages of SSR markers but lacked amenability for high throughput genotyping (Gupta and Varshney, 2000). Therefore, the current emphasis is now on developing single-nucleotide polymorphism (SNP) markers in groundnut due to their high preponderance throughout the genome and their amenability for high throughput genotyping for genome-wide breeding applications (Varshney et al., 2009;Pandey et al., 2016). Surprisingly, both SSRs and SNPs have their own drawbacks in spite of usefulness. For instance, the complicated and heterogenous mutation forms of SSRs can sometimes mislead the data analysis (Ellegren, 2004). Similarly, genotyping error may happen due to false bands and proficient artifacts (null or false alleles, allelic dropouts, size homoplasy) (Pompanon et al., 2005). Furthermore, regardless of many refined genotyping methods available (Syvänen, 2001), comparatively most of them are costly and needs special instruments for high throughput genotyping.
Insertions-deletions (InDels) are the second most prevalent and frequent structural variations detected across the plant genomes (Yang et al., 2014;Lu et al., 2015). InDels originate as a consequence of some cellular events like replication slippage, transposable elements, and crossing-over (Moghaddam et al., 2014). InDel process maintains beneficial as well as the deleterious effect on specific loci in the genome (Pearson et al., 2005). InDels are discernible as an efficient marker system for genetic studies in crops and a good addition to other sequencebased genetic markers mainly due to countless desirable inherent genetic ascribes as co-dominant inheritance and multi-allelic with genome-wide dispersion. In addition to easy detection in the genome, InDel markers can be selected by their anticipated fragment size and validated in genetic populations/germplasm using simple and cost-efficient agarose gel. These features make InDels an appropriate marker system for various translational genomics studies in crop plants.
Groundnut (Arachis hypogaea L.) is an economically important, nutritious and protein-rich oilseed crop grown in tropical and warm temperate regions of the world. The genus Arachis is indigenous to South America and its different species are widespread in >100 countries of the world and includes 80 species (Krapovickas and Gregory, 1994;Valls and Simpson, 2005). Based on the morphological variation, geographic distribution and ability to cross (cross compatibility), these species are grouped into nine sections namely Arachis, Caulorrhizae, Erectoides, Extranervosae, Heteranthae, Procumbentes, Rhizomatosae, Triseminatae, and Trierectoides. Of these sections, Arachis is the most diverse section with 30 different species including A. hypogaea, the cultivated groundnut species. Arachis section has much diversity at ploidy level i.e., two tetraploids (A. hypogaea and A. monticola with 2n = 4x = 40), three aneuploids (2n = 2x = 18; A. decora, A. palustris, and A. praecox) and remaining diploids (2n = 2x = 20; Valls and Simpson, 2005). The cultivated tetraploid, A. hypogaea, is considered to have originated from two diploid species namely Arachis duranensis (AA) and Arachis ipaënsis (BB) (Seijo et al., 2004). All the cultivated genotypes can be further grouped into subspecies (hypogaea and fastigiata), varieties (aequatoriana, fastigiata, hirsuta, hypogaea, peruviana, and vulgaris) and agronomic types (Spanish, Southeast-runner, Valencia, and Virginia) Gregory, 1994, 2007). Albeit, with the boastful variation in phenotype, the species display less genetic diversity, as earlier reported with the analyses using RFLPs, SSRs, and SNPs (Pandey et al., 2012b;Varshney et al., 2013). Availability of genome sequences for both the diploid ancestors of tetraploid groundnut (Bertioli et al., 2016;Chen et al., 2016) fostered the process of genome resequencing of diverse cultivars. Therefore, this study reports in silico discovery of large-scale informative genome-wide InDels, development of InDel markers, validation and their deployment for conducting phylogenetic analysis of Arachis genus to understand the genetic complexity at the genome and species level.  Table 1). Genomic DNA isolation using the modified CTAB method (Doyle and Doyle, 1987) from freshly collected unopened leaves followed by DNA quantification was performed following the protocol explained in Mace et al. (2003).

Accessing Genome Sequences and Genome-Wide Indels Discovery
Considering the similarities between two diploid subgenomes of tetraploid groundnut (Bertioli et al., 2016), the genome sequences of A. duranensis (accession V14167 for A subgenome,) and A. ipaënsis (accession K30076 for B subgenome,) were used for mining the InDels using the methodology depicted by Yang et al. (2014). Briefly, the fasta files for A and B subgenomes were downloaded from PeanutBase site (https://www.peanutbase. org/) and InDels were identified using MUGSY software (Angiuoli and Salzberg, 2011). The alignment results were mined for InDel polymorphism using the Perl scripts provided by Dr. Wencai Yang (China Agricultural University, Beijing, China). Only the reads mapped on respective pseudomolecules of A and B subgenomes were considered for InDel discovery. The 100 bp (base pair) sequences flanking the candidate InDels were considered from A genome for insertion and from B genome for deletion. The low similarity sequences were removed by searching the extracted sequences against respective genomes using BLASTN program (Altschul et al., 1990) with an E-value of e-20. In house Perl script was used to extract (insertion or deletion) types, chromosomal position, and length of InDels (Supplementary Table 2). The Primers with desired features such as 100-200 bp product size and 20 bp optimum primer length were designed from the extracted sequences by using Primer3 software (Untergasser et al., 2012). The circular plots were constructed using standalone version of circos software (http://circos.ca/) developed at the Genome Sciences Centre in Vancouver, Canada (Krzywinski et al., 2009). The insertion and deletion densities are plotted at a window size of 100 Kb.

InDel Polymorphism and Genotyping
Primer pairs were first used on four accessions/genotypes namely ICG 8138 (A. duranensis), ICG 8206 (A. ipaënsis), TAG 24 (A. hypogaea), and GPBD 4 (A. hypogaea). Subsequently, polymorphic InDels were used for the genotyping set of 96 diverse accessions. Each PCR reaction mixture of total volume 5 µl, contained ∼5 ng of template DNA with 2 picomoles of each primer (forward and reverse), 2 mM of each dNTP, 2 mM MgCl 2 , 1X buffer (Kapa Biosystems) and 0.1 U of Taq DNA polymerase (Kapa Biosystems). PCR (ABI 9700 thermal cycler, Applied Biosystems, USA) reaction was performed following touchdown PCR program explained in Pandey et al. (2012a) and Varshney et al. (2014). The amplified PCR products of all accessions were electrophoresed on 1.5% agarose gels for 2 h at 80-100 V and visualized under gel documentation unit (Syngene). The amplified products were scored on the basis of presence (1) or absence (0) of alleles and the fragment sizes were determined by comparing the 100 bp DNA ladder band.

Population Structure and Phylogenetic Analysis
The population structure of the 96 Arachis accessions was analyzed utilizing genotyping data for polymorphic InDel markers using the program STRUCTURE 2.2 (Pritchard et al., 2000). The presumed number of k (the number of groups) was set from 1 to 10 with admixture and related frequency models; five independent simulations were executed for each range of values. Admixture and concerned frequency models including standard parameters such as 10,000 iterations before a burn-in length of 10,000 MCMC (Markov Chain Monte Carlo) replications for each simulation were used for population structure analysis. The optimal k-value was estimated by posterior probability [LnP(D)] and accessions were assigned to a representing group, as elucidated by Remington et al. (2001). Shannon-Weaver diversity index was analyzed using the GenAlEx 6.5 (Peakall and Smouse, 2012). An UPGMA dendrogram of the 96 accessions, sections and species of Arachis was constructed using DARwin 6 (Perrier et al., 2003). Dissimilarity matrix was generated from the binary data using Jaccard's coefficient at 1,000 bootstraps and the dendrogram was generated using UPGMA method. Dendroscope (Huson et al., 2007) was used to depict the relationship among them. Based on Nei's (1972) genetic distance, PIC-value was also calculated using PowerMarker 3.25 (Liu and Muse, 2005).

Discovery of Genome-Wide InDels in Two Diploid Ancestor Genomes
A total of 269,973 insertions and 245,250 deletions were identified by comparing the genome assemblies of both the progenitor species i.e., A. duranensis (A subgenome) and A. ipaënsis (B subgenome). The lowest number of insertions (14,054) were detected between homeologous pseudomolecules A08 and B08, while the highest number of insertions (36,732) were detected between homeologous pseudomolecules A03 and B03. Similarly, the lowest number of deletions (12,849) were detected between homeologous pseudomolecules A08 and B08 while the highest number of deletions (33,537) were detected between homeologous pseudomolecules A03 and B03 (Table 1, Figures 1, 2A,B). The number of deletions were lower than the number of insertions across corresponding pseudomolecules (Figures 1, 2A,B). The average size of insertion was 4.69 bp with the range of 1-105 bp, while, the average size of deletion was 4.50 bp and varied from 1 to 105 bp. It was important to observe that the number of InDels decreased gradually with increase in size of InDels. For instance, ∼43.7% InDels were of 1 bp size and 39.9% were of 2-10 bp while remaining 16.4% were of 10-50 bp.

InDel Marker Development and Their Validation
In order to develop user-friendly markers, InDels >50 bp were selected for primer designing and marker development. A total of such 5,698 InDels (3,218 insertions and 2,480 deletions) with >50 bp size were selected for further analysis and primer designing. Out of 5,698 selected InDels, primers could be designed for 5,519 InDels i.e., 3,111 insertions and   Tables 3, 4). A total of 214 evenly distributed InDel markers with >50 bp size were selected for amplification and polymorphism study in a set of diverse germplasm. For these selected InDels, the physical distance between two InDel markers ranged from 0.06 to 42.18 Mb with an average of 6.54 Mb. Out of these 214 InDel markers, 86 (40.19%) markers found polymorphic in these tested genotypes ( Table 1). These polymorphic InDels with the precise amplification, clear amplicons and length polymorphism were further used to study the phylogeny of 96 diverse accessions. The polymorphic information content (PIC) values of InDel markers ranged from 0.006 (Ai.B01_137090268) to 0.9951 (Ad.A01_10274031) with an average of 0.490.

InDel Markers Based Genetic Diversity among Seven Sections of Arachis Genus
The 96 Arachis accessions, belonging to 41 species from seven sections and two synthetics, were assessed for allelic diversity using genotyping data of 86 polymorphic InDel markers (Table 1; Figure 2C). In total, 174 alleles were identified (2.02 alleles per marker). Among the seven sections of genus Arachis, the Shannon-Weaver diversity index was 0.26. The section Arachis exhibited the highest diversity, with a Shannon-Weaver diversity index of 0.46 (Table 2). Subsequently, Erectoides and Procumbentes exhibited the high diversity with an average Shannon-Weaver diversity index of 0.34. The Triseminatae and Synthetic accessions exhibited relatively low diversity, with only 0.48 and 0.49 different alleles, respectively with the 1.09 effective alleles in accessions of both sections and Shannon-Weaver diversity index of 0.08 ( Table 2). Within the seven sections, accessions from Procumbentes exhibited the highest diversity with a Shannon-Weaver diversity index of 0.18, followed by the accession of Triseminatae, Erectoides, and Heteranthae that exhibited high diversity with an average Shannon-Weaver diversity index of 0.16, 0.14, and 0.13, respectively. Among the accessions of Arachis section, low diversity was very much evident with an average Shannon-Weaver diversity index of 0.09 (Table 3). Shannon-Weaver diversity index was zero in 13 species as these species had only single representation.

InDel Markers Based Estimation of Genetic Relatedness among Different Arachis Species
The analysis of molecular variance (AMOVA) was performed to assess genetic differentiation among sections and species (Table 4). It has been observed that ∼15% (P = 0.001) molecular variation was attributed to genetic differentiation between the sections while remaining 85% among species within sections. These results indicated the presence of huge genetic variability in genus Arachis and between/within sections and species. Of the 86 InDel markers, 32 markers were amplified with the clear cut 50 bp difference between A. ipaënsis and  (H genome), while marker Ad.A10_77068321 (210 bp) could also amplify genotypes of D genome of Arachis section (Supplementary Table 6).

Alleles Specific InDel Markers for Arachis Species and Genotypes
For the assessment of genetic diversity among species, the unique alleles provide a good index to discriminate different species in addition to the number of unique alleles in a population which is an elementary estimate of genetic distinctiveness and differentiating degree of speciation of the species/accessions (Chen et al., 2008). Allele bands specific to 41 different species were scored and analyzed. For instance marker, Ai.B04_11232665 amplified a unique 300 bp allele in species A. batizocoi of section Arachis (B-genome) which did not amplify in any other species (Supplementary Table 7). In addition to this, some markers specifically amplified in some genotypes of different species within and among sections.

The Alleles of Cultivated Groundnut and Its Wild Relatives
Ascertaining the differences in genetic constitution between A. hypogaea and its diploid and tetraploid wild relatives is requisite to empathize the evolution of the cultivated groundnut. In addition to diploid wild progenitor species (A. duranensis and A. ipaënsis), the study also included both the tetraploid species i.e., A. monticola (AABB) and A. hypogaea (AABB) (Supplementary

Phylogenetic Analyses to Establish Genetic Relatedness
The phylogenetic analysis grouped the 96 Arachis accessions into 3 clusters (I, II, and III) corresponding to the structure analysis (Supplementary Table 9; Figure 3). The analysis also showed grouping of the accessions belonging to different genomic sections according to the affinity of their genomes. As expected, all the tetraploid genotypes were grouped together whereas the genotypes belonging to two diploid progenitor genomes (A and B subgenome) grouped separately ( Figure 2D). Among other genomes, "P" and "E" genomes clustered together while "C, " "H, " and "T" genomes grouped together showing their genomic similarity with each other. A small separate group was also formed with representative accessions from "Ex, " "P, " and "E" genomes and also synthetics. Also, some accessions from "A" genome grouped with the accessions of "D" and "P" genomes. It was observed that the "D" genome accessions were grouped together with "B" genome indicating higher similarity with each other in comparison to "A" genome. In contrast, surprisingly three B genome accessions belonging to A. hoehnei species grouped with cluster dominated by accessions from A genome ( Figure 2D).
For the analysis of pair-wise relationships between different sections of Arachis, dendrogram based on Nei's distance was constructed. Erectoides and Procumbentes were clustered together with a genetic distance of 0.011. Synthetics grouped exclusively indicated high distance from other sections Arachis accessions were classified into three clusters by structure analysis, I, II, and III, basically accompanying to the phylogenic dendrogram. Red, green, and blue represents the cluster I, II, and III, respectively. The proportion of each color of the horizontal bar represents the assignment possibilities to the specific cluster. The names of accessions and taxonomical information are given next to the horizontal bars, starting with the accession number followed by an abbreviated form of species name followed by respective genomes and sections. All the tetraploid accessions were evaluated for the differences and relatedness between cultivated, their wild relatives and synthetic genotypes. Two separate groups were formed for the synthetics and cultivated tetraploid genotypes. The second group containing cultivated genotypes clustered according to sub species i.e., sub sp. fastigiata and sub sp. hypogaea. For greater understanding, hierarchical analysis was performed which identified two major clusters. The first major cluster consisted synthetics and had maximum distance of 0.643 from the second major cluster (Figure 4B, Supplementary Table 12). The second cluster was separated into three sub clusters i.e., sub cluster I consisted A. monticola, two A. hypogaea sub sp. fastigiata var. peruviana and strangely one A. hypogaea sub sp. hypogaea var. hirsute. Sub cluster II consisted two A. hypogaea sub sp. fastigiata var. fastigiata with one A. hypogaea sub sp. fastigiata var. aequatoriana and one A. hypogaea sub sp. hypogaea var. hirsute. Sub cluster III consisted of two genotypes from A. hypogaea sub sp. hypogaea var. hypogaea with one A. hypogaea sub sp. fastigiata var. vulgaris and one A. hypogaea sub sp. fastigiata var. aequatoriana.

DISCUSSION
Over the last decade, next-generation sequencing (NGS) technologies have revolutionized the availability of large-scale genetic markers and their deployment in trait discovery and breeding (Varshney, 2015;Pandey et al., 2016). InDel markers have been deployed in forensic and genetic studies in humans as well as in several plants/crops like wheat, rice, barley, mustard, citrus, tomato, and Arabidopsis (Yang et al., 2014;Lu et al., 2015;Zhou et al., 2015). The availability of draft genome assemblies of groundnut progenitors (Bertioli et al., 2016;Chen et al., 2016) have provided an excellent opportunity for initiating several genomic and genetic studies such as SNP discovery, gene prediction, gene expression, comparative genomics, genetic diversity, genetic mapping, and molecular breeding . In this context, this study developed large-scale genome-wide InDel markers and demonstrated their utility in phylogenetic relationship among different species of Arachis genus.

Large Scale Genome-Wide InDels, an Important Resource for Genetic Studies and Breeding Applications in Groundnut
This study discovered large-scale InDels by comparing A subgenome (A. duranensis, accession V14167) and B subgenome (A. ipaënsis, accession K30076), and detected 5,698 InDels of more than 50 bp sizes indicating an abundance of InDels for genetic and breeding studies in groundnut. Accuracy in InDels identification basically depends on the quality of sequencing data, scheme, and parameters used for data extraction. One and two base pairs InDels were not included between A and B subgenomes in order to avoid over reckoning of small InDels due to sequencing errors as earlier experienced in tomato crop by Yang et al. (2014). In groundnut, identification of InDels have been reported from expressed sequence tags (ESTs) data and was used for studying genetic diversity in cultivated groundnuts (Liu et al., 2015). This is the first report on developing large scale genome-wide InDels using the sequence information of the diploid progenitors of groundnut. Groundnut is usually considered as a less diverse crop, and to explore more studies related to diversity analysis, several marker systems were used from time to time but their numbers were less or not optimum. In this study, we found 40.19% (86 InDels) polymorphism with these InDel markers which were higher than other earlier reported markers viz. start codon targeted polymorphism (SCoT) marker (38.2%) (Xiong et al., 2011); InDels developed from ESTs (33.3%) (Liu et al., 2015); AFLP markers (3.6%) (He and Prakash, 1997); RAPD markers (6.6%) (Subramanian et al., 2000); EST-SSR markers (10.4%) by Liang et al. (2009) and SSR markers (14.5%) Zhao et al. (2012). These InDel markers are a good genomic resource for groundnut research community and can be used in majority of the genetic and breeding applications in groundnut.

InDel Markers Provided Insights on Allele Diversity and Genetic Differentiation
For the better exploitation of wild species genes in groundnut improvement program, knowledge of genetic diversity in the Arachis germplasm is essential as indicated in several such studies (Barkley et al., 2007;Angelici et al., 2008). Our findings indicated the presence of lots of diversity within the Arachis section and its 23 species. Among the Arachis species of section's most diverse accessions was the A. hoehnei that belong to B subgenome, which was followed by accessions from the A subgenome (A. cardenasii, A. palustris, and A. villosa) with the highest Shannon-Weaver diversity index values. The A. hoehnei carrying B subgenome while A. villosa and A. cardenasii (both resistant to rust, LLS, and groundnut rosette) having A genome are considered in secondary gene pool as they are cross-compatible, having chromosome pairing, and hybrid fertility, due to this reason earlier they were considered as probable subgenomes contributors of A. hypogaea (Mallikarjuna et al., 2006). Although A. palustris is also highly diverse due to aneuploidy species (differences in basic chromosome number) but it would not be cross compatible with A. hypogaea, a barrier for the introduction of desirable diverse characters (Lavia, 1998). Differences in the genetic diversity within the Arachis section and between species of other sections could be due to the following three reasons or their combination: (1) polyploidization and its events create hurdle in gene mobilization from concerned diploid to cultivated species (Young et al., 1996), (2) combination of self-pollination and polyploidization in immediate past from one or a elite individual(s) of each diploid parental species (Halward et al., 1991), and (3) narrow genetic base induced by consistent use of elite cultivars and less use of exotic germplasm in breeding curricula (Knauft and Gorbet, 1989). This study showed high diversity and researchers need to found better ways for broadening the genetic base by introgressing desired genomic segments from wilds to cultivated groundnut.

Allele Specific Indels to Different Arachis Subgenomes Supporting Existence of K Genome
The species specific alleles confer particular species a unique identity in the population. Out of 86 InDel markers examined, 9 markers amplified 9 alleles, which were specific to genotype/accession and dissimilar for Arachis sections. The unique alleles were observed in the species/accessions indicating the various degrees of evolution and diversity of these species/accessions (Chen et al., 2008). We identified few markers that were specific to a particular genome like four markers amplified only Arachis section (A-, B-, AB-genome) which were not amplified in any other sections (representative of other than A-, B-, AB-genome). This indicated occurrence of new recombination events that might have shuffled the genomic sequences and created insertions and deletions sites due to domestication. Likewise, some markers amplified in Caulorrhizae, Extranervosae, Heteranthae, and Triseminatae sections, but not amplified in any accessions from Arachis section, these wild crop relatives of Arachis are endemic to South America, occurring in Bolivia, Argentina, Brazil, Paraguay, and Uruguay and are a rich source of specific alleles (Valls et al., 1985). This suggested that some genes/alleles specific to them were lost after natural selection or domestication events. In contrast to the markers amplified in Arachis section and as well as in section Procumbentes and Heteranthae. This result revealed sharing of some large genomic sequences between Arachis and Procumbentes sections, Arachis and Heteranthae sections which was not fully recombined during domestication and remained conserved. Likewise, two genotypes of A. batizocoi had a specific allele that was not found in any other genotypes or species. Our study is in quite an accordance with the earlier study (Leal-Bertioli et al., 2014) which claimed this species might have another genome "K" with more similarity to "B" genome. For getting insights into genetic constitution among A. hypogaea and its diploid and tetraploid wild congenators, we also considered specific alleles between them. We found five genotypes belonging to wild diploid species with one exception i.e., A. monticola with specific alleles within Arachis section. This indicated that these alleles were highly conserved in wild relatives and emerged during the speciation and evolution of groundnut but were restricted to wild or lost in domestication events. These wild resources can be a good source of mobilizing specific alleles from wild to diversify the genetic base of cultivated gene pools and to enrich economically significant traits.

InDels Established Genomic Affinities among Diverse Germplasm
The InDel-based phylogenetic study grouped all the seven taxonomic sections based on their genomic affinities with the exception of synthetic genotypes which grouped distinctly from all the sections. Arachis species is considered to be the most diverse, holding both annual and perennial species and distinct chromosome numbers, karyotype structures and ploidy levels as it was a group apart but very close to Erectoides and Procumbentes Krapovickas and Gregory, 1994). On the other hand, Heteranthae and Caulorrhizae grouped together indicating that some species of the sections Heteranthae and Caulorrhizae may be capable of producing hybrids with Arachis section while a substantial genetic isolation persists with the other sections (Bertioli et al., 2016). According to Krapovickas and Gregory (1994) sections, Extranervosae and Triseminatae are the most detached sections, however, their evolutionary place is yet to be decided (see Stalker, 2017). This study confirms the above assumption for Triseminatae as the most isolated section from the remaining sections. On the other hand, this study contradicts in case of Extranervosae which was found close to the Arachis, Procumbentes, and Erectoides. Nevertheless, the recent phylogenetic studies based on ribosomal DNA (rDNA) suggested sections Heteranthae, Extranervosae, and Triseminatae to be most primitive and section Arachis to be most recent while sections Procumbentes, Caulorrhizae, Erectoides, Rhizomatosae, and Trierectoides found in between these sections (Bechara et al., 2010;Wang et al., 2010). This study with the InDel markers indicates sections Heteranthae and Triseminatae to be the most primitive including Caulorrhizae; sections Erectoides and Procumbentes as intermediate and section Arachis as the most recent in origin.
The topology of the UPGMA dendrogram for all 44 species used in this study generated from InDel markers revealed that during the whole evolutionary courses of Arachis to date, there have been new recombinants occurring due to regular and frequent deletion process. Due to above possible reasons, all the species show higher affinity to A genome rather than B genome representative A. ipaënsis. This also indicated that the other B genome species, grouped close to A genome representative A. duarnensis, are distinct from A. ipaënsis due to greater difference in accumulation of deletions in the genome. The analysis of InDel markers showed grouping of species from D, K, and F genomes together with A genome species. The position of A. benensis distant from the B genome biological group also gives support to the validity of the F genome assignment. The position of the D, K, and F genomes closer to the A than to the B genome is worthy of further investigation.
A. monticola: A True Wild Species Escaped away from Cultivation or Ancestor?
Looking insights into the relationship of tetraploid genotypes/accessions based on the InDel markers, both the synthetic genotypes grouped separately as they were newly created and indicated their diverse genetic makeup. In addition, the dendrogram clearly showed that A. monticola and cultivated A. hypogaea (all botanical type) were grouped together with less or null genetic distance due to their close affinity. The complete homology observed here in the insertion and deletion sequences that manifested A. monticola and A. hypogaea tetraploid taxa very near and similar genome composition. The current finding also justifies the high rate of crossability and achieving fertile progenies reported by Krapovickas and Gregory (1994). Albeit, the dendrogram of tetraploid species, showed that A. monticola grouped separately from the other botanical types, which affirms the belief that it is a separate species from A. hypogaea. In a study, A. monticola was believed as a true wild species that had got away from cultivation which was not considered as a form of A. hypogaea (Krapovickas and Gregory, 1994;Bechara et al., 2010). This finding is also supported by the studies done on the basis of fruit structure which narrowly separate each seed, A. monticola was considered a discrete species from A. hypogaea (Krapovickas and Gregory, 1994). This attribute was not found in any cultivated groundnut and is conceived as a naive feature in the genus. These observations affirm the possibility that A. monticola is the immediate wild ancestor or an introgressive derivative between the A. hypogaea and wild species as reported in earlier studies (Gregory and Gregory, 1976;Moretzsohn et al., 2004;Koppolu et al., 2010).

CONCLUSION
The InDels are the second most abundant structural variations across the genome after SNPs and can serve as genetic markers for conducting genetic studies in labs with small to medium scale genotyping facility, especially with higher length polymorphism. This study successfully identified 515,223 InDels distributed across groundnut genome and designed primers for 5,698 InDels with >50 bp size. Further, randomly selected InDel markers were validated for their functionality and usefulness in studying the genetic relationship in a very diverse germplasm sets. The information on InDel markers is a very useful genomic resource for the groundnut research community for using them in an array of genetic and breeding applications.

AUTHOR CONTRIBUTIONS
MKV and MKP: performed most of the experiments; SMK: performed InDel marker discovery and primer designing; MS, TN, and YS: generated genotyping data on diverse germplasm panel; MKV, MS, and VG: analyzed genotyping data and conducted genetic analysis; MKV and MKP: interpreted the results; MKV: drafted the MS; MKP and RKV: improved the manuscript; MKP and RKV: conceived, designed, and supervised the study and finalized the manuscript.