Genome-Wide DArTSeq Genotyping and Phenotypic Based Assessment of Within and Among Accessions Diversity and Effective Sample Size in the Diverse Sorghum, Pearl Millet, and Pigeonpea Landraces

Germplasm should be conserved in such a way that the genetic integrity of a given accession is maintained. In most genebanks, landraces constitute a major portion of collections, wherein the extent of genetic diversity within and among landraces of crops vary depending on the extent of outcrossing and selection intensity infused by farmers. In this study, we assessed the level of diversity within and among 108 diverse landraces and wild accessions using both phenotypic and genotypic characterization. This included 36 accessions in each of sorghum, pearl millet, and pigeonpea, conserved at ICRISAT genebank. We genotyped about 15 to 25 individuals within each accession, totaling 1,980 individuals using the DArTSeq approach. This resulted in 45,249, 19,052, and 8,211 high-quality single nucleotide polymorphisms (SNPs) in pearl millet, sorghum, and pigeonpea, respectively. Sorghum had the lowest average phenotypic (0.090) and genotypic (0.135) within accession distances, while pearl millet had the highest average phenotypic (0.227) and genotypic (0.245) distances. Pigeonpea had an average of 0.203 phenotypic and 0.168 genotypic within accession distances. Analysis of molecular variance also confirms the lowest variability within accessions of sorghum (26.3%) and the highest of 80.2% in pearl millet, while an intermediate in pigeonpea (57.0%). The effective sample size required to capture maximum variability and to retain rare alleles while regeneration ranged from 47 to 101 for sorghum, 155 to 203 for pearl millet, and 77 to 89 for pigeonpea accessions. This study will support genebank curators, in understanding the dynamics of population within and among accessions, in devising appropriate germplasm conservation strategies, and aid in their utilization for crop improvement.

Germplasm should be conserved in such a way that the genetic integrity of a given accession is maintained. In most genebanks, landraces constitute a major portion of collections, wherein the extent of genetic diversity within and among landraces of crops vary depending on the extent of outcrossing and selection intensity infused by farmers. In this study, we assessed the level of diversity within and among 108 diverse landraces and wild accessions using both phenotypic and genotypic characterization. This included 36 accessions in each of sorghum, pearl millet, and pigeonpea, conserved at ICRISAT genebank. We genotyped about 15 to 25 individuals within each accession, totaling 1,980 individuals using the DArTSeq approach. This resulted in 45,249, 19,052, and 8,211 high-quality single nucleotide polymorphisms (SNPs) in pearl millet, sorghum, and pigeonpea, respectively. Sorghum had the lowest average phenotypic (0.090) and genotypic (0.135) within accession distances, while pearl millet had the highest average phenotypic (0.227) and genotypic (0.245) distances. Pigeonpea had an average of 0.203 phenotypic and 0.168 genotypic within accession distances. Analysis of molecular variance also confirms the lowest variability within accessions of sorghum (26.3%) and the highest of 80.2% in pearl millet, while an intermediate in pigeonpea (57.0%). The effective sample size required to capture maximum variability and to retain rare alleles while regeneration ranged from 47 to 101 for sorghum, 155 to 203 for pearl millet, and 77 to 89 for pigeonpea accessions. This study will support genebank curators, in understanding the dynamics of population within and among accessions, in devising appropriate germplasm conservation strategies, and aid in their utilization for crop improvement.

INTRODUCTION
Plant genetic resources include landraces, wild and weedy relatives, improved cultivars, etc. which are of potential value as a resource for present and future generations of people. Landraces occupy a major portion in collections conserved in genebanks. Landraces possess a multifaceted evolutionary history and a vast diversity, primarily associated with humans, also influenced by both natural and farmers' informal selections (Hawkes, 1983). The high variability and genetic diversity of landraces are wellknown. Harlan (1965) reported the gene-flow from weeds to landraces and several other authors (Ellstrand et al., 1999;Jarvis and Hodgkin, 1999;Messeguer, 2003;Gompert and Buerkle, 2016) reported the transfer and diffusion of genes into landraces from various sources in both self and out-crossing species. Harlan (1971) emphasized landraces as genetically dynamic populations, and a result of millennia of artificial and natural selection, also Hawkes (1983) described landraces as highly diverse populations or a mixture of heterogenous genotypes, and several other authors proposed various definitions to landraces, explaining their heterogeneity and genetic nature (Brown, 1978;Martin and Adams, 1987;Astley, 1991;Michaelis et al., 1991). Brown (1978), Bellon (2009), and Frankel and Soulé (1981) explained the occurrence of within and between population genetic variation in landrace populations and further explained the within-population diversity is mainly an effect of heterogeneity over space and time. Many pieces of literature are available emphasizing the high variability in landraces, however, only a few studies are available investigating diversity within individuals of landrace accessions that are conserved in genebanks (Busso et al., 2000;Bhattacharjee et al., 2002), while few other studies focused on diversity within landrace populations conserved in situ (Djè et al., 1999;Pressoir and Berthaud, 2004;Dreisigacker et al., 2005;Al Khanjari et al., 2007;Jones et al., 2008;Hagenblad et al., 2012;Kyratzis et al., 2019).
Therefore, understanding the diversity within landraces is essential to make sure that, in genebanks the genetic integrity of a given accession is maintained with its innate variability and diversity without losing any rare allele variants. The major cause for allele loss in genebank accessions is genetic drift when accessions are regenerated with small sample sizes (Crossa, 1995). Mode of pollination being the key factor governing the frequencies of alleles within different individuals of a population, it influences the variability, quantum of diversity, gene flow and population dynamics behind evolution. Hammer et al. (1996) explained the effect of mode of pollination on genetic erosion of landraces, Zeven (1998) explained the attainment of gradual homozygosity within inbreeding landrace populations, and Villa et al. (2005) explained the influence of mode of reproduction in alteration of genetic structure of landraces. Genebanks exercise various scientific strategies to preserve the inherent genetic variability within each accession with theoretical foundations of various population genetic considerations, mainly the mode of reproduction, allelic frequencies, distribution of allelic variations, the proportion of rare alleles, etc. to maintain the genetic integrity of an accession. Rare alleles, however, are easily susceptible to random genetic drifts and can be lost permanently (Ramanatha Rao and Hodgkin, 2002) when handled with inadequate scientific knowledge about the underlying population dynamics. Thus, appropriate conservation strategies with statistically estimated population sizes should be followed. In this study, we have chosen three crops that differ in pollination behavior, including highly cross pollinated pearl millet (>85%) (Burton, 1983), and oftencross pollinated sorghum (about 18%) (Barnaud et al., 2008) and pigeonpea (about 30%) (Saxena et al., 1990), to comparatively assess the within and between accession diversity. Landraces of these crops possess large variability within accessions, therefore chosen for this study.
Classical molecular markers used to assess the genetic diversity in these crops included SSR (Budak et al., 2003;Chandra-Shekara et al., 2007;Bashir et al., 2015), RFLP (Bhattacharjee et al., 2002;Govindaraj et al., 2009), ISSR (Kumar et al., 2006;Animasaun et al., 2015), RAPD (Chowdari et al., 1998;Chandra-Shekara et al., 2007), SRAP (Xie et al., 2010), etc. However, these molecular markers had constrains such as high cost of genotyping per sample and most of these technologies are gel-based and lacked the ability to rapidly analyze large number of marker loci. Recent technological developments in high throughput genotyping overcame these limitations and technologies like DArTSeq, by combining DArT (Diversity Array Technology) with NGS (Next Generation Sequencing), offered the flexibility of genome-wide characterization of germplasms, even without prior sequence information, parallelly providing a low-cost platform for high throughput marker genotyping. Several studies using DArTSeq on diversity and population structure assessments have been reported on various crops evidencing the potential scope of this technology in diversity assessment (Pailles et al., 2017;Raman et al., 2017;Barilli et al., 2018;Edet et al., 2018;Ndjiondjop et al., 2018). What makes DArTSeq to stand apart from other GBS (Genotyping By Sequencing) techniques is their method of complexity reduction that are targeted over the genomic coding regions and the additional advantage of genotyping without prior sequence information extents its scope even toward the under researched wild accessions. It also offers relatively better genome coverage with high reproducibility as DArTSeq is performed at higher sequencing depths and uses strict filtering criterions, it generates markers with less missing data compared to other GBS approaches.
With these background, this study aims (i) to assess genotypic (DArTSeq) and phenotypic characterization of geographically representative diverse sorghum, pearl millet, and pigeonpea landraces and wild accessions to comparatively investigate the extent of diversity within and among accessions, and (ii) to assess the minimum sample (population) size required to capture 95% of the alleles with an expected probability of 95%, from the least frequent allele or from the frequency of the rarest allele for each accession. The scope of this study aims to benefit genebank curators in understanding the dynamics of population within and among accessions, and devising proper sampling strategies (sample size) while regeneration, for effective genebank management and for their utilization in crop improvement. To the best of our knowledge, this study is the first of its kind, and no studies were found utilizing NGS for investigating within accession diversity and sample size estimations, particularly for sorghum, pigeonpea, and pearl millet.

Plant Material
This study investigated a total of 108 geographically diverse accessions of sorghum, pearl millet, and pigeonpea (Supplementary Tables 1-3) (Figure 1), conserved at ICRISAT genebank. Accessions of sorghum included 31 landraces and 5 wild accessions, collected from 26 different countries from 5 different continents, consisted of all the 5 basic races and all 10 intermediate races as classified by Harlan and de Wet (1972). Accessions of pearl millet consisted of 33 landraces and 3 wild accessions, collected from 19 different countries from 2 different continents, and accessions of pigeonpea included 36 landraces collected from 34 different countries from 5 different continents. All these 108 accessions were raised in fields during post-rainy 2018 at ICRISAT, Hyderabad, for phenotypic and genotypic characterization. Sorghum accessions were sown on black soil, whereas pearl millet and pigeonpea were sown on red soil. Accessions of sorghum occupied three-rows of 9 m length, spaced 75 cm between rows, with a plant-to-plant spacing of about 10 cm. Accessions of pearl millet were laid in 4-meter rows, with each accession occupying 4 rows, spaced 75 cm between rows and 10 cm between plants. Each accession of pigeonpea occupied two rows of 9-meter length, spaced ∼75 cm between rows and 50 cm between plants.

DNA Extraction, Complexity Reduction and Genotyping
Individual plants within each accession of sorghum, pearl millet, and pigeonpea were labeled with unique plant ID, and leaf samples were collected from 15 plants in each accession of sorghum and pigeonpea, and 25 plants from each accession of pearl millet, totaling a 540, 900 and 540 samples in sorghum, pearl millet, and pigeonpea respectively. Leaf samples were collected from 15 days old seedlings of sorghum and pearl millet, and 2-month old seedlings of pigeonpea. Collected leaf samples were sealed in zip lock bags or collected using the PCR plates with corresponding plant ID for each sample and packed with ice cubes, and sent for DNA extraction on the same day. The DNA extraction was carried out following the procedure reported by Mace et al. (2003) and the extracted genomic DNA samples were sent to DArT Private Limited in Canberra, Australia (www. diversityarrays.com) for DArTSeq genotyping.

SNP Filtering
The SNP markers from DArTSeq were filtered with a maximum threshold of 95% reproducibility, 80% call rate for markers, and 50% missing values over samples. The SNPs were not filtered for FIGURE 1 | Geographical distribution of the sorghum, pigeonpea, and pearl millet accessions used in this study over continents.
Frontiers in Plant Science | www.frontiersin.org minor allele frequencies (MAF) in order to preserve the rare allele variants, which have the main part of the focus in this study.

Phenotypic Evaluation
To capture maximum phenotypic variability, all the individual plants within each accession were labeled with unique plant ID and data on both qualitative and quantitative traits (Supplementary Table 4) were recorded for all the 3 crops following the respective crop descriptors (IBPGR and ICRISAT, 1993a,b,c), throughout the growing season. In this study, a large number of plants including those plants that were used for DArTSeq and also plants that were not sampled for DArTSeq were phenotyped. The total plant count for each accession ranged from 115 to 234 in sorghum, 51 to 116 in pearl millet. However, in pigeonpea, only 35 accessions had plant count over 10. Two accessions of pigeonpea had a plant count of <14, and remaining accessions had plant counts between 21 and 33. Thus, only data from the 35 accessions of pigeonpea was used for phenotypic analysis.

Phenotypic Data Analysis
Descriptive statistics such as mean, standard deviation, and standard error were computed for quantitative traits to assess the spread and distribution of the data. Preliminary analysis of phenotypic data included investigating diversity among accessions using the mean and range values. This was followed by post-hoc tests, which included Student Newman Keuls test (Newman, 1939;Keuls, 1952) and Levene's test (Levene, 1960) to verify statistical significance between means and homogeneity of variances, respectively. Gower distance metric (Gower, 1971) was used for within accession diversity assessment using both quantitative and qualitative data. Pairwise distances between individual plants were subjected to the ward.D2 agglomerative clustering algorithm (Murtagh and Legendre, 2014) with 100 bootstraps. The same set of analyses were applied to all three crops. R software v.3.6.0 (R Core Team, 2019) was used with R-CRAN packages like "cluster" (Maechler et al., 2019) for Gower's distance computation, "fpc" (Hennig, 2020) and "pvclust" (Suzuki et al., 2019) for bootstrapped clustering, "car" (Fox and Weisberg, 2019) and "agricolae" (de Mendiburu, 2013) for SNK test and Levene's test, respectively.

Genotypic Data Analysis
DArTSeq derived SNP data after filtering were used for analysis. Analysis of Molecular Variance (AMOVA) was computed as proposed by Excoffier et al. (1992), which partitioned the total variance into within and among population variance components. AMOVA was carried out considering each accession (with 15 or 25 individuals) as a separate population. For testing the significance, results of AMOVA were subjected to Monte Carlo's estimate of p-values with 99 permutations. Heterozygosity was estimated as reported by Nei (1973). For diversity assessment, Euclidean based modified Roger's distance metric (Goodman and Stuber, 1983) was used and distances between individual plants were computed, which was followed by ward.D2 agglomerative clustering (Murtagh and Legendre, 2014) and a dendrogram was produced. The "clusterboot" function from the R-package "fpc" (Hennig, 2020) and the "aboot" function from the R-package "poppr" (Kamvar et al., 2015) were used to evaluate the clusters with 100 bootstraps. Shannon diversity (H ′ ) (Shannon, 1948) was calculated for each accession using the formula, Where p i is the estimated frequency of the allele "a" on the whole sample and A is the total number of alleles in the sample.
Population structure was assessed by DAPC (Discriminant Analysis of Principle Components) using posterior membership probabilities while assessing the membership stability by estimation of a-scores. Phenotypic and genotypic distance matrices were subjected to Mantel's correlation with permutation tests (Mantel, 1967). The minimum seed sample size required to capture 95% of alleles within an accession with a 95% certainty, during sampling for regeneration, was calculated as reported by Crossa (1989) for each accession. Considering the rarest biallelic locus (SNP), two alleles B 1 and B 2 with frequencies of p 1 and p 2 , so that p 1 + p 2 = 1 , the two possible outcomes will be, k 1 = B 1 is not represented in the sample of n gametes k 2 = B 2 is not represented in the sample of n gametes Thus the probability of getting at least one copy of the each B 1 and B 2 will be P k c 1 ∩k c 2 , All the above-mentioned analyses were performed using R software v.3.6.0 (R Core Team, 2019). Custom scripted codes were used for filtering, distance matrix, heterozygosity estimations, and seed sample size computations, also packages from R-CRAN and GitHub like "adegenet" (Jombart, 2008) and "ade4" (Dray et al., 2007) were used for computation of AMOVA and Mantel's test, respectively.

Descriptive Statistics and post-hoc Tests
The variations in the mean and range estimates indicated considerable variability among landraces and wild accessions of sorghum, pearl millet and pigeonpea. The SNK test indicated significant (p ≤ 0.05) mean differences among accessions (Supplementary Table 5). Levene's test indicated heterogeneous variances for all the quantitative traits in sorghum, pearl millet, and pigeonpea (Supplementary Table 6).

Phenotypic Diversity: Within and Between Accessions
The Gower's phenotypic distance matrix (Gower, 1971) was computed to obtain pairwise distances between plants of all the accessions. Within accession distances varied from 0.038 to 0.141, 0.145 to 0.271, and 0.071 to 0.410 for sorghum, pearl millet, and pigeonpea, respectively (  Hierarchal clustering was constructed based on Gower's phenotypic distance, and the number of clusters was decided based on the number of accessions in each crop from which the data were collected. Thus, dendrogram trees were cut at 36 clusters for sorghum and pearl millet, and 35 clusters for pigeonpea, with the assumption that the individuals within accession clusters together. A cluster membership barplot was generated to visualize distribution or migration of individuals of different accessions to different clusters. The cluster wise stability was evaluated using the "clusterboot" function from the "fpc" package. The Jaccard coefficients between clusters of resampled data were >70 for 35 clusters in sorghum, 15 clusters in pigeonpea, and 16 clusters in pearl millet (Supplementary Table 7) and the remaining clusters showed values <70. The bootstrapped cluster dendrograms were plotted with approximately unbiased p-values (AU) and bootstrap probability (Supplementary Figures 1A-C) calculated using multiscale bootstrap resampling in the R-package "pvclust." Bootstrap values were low in some cases of pearl millet and pigeonpea and this low bootstrap values would be a combined outcome of high variability in the data, large number of variable individuals, and the nature of clustering algorithm. Supporting the high variability and presence of valid clusters in the data, the "pvpick" function from the "pvclust" R-package yielded 66, 738, and 109 significant clusters in sorghum, pearl millet, and pigeonpea, respectively. Thus, the presence of large number of significant clusters within the studied accessions illustrates the higher variability for the observed traits and ultimately represents the higher diversity within the studied landraces. In sorghum, except cluster numbers 7, 13, and 14 all other 33 clusters have shown exclusive clustering of each accession into singleton clusters (Figure 2A). In cluster number 7, individuals of entries IS 8330 and IS 12965 were found to clustered together. The individuals of accession IS 2153 were found to be distributed in two clusters (58 individuals in cluster 14 and 139 individuals in cluster 13). In pearl millet and pigeonpea, clustering patterns showed that in most accessions, the individuals were not clustered uniquely, and found mixed with other accessions. In pigeonpea all the individuals of three accessions ICP 9150, ICP 7035, and ICP 11485 were clustered in clusters 1, 10, and 14, respectively. However, in ICP 9124, except a single individual all the other individuals were clustered in cluster 16 ( Figure 2B). In pearl millet, no exclusive clusters were observed and all the 36 clusters showed mixing of individuals from different accessions ( Figure 2C). However, The majority of individuals of wild accessions (IP 21640, IP 21752, and IP 22039) were distributed in 3 clusters (C-1, C-2, and C-3) showing their phenotypic similarity.

Genotypic Diversity
After filtering, we obtained 45,249 SNPs from a total of 76,753 SNPs in pearl millet, 19,052 SNPs from a total of 38,898 SNPs in sorghum, and 8,211 SNPs from a total of 10,096 SNPs in pigeonpea. The SNPs displayed good coverage across genome in all the three crops (Figure 3). Over the 10 chromosomes of sorghum the number of SNPs ranged from 909 to 2,988, and over the 7 chromosomes of pearl millet the number of SNPs ranged from 5,086 to 6,639, and from 121 to 755 over the 11 chromosomes of pigeonpea. The information of number of SNPs in each chromosome of sorghum, pearl millet and pigeonpea is presented in Supplementary Table 8.

AMOVA
The analysis of molecular variance (Excoffier et al., 1992) was performed by providing predefined populations, that each accession as a separate population. The results showed that the proportion of molecular variance contributed by within accession variance depicted a low value of 26.3% in sorghum, a relatively higher value in pigeonpea (57.0%), and the highest in pearl millet (80.2%) (Table 2; Figure 4). Variance among populations was high in sorghum (73.7%), while low in pearl millet (19.8%) and intermediate in pigeonpea (43%).

Genotypic Diversity: Within and Between Accessions
Modified Rogers Distance (MRD) (Wright, 1978;Goodman and Stuber, 1983) between pairs of individuals were estimated. Pairwise MRD within each accession was averaged, thus the overall mean genetic distance within each accession varied from 0.031 (IS 33844) to 0.342 (IS 18833), 0.181 (IP 9824) to 0.300 (IP 22039), and 0.040 (ICP 9150) to 0.393 (ICP 10889) in sorghum, pearl millet, and pigeonpea, respectively ( Table 3). Three of the five wild accessions studied in sorghum namely IS 18833 (0.342), IS 14485 (0.329), and IS 10897 (0.316), showed higher within accession distance values relative to the studied landraces and the other two wild accessions, IS 11005 (0.119) and IS 22428 (0.127), showed midrange values. However, all the studied wild accessions of pearl millet, IP 21752 (0.273), IP 21640 (0.279), and IP 22039 (0.300), showed higher within accession distance values relative to the mean distances of the pearl millet landraces studied. Averaging the MRD among accessions were found to be higher in comparison to within accession distances. Comparing the three crops, higher scale Frontiers in Plant Science | www.frontiersin.org FIGURE 3 | SNP densities over chromosomes of sorghum (S-Chr 1-10, the letter "S" represents sorghum), pigeonpea (C-Chr 1-11, the letter "C" represents Cajanus), and pearl millet (PM-Chr 1-7, the letter "PM" represents pearl millet) after filtering for quality parameters. The x-axis represents SNP positions along each chromosome (MB) and the y-axis represents SNP densities over chromosomes. Heterozygosity among accessions in sorghum varied from 0.019 in IS 31637 to 0.159 in IS 18833. Among five wild accessions studied in sorghum, three accessions had higher heterozygosity   (Table 3).
Clustering based on MRD grouped the individuals of all the accessions into different clusters. The cluster wise stability was assessed using the "clusterboot" function from R-package "fpc" with 100 bootstraps. About 33 clusters in sorghum, 24 clusters in pigeonpea and 19 clusters in pearl millet showed Jaccard coefficient values >70 and all the other clusters showed values <70. The Jaccard coefficient values of all the clusters were presented in Supplementary Table 9. Bootstrapped dendrograms ( Supplementary Figures 2A-C) with 100 bootstraps were plotted using the "aboot" function provided in the R-package "poppr." The dendrogram tree was cut at 36 clusters considering number of accessions in the respective crops, with an assumption that the individuals of each accession should aggregate into singleton cluster. Also a cluster membership bar-plot was used to visualize this cluster partition and migration of plants to different clusters.  (Figure 5A), while the landraces IS 29508 and IS 29605 were found to be grouped in a single cluster. All other accessions of sorghum were found to have mixtures. In pigeonpea, ICP 2309, ICP 9124, ICP 7057, ICP 9877, ICP 11480, ICP 14059, ICP 13628, ICP 7035, ICP 9150, ICP 13828, and ICP 15122 showed perfect singleton clustering (Figure 5B), while other accessions showed overlapping of individuals of different accessions which may be explained due to the heterogeneity achieved in evolutionary gene-flow or the presence of admixtures in the respective accessions. In pearl millet, a completely distinctive and complex distribution of accessions into clusters has been noticed. The accessions IP 9824, IP 7468, IP 11577, IP 11677, IP 13363, and IP 19434 showed perfect singleton clustering ( Figure 5C) while all other accessions were not clustered uniquely to singleton clusters, indicating heterogeneity within landraces and sharing of alleles between accessions. The wild accessions of pearl millet showed an interesting pattern of clustering that the individuals of the accession IP 22039 was shared between cluster numbers 11 and 12 showing the presence of a variable set of alleles or two subpopulations, and also there can be seen some individuals of the accession IP 21752 clustered with the individuals of the accession IP 21640 in cluster number 14 depicting some similar alleles between these two accessions. 3 | Mean genotypic within and between accession distances, observed heterozygosity within accessions (heWs) and estimated seed sample sizes using the least DArTSeq-SNP allelic frequency in all the accessions of sorghum, pigeonpea, and pearl millet.

Shannon Diversity
The Shannon diversity was estimated for all 36 accessions in sorghum, pigeonpea, and pearl millet (

Relationship Between Phenotypic and Genotypic Distances
Mantel's correlation between phenotypic and genotypic distance matrices showed highly significant positive correlation (r = 0.45, P ≤ 0.01) for sorghum, pearl millet (r = 0.13, p ≤ 0.01), and pigeonpea (r = 0.19, P ≤ 0.01), thus depicting the effectiveness of complimentary use of molecular and phenotypic tools as a better approach for the assessment of the genetic diversity.

Estimation of Seed Sample Size
Seed sample sizes required for regeneration to capture 95% of the alleles with an expected probability of 95%, was estimated based on the allelic frequencies of the DArTSeq-SNPs, for each accession using the model proposed by Crossa (1989). The results of the sample sizes required are given in Table 3. Seed sample sizes for sorghum ranged from 47 to 101, 155 to 203 for pearl millet, and 77 to 89 for pigeonpea. The seed sample size increments exponentially after the alternate allele frequency attains a value below 0.1 (Figure 6) depicting the need for an exponentially larger sample size for conserving the alleles with frequencies below 0.1. The number of rare allelic variants or markers (frequency less than or equal to 5% within accessions) preserved in the recommended sample size for each accession of the three crops (Supplementary Table 11) ranged from 345 to 3,075 in sorghum, 231 to 878 in pigeonpea, and 3,444 to 6,726 in pearl millet.

DISCUSSION
Sorghum, pearl millet, and pigeonpea are the important food crops, providing food and income to a large population thriving in the arid and semi-arid tropics. However, in this era of modern agriculture, landraces of these crops are becoming prone to genetic erosion, (Hammer et al., 1996;Shewayrga et al., 2008;Pattanashetti et al., 2016). Most landraces, that were permanently extinct from the farmers' field over the course of agricultural development, are only available in genebanks' collections. As each landrace possess a unique genetic fingerprint of ages of acclimatization to diverse environmental conditions, they are considered as an indispensable source of genetic variations by plant breeders and can address a potential scope in the development of improved varieties with higher productivity, nutrients, and climate resilience, etc. . Thus, conserving landraces with their inherent genetic variability is crucial for ensuring food security in the near future and also for sustainable agriculture. ICRISAT genebank conserves about 42,000 accessions of sorghum, 24,000 accessions of pearl millet, and over 13,000 accessions of pigeonpea, wherein about 86% of sorghum and pearl millet collections and over 60% of pigeonpea collections are landraces. The main focus of genebank curator is to maintain the genetic integrity and diversity within accessions while regeneration. Hence, this study assessed the diversity within landrace accessions by phenotyping and genotyping a large number of plants within each accession and estimated the seed sample size required in order to conserve the inherent diversity.
Enormous variability was observed within and among landraces of sorghum, pigeonpea, and pearl millet. Molecular variance within accessions was observed to be low in sorghum (26.3%), highest in pearl millet (80.2%), while pigeonpea showing an intermediate within accession variance of 57.0%. Our results are in correspondence with previous works, by various authors (Tostain et al., 1987;Tostain and Marchais, 1989;Busso et al., 2000;Bashir et al., 2015) on pearl millet landraces, reported a high intra-population variation of 70-90% and higher observed heterozygosity of 0.77-0.82. However, Bhattacharjee et al. (2002) reported a low 30.89% within accession variability using RFLP markers in pearl millet, also the author addressed this low variability as a contradiction for a cross-pollinated crop like pearl millet and discussed various instances that would have caused this lower variability. In sorghum, Adugna (2014) reported a 54.44% molecular variance due to diversity within landrace populations that were conserved on farms in Ethiopia. No studies investigating landraces diversity within accessions were reported in sorghum and pigeonpea, while few studies are on landraces conserved on-farm that are continually evolving through outcrossing and selections (Djè et al., 1999;D'Andrea and Casey, 2002;Songok et al., 2010;Adugna, 2014;Bashir et al., 2015).
The phenotypic and genotypic within accession distances were scaled toward the higher values in pearl millet, so that blurring the differentiation of within and between accessions diversity. The density distribution of within and between accession distances in pearl millet showed this scenario clearly, exhibiting the merging of densities (Figures 7C,F) of within and between accession distances in both phenotypic and genotypic evaluation. Pigeonpea being often cross-pollinated also depicted a pattern of overlapping within and among accession distances in both phenotypic and genotypic evaluation (Figures 7B,E). Whereas, sorghum showed a clear separation of distances within accessions from distances between accessions in both phenotypic and genotypic assessment, depicting the higher uniformity and homogeneity within the accessions (Figures 7A,D). The higher values and merging of between and within accession distances in pearl millet and pigeonpea shows the high phenotypic and genotypic heterogeneity within accessions, and also the clear separation of densities of within and between accession distances in sorghum clearly explains the higher uniformity within the accessions of sorghum.
Population structure analyses indicated that most of the accessions in sorghum were uniform enough to cluster individuals of single accession together as a singleton clusters. In sorghum, accessions IS 29508 and IS 29605 were clustered into a single population in both DAPC and ward.D2 clustering with genotypic distances, indicating the presence of high similarity and common alleles in these two accessions. The accession IS 33844 showed high uniformity with a low within accession diversity (0.031), and a selection from this landrace has been released as a variety in India as 'Parbhani moti' (Upadhyaya and Vetriventhan, 2018). Pearl millet and pigeonpea showed a higher heterogeneity within accessions, while most of the accessions FIGURE 7 | Comparison graph of densities of within and between accession distances. (A-C) denotes the phenotypic distance distribution for sorghum, pigeonpea, and pearl millet, respectively, and (D-F) denotes the genotypic within and between accession distance distribution estimated from DArTSeq-SNP data of sorghum, pigeonpea, and pearl millet, respectively.
showed mixed populations. In pigeonpea and pearl millet only some accessions showed singleton clustering. High population mixtures in these crops correspond to their pollination behavior, and sharing of alleles between populations. Landraces generally differ between populations, based on the intensity of selection imposed by farmers, their pollination mechanisms, the level of gene-flow within and between population, and level of exchange of seed materials between farmers. Previously several authors reported pollen flow between populations and the mixing of landrace populations in sorghum and pigeonpea (Songok et al., 2010;Kassa et al., 2012;Adugna, 2014;Westengen et al., 2014). Harlan (1965) reported the gene-flow from weeds to landraces and several other authors (Ellstrand et al., 1999;Jarvis and Hodgkin, 1999;Messeguer, 2003;Gompert and Buerkle, 2016) reported the transfer of genes into landraces from various sources in both self and outcrossing species. Also, some studies reported the mixing of the population by a considerable exchange of seeds within cultures or regions (Louette, 1997). The level of heterogeneity and diversity in landraces are crop-specific and associated with their mode of fertilization (Villa et al., 2005) and also several authors (Hammer et al., 1996;Zeven, 1998) stated the influence of mode of pollination in various population genetics factors over the course of evolution of landraces. Hence, complying to the effect of mode of reproduction on diversity, a higher degree of outcrossing (about 85%) in pearl millet (Burton, 1983) could impose a higher diversity in pearl millet, in comparison to lower outcrossing crops such as sorghum (about 18%) (Barnaud et al., 2008) and an intermediate outcrossing crops (about 30%) like pigeonpea (Saxena et al., 1990), and this varies with species.
Most of the accessions that showed relatively higher within genetic distances in sorghum and pearl millet were wild accessions. Thus, using wild accessions in this study helped us in the comparative assessment with landraces and also aided in the better understanding of the effect of domestication and different evolutionary forces that shaped the landraces. Historically farmers conserving landraces on-farm and multiplied desirable phenotypes, which survived both natural and artificial selection. The effect of this farmers' selection led to local adaptations and variations within the landrace populations (Zeven, 1998). Teshome et al. (2016) studied the maintenance of landrace diversity in sorghum by farmers belonging to different regions in Ethiopia and reported a narrow preference to specific economic traits and selection by farmers. Thus, the wild accessions in this case lack of farmers' selections and its obligatory to be highly diverse as these are evolving under natural selection.
Comparing diversity of the three crops in our study, heterozygosity ( Figure 8A), phenotypic (Figure 8B), and genotypic ( Figure 8C) within accession diversity of sorghum were notably low for most of its accessions, intermediate for most of the accessions of pigeonpea and followed a more stable trend around the maximal values for pearl millet. Similar to the molecular within accession distances, Shannon diversity revealed diversity estimates, scaled over the higher values for pearl millet, followed by an intermediate in pigeonpea and lower estimates for sorghum ( Figure 8D). However, in sorghum and pigeonpea both highly diverse and highly uniform accessions with maximal and minimal estimates of genotypic distances and Shannon diversity were observed. The higher diversity estimates indicate the presence of higher variability within accessions. In case of pigeonpea most of the accessions were found to have molecular within accession distances <0.250 except two accessions viz., ICP 10880 (0.348) and ICP 10889 (0.393). On further investigation into the individual plant within accession distances of these accessions, it appeared that, some individuals within these accessions were diverse from all the other individuals of the respective accession. Such that, the accession ICP 10880 had two individuals that were highly divergent from all other individuals by a mean distance of 0.410 and 0.434. Also these individuals were found to cluster separately in hierarchal clustering. Same for the accession ICP 10889, where some individuals were highly divergent from the other. In case of sorghum, most of the accessions had a molecular within accession distances <0.250 except three wild accessions viz., IS 10897 (0.316), IS 14485 (0.329), IS 18833 (0.342) and one landrace IS 27325 (0.310). In the landrace IS 27325, it can be seen that the individuals are divided into three subgroups in hierarchal clustering. Thus, higher diversity in some landraces of sorghum and pigeonpea can be due their pollination behavior, which ultimately influences the population substructure. The lower outcrossing in these crops offers the higher probability of fixation of various alleles within a fewer members or individuals, restricting the frequency/occurrence of some allele within a small group of a landrace population, thus gradually over generations, forming distinct subpopulations within groups. These varied groups of individuals are however not phenotypically variable enough to consider it as separate population, but however assimilated a genetically distinct fingerprint from various elements throughout the course of evolution. Similar cases of extreme values of low and high diversity were previously encountered by researchers. Zeven (1998) emphasized the low diversity and increased homozygosity in inbreeding accessions and also explained the influence of farmers' selection and sampling strategies for reduction of diversity in landrace populations. Adugna (2014) and Westengen et al. (2014) found both high and low within-population diversity in sorghum landraces cultivated in Ethiopia and reasoned the low within landrace diversity could be due to farmers' sampling during migration, as farmers tend to carry few heads during migration and settlements.
Based on the level of diversity within each accession of different crops, appropriate conservation and regeneration strategy should be followed to conserve the genetic integrity and diversity of landraces. ICRISAT genebank follows various pollination control and sampling strategies to maintain the genetic integrity and diversity within accessions, while regenerating different crops. Theoretically, selfing will be a good strategy to maintain the genetic integrity and diversity in selfpollinated crops and often-cross pollinated crops (out-crossing >5%), because of the low effect of inbreeding depression, and to preserve alleles within the population. In cross-pollinated species like pearl millet, sib mating is the best strategy to mimic the random mating, and for that ICRISAT genebank performs cluster bagging (bagging few panicles of different individuals of the same accession) that reduces the effect of inbreeding depression. However, in both cases, the appropriate population size needs to be ensured while regeneration for capturing the rare alleles. Small sample sizes while regenerating landraces may lead to genetic drift which results in the loss of some rare alleles. Crossa (1989) based on his results on stimulated populations, reported a practical system for maize regeneration, wherein the author discussed that the ideal system of regeneration involves equalizing the genetic contribution of parents and avoiding small population sizes and, also Crossa (1995) suggested a practical seed sample size of 130-200 in monoecious crops for retaining the rare alleles in most of the loci. FAO standards specify a sample size of 30 individuals in a completely random mating population and 60 individuals for completely selfing species to capture 95% of the alleles which have a frequency >0.05 (FAO, 2014). However, in sorghum, pigeonpea, and pearl millet, no detailed molecular studies were done previously utilizing NGS tools to determine optimum population size requirements for regeneration. Therefore, we estimated the minimum sample size to capture 95% of the SNP alleles spread throughout the whole genome with an expected probability of 95% based on the least frequent allele or the frequency of the rarest allele for each accession following Crossa (1989). From our study, seed sample sizes were found to be minimal for sorghum (47-101), and pigeonpea (77-89), and high for pearl millet (155-203). The sample size required to conserve the genetic integrity of germplasm depends largely on the frequency of the least common alleles or genotypes.
In conclusion, sorghum, pigeonpea, and pearl millet accessions showed higher within and among accession diversity, indicating that the regeneration strategies at ICRISAT genebank are appropriate to ensure the genetic integrity of each accession. Information from this study will support genebank curators in understanding within accession variability and assists in devising scientific sampling strategies (sample size) for regeneration to maintain the genetic integrity and variability. This could also help breeders in the utilization end to understand the population dynamics and subpopulation structure, to forward the material with appropriate breeding techniques.

AUTHOR CONTRIBUTIONS
VCRA, MV, SD, and AR contributed to conception and design of the study. VCRA, MV, VA, RS, VK, PS, and SR conducted field experiments and data collection. This work is part of VA's thesis research. SG supported student research as chairman. VA and MV performed the statistical analysis and wrote the first draft of the manuscript. VCRA reviewed and approved the first draft. All authors contributed to manuscript revision, read, and approved the submitted version.

FUNDING
We acknowledge funds by global diversity crop trust (GCDT) and genebank platform for funding support for this study.