The Prehistoric Indian Ayurvedic Rice Shashtika Is an Extant Early Domesticate With a Distinct Selection History

Fully domesticated rice is considered to have emerged in India at approximately 2000 B.C., although its origin in India remains a contentious issue. The fast-growing 60-days rice strain described in the Vedic literature (1900–500 B.C.) and termed Shashtika (Sanskrit) or Njavara (Dravidian etymology) in Ayurveda texts including the seminal texts Charaka Samhita and Sushruta Samhita (circa 660–1000 B.C.) is a reliable extant strain among the numerous strains described in the Ayurveda literature. We here report the results of the phylogenetic analysis of Njavara accessions in relation to the cultivars belonging to the known ancestral sub-groups indica, japonica, aromatic, and aus in rice gene pool and the populations of the progenitor species Oryza rufipogon using genetic and gene genealogical methods. Based on neutral microsatellite markers, Njavara produced a major clade, which comprised of minor clades corresponding to the genotypic classes reported in Njavara germplasm, and was distinct from that were produced by the ancestral sub-groups. Further we performed a phylogenetic analysis using the combined sequence of 19 unlinked EST-based sequence tagged site (STS) loci with proven potential in inferring rice phylogeny. In the phylogenetic tree also the Njavara genotypic classes were clearly separated from the ancestral sub-groups. For most loci the genealogical analysis produced a high frequency central haplotype shared among most of the rice samples analyzed in the study including Njavara and a set of O. rufipogon accessions. The haplotypes sharing pattern with the progenitor O. rufipogon suggests a Central India–Southeast Asia origin for Njavara. Results signify that Njavara is genetically distinct in relation to the known ancestral sub-groups in rice. Further, from the phylogenetic features together with the reported morphological characteristics, it is likely that Njavara is an extant early domesticate in Indian rice gene pool, preserved in pure form over millennia by the traditional prudence in on-farm selection using 60-days maturity, because of its medicinal applications.


INTRODUCTION
Among the numerous rice (Oryza sativa L.) strains described in the seminal Sanskrit compendia of Ayurveda medicine Charaka Samhita (Van Loon, 2003) and Sushruta Samhita (Bhishagratna, 1907) (circa 660-1000 B.C.) (Ninivaggi, 2010), Shashtika (or Sastika), which is grown in Kerala and known locally by the Dravidian name Njavara (Burrow and Emeneau, 1961;Singh, 1992;Sreejayan et al., 2011), is a reliable extant strain in the present day rice gene pool. The Yajur Veda and Atharva Veda (circa 1900-500 B.C.) (Ninivaggi, 2010) were the first Sanskrit texts to describe a fast-growing rice that matures at 60 days, although the authors did not use the term Shashtika (Nene, 2005). While, the people of Kerala have ritualistically preserved Njavara over centuries, for its grains are indispensable in the preparation of several acclaimed Ayurveda medicines (Singh, 1992;Sreejayan et al., 2011), the other rice strains described in Ayurveda texts are either extinct or we are unable to recognize them in the present day gene pool (Singh, 1992). Njavara grains are used in several Ayurveda treatments since time immemorial (Singh, 1992;Sreejayan et al., 2011). The hall mark of Njavara is its short crop duration and it matures in 60 days (Bhishagratna, 1907;Van Loon, 2003;Sreejayan et al., 2011). Farmers traditionally identify and select Njavara on-farm based on its 60-days maturity (Sreejayan et al., 2011).
Even after decades of research, the debate continues about when, where, and how many times rice was domesticated in the past. Proponents of both single origin (Molina et al., 2011;Huang X. et al., 2012) and the multiple origin (Londo et al., 2006;Civán et al., 2016;Wang et al., 2018) models for rice domestication agree that the Indo-Gangetic plains are a major center for rice domestication. But they differ in their opinion with respect to the way by which rice was evolved in this region. As per the single origin model, fully domesticated rice was evolved in the Gangetic plains from the pre-domesticated or wild rice populations that existed here following the introgression of alleles from the fully domesticated japonica, which was dispersed towards the South Asia from Yangtze River basin in china (Molina et al., 2011;He et al., 2011;Huang X. et al., 2012;Yang et al., 2012). According to the multiple origin models, the fully domesticated rice was evolved in the Gangetic plains by the primary domestication from the local populations of the progenitor species O. rufipogon (Londo et al., 2006;Civán et al., 2016). The archeobotanical data support the occurrence of rice in the Gangetic plains as early as circa 6000 B.C. (Saxena et al., 2006;Fuller et al., 2010;Gross and Zhao, 2014). However, it is not certain whether the Oryza bulliform phytoliths obtained from archeological sites in the Gangetic plains are from the cultigens or the wild rice populations (Fuller et al., 2010;Gross and Zhao, 2014). Bulliform phytolith morphometrics indicate that the development of fully domesticated rice occurred in northern India only in circa 2000 B.C. following a contact-induced hybridization between ancestral indica and fully domesticated japonica (Fuller et al., 2010). Recent studies have increasingly shown that genetic structure of rice is more complex than thought before and emphasize the importance of local gene pool in fine tuning the evolutionary trajectories of rice (Travis et al., 2015;Kim et al., 2016;Roy et al., 2016;Wang et al., 2018).
The Ayurveda texts Charaka Samhita and Sushruta Samhita represent a compilation of oral tradition of Ayurveda knowledge existed in the ancient northern India (Ninivaggi, 2010). Presumably, medicinal applications of Njavara grains must be well known when these texts as well as Yajur Veda and Atharva Veda were redacted. The supposed time period of the origin of these Sanskrit literatures (circa 1900-500 B.C.) (Ninivaggi, 2010) indicates a possible overlap with the suggested time period of the emergence of the fully domesticated rice in India (circa 2000 B.C.) (Fuller et al., 2010). Although, the archeology-genetic studies have added new insight into the domestication profile of rice (Gross and Zhao, 2014), only little effort has been so far made to employ linguistic-genetic approach to address rice domestication issues. The Njavara germplasm has not been so far subjected to a phylogenetics analysis to infer its oldness as depicted in the pre-historic Sanskrit literatures. Such phylogenetics treatment on Njavara as well as other rice strains may also provide us vital insight into the domestication profile of rice in India.
Previously, we identified six distinct genotypes in Njavara germplasm under four morphotypes (Sreejayan et al., 2011). However, the genetic relationships of Njavara to the canonical sub-groups indica, aus, aromatic, and japonica (tropical and temperate) identified in global rice gene pool (Garris et al., 2005) are not yet known, although previously we reported that Njavara is morphologically and genetically distinct from syntopic traditional cultivars grown in Kerala (Varghese et al., 2014). Another important question about Njavara is its possible center of origin. Because, although extensively mentioned in Charaka Samhita (Van Loon, 2003) and Sushruta Samhita (Bhishagratna, 1907) compiled in northern India (Ninivaggi, 2010), Njavara is currently grown and utilized for Ayurveda treatments only in Kerala State, which is located at the southern most region of India.
In this study, we asked three questions: (1) What is the position of Njavara in rice gene pool, particularly with respect to the canonical sub-groups identified in global rice gene pool (Garris et al., 2005)? (2) Do phylogenetics analyses provide a clue regarding the oldness of Njavara as depicted in prehistoric Sanskrit literatures? and (3) Where is the possible center of origin of Njavara. We analyzed 213 Njavara individuals, and 376 rice cultivars and 47 accessions of O. rufipogon sampled from different rice growing Southeast Asian countries using microsatellite markers, gene genealogy, and chloroplast genotyping. The data were rigorously analyzed and examined the genetic and phylogenetic interrelationships between Njavara and other sample sets.

Plant Materials
In this study, we analyzed 636 samples in total, and they included 589 accessions of Asian cultivated rice O. sativa L. and 47 accessions of the progenitor O. rufipogon Griff. The sample details are as follows: (1) O. sativa cv. Njavara: We began this study with 100 collections of Njavara, including the 80 collections reported earlier (Sreejayan et al., 2011;Varghese et al., 2014). The geographic origin of the collections is shown in Figure 1A. Sreejayan et al. (2011) assessed the intracollection genetic heterogeneity in 28 Njavara collections and identified 42 genetically distinct individuals. In this study, we first performed the intra-collection heterogeneity assessment of the remaining 72 collections as described in the Supplementary  Table 1a and then identified 171 genetically distinct individuals from 70 collections. Two collections were duplicates and thus were discarded. Thus, the Njavara germplasm used in this study for genetic analysis consisted of 213 genetically distinct individuals including the 42 previously identified individuals (Sreejayan et al., 2011)  Together, the isozyme (Glaszmann, 1987) and microsatellite analyses (Garris et al., 2005) classified the global rice gene pool into five ancestral sub-groups: indica, aus, aromatic, and japonica (tropical and temperate). We obtained 44 well-classified rice cultivars belonging to the different sub-groups (Glaszmann, 1987;Garris et al., 2005)

Microsatellite Analysis
Genomic DNA was extracted from a single plant using a GenElute Plant Genomic DNA Miniprep Kit (Sigma Genosys, Bangaluru, India) following the manufacturer's instructions. We selected 340 microsatellite markers covering all the 12 chromosomes of rice (Supplementary Table 1b) from the public database Gramene 1 , and the oligonucleotide primers were custom synthesized by Sigma (Sigma Genosys, Bangaluru, India). We performed an initial evaluation of these loci as described in the Supplementary Table 1b. We selected 70 loci that yielded discrete, unambiguous, and polymorphic alleles (Supplementary Table 4a) and genotyped 589 rice samples (Supplementary Tables 2, 3a-c). The loci were amplified according to the methods of Schuelke (2000). In each PCR assay, this method uses three primers: a sequence specific forward primer with universal M13 sequence (TGTAAAACGACGGCCAGT) attached to its 5 end, a sequence specific reverse primer and the universal fluorescently labeled M13 primer. For assaying microsatellite loci, PCR was carried out in a 10-µl reaction volume containing 15-ng template DNA, 0.5 unit Taq DNA polymerase (Ampli Taq Gold, Applied Biosystems, Foster City, CA, United States), 1X PCR buffer containing 1.5 mM MgCl 2 , 2 µM each of dNTPs, 0.4 µM of M13 tailed sequence specific forward primer, 1.2 µM of FAM labeled M13 primer, and 1.6 µM of the sequence specific reverse primer. Cycling was performed in a PCR machine (Mastercycler R Gradient, Eppendorf, Germany) under the following conditions: initial denaturation at 94 • C for 5 min followed by 30 cycles of 30 s at 94 • C; 45 s at 56 • C, 45 s at 72 • C, followed by 8 cycles of 30 s at 94 • C, 45 s at 53 • C, 45 s at 72 • C, and a final extension at 72 • C for 10 min. Microsatellite alleles were detected following capillary electrophoresis of PCR products on an ABI Prism 3730 genetic analyzer (Applied Biosystems, Foster City, CA, United States). For capillary electrophoresis, the samples were prepared by mixing 1 µl of PCR product with 10 µl of Hi-Di formamide (Applied Biosystems, Foster City, CA, United States) and 0.1 µl of size standard [GENESCAN R 400 HD (ROX) size standard, Applied Biosystems, Foster City, CA, United States]. The mixture was denatured at 94 • C for 5 min and snap cooled on ice before subjecting to capillary electrophoresis. Allele sizing was performed by using the software GENESCAN 3.1.2 (Applied Biosystems, Foster City, CA, United States) using the "Local Southern Method" and default analysis settings by comparison with the internal size standard ROX 400. Allele calling and sorting was performed using the software GENEMAPPER V. 4 (Applied Biosystems, Foster City, CA, United States) and binning was performed manually.
The genetic diversity parameters, such as the number of alleles per locus, major allele frequency, gene diversity, heterozygosity, and polymorphism information content (PIC), were determined from the allelic data using the software PowerMarker Version 3.25 (Liu and Muse, 2005). We examined the population genetic structure of the rice samples using two methods: the genetic distance-based unweighted pair-group method with arithmetic average (UPGMA) clustering using PowerMarker (Liu and Muse, 2005) and model-based Bayesian clustering as implemented in the software STRUCTURE Version 2.3.4 (Pritchard et al., 2000). We performed the cluster analysis using the C. S. Chord genetic distance, and the resulting tree was viewed using the program TreeView Version 1.6.6. The tree was evaluated by bootstrapping on 1000 replications. The STRUCTURE analysis was run 10 times for each K (the number of clusters) from 1-20 using a model that allows for the admixture and correlated allele frequencies. The optimal K-value in the data set was determined according to the method of Evanno et al. (2005) as previously described (Varghese et al., 2014). STRUCTURE probabilistically generate ancestry coefficients totaled 1 for each genotype. Following the threshold used by Garris et al. (2005), we assigned a genotype with an ancestry coefficient > 0.8 into the corresponding population and treated a genotype as admixed if the coefficient was < 0.8. To determine the level of between-population molecular differentiation (pair-wise F ST ), we performed an AMOVA using the software Arlequin Version 3.0 (Excoffier et al., 2005).

Chloroplast Genotyping
We determined the chloroplast genotype by assaying the chloroplast locus ORF100 (Kanno et al., 1993) Table 3d). Following an initial standardization of different primer combinations, we finally chose a primer combination consisting of the forward primer (5 CAACCCACCCCATAAAATTG 3 ) reported by Garris et al. (2005) and the reverse primer (5 CAGCCGAGGTCGTGGTAAATC 3 ) used by Chuayjaeng and Volkaert (2006) for genotyping the ORF100 loci. PCR was carried out in a 10-µl reaction volume which contained 2X GoTaq colorless master mix (Promega, Madison, United States), 10 ng genomic DNA and 1 µM each of forward and reverse primers. DNA amplification was performed in Veritti R 96-Well Thermal Cycler (Applied Biosystems, Foster City, CA, United States). Cycling conditions were 95 • C for 2 min followed by 40 cycles of 30 s at 95 • C, 40 s at 50 • C, 40 s at 72 • C, and concluding extension at 72 • C for 5 min. The PCR assay produced a single amplicon at the ORF locus in the size range of 405-477 bp across different accessions used in the study. Initially, we sequenced the amplicon in a set of cultivars and confirmed the 69-bp deletion (deletion type chloroplast) in the smaller-sized amplicons and no such deletion (no-deletion type chloroplast) in the larger amplicons (Kanno et al., 1993;Garris et al., 2005;Chuayjaeng and Volkaert, 2006). Thereafter, we fractionated the PCR products on 1.5% agarose gels and directly scored the deletion or no-deletion alleles via a visual observation of ethidium bromide gels on a gel documentation system.

Phylogenetic and Gene Genealogy Analysis
We randomly selected 19 unlinked expression sequence tag (EST)-based sequence tagged site (STS) loci (Supplementary  Table 4b) from the pool of loci used previously for studying the phylogenetic relationship between rice and O. rufipogon. A single primer pair can retrieve sequence from a STS locus in both rice and O. rufipogon. These STS loci follow neutral expectations and are appropriate for a phylogenetic study (Caicedo et al., 2007;Huang P.U. et al., 2012). We sequenced the chosen loci in 186 samples, including 43 Njavara individuals, 12 Njavara-like individuals (Supplementary  Table 3d). We custom-synthesized oligonucleotide primers for PCR amplification of the STS loci with Sigma (Sigma Genosys, Bangaluru, India) (Supplementary Table 4b). PCR was carried out in a 10-µl reaction volume which contained 1X PCR buffer (150 mM Tris-HCl, pH-8; 500 mM KCl), 0.2 mM each dNTPs (dATP, dGTP, dCTP, and dTTP), 2.5 mM MgCl 2 , 20 ng genomic DNA, 1 unit of AmpliTaq Gold DNA polymerase enzyme (Applied Biosystems, Foster City, CA, United States), 0.1 mg/ml BSA and 4% DMSO, and 5 pM each of forward and reverse primers. DNA amplification was performed in a Mastercycler R Gradient Thermal Cycler (Eppendorf, Germany). Cycling conditions were 94 • C for 5 min followed by 40 cycles of 94 • C (30 s), 55 • C (30 s-1 min), 72 • C (30 s-1 min), and concluding extension at 72 • C (5 min). The PCR products were cleaned using ExoSAP-IT TM (USB Corporation, United States) and subjected to bidirectional sequencing using a BigDye R Terminator v3.1 Cycle sequencing kit (Applied Biosystems, Foster City, CA, United States), which used the primers employed for amplification. The cleaned sequenced PCR products were run on an ABI Prism 3730 Genetic Analyzer (Applied Biosystems, Foster City, CA, United States). The sequence quality was assessed using the software Sequence Scanner v1 (Applied Biosystems, Foster City, CA, United States). Sequence editing and alignment were conducted using the software Geneious R.6 (Kearse et al., 2012). We subjected the sequences to homology searches at the National Center for Biotechnology Information (NCBI) database using the Basic Local Alignment Search Tool (BLAST) algorithm to confirm the authenticity of the obtained sequence.
We estimated the level of nucleotide diversity (Watterson's θ and π) and the number of haplotypes (h) and haplotype diversity (Hd) for each locus across the 186 samples using the software DnaSP Version 5.1 (Librado and Rozas, 2009) which was also used to perform the test of deviation from neutral expectations separately for rice and O. rufipogon samples based on Tajima's D, Fu and Li's D, and Fu and Li's F * statistics. We reconstructed the phylogenetic relationships between the 186 samples based on the concatenated sequence of the 19 STS loci using the software PAUP * Version 4.0 (Swofford, 2002) following neighbor-joining method. Bootstrapping was performed (500 replicates) to assess the robustness of the clustering in the trees.
The genealogical network of haplotypes from genes can better resolve reticulations, such as hybridizations and introgressions, observed in the evolutionary trajectories of a taxon and interconnect the descendant genes with the persistent ancestors (Posada and Crandall, 2001). We constructed a statistical parsimony network for the STS haplotypes using the software TCS Version 1.21 (Clement et al., 2000). The statistical parsimony algorithm estimates the number of differences in single nucleotide substitutions between haplotypes and then generates a pair-wise distance matrix between haplotypes based on the probability of parsimony for pair-wise differences until the probability exceeds 0.95. The phylogenetic relationship between the haplotypes is depicted in the form of an acyclic graph (network).
NCBI accession numbers of DNA sequences generated in this study are provided in the Supplementary Tables.

Njavara Is Distinct From the Canonical Sub-Groups Known in Rice
First, we examined the genetic interrelationship between Njavara and other rice cultivars. We analyzed 589 rice individuals, including 213 genetically distinct individuals chosen from 100 Njavara collections (Figure 1A and Supplementary Tables 2, 3a-c). In addition to the canonical ancestral sub-groups (Garris et al., 2005), a UPGMA cluster analysis of 589 rice individuals ( Figure 1B Table 2). The pattern of population sub-divisions produced by the Bayesian algorithm STRUCTURE in a set of increasing K-value is given in Figure 1C. Initially, all the Njavara individuals together produced a single cluster at K = 2. Three clusters were discernible within Njavara germplasm at K = 8. Overall population subdivisions produced by the STRUCTURE were corresponded well with the clustering patters in UPGMA dendrogram at K = 15 (Figures 1B,C, Supplementary Figure 1 and Supplementary Table 2). Of the six genotypic classes identified earlier in the Njavara germplasm (Sreejayan et al., 2011;Varghese et al., 2014), five classes, namely, Short black I (SB I), Short black III (SB III), and Intermediate yellow (IY), produced distinct groups, and the Short yellow (SY), and Long yellow (LY) genotypes produced distinct sub-groups under one major node (hereafter referred to as the LY/SY group) in the UPGMA dendrogram and the STRUCTURE bar plot (Figures 1B,C, Supplementary  Figure 1 and Supplementary Table 2). The pair-wise Fstvalues between Njavara and the canonical sub-groups differed significantly (p < 0.001) (Supplementary Table 5b). In addition, the STRUCTURE analysis classified Njavara individuals into respective genotypic classes with a high probability (< 90%) (Supplementary Table 6), suggesting the genetic purity of Njavara individuals. Together, these results illustrate that the Njavara represents a clade distinct from the known ancestral sub-groups (Garris et al., 2005) in the rice gene pool.
A total of 43 individuals from the Njavara collections showed high genetic affinity to different canonical ancestral sub-groups ( Supplementary Figure 1 and Supplementary Tables 2, 6) and will hereafter be referred to as "Njavara like" individuals. Short black II, the sixth genotypic class identified in the previous study (Sreejayan et al., 2011), was identified as aus in the present study ( Supplementary Figure 1 and Supplementary Table 6). Hence, we included this class also under Njavara-like individuals.

Njavara Is Likely an Extant Early Domesticate
Next, we examined the phylogeographic evolutionary history of Njavara following a multiple gene genealogical approach. For this purpose, we retrieved sequences of a set of 19 unlinked ESTbased STS loci (Supplementary Table 4b) from 186 samples, including 139 rice individuals and 47 accessions of O. rufipogon (Supplementary Tables 2, 3a,c,d). The test of neutrality using different parameters (Tajima's D, Fu and Li's D, and Fu and Li's F * statistics) showed that the STS loci used in the study follow neutral expectations (Supplementary Table 7). We combined the sequence of 19 STS loci (Supplementary Table 7) and generated a neighbor-joining phylogenetic tree (Figure 2). In the phylogenetic tree, the individuals belonging to the Njavara genotypic classes SB I, SB III, and LY/SY were separated into distinct sub-clades and entered into three separate major clades consisting of aus cultivars, whereas the individuals belonging to the IY genotype produced another distinct sub-clade and entered into another major clade consisting of japonica and aromatic cultivars (Figure 2).
Genealogical analyses were conducted for each STS locus separately. The number of haplotypes detected at a locus in the 186 samples varied between five in STS 087 and 31 in STS 085 (Supplementary Table 7). For most loci, the genealogical analysis produced a central haplotype shared among most of the cultivars belonging to the ancestral sub-groups, Njavara accessions and a set of O. rufipogon populations originating mostly from Central India to Southeast Asian countries and a few peripheral ones separated by few mutational steps (Figure 3 and Supplementary  Tables 8, 9). The high frequency central haplotypes are most likely ancestral ones and are probably fixed in the rice during the domestication processes. Genetic analysis using domestication related genes shows introgression of domestication alleles from fully domesticated japonica into other ancestral groups during their domestication processes (Molina et al., 2011;He et al., 2011;Huang X. et al., 2012;Yang et al., 2012). The possible introgression of alleles from japonica into rice accessions used in the analysis could not be determined as we did not include the domestication alleles in the analysis. A close genetic relationship between the haplotypes phased from O. rufipogon and the other rice cultivars were evident in the network (Figure 3). Gene flow from self-pollinating rice to out crossing O. rufipogon is wide spread in natural habitats (Song et al., 2006;Kim et al., 2016) and this provides reticulate relationships between O. rufipogon and cultivated rice in phylogenetic and genealogical analysis (Londo et al., 2006;Huang X. et al., 2012).
Thus, the genetic and gene genealogical treatments performed in the present study together with the results of the morphological analysis reported previously (Varghese et al., 2014) reveal two characteristic features of Njavara. First, a cohesiveness is observed between the four genotypic classes of Njavara ( Figure 1B and Supplementary Figure 1) at the genetic and morphological levels despite their distinct positions in the phylogenetic tree (Figure 2). Second, Njavara retains a distinct and smaller morphological architecture (Varghese et al., 2014). Both of these genetic characteristics must represent a corollary of an on-farm stabilizing selection traditionally performed by farmers on Njavara using 60-days maturity as the selection criterion (Sreejayan et al., 2011;Varghese et al., 2014). Because, the stabilizing selection leads to morphological convergence by preventing the selected trait from differentiating across environments (Latta, 1998) and deepens the genetic differentiation between the selected and non-selected populations by enhancing the divergence in loci experiencing balancing selection and linked neutral loci (Charlesworth et al., 1997). The previous study has reported significantly smaller (p < 0.05) characteristics with respect to the yield-related traits, including grain dimensions in Njavara than the syntopic traditional cultivars (Varghese et al., 2014). The heading date and several yield-related traits are co-regulated by pleiotropic quantitative trait loci (QTL) in rice (Xue et al., 2008;Wei et al., 2010;Zhang et al., 2010;Yan et al., 2011;Cai et al., 2012;Varghese et al., 2014). Therefore, in addition to the heading date, the stabilizing selection performed by farmers of Njavara using 60-days maturity has concomitantly preserved the allele complements that govern the morphological architecture also since ancient times, resulting in the retention of the small plant architecture that is presumably a characteristics of early domesticates (Fuller et al., 2010;Castillo et al., 2016). Thus, Njavara provides an example of how an "unconscious selection" (Zohary, 2004) performed by farmers preserved the genetic purity and morphological characteristics of a cultivar over millennia. However, on what basis people in the ancient days have specifically chosen short duration cultivars emanating from different phylogenetically distinct groups for medicinal applications is not understood from the present data. A comparative global metabolome profiling between Njavara and other cultivars may provide evidence to such queries.
Intriguingly, the Njavara individuals invariably had a nodeletion chloroplast genotype (Figure 2 and Supplementary Table 2). Most of the aus cultivars entered into the major clade that consisted of the SB I and SB III genotypes were also of the no-deletion type (Figure 2 and Supplementary Tables 3a,c). The Njavara genotypes produced many haplotypes shared only with aus (Supplementary Tables 8, 9). Recent research has increasingly shown that aus is genetically complex (Travis et al., 2015) and its origin is distinct (Kim et al., 2016;Wang et al., 2016). Results of the present study together with the results of Civán et al. (2016), who found that aromatic may be a product of hybridization between japonica and aus, prompt us to infer that Njavara is closest to aus and the no-deletion cultivars related to aus, such as Njavara, constitute one of the earliest domesticates of rice in India.

Geographic Affinity of Njavara Lies With Central India-Southeast Asia Region
The genealogical analysis suggests that the geographic center of origin of Njavara encompasses Central India to Southeast Asia because its genetic affinity mostly lies with the O. rufipogon Short black III (SB III) produced distinct sub-clades and are labeled in the tree. The ancestral sub-groups aus, indica, aromatic, and japonica, and the Njavara like and O. rufipogon were also formed distinct clades and are color coded. The no-deletion chloroplast is indicated by a closed circle at the end of the node, and the deletion type chloroplast by an open circle. Bootstrap values > 50% is given at the respective nodes. Sample name is given at the end of each node. The individual name and number, as in Supplementary Table 2, is used to name the Njavara individuals. The cultivar name and IRGC number followed by the RGCB collection serial number, as in Supplementary Table 3, is used for naming traditional cultivar and O. rufipogon, respectively. R or Or before the RGCB collection serial number indicates rice or O. rufipogon, respectively.
populations from this region (Supplementary Tables 8, 9). This finding raises a pertinent question: how did a rice strain domesticated in Central India-Southeast Asian region reach Kerala and become preserved in this state? In ancient India, the Gangetic plains (Figure 1), which encompassed Magadha (modern Bihar state in India), the seat of the Iron Age Maurya dynasty (ca. 323-185 B.C.), were an epicenter of rice cultivation (Fuller et al., 2010) and Buddhist doctrines (Zysk, 1998). The development and spread of Buddhism, Ayurveda, and rice cultivation are entwined (Zysk, 1998;Shaw et al., 2007;Ninivaggi, 2010). Mauryan Emperor Asoka (ca. 269-23 B.C.), who spearheaded the spread of Buddhism and Ayurveda (Zysk, 1998;Ninivaggi, 2010) proclaimed that Buddhist ascetics required the importation and planting of medicinal herbs wherever these herbs were not found (Zysk, 1998). Rice was introduced to South India during the spread of Buddhism (Shaw et al., 2007). Buddhism flourished in Kerala between the third century B.C. and the eighth century A.D. (Menon, 2007). Ayurveda also flourished in these periods (Menon, 2007) because Ayurveda medicine and healing were integral parts of Buddhist monasticism (Zysk, 1998). The people of Kerala have been ritualistically preserving Njavara (Sreejayan et al., 2011) which may have reached this state concomitant with Ayurveda. Njavara grain-based treatments, such as Shashtikapindasweda (Sanskrit) or Njavarakizhi (Malayalam) and Shashtikaannalepa (Sanskrit) or Njavaratheppu (Malayalam), that belong to a group of treatments called Keraliya Panca Karma (Kerala specialty of Panca Karma treatments) and developed historically in Kerala by Ashtavaidyas are continued to be the acclaimed therapies for various illnesses even today (Singh, 1992;Sreejayan et al., 2011).

CONCLUSION
Our study circumscribed Njavara gene pool in rice genetic resources, and demonstrates the value of linguistic-genetic approaches in the domestication research of rice. Njavara is distinct from the known canonical ancestral subgroups in rice (Garris et al., 2005) and is originated in the central India-Southeast Asia region. It is likely a genetically pure extant early domesticate still being cultivated in India. Also, Njavara provide a rare example of traditional prudence in on-farm selection that preserved a cultivar in pure form over millennia because of its applications in healthcare. Njavara may constitute a good organism to investigate the genomic changes associated with domestication and on-farm evolutionary trajectories in rice. The circumscription of the Njavara gene pool achieved in this study may give an impetus to the discovery of genes for the biofortification of rice because, consistent with the traditional claims (Bhishagratna, 1907;Van Loon, 2003), empirical research has recently reported medicinally active and nutritionally rich molecules in Njavara grains (Deepa et al., 2008;Jung et al., 2014).

AUTHOR CONTRIBUTIONS
GT designed and supervised this study. MJ, RDR, and RM performed the microsatellite and STS genotyping. MRV and JB conducted the statistical analysis. GV, RY, ONS, BCP, JCR, and SLK collected the rice germplasm.

FUNDING
This research was supported by the intra-mural grant of the Rajiv Gandhi Center for Biotechnology.