Transcriptome Analysis Reveals Key Pathways and Candidate Genes Controlling Seed Development and Size in Ricebean (Vigna umbellata)

Ricebean (Vigna umbellata) is a lesser known pulse with well-recognized potential. Recently, it has emerged as a legume with endowed nutritional potential because of high concentration of quality protein and other vital nutrients in its seeds. However, the genes and pathways involved in regulating seed development and size are not understood in this crop. In our study, we analyzed the transcriptome of two genotypes with contrasting grain size (IC426787: large seeded and IC552985: small seeded) at two different time points, namely, 5 and 10 days post-anthesis (DPA). The bold seeded genotype across the time points (B5_B10) revealed 6,928 differentially expressed genes (DEGs), whereas the small seeded genotype across the time point (S5_S10) contributed to 14,544 DEGs. We have also identified several candidate genes for seed development–related traits like seed size and 100-seed weight. On the basis of similarity search and domain analysis, some candidate genes (PHO1, cytokinin dehydrogenase, A-type cytokinin, and ARR response negative regulator) related to 100-seed weight and seed size showed downregulation in the small seeded genotype. The MapMan and KEGG analysis confirmed that auxin and cytokinin pathways varied in both the contrasting genotypes and can therefore be the regulators of the seed size and other seed development–related traits in ricebeans. A total of 51 genes encoding SCF TIR1/AFB , Aux/IAA, ARFs, E3 ubiquitin transferase enzyme, and 26S proteasome showing distinct expression dynamics in bold and small genotypes were also identified. We have also validated randomly selected SSR markers in eight accessions of the Vigna species (V. umbellata: 6; Vigna radiata: 1; and Vigna mungo: 1). Cross-species transferability pattern of ricebean–derived SSR markers was higher in V. radiata (73.08%) than V. mungo (50%). To the best of our knowledge, this is the first transcriptomic study conducted in this crop to understand the molecular basis of any trait. It would provide us a comprehensive understanding of the complex transcriptome dynamics during the seed development and gene regulatory mechanism of the seed size determination in ricebeans.


INTRODUCTION
A rapid increase in the human population, which is expected to reach 9.7 billion by 2050, is one of the biggest challenges of this world (Gu et al., 2021). To ensure food and nutritional security to the ever-growing human population, it is extremely important to bring underutilized and neglected crops into mainstream agriculture. Owing to its short growth duration and ability to thrive well in stress conditions and various soil types, ricebean (Vigna umbellata) is one such crop which has huge potential to sustain food and nutritional security in most parts of the world (Pattanayak et al., 2019). It is a diploid (2n 2× 22), warmseason annual legume with a genome size of approximately 440 Mb (Kaul et al., 2019). Ricebean is mainly cultivated in Nepal, Bhutan, Northeast India up to Myanmar, Southern China, Northern Thailand, Laos, Vietnam, Indonesia, and East Timor (Tian et al., 2013), where it constitutes an important source of protein for the sizable population and contributes to household food and nutritional security. The observed protein content in ricebean is 25.57% with high concentration of various essential amino acids. Besides protein, ricebean grains also contain a significant amount of other nutrients such as carbohydrates, fiber, minerals, vitamins, and fatty acids. Moreover, ricebean is a rich source of unsaturated fatty acids like linoleic and linolenic acids (Katoch, 2013).
The aforementioned transcriptome-based gene expression analysis has provided a robust functional genomics resource for deciphering gene networks and candidate genes regulating various biological processes in crop plants. For minor crops with poorly characterized genomes, like ricebean, such detailed transcriptome analysis will provide comprehensive information about expression patterns of genes and molecular mechanisms governing traits of economic importance. This valuable information can further be employed for the development of functional markers for gene and QTL mapping. Therefore, in the present study, we conducted transcriptome analyses to investigate gene expression networks and identify the candidate genes controlling seed size variation in ricebean. RNA sequencing of two contrasting ricebean genotypes was performed at early development stages (i.e., 5 and 10 DPA). The study provides detailed insights into various gene networks and their potential roles in determining seed size. Furthermore, the study also identified simple sequence repeat (SSR) motifs that could be used for molecular mapping of seed size/weight and other related traits.

Plant Material and Growth Conditions
Seeds of two contrasting ricebean genotypes, namely, IC426787 (bold seeded) and IC552985 (small seeded) were obtained from ICAR-National Bureau of Plant Genetic Resources (NBPGR), New Delhi (Figure 1). On the basis of the 2-year trial (2018 and 2019), the average 100-seed weight of IC426787 and IC552985 was 13.20 and 3.87 gm, respectively. Plants were grown in a nethouse at ICAR-NBPGR, New Delhi (latitude: 28°38′56″N, longitude: 77°9′8″E, altitude: 228 mean sea level (msl)), during Kharif (rainy) season 2020. During pod filling, the minimum FIGURE 1 | Two contrasting genotypes of ricebean, that is, IC426787 (bold size) and IC552985 (small size), selected for the transcriptome analysis on the basis of their seed size. temperature ranged from 10.8 to 23°C, maximum temperature ranged from 30.4 to 36°C, and average RH% varied from 53 to 56. The ricebean pod filling duration varied from 20 to 30 DPA depending upon the genotype. Genotypes with smaller grain size took comparatively less pod filling time than the genotypes having larger grain size. Three biological replicates of pod samples were harvested from three full-grown plants of both genotypes at 5 and 10 DPA each. The seeds were separated and immediately frozen in liquid nitrogen and stored at −80°C for future use. A total of 12 samples were prepared for the construction of RNA libraries.

RNA Extraction, Library Preparation, and Sequencing
The Pure Link RNA Mini Kit (Ambion, United States) was used to extract RNA from the frozen samples. The total RNA quality was checked using the RNA 6000 Nano Kit (Agilent Technologies, United States) on a 2100 Bioanalyzer (Agilent Technologies, United States), with a minimum RNA integrity number (RIN) of 7. RNA concentrations were determined with a NanoDrop ND-8000 spectrophotometer (Nano-Drop Technologies, Thermo scientific, Wilmington, DE). RNA-Seq libraries for all samples were prepared using the NEBNext UltraII RNA library preparation kit for Illumina; Cat no: E7770 (New England Biolabs), according to manufacturers recommended protocol, and sequencing was done in a single HiSeq 4000 lane using 150 bp paired-end chemistry. Briefly, total RNA was used to purify poly (A) messenger RNA (mRNA) using oligo-dT labeled magnetic beads. Then, the isolated mRNA was fragmented into 200 to 500 bp pieces in the presence of divalent cations at 94°C for 5 min using an ultrasonicator. The cleaved RNA fragments were copied into first-strand cDNA using SuperScript-II reverse transcriptase (Life Technologies, Inc.) and random primers. After second-strand cDNA synthesis, fragments were end-repaired and A-tailed, and indexed adapters were ligated. The products were purified and enriched with PCR to create the final cDNA library. The tagged cDNA libraries were pooled in equal ratios and used for 2 × 150 bp paired-end sequencing on a single lane of the Illumina HiSeq 4000. Illumina clusters were generated and loaded onto the Illumina Flow Cell on the Illumina HiSeq 4000 instrument, and sequencing was carried out using 2 × 150 bp paired-end chemistry. After sequencing, the samples were demultiplexed, and the indexed adapter sequences were trimmed using CASAVA v1.8.2 software (Illumina Inc.).

Read Quality and Adapter Removal
Raw reads of ricebean were evaluated for their quality using FASTQC v0.11.8 package (http://www.bioinformatics.bbsrc.ac. uk/projects/fastqc/). Four parameters were considered: base quality score distribution, sequence quality score distribution, average base content per read, and GC distribution in the reads. Trimmomatic v0.36 was applied to remove the adapter and trim the low-quality reads (trimming includes reads with or without ambiguous sequence "N") using default parameters. To correct the random sequencing errors in Illumina RNA-Seq reads, Rcorrector v1.0.3 was used. Clean reads were also checked for their quality using FASTQC only.

RNA-Seq De Novo Assembly and Transcriptome Assessment
The obtained clean reads of all 12 samples were assembled using Trinity v2.4.0 with the paired-end model and default K-mer value of 25. The de novo assembly was merged and clustered using CDHIT v4.0 to get non-redundant sequences. Furthermore, these non-redundant sequences were made transcripts using the trinity in-built script. The clean reads of each sample were mapped back to the de novo assembled genome through BWA-mem software with default parameters. The BAM files were handled by samtools. The number of reads mapped to genes was calculated using samtools v0.1.19. The expression difference of each transcript between different samples was calculated using DESeq2 R package. False discovery rate (FDR) values less than 0.01 and |log2 (fold change)| ≥2 were considered significant differences at the expression level. The transcript abundance was normalized by the fragments per kilobase of transcript per million mapped reads (FPKM) value.

Gene Functional Analysis
To annotate the assembled transcripts, sequences were aligned by BLASTX (e-value <1e −5 ) to protein databases, including the nonredundant protein (NR) database, Swiss-Prot, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database. A GO enrichment analysis was conducted for the transcripts according to biological process, cellular component, and molecular function ontologies using Blast2GO software (Liu et al., 2013;Calzadilla et al., 2016). The GO annotation functional classifications were determined using WEGO software for the distribution of gene functions (Ye et al., 2006). GO functional enrichment and KEGG pathway enrichment analysis were also tested at a significance cutoff of p-value. All the p-values were adjusted with the criterion of Bonferroni correction. We selected the corrected p-value of 0.05 as the threshold to determine significant enrichment terms of the gene sets. The MapMan analysis was also conducted to provide a graphical overview of the metabolic and regulatory pathways for the detected genes using the MapMan tool, and the mapping file of ricebean for all the samples was generated using the Mercator tool.

Candidate Gene Identification and Their Domain Analysis
The candidate genes for seed development-related traits were identified on the basis of similarity (BLASTX with similarity >80% and e-value <0.001) with genes responsible for similar traits in other species, including Arabidopsis, Phaseolus vulgaris, and Vigna species (V. radiata and V. angularis). Furthermore, the candidate genes were also validated in silico on the basis of their domain analysis. The amino acid sequences of the identified candidate genes were predicted and compared against the Pfam protein database using HMMER 3.0 (e-value ≤ 1e −10 ) to obtain candidate gene domain/family annotation information. A heatmap was also generated for the candidate genes on the basis of their expression in both the genotypes at different times of development. The heatmap was made using an in-house R script.

Simple Sequence Repeats Identification and Primer Design
The MIcroSAtellite (MISA) search engine (http://pgrc.ipkgatersleben.de/misa) was employed for the identification of SSRs. The minimum numbers of repeats used for selecting the SSRs were ten for mononucleotide-based loci, six for dinucleotide loci, five for trinucleotide loci, and three for all larger repeat types (tetra-to hexanucleotide motifs). For validation, 50 SSR motifs were randomly selected, that is, 25 for dinucleotide and trinucleotide each. The primers for these selected SSR motifs were designed based on flanking sequences using Primer3 software (http://sourceforge.net/projects/primer3) with targeted size of PCR products in the range between 100 and 300 bp, primer length between 18 and 22 bp, GC content between 40 and 70, and melting temperature of 50-60°C.

Simple Sequence Repeats Validation
DNA was isolated from young leaves of eight accessions of Vigna species including V. umbellata (6), V. mungo (1), and V. radiata (1) by following the protocol described in the DNeasy plant mini kit (QIAGEN, Hilden, Germany). DNA concentration was measured using a NanoDrop ™ 2000 spectrophotometer (Thermo, United States), and DNA quality was analyzed using 0.8% agarose gel. A working stock of DNA was (10 ng/µl) prepared with nuclease-free water for polymerase chain reaction (PCR) for SSR amplification.
For the SSR amplification, 20 µl reaction mixture containing 4 µl genomic DNA (40 ng), 10 µl Taq Polymerase 2X Master Mix (United States), 0.8 µl primers (10 pM), and 5.2 µl nuclease-free water were used. For the amplification, the following thermal conditions were carried out: initial denaturation of 94°C for 3 min, then 35 cycles of 94°C for 30 s, primer annealing at 55°C for 45 s, extension at 72°C for 1 min, and final extension for 10 min. PCR products were separated using high-resolution metaphor agarose gel (3%) electrophoresis. Furthermore, the dendrogram of the genotypes was generated using the hierarchical clustering algorithm in DARwin v6.0.21 software (https://darwin.cirad.fr/).

Transcriptome Sequencing and De Novo Assembly
To obtain a comprehensive transcriptome profile of ricebean 12 RNA libraries were sequenced, and a total of 94.35 Gb raw data were generated. For these 12 samples, approximately, 98.50-99.80% of reads passed the quality control, and 98.60-99.60% of the clean reads were mapped back to the de novo assembled ricebean genome. On average, raw data of the seed transcriptome at 5 DPA and 10 DPA had 50.33 and 48.66% GC content, respectively, while after trimming, the GC content of clean data at 5 DPA and 10 DPA was 48.66 and 49.33%, respectively, which is similar to the GC content reported in the previous study of ricebean (Chen et al., 2016; Table 1).
The obtained clean reads of all 12 samples were assembled using Trinity (v2.4.0) with default parameters. The assembled transcriptome consists of 218,486 super transcripts with an N50 value of 1,041. The number of transcripts generated in the current study is comparable to previous studies. In terms of N50, the ricebean had a higher N50 value than field pea (781) and chickpea (441) (Pradhan et al., 2014;Sudheesh et al., 2015) and less value than mungbean, common bean, and adzuki bean (Hiz et al., 2014;Chen et al., 2015b;Chen et al., 2016). These results indicate the good quality of ricebean transcriptome.
The lengths of the transcripts ranged from 201 to 15,828 bp, with an average length of 669 bp, which is less than other Vigna species like cowpea (871 bp) and mungbean (874 bp) but more than that of black gram (443 bp) (Chen et al., 2015bSouframanien and Reddy, 2015). Of these transcripts, 146,622 (67.11%) were 201-500 bp; 39,620 (18.13%) were 501-1,000 bp; 12,654 (5.79%) were 1,001-1,500 bp; 6,511 (2.98%) were 1,501-2,000 bp; 3,986 (1.82%) were 2,001-2,500 bp; 2,567 (1.17%) were 2,501-3,000 bp; and 6,526 (2.99%) were more than 3,000 bp in length ( Figure 2). The developed assembly showed ∼100% back mapping of total and important reads, and this shows that our assembly had vast and proper mapping quality for the generated reads. The high percentage of reads mapping back to the de novo assembled transcriptome is a quality metric that provides an assessment of assembly entirety (Hornett and Wheat, 2012).

Differential Expression Analysis
In this study, a comprehensive transcriptome analysis has been performed with the aim to reveal those gene expression changes that, independently of the genotype diversity, are involved in controlling seed size in ricebean. Comparative transcriptome analysis was performed between two genotypes with contrasting seed size at two time points, namely, 5 and 10 DPA. A similar type of study using two genotypes with a contrasting seed size has also been done in the peanut (Li et al., 2019b). The expression profile was checked for the individual genotypes across the time points (B5_B10, S5_S10) as well as between the genotypes at each time point (B5_S5, B10_S10) (Supplementary Table S1). False discovery rate (FDR) values less than 0.01 and |log2 (fold change) |≥2 were considered significant differences at the expression level.
While evaluating the expression difference individually for the bold genotype across the time point (B5_B10), 6,928 differentially expressed genes were identified. In B5_B10, the number of upregulated genes (6,284) were higher than downregulated genes (644), suggesting that these upregulated DEGs might be responsible for the increase in seed size. Similarly, a small genotype across the time point (S5_S10) contributed to 14,544 DEGs (Figure3A; Table 2). In contrast to B5_B10 expression results, S5_S10 had a high number of downregulated genes (7,862) in comparison with the upregulated genes (6,682), indicating that these downregulated genes might be repressing any transcriptional activity or downstream pathways resulting in the small size of ricebean seeds (Li et al., 2019b). To gain a better understanding of molecular processes/regulatory networks associated with the seed size in ricebeans, the pattern of differentially expressed genes was analyzed between genotypes in each time point and across the time point using a Venn diagram ( Figure 3B).
We also identified common genes between the individual genotype across time points (i.e., B5_B10 and S5_S10) as well as between the genotypes at each time point (B5_S5 and B10_S10). In case of B5_B10 and S5_S10, in total, 2091 DEGs were common. On the other hand, 850 DEGs were common between B5_S5 and B10_S10 (Supplementary Table S2). The comparative gene expression analysis indicated that a relatively large amount of the transcriptional program operating during seed development or maturation is shared between both the genotypes. The same results have been observed in the case of common bean, where 2,487 DEGs were shared by two contrasting genotypes (González et al., 2021).

Gene Ontology Analysis of the Transcriptome
To infer the biological processes and the functions related to seed development stages, gene ontology analysis was conducted for differentially expressed genes in terms of their biological involvement, target cellular component, and molecular Frontiers in Genetics | www.frontiersin.org January 2022 | Volume 12 | Article 791355 5 function using Blast2GO. Out of total 33,880 DEGs, 16,002 DEGs contributed to GO terms. In core GO annotation, 7,002 (25.37%) genes annotated for biological process (BP), 12,069 (43.72%) for molecular function (MF), and 8,533 (30.91%) for cellular components (CC). The highest number of GO terms were observed in the case of S5_S10 (44.10%), followed by B5_B10 (23.89%), B10_S10 (16.91%), and B5_S5 (15.09%).
In case of the bold genotype across the time point (B5_B10), out of 6,928 DEGs, only 3,764 were annotated, constituting 1,696, 2,046, and 2,864 GO terms for BP, CC, and MF, respectively. However, in S5_S10, we observed 7,158 annotated DEGs from 14,544 DEGs and 2,954, 3,909, and 5,312 GO terms for BP, CC, and MF, respectively. On the other hand, in the case of between the genotype at the first time point (B5_S5), 2,537 DEGs were found to be annotated as compared B10_S10, in which 7,158 DEGs were annotated. In case of BP, 1,075 and 2,954 GO terms were identified in B5_S5 and S5_S10. Similarly, 2,046 and 3,909 GO terms were found for the cellular component function in B5_S5 and S5_S10, respectively, whereas in the case of molecular function, B5_S5 and S5_S10 consisted of 1,875 and 5,312 GO terms, respectively (Supplementary Table S3).
We have also illustrated the top or enriched functions in terms of BP, MF, and CC for both the genotypes. For example, the top biological activities include "cellular process," "nitrogen compound metabolic process," "small molecule metabolic process," "cellular component organization," "regulation of metabolic process," "response to stress," "cell wall organization," cellular response to stimulus," and developmental process. All these results indicated the biological process of DEGs vary over a broad range of terms. These enriched GO terms for BP indicate that hormone and environment stimuli played a vital role in ricebean seed/pod development. A similar type of results was also found in peanut pod development (Zhu et al., 2014).
Similarly, in the case of MF, bold and small genotypes were identified to be involved in "binding," "metabolic processes," "organic cyclic compound binding," "heterocyclic compound binding," "ion binding," "transferase activity," and "biosynthetic processes." However, on the other hand, cellular component activities include "catalytic activity," "membrane," "membrane part," "intrinsic component of the brain," and "intracellular" and cellular activities" (Figure 4). Similar results for MF and CC were observed in the pod development of peanuts (Zhu et al., 2014).

MapMan Analysis
For comprehensive assessment of gene expression network dynamics in a developing seed of bold and small genotypes, identified DEGs were mapped onto metabolic maps using the MapMan tool and categorized into BINS on the basis of their functional groups. We could observe various functional groups of genes activated at different stages of seed development.  Interestingly, we noticed a major variation between the bold and small genotypes with respect to genes related to important functions like those involved in different aspects of metabolism and signaling or regulation. A detailed analysis of genes expressed in these categories that actually distinguish the two genotypes was considered relevant, and a major emphasis was therefore given to the BINS in which the genotypes were found to be involved. This analysis allowed exploration of the global activation of specific metabolic pathways and gene regulatory networks activated during ricebean seed development.
For the whole ricebean transcriptome, we annotated 13,759 transcripts with MapMan BINS of known function after running the Mercator web tool. In total, these transcripts were classified into 29 BINS. The transcripts were expressed mainly in the following categories: carbohydrate metabolism (major and minor CHO metabolism), amino acid turnover, photosynthesis, secondary metabolism, and cell wall organization (Supplementary Table  S4). In the former categories, most of the transcripts were highly expressed in B5_B10, while downregulated in the case of S5_S10 ( Figure 5A). B5_B10 and S5_S10 shared 17 pathways, but only two pathways were found in B5_B10 such as RNA processing and polyamine metabolism, indicating that these two pathways triggered after 5 DPA. Similarly, while comparing expressed transcripts between the genotype at the same time points (i.e., B5_S5 and B10_S10), 18 categories were the same, except the polyamine metabolism which was detected only at the second time point, that is, 10 DPA, which also confirms our previous result that polyamines activate only in the case of bold genotype after 5 DPA of seed development ( Figure 6A).
In our study, photosynthesis-related genes were highly enriched in bold genotypes in comparison with the small genotype which is in similarity with the previously published reports (Zhu et al., 2014;Clevenger et al., 2016;Sinha et al., 2020). The main role of photosynthesis in seed development is reported to increase the internal oxygen content and to control biosynthetic fluxes by improving the energy supply (Borisjuk et al., 2004), and it can also affect the metabolism in a number of distinct ways (Ruuska et al., 2004). Our results indicate that many metabolic genes are most active during ricebean seed filling, which aligns with previous studies on M. truncatula and P. sativa where approximately half of the seed-regulated genes were assigned to metabolic pathways (Benedito et al., 2008;Liu et al., 2015).
Furthermore, some DEGs are also mapped to hormone metabolic pathways. Majority of the genes associated with  Table S5; Figure 5B), whereas in case of B5_S5 and B10_S10, mixed expression of phytohormones was observed ( Figure 6B); a complex regulatory network triggers the initiation of seed development, maturation, and accumulation of storage products. Several studies suggested the vital role of phytohormones in pod and seed development (Zhu et al., 2014;Huang et al., 2017;Wan et al., 2017;Kumar et al., 2019;Sinha et al., 2020). In 2017, a study demonstrated the role of phytohormones in various aspects of plant hormone homeostasis including biosynthesis, metabolism, receptor, and signal transduction (Xu and Huang, 2017).

Kyoto Encyclopedia of Genes and Genomes Pathway Analysis
The KEGG pathway enrichment analysis was conducted for two contrasting genotypes at both time points (i.e., 5 DPA and 10 DPA) at a p-value <0.05 using the KEGG database server. The KEGG pathway enrichment analysis indicated that 7,178 transcripts obtained hits in the KEGG database, and those transcripts were associated with 106 unique pathways. The 7,178 transcripts included 3,112, 434, 2,103, and 1,529 transcripts with respect to B5_B10, B5_S5, B10_S10, and S5_S10, respectively. The pathway enrichment analysis of DEG conducted between different combinations, B5_B10, B5_S5, B10_S10, and S5_S10, revealed involvement the of 7, 52, 458, and 35 pathways, respectively. In case of B5_B10 and S5_S10, from the top 10 pathways, four pathways, namely, biosynthesis of secondary metabolites, protein processing in the endoplasmic reticulum, plant-pathogen interaction, and starch and sucrose metabolism were common. On the other hand, between the genotypes at both the time points (i.e., B5_S5 and B10_S10), only one pathway i.e., metabolic pathway-was common. The top 10 pathways among the time points for both genotypes as well as between the genotypes at both the time points are represented in Figure 7.
In KEGG pathway-based analysis, we observed a clear difference in the expression of some phytohormones which regulates seed development, including auxin, cytokinin, gibberellin, and ethylene. The differential expression of these Frontiers in Genetics | www.frontiersin.org January 2022 | Volume 12 | Article 791355 8 phytohormones was also observed in our MapMan analysis. This was not surprising since phytohormones control or influence all aspects of plant growth and reproduction, including seed germination, growth of roots, stems and leaves, plant flowering, seed development, seed fill, and seed dormancy. The expression pattern of key genes involved in biosynthesis and signaling of important phytohormones was compared between small and bold seeded genotypes for their possible role in determining seed size.

Auxin Pathway
Auxin regulates many aspects of plant growth and development, including embryogenesis (Möller and Weijers, 2009), the architecture of the root system (Benková et al., 2003), gravitropism (Rashotte et al., 2003), phototropism (Blakeslee et al., 2004), initiation and radial positioning of plant lateral organs, and cell elongation (Reinhardt et al., 2000;Christian et al., 2006). Auxin is sensed by its receptor protein such as TRANSPORT INHIBITOR RESPONSE 1/AUXIN-SIGNALING F-BOX proteins (TIR1/AFBs) which mediate the auxin signaling pathway and centered on a ubiquitin-dependent Skp1-Cullin-F-box (SCF) TIR1/AFBs protein complex to regulate the Aux/IAAs-ARFs flow (Leyser, 2003; Figure 8A). The TIR receptor protein confers substrate specificity and target-specific Aux/IAA proteins for degradation via the SCF TIR1/AFBs protein complex, in the presence of auxin. The degradation of Aux/IAA leads to switching on transcriptional expression of a range of genes including auxin responsive factors (ARFs) which in turn regulate the expression of several other genes that have a role in auxin-mediated plant growth and development.
The KEGG pathway expression-based analysis revealed a clear difference in the auxin signaling pathway in two contrasting ricebean genotypes, which is also in accordance with our MapMan results where auxin signaling related genes showed higher expression in the bold genotype than the small genotype. We found approximately 51 genes encoding SCF TIR1/AFB , Aux/IAA, ARFs, E3 ubiquitin transferase enzyme, and 26S proteasome, showing distinct expression dynamics in bold (B5_B10) and small (S5_S10) genotypes (Supplementary Table S6). The three key signaling elements TIR1/AFBs, Aux/IAAs, and ARFs have also been identified in different species including Arabidopsis (Chapman and Estelle, 2009), populus (Kalluri et al., 2007), and rice (Jain et al., 2005;Parry et al., 2009;Shen et al., 2010). Similarly, several studies focused on the role of AUX/IAA in determining the seed size with the influence of the expression of a gene in AUX biosynthesis (ZmTar3, ZmTar1, and ZmYuc1) and signaling (auxin efflux carriers, PIN, and ARF2) (Schruff et al., 2006;Bernardi et al., 2016). Homologs of ZmYuc1, PIN, and ARF2 were significantly differentially expressed during tartary buckwheat seed development (Huang et al., 2017). The high expression of DEGs in bold genotypes corresponds to cell division, and expansion is faster to form larger size seeds at these stages. The upregulation of SCF TIR1 , E3 ubiquitin transferase enzyme, and 26S proteasome was found in the bold genotypes, indicating the degradation of Aux/IAA and release of ARFs to modulate the expression of their target genes including SMALL AUXIN UP RNA (SAUR), Gretchen Hagen 3 (GH3), and indole-3-acetic acid-inducible gene (Aux/IAA), while in case of small genotypes, SCF TIR1 was not expressed, but TOPLESS (TPL) gene was upregulated, suggesting that Aux/IAA might have formed the complex with ARFs to block the transcriptional activity ( Figure 8A; Hayashi, 2012).The induction of auxininducible acyl amidosynthetases, GH3, by the ARF family is the early event of auxin signaling cascade (Zhang et al., 2016). The expression of GH3 gene was upregulated in the case of bold genotypes, while it was downregulated in the small genotypes. SAUR expression was upregulated in small genotypes, while downregulated in bold genotypes. The aforementioned results clearly inferred that the differential regulation of the auxin signaling pathway in bold and small genotypes might be the main factor contributing to the variation in ricebean seed size (Bai et al., 2019).
Cytokinins have been reported to function in seed development, such as seed size, seed yield, embryonic growth, with the involvement of genes encoding isopentenyl transferase (IPT), cytokinin oxidase/dehydrogenase (CKX), and histidine kinase (HK) (Bartrina et al., 2011). In our study, we have also found the expression of genes such as IPT, CKX, and HK. IPT upregulation was observed only in the case of small genotypes, whereas CKX was upregulated in bold genotypes, and mixed expression of HK was noticed in both the genotypes ( Figure 8B; Supplementary Table S7). The upregulation of CKX in bold seed genotypes hints at its possible role in determining the seed size. The CKX proteins are widely distributed in plants and implicated in various plant growth and developmental processes by maintaining the endogenous cytokinin level via irreversible degradation. In plant tissues, the expression of the CKX genes is primarily regulated by the endogenous cytokinin level. Various past studies have shown the role of CKX genes in the regulation of the seed size and grain yield in different plant species. In Arabidopsis, a CKX family gene-encoded enzyme CYTOKININ OXIDASE 2 (CKX 2) has been demonstrated to be associated with large seed size via catalyzing irreversible degradation of cytokinin. Similarly, in rice, a Gn1a locus encoding for cytokinin oxidase/dehydrogenase (OsCKX2) is shown to be responsible for high grain yield (Ashikari et al., 2005). On the other hand, the expression of type-A Arabidopsis response regulator (type-A ARRs) genes that negatively regulates the cytokinin signaling was majorly detected in small genotypes. This suggested that type-A ARR genes may be repressing the cytokinin signaling pathway (Heyl and Schmülling, 2003;Lohar et al., 2004;Desbrosses and Stougaard, 2011). The inhibition of the cytokinin signaling pathway may contribute to plant and bacterial cell differentiation (Bromley et al., 2014). Mixed expression of AHP and type-B ARRs was found in both the genotypes. Phosphate transfer to type-B ARR proteins modulates the transcriptional changes in the nucleus and causes the expression of primary cytokinin response genes including the type-B ARRs.

Ethylene Pathway
Ethylene, an "aging" hormone, has been reported to control the development of plant seeds and grains in various species (Zhong et al., 2002;Hentrich et al., 2013;Huang et al., 2013;Guo et al., 2016).Molecular evidence demonstrated ethylene's role in the regulation of seed size and seed shape, in which genes in ethylene biosynthesis (EIN2, ERS1, and ETR1), signaling (CTR1, ETO1, ETR1, and EIN2), and catabolism (ACC deaminase) were involved (Robert et al., 2008;Walton et al., 2012). According to our results, the expression of ethylene receptors (ERS1/2) was higher in bold genotypes than small genotypes, whereas CTR1, a negative regulator of ethylene hormone showed contrasting expression with upregulation in small and downregulation in bold genotypes ( Figure 8C; Supplementary Table S8). In buckwheat, the differential expression of ERS1, ETO1, ETR1, etc. was observed (Huang et al., 2017). In case of bold genotypes, we have noticed the high expression of SIMKK, MPK6, EIN3-like transcription factors, and EIN2, indicating positive regulation of transcriptional response in the bold genotype. In case of small genotypes, the upregulation of ERFs depicted that ERF might have shown activity after the phosphorylation via the MPK3/6-cascade, which regulates the ethylene biosynthesis, and the expression of EIN3/EIL1 was not found which possibly indicates its degradation by ubiquitination. In our samples, we found a full cascade of gene expressions in bold genotypes, while in small genotypes, the expression of genes detour from the normal expression and opted a new route for the ethylene-inducible gene expression. Frontiers in Genetics | www.frontiersin.org January 2022 | Volume 12 | Article 791355

Gibberellin Pathway
Gibberellins (GAs) are well-known plant hormones that are widely involved in the growth and development processes. GAs, auxin, ABA, and ethylene have been involved in the regulation of seed development and pod maturation (Ziv and Kahana, 1988;Shlamovitz et al., 1995;Ozga et al., 2003). In case of bold genotypes, the expression of GA, DELLAs, and SCF-complex protein is upregulated, which indicates DELLA proteolysis; simultaneously, the upregulation of protein indeterminate domains (IDDs) and scarecrow-like proteins (SCLs) were also observed, which supports the feedback loop mechanism which regulate the GA signaling ( Figure 8D). According to the feedback loop mechanism, DELLA initiates the expression of downstream genes, including SCLs by IDD-mediated interaction with their promoters. The subsequent increased concentration of SCLs enhances the SCL3/IDD complex synthesis while decreasing the formation of the DELLA/IDD complex and consequent suppression of the expression of SCLs, which mediates the homeostatic regulation of the downstream genes, including positive regulation of SCLs and GA signaling. In case of small genotypes, the expression of SCF complex protein was absent, while the expression of DELLAs was unregulated. Consecutively, we observed the SCL protein script, while IDD protein was completely absent. The expression of the phytochrome-interacting factor (PIF) protein was found, which indicates the DELLA-mediated inhibition of hypocotyls elongation (Supplementary Table S9).
Previous studies have revealed that genes encoding GA2 oxidase and GA3 oxidase in the GA biosynthesis pathway can affect seed development, starch biosynthesis, embryo, and seed coat development (Nakayama et al., 2002;Singh et al., 2002). The downregulation of GA2-oxidase was observed in our results similar to a study of the tartary buckwheat in which the downregulation of GA2-oxidase was also depicted during seed development (Huang et al., 2017).
The KEGG pathway and the MapMan analysis suggested the differential expression of phytohormone biosynthesis or response genes. According to the MapMan analysis, auxin, cytokinin, ethylene, and gibberellin showed contrasting expressions in both the genotypes ( Figure 5B). Similarly, in terms of the KEGG pathway, we have observed how the signaling pathways of these phytohormones were different. The present work confirms that auxin, cytokinin, ethylene, and gibberellin are the important regulators of the seed size in ricebean. Our results are also in accordance with those of previous studies in other species (Riefler et al., 2006;Heyl et al., 2012).

Candidate Gene Identification for Seed Development-Related Traits
The expression of a number of genes starting from the anthesis to early stages of maturity may have a crucial role in determining grain size and various other pod-related traits in pulses (Pazhamala et al., 2016). In this study, candidate genes for various traits such as days of flowering, pod shattering, seed per pod, seed size, 100-seed weight, and pod length were identified from the assembled transcriptome on the basis of sequence similarity search. In total, we identified 142 genes in ricebean belonging to development-related traits on the basis of similarity search (BLASTX) and e-value. Furthermore, the candidate genes were also characterized in silico on the basis of their domain analysis using Pfam software. Out of 142 genes, only 120 genes showed domain similarity with their hits. Therefore, we discarded 22 genes whose domain was not matched. Hence, according to our study, we found 120 candidate genes of ricebean belonging to different development-related traits (days of flowering, pod shattering, seed per pod, seed size, 100-seed weight, and seed length) (Figure 9; Supplementary Table S10).
In terms of pod development, seed size is a key determinant for the seed or grain yield in legume crops (Amkul et al., 2020). In ricebean, we found four candidate genes for seed size encoding: histidine kinase 2, delta sterol reductase, phosphate transporter (PHO1), and WRKY domain-containing protein (WRKY 40). These genes have already been reported to be involved in seed size. For example, Vigun05g039600 (PHO1) has been reported to be a positive regulator of seed development that affects both the cell size and cell number . Similarly, Vigun08g217000 which codes for histidine kinase 2 has been identified as a potential candidate gene for improved organ size during cowpea domestication (Lonardi et al., 2019), and its Arabidopsis ortholog AHK2 has been shown to regulate the seed size (Riefler et al., 2006;Bartrina et al., 2017). Vigun11g191300 encoding a delta (24)-sterol reductase is an ortholog of the Arabidopsis DIMINUTO gene which has been shown to regulate cell elongation (Takahashi et al., 1995). In foxtail millet, Loose Panicle1-encoded WRKY transcription factor regulates the seed size by increasing the length and width of the seed (Xiang et al., 2017). Hence, these genes are the strong candidates as seed size is affected by multiple pathways.
On the other hand, for 100-seed weight, 29 candidate genes were identified corresponding to expansin, cytokinin dehydrogenase, cytochrome P450, and response regulatory domain containing protein. The significance of these genes as candidate loci related with the 100-seed weight is supported by the work done on Arabidopsis, where orthologs of the candidate genes in the cytokinin pathway have been shown, in transgenic studies, to regulate seed size and/or weight (Daele et al., 2012). Our findings are also in accordance with the common bean in which type-B regulators were found to be involved in the activation of downstream genes in the cytokinin pathway, and the genes encoding cytokinin dehydrogenase regulates the pathway by degrading active cytokinin (Hwang et al., 2012;Schmutz et al., 2014). Likewise, in Arabidopsis, expansins increased grain size and also improved grain production (Bae et al., 2014). Recent studies have also associated expansins with grain size and weight in wheat and tomato (Muñoz and Calderini, 2015;Brinton et al., 2017). TaCYP78A3 in wheat and CYP78A5 in Arabidopsis encodes the cytochrome P450, which positively correlates with seed size and seed weight (Ma et al., 2015;Tian et al., 2016b).
Like other development-related traits, flowering time is also an important trait because several agronomical traits such as quality of the grain and grain yield depend on flowering time. For days to flowering trait, we identified 21 candidate genes in our dataset encoding protein Flowering Locus T-like, GIGANTEA-like, cryptochrome, and transcription factors such as bHLH, ERF, and PIF-3. Most of the candidate genes of days to flowering had high expression in case of 10 DPA, instead of 5 DPA. In rice, florigen is encoded by RICE FLOWERING LOCUS T 1 (RFT1) and the orthologs of Arabidopsis FT and plays important role in heading date, influencing yield traits in rice (Tamaki et al., 2007;Komiya et al., 2009), whereas GIGNANTEA-like genes observed in the regulation of many genes which influence the circadian clock, blue light photoreceptor, and flowering time have also been reported in Arabidopsis (Hayama et al., 2003;Fornara et al., 2009). Similar to rice results, the Flowering Locust T-like in ricebean might help in the improvement of yield.
On the other hand, the number of seeds per pod might be useful for increasing the seed yield of ricebean. We identified 15 candidate genes for seeds per pod trait having annotations like MAPK, NAC, MALE STERILE 5, and ABSCISIC ACID-INSENSITIVE 5-like protein. Vigun03g187300 (ABA-insensitive 5-like protein 6) is an ABA-responsive element (ABRE)-binding factor that regulates ABRE-dependent gene expression (Nakashima and Yamaguchi-Shinozaki, 2013). In Arabidopsis, ABA deficiency reportedly decreases the number of seeds per siliqua (Cheng et al., 2014).
Hence, the higher expression of this gene in bold genotypes implies an increase in the number of seeds per pod that could result in the improvement of the ricebean yield. The Vigun05g126900 gene, encoding MALE STERILE 5, was selected as a candidate gene in zombie pea (Amkul et al., 2020). In a previous study on Arabidopsis, mutations to MALE STERILE 5 resulted in the development of "polyads" (i.e., tetrads with more than four pools of chromosomes following male meiosis) (Glover et al., 1998). Plants that are homozygous for the MS5 recessive allele apparently revealed arrested growth and harvested empty siliques, whereas in plants that are heterozygous for MS5, siliqua elongation and seed set are less repressed (Glover et al., 1998). In case of the pod length, five candidate genes in ricebeans have been identified, mostly corresponding to the auxin response factor. Glyma.07G134800, an ortholog of Arabidopsis, was also associated with the auxin pathway (Jiang et al., 2018).
Furthermore, we have also identified a few candidate genes associated with pod shattering which is considered to be an undesirable agronomical trait. We identified maximum candidate genes (i.e., 57) for this trait in our ricebean study. Out of the 57 candidate genes, 18 genes encode transcription factors like AP2/ERF, FIGURE 9 | Heat map representing the differential gene expression of the identified candidate genes for six traits including seed size, 100-seed weight, seed/pod, days to flowering, pod shattering, and pod length in bold and small genotypes at 5 DPA and 10 DPA.
WRKY, and NAC, whereas the rest of the genes were involved in cellulose synthase and serine/threonine protein kinase. The candidate genes for pod shattering have also been identified in other legumes including Vigun02g095200 (cellulose synthase), Vigun03g306000 (NAC domain transcription factor), and zombi pea (Suanum et al., 2016;Lo et al., 2018;Takahashi et al., 2019;Amkul et al., 2020;Watcharatpong et al., 2020). In Sorghum propinquum, WRKY modulates the flower and seed development and lignin deposition, and it is also found to be involved in pod shattering . Recently, in rice, AP2 transcription factor-coding gene SHATTERING ABORTION1 (SHAT1) was observed having a crucial role in pod shattering. Two genes encoding NAC in Vigna unguiculata were found to be involved in cell wall biosynthesis and hence influencing the pod shattering (Zhou et al., 2012;Lo et al., 2018). The identification of pod shattering genes may reduce preharvest yield damages in ricebean, resulting in a more efficient yield. Thus, pod indehiscence may be a valuable trait during seed harvesting, making it a main concern during crop domestication (Amkul et al., 2020).
To support our findings related to candidate genes, we performed a comparative analysis of the identified candidate genes with our MapMan and KEGG pathway results. Out of 120 candidate genes, 23 genes matched with the MapMan results ( Table 3). For example, the expression of eight candidate genes of 100-seed weight and seed size were only shown in the small genotype encoding PHO1, cytokinin dehydrogenase, A-type cytokinin ARR response negative regulator, etc. Similarly, for bold genotypes, only one gene, aB10dtrinity_dn14585_c0_g1_i2, for a seed was upregulated, revealing a high number of pods in bold genotypes as compared to the small genotype. On the other hand, in terms of time point, three genes (cS5dtrinity_dn10996_c2_g4_i3: seed size; cS5dtrinity_dn11557_c0_g1_i4: seeds/pod; and aB10dtrinity_dn30303_c0_g10_i1: days to flowering) were detected only at the first time point, that is, 5 DPA. Two genes (aB10dtrinity_dn33078_c0_g1_i1: 100-seed weight and bS10d1trinity_dn10624_c1_g9_i1: pod shattering) were found to be highly expressed only in bold genotypes, whereas nine genes encoding alpha class expansins were found to be downregulated, specifically in the small genotype.
Similarly, 16 candidate genes (auxin: 2; cytokinin: 8; ethylene: 5; GA: 1) were matched with the KEGG pathway results ( Table 4). The matched genes were found to be associated with several seed development-related traits like pod length, days to flowering, 100-seed weight, seeds/ pod, and pod shattering. All the genes were expressed in the small genotype, except two (ab10dtrinity_dn29885_c1_g2_i2 and bs10dtrinity_dn12088_c3_g6_i6) which were expressed in bold genotypes corresponding to MAPK.

Simple Sequence Repeat Identification
In this study, we used the MISA Perl script (http://pgrc.ipkgatersleben.de/misa) to detect the microsatellites. Of the 288,393 transcripts generated in this study, 14,663 contained an SSR totaling 201,517,181 bp. Out of these 14,663 sequences, 2,317 sequences had more than a single SSR, and 1,487 had SSRs of Alpha-class expansin Dihydrolipoamide acetyltransferase component E2  Figure 10A). Similar results have been reported in the previous transcriptome published for ricebean varieties (Chen et al., 2016) as well as for other legume species including mungbean (Tian et al., 2016b), adzuki bean (Chankaew et al., 2014;Chen et al., 2015b), cowpea (Gupta et al., 2010), and chickpea (Choudhary et al., 2008).  The number of the given repeat unit of SSRs ranged from 5 to >10, and as the number of repeat units increased, the frequency of the given SSR structure progressively decreased (Supplementary Table S11). As for the two most abundant repeat motif types (diand trinucleotides), the frequency of the AG/CT motif type accounted for 17.41% in dinucleotide repeat motifs, and the frequency of GAA/TTC was the most abundant motif type in the trinucleotide, accounting for 6.3%. A previous study on adzuki bean also showed a high frequency of AG motifs in dinucleotides (Chankaew et al., 2014).

Simple Sequence Repeat Validation
To determine the polymorphism level of the identified EST-SSRs, the randomly selected 50 SSRs were evaluated in eight accessions of Vigna species including V. umbellata (6), V. mungo (1), and V. radiata (1) (Supplementary Table S12). From 50 pairs, 43 were successfully amplified, while seven pairs were not able to generate a PCR product (Supplementary Table S13). More than 85% of the SSR markers were successfully amplified, suggesting that the quality of our assembled transcripts was very high. The annealing temperatures of the primers ranged between 54 and 56°C. Out of these 43 SSR primer pairs, 26 pairs showed polymorphism (dinucleotide: 12, trinucleotide: 14) and the rest were monomorphic ( Figure 10B; Table 5). A high polymorphism level (60.46%) of ricebean EST-SSRs was observed in the selected set of eight accessions which was higher than that from previous reports in other legume species including the chickpea (47.3%) (Nayak et al., 2010), mungbean (33%) (Chen et al., 2015b), black gram (58.2%) (Souframanien and Reddy, 2015), and adzuki bean (7.6%) (Chen et al., 2015a) while lower than common bean (71.3%) (Hanai et al., 2007), whereas when we considered only ricebean genotypes (six accessions), only 34.88% SSR markers were found polymorphic. We have also checked the cross-species transferability pattern and found that the transferability of ricebean-derived SSR markers was higher in V. radiata (73.08%) than in V. mungo (50%). Various studies depicted the importance of SSR cross-transferability in Vigna species including ricebean, mungbean, and cowpea (Pattanayak et al., 2019). Furthermore, the genetic distance among the accessions was determined, and we found two clusters, with six (V. umbellata) in the first cluster and two (V. mungo and V. radiata) in the second cluster, respectively ( Figure 10C). SSR molecular markers on the basis of transcriptomes have become more promising and useful because of their high crossspecies transferability, their high rate of amplification, and being reasonably cheap as compared with the SSR markers of nontranscribed regions (Hansen et al., 2008;Rai et al., 2013). Moreover, since they can easily expose variance in the expressed portion of the genome, it is possible to evaluate marker-trait association (MTA) and specific genomic regions stating important physio-agronomic traits (Kalia et al., 2011).

CONCLUSION
The transcriptomic analysis in our study provided detailed insights into molecular processes and candidate genes controlling seed size and other seed development-related traits in ricebean. The MapMan and KEGG analyses confirmed that the phytohormone signaling pathways varied in both the contrasting expressions taken in this study and can therefore be the regulators of seed size as well as other seed development-related traits in ricebean. We hypothesize that the auxin, cytokinin, ethylene, and gibberellin signaling pathways interact cooperatively with one another, thereby modulating the expression of genes of seed development-related traits. Further research is required to identify key regulators/genes in determining seed size.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are publicly available in NCBI under accession number PRJNA765494.

AUTHOR CONTRIBUTIONS
SV contributed to investigation, formal analysis, and writing-original draft. SM helped with data curation, investigation, formal analysis, writing-original draft, and writing-review, and editing. GC contributed to resources and writing-review and editing. DW assisted with analysis and writing-review and editing. SP, DC, GP, DM, DJ, MS, and KS assisted with writing-review and editing. AS involved in conceptualization, supervision, resources, and writing-review and editing.

FUNDING
The financial support was received from the Department of Biotechnology under project: BT/Ag/Network/Pulses-1/2017-18.