Next Generation Sequencing Technologies: The Doorway to the Unexplored Genomics of Non-Model Plants

Non-model plants i.e., the species which have one or all of the characters such as long life cycle, difficulty to grow in the laboratory or poor fecundity, have been schemed out of sequencing projects earlier, due to high running cost of Sanger sequencing. Consequently, the information about their genomics and key biological processes are inadequate. However, the advent of fast and cost effective next generation sequencing (NGS) platforms in the recent past has enabled the unearthing of certain characteristic gene structures unique to these species. It has also aided in gaining insight about mechanisms underlying processes of gene expression and secondary metabolism as well as facilitated development of genomic resources for diversity characterization, evolutionary analysis and marker assisted breeding even without prior availability of genomic sequence information. In this review we explore how different Next Gen Sequencing platforms, as well as recent advances in NGS based high throughput genotyping technologies are rewarding efforts on de-novo whole genome/transcriptome sequencing, development of genome wide sequence based markers resources for improvement of non-model crops that are less costly than phenotyping.

Non-model plants i.e., the species which have one or all of the characters such as long life cycle, difficulty to grow in the laboratory or poor fecundity, have been schemed out of sequencing projects earlier, due to high running cost of Sanger sequencing. Consequently, the information about their genomics and key biological processes are inadequate. However, the advent of fast and cost effective next generation sequencing (NGS) platforms in the recent past has enabled the unearthing of certain characteristic gene structures unique to these species. It has also aided in gaining insight about mechanisms underlying processes of gene expression and secondary metabolism as well as facilitated development of genomic resources for diversity characterization, evolutionary analysis and marker assisted breeding even without prior availability of genomic sequence information. In this review we explore how different Next Gen Sequencing platforms, as well as recent advances in NGS based high throughput genotyping technologies are rewarding efforts on de-novo whole genome/transcriptome sequencing, development of genome wide sequence based markers resources for improvement of non-model crops that are less costly than phenotyping.

GENOMICS IN THE VIEWPOINT OF NON-MODEL PLANT SYSTEMS
Plant genomics, which entails the application of recombinant DNA technologies, sequencing methods, and Bioinformatics tools for assembling and assigning the function and structure of plant genomes, is a key to understanding their genome via determining the order of DNA sequences which sequentially enable exploring the evolution of plant genome structure and inferring molecular phylogeny. It also helps in fathoming the interaction of genes in controlling the organism growth, development and adaptation to their environment. Most of the information we have about mechanisms underlying plant biological processes come from investigations on "model plants, " commonly referred to as "plants" extensively studied at the whole genome level to elucidate various complex biological phenomena. High-throughput sequencing technologies are, however changing the approaches toward projects geared at genome sequencing, giving us a deeper understanding of plant biology by the generation of biologically important data sets from different plant species other than the model plants.
Prior to the development of next generation sequencing (NGS) in 2005 (Morozova and Marra, 2008;Schuster, 2008), nucleic acid sequencing for genomic studies was based on the Sanger method. This technique was successfully used to complete the human genome and the first sequenced plant genome, Arabidopsis thaliana (The Arabidopsis Genome Initiative) published in 2000. The authors, after outlining its many advantages for genome analysis, which include small size, homozygous nature, large number of offspring due to short gestation period and relatively small nuclear genomes, reported the plant as an important model system for identifying pathways genes and determining their functions. Some other plants sequenced using this first generation method and reported by Schatz et al. (2012) as models include Oryza sativa (rice) in 2002, Carica papaya (papaya) in 2008 and Zea mays (maize) in 2009. Further genome sequencing of other plants was therefore based on the idea that a single species among related plant species that share some similarities is chosen as a model, studied as a representative and information gathered can be applied to related organisms as required but Tagu et al. (2014) noted that model organisms are often not archetypal and do not replicate the biology of their close relatives or even the wide diversity of living mechanisms. Hirsch and Buell (2013) stated that the characteristics of the ideal plant genome are hinged on technological limitations of genome-sequencing and assembly methods, computation, and the desire for a whole genome sequence for downstream biological interpretations. Despite the successful use of Sanger technology in sequencing the model crops, its throughput and high cost posed some constraints to sequencing millions of plant species, especially those with large and complex genomes and this prompted high demand for new and improved sequencing technologies. In addition, several nonmodel plants are indispensable assets for food, feed, or energy resource with certain characteristics unique to them and thus intricate to study them by the use of a model plant (Carpentier et al., 2008); hence, genomics in these species was not known and posed some challenges until the recent progress made by the emergence of alternative sequencing platforms with increased throughput and lower sequencing cost collectively termed as NGS technologies.
This article is therefore an appraisal of the impact made by NGS technologies on the "genomics of non-model plants." It also tried to make out the future prospect of using these technologies in this group of plants.

GLIMPSE OF NEXT GENERATION SEQUENCING TECHNOLOGIES (NGS)
NGS incorporates technologies which at low cost and in short time produces millions of short DNA sequence read mostly in the range of 25 and 700 bp in length. According to Metzker (2010), they include a number of methods grouped broadly as template preparation, sequencing and imaging, and data analysis in which protocol distinguishes one technology from another and by the amount of the data produced from each platform. NGS has turned out to be a realistic method for maximizing sequencing in a large number of non-model plants while reducing time and cost when compared to the traditional Sanger method. The Sanger method makes use of the 2 ′ ,3 ′ -dideoxy and arabinonucleoside analogs of the normal deoxynucleoside triphosphates, which act as specific chain-terminating inhibitors of DNA polymerase (Sanger et al., 1977) while in the NGS techniques, the DNA sequencing libraries are first clonally amplified in vitro, circumventing the time consuming and laborious cloning of the DNA library into bacteria unlike the Sanger method (Anderson and Schrijver, 2010). In addition, DNA templates are randomly read along the entire genome in a massively parallel sequencing by splitting the entire genome into small pieces followed by adapter ligation to the fragmented DNA (Zhang et al., 2011). Different technologies comprise NGS and while some of these technologies seem to have slight common features, they share key characteristics (Supplementary Table 1

Whole Genome Sequencing
High coverage and quality reference genome sequences which give insight into the relatively complete information of genes, the regulatory elements that control their function, genome composition and an outline for understanding genomic variations (Feuillet et al., 2011) are the basics of "omics" investigations in a targeted species . The low cost of NGS is making it achievable for non-model plants, but as highlighted by Hirsch and Buell (2013), four major factors hinder the obtaining of quality genome assembly from non-model species: the extent of genome duplication (segmental, tandem, and whole-genome); the heterozygosity; the ploidy level; and repetitive sequence composition which have until now thwarted full genome sequencing and assembly of these plants. However, different methods are being applied to obtain a good quality sequence data as most sequencing projects of non-model plants are de novo, therefore, sequencing and assembly require high coverage and quality sequence data.
Various strategies are being employed to overcome the high level of heterozygosity and repetitive sequences that hinder the sequencing and assembly of plants using NGS technologies. Sequencing several independent libraries with different insertion sizes in different platforms and combining their data for assembly (Peng et al., 2014) wherein all data put together achieved high coverage of the genome and consequently enhanced the quality of the de novo assembly. Combined sequence data from paired end and mate pair libraries also produce assemblies with longer contigs and fewer, larger scaffolds for maximizing coverage across the genome, thus many biological questions in these nonmodel plants can be answered. The large genome size of these plants is contributed by highly repetitive sequences that are similar or identical to sequences in the genome, are so abundant in occurrence such that even sequencing to higher depths by short-read technologies does not guarantee assembly quality. According to Hirsch and Buell (2013), their overrepresentation in the read pool of short-read sequences when joined with the inherent error rate in current NGS technologies confounds genome assembly. However, a hybrid approach that combines WGS sequencing data from different short reads platforms with high-density genetic and physical maps was utilized by Kane et al. (2011);Yang et al. (2013); Chen et al. (2013) wherein the maps can serve as scaffolds for the linear assembly of WGS sequences. Heterozygosity hampers contig assembly when a whole-genome shotgun strategy is used for sequencing. The negative effect of ploidy level and heterozygosity to the assembly of shortread sequence can be cushioned using homozygous genotypes derived from successive generations of self-fertilization (Shulaev et al., 2011;Wang et al., 2012a;Polashock et al., 2014). Wu et al. (2013) employed a novel combination of BAC-by-BAC (bacterial artificial chromosome) libraries with Illumina sequencing technology and Liu et al. (2014) used BAC libraries successfully, to overcome the major issues of high heterozygosity and high repeat content. This showed that a complex plant genome sequence can be assembled and characterized using NGS without a physical reference.
Genome duplication is thought to be a factor in the evolution and diversification of plants. Whole genome duplication (WGD) creates gene duplicates in plants, some which might not be essential to cell functioning while some may evolve novel genes via non-functionalization, neofunctionalization, or subfunctionalization. WGD thus contributes to evolution by enabling the evolution of new gene functions, advancing genome rearrangement and perhaps driving speciation. Whole genome sequencing (WGS) and analysis methods by comparing the sequences of individual members of a family is helping to map out the individual gene duplications involved in the evolution of a family from a single progenitor gene that existed in an ancestral genome as seen in Albert et al. (2013) where genomic changes that accompanied the origin of angiosperms was identified. They showed an ancient genome duplication that predated angiosperm diversification indicating that the ancestral angiosperm was a polyploid with a large assemblage of both novel and ancient genes that survived to play key roles in angiosperm biology.
The complete genome sequence of a species nevertheless does not imply that all accessions of the species has the same nucleotide sequence but rather contains almost same set of genes with changes in their nucleotide sequence arising maybe from substitutions, insertions, deletions, and structural variations. The low cost of NGS has made sequencing of related genomes to estimate the genetic diversity within and between germplasm pools possible, and identification and tracking of genetic variation are now so efficient and precise that thousands of variants can be tracked within large populations (Varshney et al., 2009). In sequencing the genomic DNA and RNA of Cannabis sativa (Purple Kush) using hybrid approaches of Illumina and 454 pyrosequencing, Van Bakel et al. (2011) reported a draft haploid genome sequence of the cultivar which, when compared with the genome of another cultivar C. sativa (Finola), showed more expression of cannabinoid pathway genes and the exclusive presence of the functional THCA synthase (THCAS) in the genome and transcriptome of Purple Kush. Deciphering domestication of plants requires identification of the important traits that have been altered during domestication. NGS have made the discovery of the genes that have been selected during domestication feasible. Investigation of the primary gene pool and of more distantly related wild relatives has potential to identify genes and alleles that can be used to improve the performance of major crop species (Tang et al., 2010). Mace et al. (2013) used WGS to give an account of a strong racial structure and complex domestication events in 44 accessions of Sorghum and showed that the modern cultivated sorghum is derived from a limited sample of racial variation, with the result pointing to the positive utilization of NGS in the understanding of genetic diversity at the genomic sequence level.
To date, a number of non-model crops have been successfully sequenced using the NGS technology (Table 1) charting a new course for future genomic and genetic research and crop improvement in these plants, and even turning some of the so called non-model plants into genetic models for studying certain biological processes.

GENE IDENTIFICATION AND EXPRESSION ANALYSIS
The field of molecular and evolutionary biology are being revolutionized by the accessibility to genome-scale information which has helped to answer biological questions like how the identical genetic makeup of cells can give rise to different cell types, with each playing a different role in the working of a multicellular organism that until recently were implausible. Earlier technique used for detection and quantification of specific RNA levels is the Northern blotting (Northern hybridization) developed by James Alwine and George Stark. In this technique, electrophoretically separated bands of RNA are transferred from an agarose gel to a paper strip. Specific RNA bands can be detected by hybridization with 32 P-labeled DNA probes followed by autoradiography. This procedure allows the detection of specific RNA bands with high sensitivity and low background (Alwine et al., 1977). But as noted by Streit et al. (2008), northern blotting has some disadvantages among which are risk of mRNA degradation during electrophoresis, which compromises the quality and quantification of expression; health and environmental implication of high doses of radioactivity and formaldehyde; low sensitivity of northern blotting in comparison with that of RT-PCR; detection with multiple probes is difficult; use of ethidium bromide, DEPC and UV light needs special training and attention. The RNase protection assay, an alternative, is a highly sensitive technique developed to detect and measure the abundance of specific mRNAs in samples of total cellular RNA (Ma et al., 1996). Another method of gene expression analysis, hybridization of antisense RNA corresponding to a known complementary target sequence prevents target digestion by single strand-specific RNase activity. This process results in the degradation of all remaining singlestranded RNAs (i.e., those not hybridized to the probe sequence), enabling the accurate quantitation of specific target sequences (VanGuilder et al., 2008). However, the complex procedures as well as relatively large amounts of RNA involved pose some restrictions in the use of these methods. The development of realtime qPCR has increased the throughput of gene expression while reducing the required quantity of RNA. It has become a routine approach for measuring the expression of genes of interest, validating microarray experiments, and monitoring biomarkers (VanGuilder et al., 2008). Real-time PCR amplifies a specific target sequence in a sample, then monitors the amplification progress using fluorescent technology (Valasek and Repa, 2005). Despite the fact that real-time PCR technology is an invaluable tool for many scientists in gene expression analysis, its one major shortcoming is the prerequisite for prior sequence data of the specific target gene of interest, hence q-PCR can only be used for targeting of known genes. The transcriptome is the set of all RNA molecules (mRNA, rRNA, tRNA, and other non-coding RNA) transcribed by an organism. Wang et al. (2009) had posited that the fundamental principle for interpreting the functional elements of the genome and revealing the molecular constituents of cells and tissues, and also for understanding development and disease is gaining insight into the transcriptome. Microarray is a technique widely employed for analyzing the transcriptome for patterns of gene expression. It has the ability to measure the expression levels of thousands of genes in a single experiment, but lacks the capacity to detect novel transcripts and sensitivity to expression levels of genes. NGS have rapidly advanced nextgeneration RNA sequencing (RNA-seq) for rapid generation of large expression datasets for gene discovery and expression analysis in non-model species (Marioni et al., 2008;Li et al., 2012). As stated by De Wit et al. (2012), RNA-seq focuses on sequencing only mRNA from the genes that are expressed in the tissue or transcriptome wherein a considerable proportion of adaptively interesting variations are located. It shows a record of how many mRNAs from a particular exon are in the sample and includes variations in the sequences that elucidate functional polymorphisms. Unlike the microarray techniques, RNA-seq can assemble reads de novo without mapping to reference genomic sequence, a feature that makes it an invaluable asset for identification of novel genes in nonmodel plants. Zhou et al. (2012) demonstrated the use of de novo assembly in Ammopiptanthus, a genus with evergreen broadleaf habit in the desert and arid regions of the Mid-Asia, playing a critical role in conserving the desert ecosystems, which is critical in controlling desertification. To understand the genetic mechanisms underlying deep, flourishing root system for water absorption to adapt these plants to harsh conditions, de novo transcriptome sequencing of A. mongolicus was carried out using 454 pyrosequencing to discover putative genes associated with drought tolerance. The potential drought stress related transcripts identified in the study provided a foundation for further investigation into the drought adaptation in Ammopiptanthus. Transcriptome sequencing has, however caused a significant upshot in the expressed sequence tags (ESTs) collections, including the non-model plant species (http://www.ncbi.nlm.nih.gov/dbEST/dbEST_summary.html).
MicroRNAs (miRNA; 21-24 nucleotide) are a class of noncoding endogenous small RNAs that are transcribed from a gene, but the transcript is never translated into a protein (Phelps-Durr, 2010) therefore are involved in regulating gene expression in different organisms including non-model plants.
Since the discovery of the first miRNAs, Lin-4, Lee et al. (1993), there has been an increased interest in understanding post transcriptional gene expression regulation during development. According to Axtell and Bartel (2005), miRNAs affect the morphology of flowering plants by the post transcriptional regulation of genes involved in critical developmental events. They, however postulated that an understanding of the spatial and temporal dynamics of miRNA activity is fundamental to elucidate the functions of miRNAs. Achard et al. (2004) described the role of microRNA (miR159) in the regulation of shortday photoperiod flowering time and of anther development. Other plant developmental processes involving miRNAs include leaf morphogenesis and polarity (Floyd and Bowman, 2004), floral development and timing defects (Aukerman and Sakai, 2003) among others. Zhang et al. (2006) identified four existing approaches for identifying miRNAs which include genetic screening, direct cloning after isolation of small RNAs, computational strategy, and ESTs analysis but observed that these approaches have different advantages and shortcomings and postulated that combining these methods, more miRNAs will be quickly discovered. As reported by Lakhotia et al. (2014), a large number of miRNAs are evolutionary conserved among diverse species, while several miRNAs, that are considered to be recently evolved show species-specificity and often express at lower levels relative to conserved miRNAs and as a result of their low expression levels, most of the species-specific miRNAs remained unidentified in many plant species. With improved methods of NGS technologies in investigating the transcriptome, enormous progress, especially with regard to regulatory pathways have been made in identifying and understanding non-coding RNAs such as miRNAs. RNA sequencing using high-throughput NGS platforms has the advantage of high accuracy in distinguishing miRNAs that are very similar in sequence and can detect novel miRNAs. Gao et al. (2015) identified 50 novel miRNAs, representing 19 families from three sRNA libraries of tobacco in addition to 165 miRNAs representing 55 conserved families using Solexa sequencer. Similarly, using high-throughput sequencing of small RNAs and analysis of transcriptome data, Zhu et al. (2013), identified 132 putative conserved miRNAs belonging to 31 known miRNA families and 10 novel miRNAs in Caragana intermedia. They in addition, predicted 38 potential targets for the conserved and novel miRNAs and validated four of them by 5 ′ RACE. These including identifications of miRNA in various non-model crops, Lakhotia et al. (2014) show the value of high throughput sequencing approach to miRNA discovery, especially novel miRNAs in non-model crops without a reference genome.

NGS IN AID OF MOLECULAR MARKER DEVELOPMENT AND BREEDING
Molecular markers are identifiable DNA sequences, found at specific locations of the genome, and transmitted by the standard laws of inheritance from one generation to the next (Semagn et al., 2006). With the need to amplify the agricultural output to meet up with the challenge of producing enough food for the rising world population, advances in genomic technologies have provided new tools for discovering and tagging novel alleles and genes. These tools can enhance the efficiency of breeding programs through their use in marker-assisted selection (MAS), linkage mapping or quantitative trait locus (QTL) mapping, Phylogenetics, positional cloning, genetic diversity assessment, genotypic profiling etc. According to Kumpatla et al. (2012), the ability to deduce the underlying molecular mechanisms of a trait, understand the gene regulatory mechanisms, determine gene expression differences and variations in expressed gene sequences, and other structural variations such as copy number variations (CNV) and presence-absence variations (PAV) is to a large extent dependent on the availability of reference genome/transcriptome sequence.
Identification of polymorphic sequences, basic to a trait of interest enables the development of functional markers. The advent of NGS has enabled the exploration of thousands of markers across the entire genome using several approaches, enabling comprehensive genome-wide association studies, even in populations with little or any previous genetic information as in non-model plants (Sakiyama et al., 2014). SNP markers are the most abundant in a genome and appropriate for analysis on a wide range of genomic scales. SNPs are markers, which untangle polymorphism between individuals or populations due to change of a single nucleotide. Illumina transcriptome sequencing data was used to discover 2987 high-quality putative SNP in Turkish Olive Genotypes (Kaya et al., 2013). These were successfully used to access genetic diversity among 96 olive genotypes. A wholegenome resequencing of two cabbage inbred lines using Illumina (Lee et al., 2015) identified 674,521 SNPs. From these, 167 dCAPS markers were developed for genetic map construction which identified novel QTLs for black rot resistance. Similarly, a highthroughput and specific-locus amplified fragment sequencing (SLAF-seq) approach was also used by Wei et al. (2014a) to construct a high-density SNP map for cucumber. It contained 1800 high quality SNPs, spanning 890.79 cM with an average marker interval of 0.50 cM and further detected fruit-related QTLs. Also, genotyping-by-sequencing (GBS) approach via NGS identified 21,471 SNPs in oil palm (Pootakham et al., 2015). It enabled the construction of linkage map containing 1085 markers distributed over 17 linkage groups and identified quantitative trait loci (QTL) affecting trunk height and bunch weight.
Simple sequence repeat (SSR) markers which have the advantage of high abundance, random distribution within the genome, high polymorphism information content and codominant inheritance have been developed at large scale and lower costs via NGS. In Myrica rubra with an estimated genome size of 323 Mb, highly heterozygous but with little duplication, Jiao et al. (2012) identified 28,602 SSRs from a WGS sequencing using Illumina. Polymorphic markers among these also successfully transferred to other Myrica species. Likewise, in Sesame genome, 23,438 putative SSRs were identified by wholegenome de novo sequencing and successfully used to screen accession across 12 countries (Wei et al., 2014b). De novo genic SSRs have been developed at large scale and used in a number of non-model, including but not limited to Caragana korshinskii Kom (Long et al., 2015), Hevea brasiliensis (Salgado et al., 2014), Prosopis alba (Torales et al., 2013).
These developed markers are also used for association mapping studies in non-model plants. Association mapping (linkage disequilibrium mapping) identifies QTLs that accounts for phenotypic variation among individuals or species. It helps in the dissection of complex genetic traits and enhances crop breeding for traits as disease resistance, salinity and drought tolerance. In an association mapping analyses, accounting for population structure study by Gupta et al. (2014), eight out 50 SSR markers representing the nine chromosomes of foxtail millet used in testing population structure in 184 accessions were shown to have significant association with nine agronomic traits. Also, association analysis using 20 SSR markers to detect the marker loci linked to morphological traits and physiological traits in a wild Populus simonii population Wei et al. (2014c), revealed that three SSR markers were identified for seven traits, one was associated with five morphological traits while two of the markers were associated with one morphological trait and one physiological trait, respectively. These studies infer that the identified markers are suitable for MAS breeding, target gene detection or QTL.
Genome sequencing have aided in deciphering the influence of transposable elements in the function and evolution of genes and genomes. Most of these repetitive sequences are found in different regions across the genome and have been implicated in genome diversity and phenotypic variation. In view of these, molecular markers are being developed from these elements and used for diversity characterization and construction of genetic linkage maps. In foxtail millet, genomewide analysis, Yadav et al. (2014)  Of these, 30 out of 134 Repeat Junction Markers screened in 96 accessions of Setaria italica and three wild Setaria accessions showed polymorphism. This demonstrates that transposable elements can serve as genomic resources for genotyping. Insertions and Deletions (Indels), are other genomic resources distributed across the genome that can also be used as molecular markers for Phylogenetics. 2687 InDel-based markers were developed from Illumina sequence data from three genotypes of Phaseolus vulgaris L (Moghaddam et al., 2014). These markers were successfully used to construct a phylogenetic tree and a genetic map, deducing that InDel markers are reliable, simple, and accurate. Introns are non-coding RNA transcripts that are spliced out before the translation of the RNA molecule into a protein. Markers developed from introns have high evolutionary rate, possibly because they are flanked by exons which consign conserved primers that may function across a wide range of species. Intron Length Polymorphic (ILP) markers are thus designed via exon-primed intron-crossing PCR (EPIC-PCR) by designing primers in exons flanking the target intron. NGS sequence data from a potato cultivar was used to design ILP markers (Ahmadvand et al., 2014). These markers were used to test diversity in other potato genotypes and cross transferability was investigated in other Solanum species. The results demonstrated ILPs as genomic resources in diverse molecular analyses, including cross-species studies. Similarly, Muthamilarasan et al. (2014) developed 5123 ILP markers, of which 4049 were physically mapped onto nine chromosomes of foxtail millet. They further showed the applicability of the markers in germplasm characterization, transferability, Phylogenetics and comparative mapping studies in millets and bioenergy grass species.

UNDERSTANDING BIOSYNTHETIC PATHWAYS OF SPECIALIZED PLANT METABOLITES IN NON-MODEL PLANTS
Plants manufacture a huge and diverse group of organic compounds called secondary metabolites. These compounds appear to have no direct role in growth and other physiological processes in plants, but are implicated in their adaptation to their environment such as control of seed germination, symbiosis regulation, defense against herbivores and pathogens, and chemical inhibition of competing plant species. Contrasting the primary metabolites (sugars, amino acids, acyl lipids, and nucleotides) which are found in all plants, these secondary metabolites only pertain to a plant species or group of related plant species. They were initially thought to be waste products of metabolism until research showed that these secondary metabolites are useful in pharmaceuticals, flavors, industrial materials, and chemicals consequently increasing interest for their use. Most of these compounds occur in non-model plants for which genomic sequence information is not yet available . The genus Panax, for instance, consists of at least nine species (Leung and Wong, 2010), most commonly referred to as ginsengs which are known from research to have anticancerous, antidiabetic, immunomodulatory, antiinflammatory, and antiallergic, effects among other medicinal uses. The mode of action of ginseng was however not known until ginsenosides were isolated in 1963 (Shibata et al., 1963(Shibata et al., , 1965. Christensen (2008) reported that ginsenosides are found nearly exclusively in Panax species (ginseng) with more than 150 naturally occurring ginsenosides being isolated from the roots, leaves/stems, fruits, and/or flower heads of ginseng. Since then, research effort on evaluating the function and elucidating the molecular mechanism of each ginsenoside has been on the increase. Researchers have generated genomic information about ginsengs, identifying several candidate genes encoding enzymes responsible for the biosynthesis of the secondary metabolites ginsenoside using different NGS platforms (Sun et al., 2010;Luo et al., 2011;Li et al., 2013;Jayakodi et al., 2014).
Access to some of these secondary metabolic compounds was often poor because of a lack of understanding of how these metabolites are synthesized (Oksman-Caldentey and Inzé, 2004), partly owing to the fact that the enzymes and biochemical pathways in their synthesis were either unknown or having complexities that make identification of the enzymes that catalyze the numerous metabolic cycles difficult. In some of these plants, a number of regulatory enzymes are involved in the biosynthesis process. Many of the genes in plant genomes code enzymes for secondary metabolism and transcriptomics data mining however have proven to be an efficient way to discover genes or gene families encoding enzymes involved in various metabolic pathways . Podophyllum species are sources of podophyllotoxin, an aryltetralin lignan used for semi-synthesis of various powerful and extensively employed cancer-treating drugs, its biosynthetic pathway, however, remains largely unknown. NGS/Bioinformatics and metabolomics analysis of Podophyllum hexandrum and P. peltatum plant tissues gave two putative genes in podophyllotoxin biosynthesis (Marques et al., 2013). Further studies using integrated omics technologies (including advanced mass spectrometry/metabolomics, transcriptome sequencing/gene assemblies, and Bioinformatics) in the two Podophyllum plants (Marques et al., 2014) enabled discovery of the aporphine alkaloid pathway in Podophyllum species, result which suggest evolutionary linkages between both lignan and alkaloid biosynthetic pathways. The authors reported that RNA-seq transcriptome sequencing and Bioinformatics Frontiers in Plant Science | www.frontiersin.org guided gene assemblies/analyses in silico, specifically suggested presence of transcripts homologous to genes encoding all known steps in aporphine alkaloid biosynthesis. Miettinen et al. (2014) had stated that the biotechnological production progress of the monoterpenoid indole alkaloids (MIAs), produced by Catharanthus roseus in extremely low levels and used as anticancer drugs, from other sources is hampered by the lack of knowledge of the enzymes responsible for their biosynthesis. They nevertheless reported the characterization of the last missing steps of the C. roseus secoiridoid pathway using an integrated transcriptomics and proteomics approach for gene discovery, followed by biochemical characterization of the isolated candidates and further reported the reconstitution of the entire MIA pathway up to strictosidine in the plant host Nicotiana benthamiana, by heterologous expression of the newly identified genes in combination with the previously known biosynthesis genes. This new technology of NGS has helped in explicating the progression of events that lead to the production of these secondary compounds of interest in non-model plants, accelerating gene discovery for secondary metabolite pathways without preexisting sequence knowledge of the genes studied.
Many secondary metabolites have a complex and unique structure and their production is often enhanced by both biotic and abiotic stress conditions (Dixon, 2001). Ryan et al. (2002) provided valuable insights into the biochemical response of plants to UV stress, which results in the production of a more protective flavonoid profile. Rezaeieh et al. (2012) however noted that biotic and abiotic stresses exert an outstanding influence on the biosynthesis of several secondary metabolites in medicinal plants. Often, it is difficult to predict the complex signaling pathway that are activated or deactivated in response to different abiotic stresses but the complex molecular regulatory system involved in stress tolerance and adaptation in plants can be easily deciphered with the help of different omics study (Chawla et al., 2011). In response to various abiotic stresses, plant continuously needs to adjust their transcriptome profile (Gupta et al., 2013) thus NGS based transcriptome shotgun sequencing (RNA-seq), which targets the genes that are expressed in a tissue at a particular time is invaluable. A comprehensive transcriptome analysis of a salinity tolerant Phaseolus vulgaris L. variety by Illumina sequencing showed genes related to salt tolerance in plant (Hiz et al., 2014). This and other studies using transcriptomic approaches in non-model plants (Xu et al., 2013) for drought stress in chrysanthemum have continued to generate functional genomics resource, giving an unfathomable understanding of the molecular mechanisms underlying plant's responses to stress conditions.

CONCLUSION AND FUTURE PROSPECTS
From the foregoing, it is evidently clear that the cost effective and timely sequencing provided by different NGS technology platforms has impacted positively in advancing the course of non-model plants which earlier had no place in genomics. The technology has enabled scientists to explore the plants to their own benefit and in understanding mechanisms underlying processes of gene expression and secondary metabolism in addition to creation of genomic resources for diversity analysis and marker assisted breeding (Figure 1) through de novo analysis which hitherto was impossible due to lack of reference genomes.
The decreasing cost of this technology is however an open door to the possibility of sequencing genomes of individuals of a particular species. This if utilized properly will immensely assist comparative genomics in acquiring vital information about the evolutionary history of non-model plant species by studying the order of their DNA sequences, which had relied on chromosome numbers and ploidy levels. Moreso, protein seq (proteomics) combined with the increasing number of WGS will aid functional genomics in protein identification and consequently perform functional prediction of hypothetical proteins/genes which usually form the largest category during functional (BLASTX) annotations in non-model plants as well as in metabolomics which involves large scale measurements of metabolites level as non-model plants are large repositories of secondary metabolites of economic interest. It will also enable Phenomics for development of large scale phenotypic data for understanding how interactions of genotypes with the environment translate into phenotypic variations in non-model plants. In addition, improvements in these technologies will also advance Bioinformatics in data handling processes.

ACKNOWLEDGMENTS
CU acknowledges The World Academy of Sciences for the advancement of science in developing countries (TWAS) and The Council of Scientific and Industrial Research (CSIR) for the award of a postgraduate fellowship. RS acknowledges CSIR and DBT for the award of PLOMICS and Tea Network projects. This is CSIR-IHBT publication 3895.