De novo Sequencing and Comparative Transcriptomics of Floral Development of the Distylous Species Lithospermum multiflorum

Genes controlling the morphological, micromorphological, and physiological components of the breeding system distyly have been hypothesized, but many of the genes have not been investigated throughout development of the two floral morphs. To this end, the present study is an examination of comparative transcriptomes from three stages of development for the floral organs of the morphs of Lithospermum multiflorum. Transcriptomes of flowers of the two morphs, from various stages of development, were sequenced using an Illumina HiSeq 2000. The floral transcriptome of L. multiflorum was assembled, and differential gene expression (DE) was identified between morphs, throughout development. Additionally, Gene Ontology (GO) terms for DE genes were determined. Fewer genes were DE early in development compared to later in development, with more genes highly expressed in the gynoecium of the SS morph and the corolla and androecium of the LS morph. A reciprocal pattern was observed later in development, and many more genes were DE during this latter stage. During early development, DE genes appear to be involved in growth and floral development, and during later development, DE genes seem to affect physiological functions. Interestingly, many genes involved in response to stress were identified as DE between morphs.


INTRODUCTION
Heterostyly is a complex and elegant breeding system characterized by the presence, in a species, of two or three floral morphs with sexual organs at distinct, yet fixed, heights (Barrett et al., 2000;Barrett and Shore, 2008). In heterostyly's simplest form, distyly, two floral morphs exist. The long-style (LS) morph has flowers with the stigma elevated above the anthers, and the short-style (SS) morph produces flowers in which the anthers are positioned above the stigma. The stigma in one morph is at the same height as the anthers in the other morph, a condition known as reciprocal herkogamy (Figure 1). In heterostyly, reciprocal herkogamy is usually accompanied by a self-and intramorph incompatibility (SI) mechanism that results in only sexual organs at the same height being compatible and able to produce fertile offspring. Researchers have demonstrated that together reciprocal herkogamy and SI allow for efficient pollen transfer among individuals of different morphs throughout populations, and this results in increased outbreeding and reduced inbreeding (Barrett, 1990;Ferrero et al., 2011;Zhou et al., 2015) [although see Björkman (1995) for an example of lack of efficient pollen transfer]. These functional and ecological aspects of heterostyly have been the subject of much research, dating back to Darwin's pivotal work on these components of the breeding system (1877), but the genetics of heterostyly, while investigated using numerous crossing studies (e.g., Ernst, 1928Ernst, , 1933, still remain elusive (Fornoni and Domínguez, 2015;Gilmartin, 2015). One reason this is the case is that heterostyly is known from members of at least 28 families from across the angiosperms (Barrett and Shore, 2008;Cohen, 2010), but detailed data on the genetics of the breeding system is primarily limited to a small number of species (Gilmartin, 2015;Nowak et al., 2015). Indeed, much of the information on the genetics of heterostyly has been informed by studies on Primula L., and in this taxon, evidence suggests that the heterostylous condition is controlled by a supergene (S-locus), multiple genes that are tightly linked together, involved in the control of the breeding system (Barrett and Shore, 2008;Charlesworth, 2016). Outside of Primula, evidence for the presence of a supergene is weaker, primarily due to lack of data (Barrett and Shore, 2008;Cohen, 2010;Thompson and Jiggins, 2014). The present study employs de novo sequencing and comparative transcriptomics of three developmental stages of the flowers of the two morphs of the distylous species Lithospermum multiflorum Torr. ex A. Gray (Boraginaceae) in order to identify genes involved in the control of the heterostylous condition.
The genetics of heterostyly was initially characterized through crossing studies conducted during the late 1800s and early 1900s (Darwin, 1877;Ernst, 1928Ernst, , 1933, and the outcomes of these studies resulted in the hypothesis of the three-gene Slocus (Lewis, 1954;Dowrick, 1956;Richards, 1997;Nowak et al., 2015). Recent studies have focused on uncovering the underlying genetics of a small number of heterostylous species, including Linum L., Turnera L., and Primula (McCubbin et al., 2006;Somers et al., 2008;Labonne et al., 2009;Ushijima et al., 2012Ushijima et al., , 2015. Many of the studies have recognized genes or proteins either linked to the S-locus or with morph-specific patterns of expression. In Linum, Ushijima et al. (2012Ushijima et al. ( , 2015 identified a gene, TSS1, that was tightly linked to the S-locus and only segregated with SS individuals, while other genes unlinked to the S-locus, such as LgSKS1, which are also highly expressed in the SS morph, have post-transcriptional regulation. This latter finding suggests that the development of the two floral morphs is controlled by more than just the components of the S-locus. In Turnera, Somers et al. (2008) and Labonne et al. (2009) constructed a genetic map of the S-locus and identified three genes, TkNACA, TkST1, and TsRETRO, linked to it. Further studies on Turnera, using X-raygenerated mutants, allowed gene order to be postulated for the S-locus (Labonne et al., 2010).
Research on Primula has been more extensive (Gilmartin, 2015;Nowak et al., 2015;Huu et al., 2016), and recent work has resulted in the identification of multiple genes associated with the S-locus, such as PVeGLO2, a GLOBOSA homolog, and CYP734A50, both present in Primula veris L. PVeGLO2, which was identified via a draft genome of the species, is not expressed in the LS morph (Nowak et al., 2015). Similarly, CYP734A50 is only expressed in the styles of the SS morph, and the loss of the expression of this gene results in the elongated styles of the LS morph (Huu et al., 2016). Using BAC contigs, Li et al. (2015) identified a one megabase (Mb) region that contains 82 genes linked to the S-locus, and this region appears to be located close to the centromere. That location may be one of the reasons that recombination among the genes in the S-locus occurs infrequently (Talbert and Henikoff, 2010). In a related species of Primula, P. vulgaris Huds., McCubbin et al. (2006) utilized subtractive cDNA libraries of morphspecific floral tissue to identify differentially expressed (DE) genes between the flowers of the two morphs. These researchers identified 11 DE genes involved in a variety of different processes, including stress response, cell wall development, and RNAbinding; however, none of these DE genes were linked to the S-locus. This provides evidence that multiple genes are involved in downstream processes of morph-specific floral development in heterostylous species. Additionally, genes and proteins that differ between morphs have been studied in other heterostylous taxa, such as Fagopyrum Mill. (Fesenko and Fesenko, 2006) and Averrhoa L. (Wong et al., 1994;Fesenko and Fesenko, 2006), but to a lesser extent.
While identification of the S-locus is important in understanding heterostyly, at least in some species, the breeding system provides an excellent opportunity to investigate genes involved in floral organ height and length. In most species, identification of these genes would require large quantitative trait loci (QTL) studies involving examination of the range of morphological variation and associated molecular markers [e.g., over 6500 plants observed in a QTL study on flower size of Erythranthe Spach (formerly Mimulus L.) (Kelly and Monica, 2011)]. Given the distinct differences in floral organ heights and lengths in heterostylous species, these species do not succumb to these issues. Consequently, examining floral transcriptomes of heterostylous species, at various stages of development, allows for the identification of genes involved in controlling the sizes of flowers and flower organs. It is possible to compare patterns of gene expression of flowers of the morphs of heterostylous species because the flowers of the morphs develop following the same genetic program, with the only exceptions hypothesized to be those resulting in the morphspecific differences (Cohen, 2010). Genes that are differentially expressed between the morphs should be those that are involved in heterostyly. While the S-locus, at least in Primula, may be the genomic region ultimately controlling the heterostylous condition, other genes from throughout the genome may be involved in development of the breeding system (McCubbin et al., 2006), with identification of the S-locus only providing limited data.
L. multiflorum is a distylous species (Figure 1) common in the intermountain west in North America (Cronquist et al., 1984). The species is a member of Boraginaceae, the family that has the largest number of independent origins of heterostyly, and this type of breeding system has evolved multiple times in Lithospermum L. (Cohen, 2011). Cohen et al. (2012) demonstrated that there are differences in the development of the flowers within the genus, and data concerning SI and floral morphology and development based on studies by Philipp and Schou (1981), Casper (1985), Li and Johnston (2010), Ferrero et al. (2012), andCohen (2014) suggest that heterostyly evolved following distinct patterns in different tribes of the family. Heterostyly arose independently in L. mulitflorum, and the species is sister to Lithospermum macromeria. Cohen (2011), a species with floral morphology quite different from L. multiflorum (i.e., long, green to white corollas and exserted anthers and stigmas). Consequently, L. multiflorum is located at an intriguing phylogenetic position for an investigation of patterns of morph-specific flower gene expression. To this end, the present study is an examination of the comparative transcriptomics of (1) the corolla and androecium and (2) the gynoecium of the two morphs of L. multiflorum at three stages of development in order to recognize genes that are differentially expressed between the morphs, and, therefore, regulating floral morphology in this species.

Plant Material, RNA Extraction, and Transcriptome Sequencing
Flowers of the two morphs of L. multiflorum were collected from three populations in and around Flagstaff, Arizona, USA during June 2011 and 2012. In order to capture appropriate developmental stages, multiple flowers of different sizes were taken from plants, and after quickly removing some sepals, flowers were immediately placed in RNAlater (Applied Biosystems, Foster City, CA, USA), and subsequently stored either at 4 • C overnight or placed on dry ice. All specimens were then stored indefinitely at −20 • C. Voucher specimens for the populations were made (Cohen 353,354,356,and 398), and specimens were deposited at MICH.
Flowers were divided into three stages-early, middle, and late-based on the developmental patterns identified by Cohen et al. (2012) (Figure 2). Flowers in the early stage are up to 4 mm in length, which is before morph-specific differences are apparent in the flowers. Flowers in the middle stage are 4-9 mm in length, which is after morph-specific floral differences are observed in heights of anthers and stigmas. Flowers in the late stage are greater than 9 mm in length, and these flowers range from near anthesis to anthesis, with morph-specific differences well-defined. For each morph at each stage, three extractions were made for the organ group of the corolla, androecium, and gynoecium, and two extractions were conducted for the organ group of the corolla and androecium, resulting in a total of 30 extractions (Table S1). Flowers for RNA extraction were arbitrarily selected from either 2011 or 2012, and given the similar environmental conditions around Flagstaff, AZ during June of the 2 years (https://www.wunderground.com/), it would appear that different collection times would have minimal impact on variation in patterns of gene expression.
RNA was extracted from one or two RNAlater-preserved flowers using a modification of the Li-Cl method of Jaakola et al. (2001). Right before RNA isolation, the remaining sepals and pedicel were removed. Before the extraction buffer was added, flowers were homogenized using the method outlined in Alexander et al. (2007). The primary modification from Jaakola et al. (2001) involved the extraction buffer, which in the present study included 2% CTAB, 2% PVP (Mol WT 40,000), 0.5% SDS, 100 mM Tris-HCl (pH 8.0), 25 mM EDTA, 2.0 M NaCl, and 0.5 g/L spermidine, with 2% β-mercaptoethanol and 1.5 mg/mL Proteinase K added just before use. After RNA was isolated, the quantity, 260/280, and 260/230 ratios were determined using a Nanodrop ND-1000 spectrophotometer (Thermofisher Scientific, Waltham, MA, USA). RNA that was at least a total of 1 µg and with 260/280 and 260/230 > 1.8 was treated with the TURBO DNA-free TM Kit (Applied Biosystems) following the manufacturer's protocol. Subsequently, isolations were sent to the Duke University Genome Sequencing & Analysis Core Resource, where the quality of the RNA was determined via an Agilent 2200 TapeStation (Agilent Technologies, Santa Clara, CA, USA), and then prepared for transcriptome sequencing, on an Illumina HiSeq 2000, using the TruSeq RNA Kit. One extraction from each stage from each group of organs (total of 12) was sequenced using 100 base pair (bp) paired-end (PE) reads across three lanes, and the remaining extractions (18) were sequenced using 100 bp single-reads (SR) across five lanes. Afterwards, adapters were trimmed and low quality and short reads were removed at the Duke University Genome Sequencing & Analysis Core Resource. Sequence data are deposited at the Sequence Read Archive (SRP093214, BioProject PRJNA353131).

Transcriptome Assembly and Mapping
The transcriptome of L. multiflorum was assembled, at Duke University, based on the 12 PE transcriptomes from the three developmental stages of the two morphs. Using the ∼389 million high-quality pairs of reads sequenced from the 12 RNA isolations, the floral transcriptome of the species was assembled using Trinity v 1 (Grabherr et al., 2011). The transcriptome was annotated and gene ontology terms identified (GO terms) using the Integrative Services for Genomic Analysis (http://isga.cgb. indiana.edu). In order to test for DE genes between the morphs and organs at various stages of development, the protocol of Van Verk et al. (2013) was followed. FASTQ files were uploaded to the Galaxy server (usegalaxy.org), and five base pairs were trimmed from the ends of each read, followed by filtering the reads to remove those with low quality scores (reads <90% of the bp with a Phred quality score of ≥20). After filtering, reads from each RNA isolation were mapped to the isoforms of the Trinity-assembled transcriptome. Using Bowtie (Galaxy tool version 1.1.2) (Langmead et al., 2009), reads were mapped, with the appropriate settings for SR and PE employed, using the default (lenient) settings as well as with a stringent criterion (i.e., -a,-best, -strata, -m 1, -n 3, -l 30, -e 288, -y off, -k 1, -maxbts 800). The SAM files from the Bowtie runs were analyzed using eXpress (Roberts and Pachter, 2013), with the Galaxy server, to determine the number of reads that mapped to each isoform. This was undertaken using three different methods: Default settings and two more thorough analyses, involving two additional rounds of each of the batch and online expectation-maximization (EM) algorithm, a mean fragment length either of 90 bp for all reads or 90 bp for SR and 190 bp for PE, and a standard deviation of one. Total counts and effective counts, which are counts adjusted for fragment and target length biases, were recorded (Roberts and Pachter, 2013).

Differential Gene Expression and Gene Ontology
The total number of reads mapped to each isoform for each putative gene were compiled in order to determine the number of reads mapped to each putative gene. To test for similarity between transcriptomes from the same developmental stage and morph, Pearson correlations were calculated in Microsoft Excel. For each putative gene and isoform, both the total and effective counts from eXpress from the two Bowtie mapping strategies were analyzed in edgeR v.3.0.0 (Robinson et al., 2010) to test for differential expression (DE) of transcripts among floral organs at early, middle, and late stages of floral development for the corolla and androecium and for the gynoecium. Putative DE genes were identified based on a P-value ≤ 0.05 and a false discovery rate (FDR), using the Benjamini-Hochberg adjustment, of ≤0.1 (Anders et al., 2013). To investigate overlap of DE genes among morphs, organs, and stages of development, venn diagrams were constructed with VennDiagram (Chen and Boutros, 2011). Hierarchical clustering of DE genes was conducted across the three developmental stages, for the two organ groups, using NMF (Gaujoux and Seoighe, 2010).
After the DE putative genes and isoforms were identified for each organ or group of organs at the three developmental stages, genes were annotated using Blast2GO (Conesa et al., 2005) running blastx on the nr database at NCBI, and GO terms were determined, using Blast2GO, with InterProScan (Zdobnov and Apweiler, 2001) and by mapping genes with the Gene Ontology Database ( GO Consortium, 2004). With GOstats 2.34.0 (Falcon and Gentleman, 2007), GO terms for the DE genes for each organ or group of organs for each developmental stage were compared to those for all genes in the floral transcriptome in which GO terms were identified. GO terms for Biological Processes, Cellular Components, and Molecular Function were identified that were over-and underrepresented, based on a P-value ≤ 0.5. The hyperGTest in GOstats was run with and without the conditional algorithm. For these analyses, because of the small number of DE genes and the similarity to those during early development, the mid-stage DE genes were included with those from the early stage of development.

Transcriptome Sequencing, Assembly, and Mapping
A total of 30 transcriptomes were sequenced for flowers at the three stages of development for the two morphs of L. multiflorum. For the paired-end (PE) reads, the total number of reads for a transcriptome ranged from 58,474,582 to 134,651,292 100bp reads. For the single reads (SR), the total number of reads for a transcriptome ranged from 38,244,776 to 78,986,321 100bp reads. The assembled floral transcriptome of L. multiflorum included 97,603 components (putative genes) and 265,144 isoforms (putative gene variants). The minimum length of a putative gene was 201 bp, and the maximum assembled length was 16,847 bp. For the genes, the average length was 690 bp, with an N50 of 1201 bp, and for the isoforms, the average length was 1144 bp, with an N50 of 2029 bp. Statistics for the assembled transciptome are provided in Table 1. Approximately 2-8% of the total reads were removed via the filtering and trimming steps. Using the default (lenient) and more stringent Bowtie settings, ca. 70-80 and 40% of the reads mapped to the assembled transcriptome, respectively. Data on the number of reads are presented in Table S1.

Differential Gene Expression and Gene Ontology
The Pearson correlations for the components and isoforms were greater than 0.85 for the corolla and androecium and for corolla, androecium, and gynoecium at early and late  Table S2).
The two different mapping criteria in Bowtie (lenient and stringent), in conjunction with the six different methods for determining the number of reads mapped, in eXpress [total and effective counts for each of (1) no specified fragment length, (2) a mean fragment length of 90 bp for all reads, and (3) a mean fragment length of 90 bp for SR and 190 bp for PE], resulted in 12 manners in which the number of differentially expressed (DE) genes were calculated ( Table 2). Of the 12 different manners in which DE genes were calculated, the lenient Bowtie analyses tended to result in more DE genes identified, with edgeR, compared to the more stringent Bowtie analyses. Additionally, the use of effective counts, rather than total counts, generally resulted in fewer DE genes. Specifying the length of the mean fragments at either 90 bp for each read or 190 bp for PE reads and 90 bp for SR, did not seem to influence the number of DE genes as much as the mapping criteria and the use of total or effective counts, with the former resulting in fewer DE genes compared to the latter ( Table 2).
In the corolla and androecium, more DE genes early in development were more highly expressed in the short-style (SS) morph (although this was a small number), while a greater number of DE genes were more highly expressed in the longstyle (LS) morph during late development (Figures 3, 4). The opposite pattern was observed in the gynoecium, with more highly expressed DE genes in the LS morph during early development (again, a small number), and many more DE genes more highly expressed in the SS morph during late development of the gynoecium (Figures 3, 4). Indeed, the greatest number of DE genes between morphs was observed in the gynoecium during the late stage of growth. Only a small number of DE genes was identified during the mid-stage of development both for the corolla and androecium and for the gynoecium, with the smallest number of DE genes across stages and organs identified in the corolla and androecium of the mid-stage of development ( Table 2). The DE genes that were identified in the greatest number of analyses, for each stage and organ, are presented in Table 3. This number ranged from 4 to 12 depending on the stage and organ combination. The majority of these genes were successfully annotated using Blast2GO, and most exhibit greater than six log-fold changes, in expression, between morphs.
No genes expressed in the corolla and androecium overlap in patterns of expression among the developmental stages. In contrast, 12 genes are DE in the gynoecium across all three developmental stages, with smaller numbers overlapping between the early and mid-stage of development (two) and the mid-and late stage (four) (Figure 5). Interestingly, 11 genes are DE in both the early and late stage of the gynoecium, but not in the mid-stage of development ( Figure 5). The vast majority of genes DE during one stage are not DE at other stages (Figures 4, 5). However, 892 DE genes are common at the late stage of development for the corolla and androecium and the gynoecium, although the patterns of expression may differ among these genes (Figures 4, 5). Analyses of hierarchical clustering also illustrate that a large number of genes that are DE during the late stage are not DE during the earlier stages (Figure 4).
The pluralities of the Gene Ontology (GO) terms were identified from Glycine max (L.) Merr., Vitis vinifera L., Populus trichocarpa Torr. & A. Gray ex Hook., and Citrus sinensis Osbeck, although GO terms identified from species throughout the angiosperms were represented in the annotation. The top-hit species were primarily members of Lamiidae, including Coffea canephora Pierre ex A. Froehner, Erythranthe guttata (DC.) G. L. Nesom, Solanum tuberosum L., and Solanum lycopersicum L., as well as V. vinifera. A total of 65,535 GO terms were identified for the floral transcriptome of L. multiflorum. GO terms were overrepresented during all stages of development, and only in the early and mid-stage of the development of the corolla and androecium were no GO terms underrepresented ( Table 4). During early and mid-development, most of the GO terms that differed in frequency between DE genes and the transcriptome as a whole were overrepresented in the DE gene set, whereas during late development, a greater number were underrepresented in the DE gene set (Table 4).
A summary of the most over-and underrepresented GO terms is presented in Table 5. Many more genes are DE during late floral development compared to during early and mid-floral development (Figures 4, 5), regardless of the type of analysis ( Table 2). Early in development, gene ontology (GO) terms that are overrepresented, for Biological Processes (BP), Cellular Components (CC), and Molecular Function (MF), appear to be involved in growth. For example, GO terms associated with membrane fusion, actin cytoskeleton organization, cell tip, cell pole, site of polarized growth, and actin binding are overrepresented early in development in the corolla and androecium of the SS morph (Tables 5, 6). During late development, more genes are DE compared to early development, and the genes that are DE differ between the two stages. Later in development, many DE genes appear to be involved in physiological functions, which can include selfand intramorph-incompatiblity and response to stress. Indeed,  Positive values represent genes more highly expressed in long-style morph, and negative values signify genes more highly expressed in short-style morph.
it is worth noting that response to stress is the only GO term overrepresented throughout the developmental stages in all investigated organs when the conditional algorithm of GOstats was not employed, and this GO term was one of only three BP GO terms overrepresented when the condition algorithm was used ( Table 6). As with the patterns of DE genes in general, more of those involved in response to stress are expressed later in development than earlier in development. While specific GO terms related to distinct responses to stress, such as those related to the regulation of particular hormones, were not found to be over-or underrepresented, some genes that respond to hormones known to be involved in response to stress (e.g., ethylene and abscisic acid) were DE during various stages of development (Table 7). During late development, the BP and MF GO terms overrepresented, in both the corolla and androecium and the gynoecium, may be involved in SI, including cell   wall organization, cytoskeletal protein, and others (Tables 5, 6). Additionally, overrepresented CC GO terms tend to be either extracellular or toward the periphery of the cell, which is not the case for overrepresented CC GO terms early in development, which are more internal to the cell or involved in growth, such as an overrepresentation of plant-type vacuole membrane and intracellular organelle during early development but not latter development (Tables 5, 6). Genes putatively involved in these functions and locations are overexpressed in the corolla and androecium of the LS morph and in the gynoecium of the SS morph.

Floral Transcriptome of Lithospermum Multiforum
The present study demonstrates the utility of transcriptomics to identify differentially expressed genes in flowers, throughout development, of a heterostylous species. The transcriptome of L. multiforum includes 97,603 putative genes and 265,144 isoforms of these genes. In comparison to other floral transcriptomes of heterostylous species, this is a much larger number of genes. For example, Fagopyrum, Primula, and each of two species of Eichhornia Kunth express ca. 25,400, 55,000, and 20,000-24,000 genes, respectively (Logacheva et al., 2011;Ness et al., 2011;Zhang et al., 2013). Additionally, the transcriptome of L. multiflorum is greater than that of the only other species of Boraginaceae, Echium wildpretii H. Pearson ex. Hook.f., with an assembled transcriptome, which includes ca. 58,500 genes and 69,500 transcripts (White et al., 2016). The number of putative genes and transcripts of E. wildpretii is greater than that of the other aforementioned species as well, suggesting the possibility that the species of Boraginaceae may have large transcriptomes. The large number of putative genes of the L. multiflorum transcriptome could be due to a variety of different factors, including the complexity of the transcriptome of the species, the  inclusion of multiple stages of development in the construction of the transcriptome, paralogs due to an ancient polyploidy event in the genus (N = 14, Ward and Spellenberg, 1988), or technical issues, such as the assembly of incomplete contigs due to solely using transcriptome, not genome, data. Additionally, the flowers used in the present study were collected at different stages of development from wild populations, which could result in the expression of genes that may not have been expressed under more controlled conditions. Therefore, particular genes, such as those involved in stress, may not be expressed if conditions are more stable for the plants. However, it should be noted that all of the plants used in the present study were growing under similar conditions (including between years), even if these were in the native environment of the species, which should result in a negligible impact on DE solely due to environmental factors.

Bowtie and eXpress Criteria
The present study employed different mapping criteria and parameters for quantifying reads mapped. Unsurprisingly, the more stringent mapping criteria employed in Bowtie (Langmead et al., 2009) resulted in fewer reads mapped, and subsequently fewer DE genes identified using edgeR (Robinson et al., 2010). In general, the use of effective counts, in eXpress (Roberts and Pachter, 2013), resulted in the identification of fewer DE genes, as did the addition of length information. It is notable that using multiple criteria can result in different numbers of DE genes identified. Indeed, the most conservative methods employed found ca. 10% of the DE genes identified with the more lenient approaches. By utilizing different criteria, it was possible to identify a suite of genes DE across multiple analyses, which may be a useful approach for determining genes to target for future research, such as real-time PCR and in-situ hybridizations, to better identify patterns of expression among related homostylous and hetersotylous species of Lithospermum and Boraginaceae.

Differential Expression of Genes Throughout Development
During early development, genes overexpressed in the corolla and androecium of the SS morph have been demonstrated to be involved in cell division and actin dynamism, and this includes particular genes that are DE, such as myosin family protein with dil domain and pleckstrin homology domain-containing family A member 8 (Lehman et al., 1996;Berg et al., 2000). In contrast, genes overexpressed in the developing gynoecium of the LS morph, such as n-acyltransferase hls-1 like, have been shown to control cell growth via auxin regulation (Table 3) (Lehman et al., 1996;Berg et al., 2000;Peremyslov et al., 2010). Together, this pattern of expression suggests that different genes are involved in the elongation and growth of various organs in the developing flower bud. Later in development, DE genes and GO terms have been demonstrated to be involved in pollen-specific  responses, reproduction, and cell signaling, all of which may play a role in SI. During this stage of development, DE genes include pollen allergen ole e 6-like protein, ole e 1-like protein, pollenspecific protein sf3-like, and protein-tyrosine kinase 2-betalike (Table 3), while overrepresented CC GO terms comprise extracellular region, cell wall, plasma membrane ( Table 6) (Baltz et al., 1992a,b;Hubbard and Till, 2000;Staiger and Franklin-Tong, 2003;Jiménez-López et al., 2011;Lopez-Casado et al., 2012). This pattern of expression demonstrates a shift in the function of DE genes between early and late development. The small number of DE genes involved in growth expressed early in development, along with the larger number involved in physiological processes expressed later in development, provide evidence that the morphological components of heterostyly (i.e., herkogamy) are controlled by a smaller number of genes than those involved in the physiological aspects of the breeding system (i.e., SI). This suggests that fewer changes in patterns of expression would need to develop in order to result in shifts in herkogamy compared to changes in SI, which is consistent with the model for the evolutionary development proposed by Lloyd and Webb (1992) and supported by phylogenetic and developmental data in Lithospermum (Cohen, 2011;Cohen et al., 2012), hypothesizing that reciprocal herkogamy developed prior to SI in the genus. Furthermore, some heterostylous species in Boraginaceae (e.g., Oreocarya flava A. Nelson) have SI that is not fully established (Casper, 1985), so the morphological components would have arisen without the origin of SI.
Along with the disparity in the number of DE genes at different stages of development, there is an asymmetry present during late development in the pattern of expression of DE genes in floral organs. In the corolla and androecium, the majority of DE genes are overexpressed in the long-style (LS) morph, while in the gynoecium, the situation is reversed, with the majority of the DE genes overexpressed in the short-style (SS) morph (Figures 3-5). While this pattern is reversed during early development, it is certainly not as pronounced owing to the much smaller number of DE genes during earlier stages of development (Figures 3-5). This late-development asymmetry is notable because it has been demonstrated that SI does not necessarily act in the same manner in each morph in heterostylous species (Schou and Philipp, 1984). This has not been studied in L. multiflorum, but given the different patterns of expression in the floral organs of the two morphs, this may be the case, with the pollen from the LS morph interacting with the gynoecium of SS morph in a different way than that of the pollen of SS morph with the gynoecium of LS morph.
The mid-stage of development differs from the early and late stages of development in two manners-a much smaller number of DE genes identified and low intramorph Pearson correlations. Fewer than 25 DE genes were identified for the midstage of development, with fewer than 20 for any of the individual analyses (Figure 4). This number is less than one third of the DE genes during early development and far smaller than the DE genes during late development (Table 3). This could be due to the mid-stage not being reflective of a distinct biological stage of development. There may be an early stage that establishes morphological differences, followed by a later stage in which SI develops. Therefore, the use of a mid-stage of development, in the present study, may be artificial and include flowers at both the early and late stages of development, which could result in the lower Pearson correlations, as opposed to the high Pearson correlations for floral transcriptomes within the respective early  Lack of numbers in a cell represents lack of significant over-or underrepresentation. **Denotes P < 0.0001. and late stages of development. This variation may also result in fewer DE genes due to differences among the biological replicates of the flowers at the mid-stage. A limited number of studies have used comparative transcriptomics to identify genes involved in herkogamy and heterostyly, and it is notable that some of the genes identified as DE in other taxa are also DE in L. multiflorum. Ness et al. (2011) compared the floral transcriptomes of two species of Eichhornia that include selfing and outcrossing representatives. One hundred and forty-seven genes were upregulated in the flowers of the selfing individuals studied, with 10 genes identified as being involved in pollination or flower development. Ness et al. (2011) extracted RNA from flowers at various stages of development, but the RNA was pooled for sequencing. Of these 10 DE genes, two auxin response factors were identified, and this was also the case for early development in the flowers of L. multiflorum. During late development, six genes similar to those recognized as DE in Eichhornia were also DE in L. multiflorum, including a gibberellin-regulated protein, oligopeptide transporter 7-like, exocyst complex component, cellulose synthase-like protein d4-like, auxin-responsive protein, and a myb family transcription factor family protein. This suggests that there may be similar genes involved in floral organ height and length across the angiosperms because the taxa are quite distantly related, with Eichhornia being a monocot and L. multiflorum being a dicot. Landis et al. (2015) studied two species of Saltugilia (V. E. Grant) L. A. Johnson (Polemoniaceae), one with small corollas and one with large corollas. These authors recognized over 700 DE genes comparing mature flowers of the two species, with many DE genes involved in cellular functions.
In a recent study of the transcriptomes of flowers of heterostylous morphs, Nowak et al. (2015) compared two species of Primula, P. veris and P. vulgaris. These authors identified ca. 650 genes as DE in each of the two species throughout development, although their FDR-adjusted P-value threshold was 0.05, not 0.1, which was employed in the present study. Additionally, these authors utilized floral tissue from only one stage of development, immature floral buds 3 to 5 days prior to anthesis. Comparing the two species, Nowak et al. (2015) found 113 genes that were DE in both, with more overexpressed in the SS morph than in the LS morph. This pattern of DE is similar to the situation in the gynoecium of L. multiflorum. The study of Primula focused, in part, on the identification of the S-locus, the supergene controlling the heterostylous condition. The genes identified as being linked to the S-locus in P. veris, including a GLOBOSA homolog expressed only in the SS morph, were not DE in the present study, and it should be noted that while the S-locus appears to strongly influence heterostyly in Primula (Fornoni and Domínguez, 2015;Nowak et al., 2015) its presence has yet to be identified in Lithospermum and other species of Boraginaceae. Therefore, the S-locus may control the development of the breeding system in Primula, but other genes and molecular interactions may govern the development of heterostyly in Lithospermum and its relatives. As further genomic information becomes available, it will be possible to more carefully and critically examine differences among heterostylous species.
Some of the genes that have previously been identified as involved in flower size [reviewed in Krizek and Anderson (2013)] were also DE between morphs of L. multiflorum. During early development of the gynoecium, cytochrome p450 93a1-like, which encourages increase in flower size (Anastasiou et al., 2007), was upregulated in the LS morph, and this was also the case for two auxin-response proteins and one auxin-induced protein. These types of genes have been demonstrated to increase floral organ size in other plants (Hu et al., 2003;Varaud et al., 2011). During late development, growth-regulating factors were upregulated in the corolla and androecium of the LS morph, and mutations in these genes have resulted in smaller petals (Kim et al., 2003). This is notable because the flowers of the LS morph of most heterostylous species, including L. multiflorum (Cohen et al., 2012), are smaller than those of the SS morph (Cohen, 2010).
In the corolla and androecium of the SS morph and in the gynoecium of the LS morph, the transcription factor tcp14-like is upregulated (Table 3). TCP transcription factors are known to regulate growth either through promotion or repression of growth (Martín-Trillo and Cubas, 2010). Given the pattern of expression of this TCP transcription factor, it is hypothesized that in L. multiflorum the putative transcription factor tcp14like is involved in promoting growth, which would result in the larger corollas of the SS morph and the longer styles of the LS morph. Additionally, various genes involved in responding to auxin are DE, with most of these genes upregulated in the larger organs of the corolla and androecium of the SS morph and of the gynoecium of the LS morph. In P. veris, Huu et al. (2016) identified a gene, CYP734A50, that is a putative brassinosteroid-degrading enzyme that regulates style length. In L. multflorum, six genes, including brassinosteroidregulated protein bru1, are involved in the brassinosteroid biosynthetic pathway. These genes are more highly expressed in the corolla and androecium of the LS morph and the gynoecium of the SS morph. This pattern is the opposite of that for auxin suggesting that this hormone may be involved in the development of smaller flowers and floral organs. Given the distinct differences in herkogamy between floral morphs of heterostylous species and the importance of herkogamy in influencing pollination biology, genes involved in controlling floral organ height and length should be considered speciation genes (Rieseberg and Blackman, 2010). Mutations in these genes, such as auxin response factors (Hu et al., 2003;Varaud et al., 2011), may result in an allogamous species becoming autogamous or requiring a different pollinator to effectively transfer pollen between the anthers and stigmas of flowers that bear organs at different heights. As the function of the DE genes in the present study is further elucidated, particularly for genes also DE in flowers of other species, the importance of these genes on speciation will become evident, which will be aided by additional floral transcriptomes of species exhibiting variation in herkogamy.

Identification of DE Genes Involved in Response to Stress
While the number of DE genes, between morphs, involved in response to stress could be due to the collection of plants from wild populations, it seems likely that given the biological replicates and the consistency among them as well as similar climate and weather patterns among collections and between years, the variation in stress response between morphs has a biological basis. The two morphs may have different genes expressed in response to stress or one morph may have a stronger response to stress than the other. Some of the DE genes involved in stress response are influenced by hormones known to be involved in this type of process, including ethylene and abscisic acid, and these genes include various ethylene-responsive transcription factors, abscisic acid receptor pyr1-like, and others (Thirugnanasambantham et al., 2015;Wang et al., 2015) (Table 7). Future research can investigate variation in hormones or hormone response between morphs, which may influence the observed morph-specific responses to stress. Ralston (1993) provides evidence that the two morphs of L. multiflorum can respond differently to herbivory. She found that following herbivory individuals with longer styles increased seed production. Additionally, Ralston determined that, compared to the LS morph, the SS morph had significantly greater herbivory and seed production following herbivory. In a study on the heterostylous species Gelsemium sempervirens (L.) J. St.-Hil. (Gelsemiaceae), Leege and Wolfe (2002) provide evidence that floral herbivory is morph-specific. In G. sempervirens, the floral organs toward the apex of the floral tube receive a greater amount of herbivory, resulting in the styles of the LS morph and the stamens of the SS morph being more damaged than these same organs in the flowers of the other morph. These differences in herbivory may result in distinct responses to stress, which were observed in the present study. The different stress responses may be similar to contrasting SI responses between morphs.

FUTURE DIRECTIONS
Comparative transcriptomics provides a powerful tool for the identification of genes involved in heterostyly and the heights and lengths of floral organs, and this methodology allows for a variety of additional studies to be undertaken in L. multiflorum and other taxa. For example, in L. multiflorum it is now possible to examine patterns of variants of genes, including alternative splicing, not only genes that are DE between morphs. Additionally, investigating the patterns of DE genes throughout development of the flowers of the same morph can provide further insight into the molecular underpinnings of heterostyly within this species. An exciting area of research involves comparisons with other heterostylous species within the genus and family and across angiosperms as data collected can help identify whether a suite of genes involved in heterostyly is present in all heterostylous species [or all heterostylous species from a particular clade (e.g., Lamiidae)] or if new genes are recruited during each origin of heterostyly.

AUTHOR CONTRIBUTIONS
JC planned, designed, and implemented the experiment. He was involved with collecting plants, laboratory research, data analysis and interpretation, and writing and editing the manuscript.

ACKNOWLEDGMENTS
The author would like to thank Jeremy Coate and Caroline Kellogg for their invaluable help throughout the project. Olivier Fedrigo and Jean Qin-Qin at the Genome Sequencing & Analysis Core at Duke University were instrumental in the sequencing and assembly efforts. Veronica Moorman, Shannon Straub, two reviewers, and Verónica S. Di Stilio provided helpful comments on the manuscript. The project was supported by start-up funds from Texas A&M International University and Kettering University.