Overlapping Patterns of Gene Expression Between Gametophyte and Sporophyte Phases in the Fern Polypodium amorphum (Polypodiales)

Ferns are unique among land plants in having sporophyte and gametophyte phases that are both free living and fully independent. Here, we examine patterns of sporophytic and gametophytic gene expression in the fern Polypodium amorphum, a member of the homosporous polypod lineage that comprises 80% of extant fern diversity, to assess how expression of a common genome is partitioned between two morphologically, ecologically, and nutritionally independent phases. Using RNA-sequencing, we generated transcriptome profiles for three replicates of paired samples of sporophyte leaf tissue and whole gametophytes to identify genes with significant differences in expression between the two phases. We found a nearly 90% overlap in the identity and expression levels of the genes expressed in both sporophytes and gametophytes, with less than 3% of genes uniquely expressed in either phase. We compare our results to those from similar studies to establish how phase-specific gene expression varies among major land plant lineages. Notably, despite having greater similarity in the identity of gene families shared between P. amorphum and angiosperms, P. amorphum has phase-specific gene expression profiles that are more like bryophytes and lycophytes than seed plants. Our findings suggest that shared patterns of phase-specific gene expression among seed-free plants likely reflect having relatively large, photosynthetic gametophytes (compared to the gametophytes of seed plants that are highly reduced). Phylogenetic analyses were used to further investigate the evolution of phase-specific expression for the phototropin, terpene synthase, and MADS-box gene families.


INTRODUCTION
All land plants share a life cycle that alternates between a multicellular spore-producing phase (sporophyte) and a multicellular gamete-producing phase (gametophyte). However, it is in ferns that the manifestation of these two phases (also referred to as generations) as independent entities is most extreme (Niklas and Kutschera, 2010;Haufler et al., 2016). Fern sporophytes are typically long-lived, complex plants composed of roots, stems, and leaves (Figure 1). Fern gametophytes are considerably smaller (usually less than one centimeter long), possess fewer tissue types, and are often ephemeral, but are usually photosynthetic, and can sometimes persist independently for multiple growing seasons (Foster and Gifford, 1974;Sato, 1982;Raghavan, 2005;Watkins et al., 2007;Haufler et al., 2016;Pinson et al., 2017). Thus, fern sporophytes and gametophytes are distinct and separate organisms, varying in morphology, physiology, persistence, ecology, and usually ploidy, while sharing a common genome (Qui et al., 2012). The fern life cycle, with its two free-living phases, occupies a pivotal phylogenetic position in the spectrum of green plant life cycles, which ranges from the charophyte algae (i.e., the closest relative to land plants) life cycle that lacks a multicellular sporophyte phase to the angiosperm life cycle in which gametophytes have been reduced to a few cells (Figure 2; Kenrick and Crane, 1997a,b;Graham et al., 2000). Ferns, with their two freeliving phases, represent an ideal but under-utilized resource for addressing questions pertaining to the morphological and functional differences between generations in life cycle evolution (Whittier, 1971; Barker and Wolf, 2010;Der et al., 2011).
The phase-specific morphologies and functions of fern gametophytes and sporophytes, like those of all land plants, result largely from different gene expression patterns (Graham et al., 2000), with some genes uniquely expressed FIGURE 1 | The homosporous fern life cycle. Sporophyte tissues, which are usually but not necessarily diploid, are shown in green. Gametophyte tissues and spores, which are often but not necessarily haploid, are shown in brown. Spores are generated by meiosis in sporangia. Gametes, both eggs and sperm, are generated by mitosis in archegonia and antheridia, respectively. For simplicity, fertilization is depicted between an egg and sperm from the same gametophyte, but fertilization is also likely to occur between gametes from different gametophytes that are derived from the same or different sporophytes. an, antheridium; ar, archegonium; e, egg; gam, gametophyte; spa, sporangium; spe, sperm; spo, spore; sporo, sporophyte; z, zygote. Images are not to scale. in each phase. Previous isozyme, microarray, and RNA-Seq transcriptome profiling studies comparing gene expression between sporophytes and gametophytes in angiosperms and bryophytes uncovered a negative relationship between the morphological and functional divergence of the two phases and the similarity of their gene expression profiles (e.g., Tanksley et al., 1981;Gorla et al., 1986;Pedersen et al., 1987;Becker et al., 2003;Honys and Twell, 2003;Pina et al., 2005;Szövényi et al., 2011;Rutley and Twell, 2015). In general, the highly reduced male and female gametophytes of seed plants (the sister lineage to ferns) have reduced transcriptome profiles, expressing fewer and different genes than sporophyte tissues (Chettoor et al., 2014;Rutley and Twell, 2015). For example, in Arabidopsis thaliana, approximately one third of the genes expressed in sporophyte tissue are expressed in pollen, with approximately 10% of pollen-expressed genes being unique to that phase (Becker et al., 2003;Honys and Twell, 2003;Pina et al., 2005). In contrast, in the bryophyte Physcomitrella patens approximately 85% of genes are expressed by both the sporophyte and gametophyte (Ortiz-Ramírez et al., 2016). Similarly, a transcriptomic survey of Lygodium japonicum suggested that ferns may have more genes expressed in both gametophytes and sporophytes than do angiosperms (Aya et al., 2015).
Here, we evaluate the partitioning of gene expression between gametophyte and sporophyte phases in the wellstudied, homosporous fern species Polypodium amorphum Sukds. This diploid member of the Polypodium vulgare species complex (Sigel et al., 2014) belongs to the leptosporangiate order Polypodiales (henceforth, polypod ferns) that encompasses approximately 80% of extant fern species (PPG I, 2016). Polypodium amorphum embodies the cytological, morphological, and life history traits typical of the clear majority of ferns. It has a high base chromosome number (x = 37; Haufler et al., 1993), a large genome size (C-value = 11.5 pg; Murray, 1985), and perennial sporophytes whose spores produce cordate, photosynthetic gametophytes. We generated sporophyte-leaf and whole-gametophyte transcriptomes from three individuals of P. amorphum (each collected from independent populations) to characterize expression profiles for this iconic polypod fern and to investigate its phasespecific gene expression in the broader context land plant life-cycle diversity. Despite having greater overlap in genefamily identity with seed plants, we found P. amorphum to have phase-specific expression profiles like those of its more distant relatives-bryophytes and lycophytes. Our study supports the hypothesis that plants with relatively large, photosynthetic gametophytes will exhibit substantially more overlap in the identity and expression levels of genes in their gametophyte and sporophyte phases regardless of their phylogenetic relatedness to each other or whether they have gametophyte-dominant or sporophyte-dominant life cycles. In addition, we investigate the phototropin, terpene synthase, and Type II MADS-box gene families to determine whether phase-specific gene expression can inform our understanding of gene family function and evolution.
FIGURE 2 | Simplified phylogeny of the major clades of streptophyte plants, illustrating the gametophyte (colored brown) and sporophyte (colored green) phases for exemplar lineages. Charophyte algae have a multicellular gametophyte and a single celled sporophyte. All embryophytes, or land plants, have multicellular gametophytes and multicellular sporophytes. Synapomorphies are shown for the major clades.

Plant Material
We sampled sporophyte leaf material and whole gametophytes from three individuals of P. amorphum Suksd. (Supplementary  Table S1). Species determination followed Haufler et al. (1993) and Sigel et al. (2014). Living sporophytes were initially collected from wild populations and transported to the glasshouses at Duke University, Durham, NC, United States, where rhizomes were cleaned to remove soil and repotted in Farfard Mix 2 (Sun Gro Horticulture Canada Ltd., United States). Plants were maintained under common glasshouse conditions (photoperiod 18 h: 6 h, light: dark, with luminosity of 200-400 Umol sec −1 cm 2 ; 27-67% humidity; daytime temperature: 18.3-21.1 • C; nighttime temperature: 17.8-20.6 • C) for a minimum of 18 months prior to sampling for RNA extraction. Material from a single leaf was taken from each individual and flash frozen in liquid nitrogen 21 days after it had fully unfurled but before sporangia had developed.
Spores from each sporophyte individual of P. amorphum were collected and cleaned with bleach solution, as described in Hoshizaki (1975). For each individual, four replicated cultures of gametophytes were grown from spores in a modified Knop's liquid medium to which 0.1% glucose was added (Miller and Miller, 1961;Smith and Robinson, 1969) under controlled conditions (photoperiod 12 h: 12 h, light: dark; light source: Philips 392282 40W Plant and Aquarium linear fluorescent bulb). At 55 days, a subset of each gametophyte culture was visually inspected under 100 × magnification to confirm the absence of gametangia. The gametophyte replicates for each individual, representing a range of pre-gametangial developmental stages, were then combined and flash frozen in liquid nitrogen. Leaf material and gametophyte samples were maintained at −80 • C until RNA extraction.

RNA Extraction, cDNA Library Construction, and Sequencing
For each of the three collections of P. amorphum, two separate RNA extractions were performed-one from 70 to 100 mg of sporophyte leaf material and one from 50 mg of whole gametophytes-using the Spectrum Plant Total RNA kit (Sigma-Aldrich, United States). A total of six mRNA libraries were constructed using the TruSeq RNA Sample Prep Kit (Illumina, United States), with the protocol modified to produce indexed, strand-specific, 100-bp paired-end libraries following Borodina et al. (2011). Equimolar amounts of the libraries were pooled prior to sequencing on two lanes of Illumina HiSeq 2000 (Illumina, United States) at the Duke Center for Genomic and Computation Biology. After sequencing, reads from each library were sorted by index, and the number and quality of reads from each sample were evaluated with FastQC v. 3-5-2012 (Andrews, 2010). Adapter sequences and low-quality reads were removed using Trimmomatic v. 0.30 (parameters: PE -ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:8:true HEADCROP:10 TRAILING:20 SLIDINGWINDOW:10:30 MINLEN:30; Bolger et al., 2014). All reads that passed a second round of quality filtering using FastQC v. 3-5-2012 (Andrews, 2010) were used in de novo transcriptome assembly and gene expression analyses.

De novo Reference Transcriptome Assembly and Annotation
Because there are no published genomes of polypod ferns, we constructed a de novo reference transcriptome for P. amorphum from the filtered, paired-end reads generated from all six cDNA libraries using Trinity v. r20140413 (Grabherr et al., 2011) on the Duke Shared Cluster Resource. A strand-specific protocol (RF) was used with a fixed k-mer size of 25 and a maximum insert size of 800 bp. All reads were then mapped back to the initial reference transcriptome using Bowtie v. 1.0.1 (Langmead et al., 2009) and quantified with RSEM v. 1.2.14 (Li and Dewey, 2011) using the wrapper script align_and_estimate_abundance.pl (Grabherr et al., 2011). Transcripts (i.e., gene isoforms) representing less than 1% of the per gene (i.e., subcomponent in Trinity) expression level were discarded as possible artifacts of the transcriptome assembly algorithm (Grabherr et al., 2011;Haas et al., 2013).
To restrict differential gene expression analyses to protein coding genes, we translated transcripts using ESTScan, a program that detects coding regions and corrects sequencing errors that cause frameshifts (Iseli et al., 1999). To identify and remove nonplant contaminant sequences, we queried the resulting protein sequences against the RefSeq non-redundant peptide database (nr; release 09/2015; O' Leary et al., 2016) using BLASTP v. 2.2.30 (e-value:1e-5, max_target_seqs: 5;Camacho et al., 2009). General taxonomic information (bacteria, fungi, Homo sapiens, Viridiplantae, other Metazoan, virus, and other) was retrieved for the five best subject sequences for each reference transcript from the NCBI Taxonomy database (Federhen, 2012). A transcript was identified as a non-contaminant (i.e., plant) if the best subject sequence and ≥ 50% of the best five subject sequences belonged to Viridiplantae. Transcripts with hits to nr but not meeting these criteria were identified as contaminants. Transcripts without hits to nr were designated as "no hit, " representing potential fern-specific genes. Non-contaminant and "no hit" sequences were included in the final reference transcriptome that was used for subsequent differential-expression (DE) analyses. The completeness of the final P. amorphum transcriptome assembly was assessed by quantifying the presence of universal single copy orthologs using BUSCO software and the Embryophyta odb09 dataset under the trans setting (Simão et al., 2015). For comparative purposes, a previous published transcriptome of L. japonicum gametophyte and sporophyte tissue (Aya et al., 2015) was also subjected to BUSCO analysis. The final P. amorphum reference transcriptome was annotated to orthogroup (i.e., narrowly defined gene lineage) using the global PlantTribes gene family classification (Wall et al., 2008). Briefly, a Markov Cluster algorithm as implemented in OrthoMCL (Li et al., 2003) was used to cluster transcripts with all predicted protein coding sequences from 22 land plant genomes, representing bryophytes (P. patens), lycophytes (Selaginella moellendorffii), and a diversity of angiosperms (including Amborella trichopoda, Oryza sativa, and A. thaliana). Functional annotations for each transcript were obtained by cross referencing its assigned orthogroup number with a previously compiled set of TAIR (Berardini et al., 2015), PFAM (Finn et al., 2016), and Gene Ontology (GO) annotation terms (The Gene Ontology Consortium, 2015).

Quantification and Functional Classification of Phase-Specific Gene Expression Profiles
For each of the six P. amorphum libraries (three sporophyte, three gametophyte), all reads that mapped to transcripts identified as contaminant sequences were removed from the read files. All remaining reads were remapped to the final reference transcriptome using Bowtie v. 1.0.1 (Langmead et al., 2009) as implemented through the Trinity package v. r20140413 (Grabherr et al., 2011) using parameters specifically for pairedend, strand-specific reads (Haas et al., 2013). Maximum likelihood (ML) read abundances were calculated for each gene using RSEM v. 1.2.13 (Li and Dewey, 2011). The Trinity package script abundance_estimates_to_matrix.pl (Haas et al., 2013) was used to generate a matrix of read abundances of all genes for the six libraries. This matrix was filtered to exclude genes that were very weakly expressed (i.e., not represented by at least ten reads in at least three libraries).
Differential expression (DE) analyses among samples were performed using the DESeq2 package in R v. 3.1.0 (Love et al., 2014;R Core Team, 2015), and incorporated a multifactor block design to account for the paired sporophyte and gametophyte samples (∼ phase + individual). The BH method was used to adjust p-values to account for false discovery of DE genes (Benjamini and Hochberg, 1995;Robinson and Oshlack, 2010). The number of DE genes between the phases was assessed at adjusted p-values (padj) of 0.1, 0.02, 0.01, 0.002 and 0.001, and at log2-fold change (FC) in expression of 2, 4, and 6. DE genes were assigned to one of four categories: gametophytespecific; gametophyte up-regulated (i.e., more highly expressed in the gametophyte than in the sporophyte); sporophyte-specific; or sporophyte up-regulated (i.e., more highly expressed in the sporophyte than in the gametophyte). A gene was identified as gametophyte-specific if it met the padj and FC threshold criteria and had a mean normalized count of < 1.0 for the three sporophyte samples. Similarly, a gene was determined to be sporophyte-specific if it met the padj and FC threshold criteria and had a mean normalized count of < 1.0 for the three gametophyte samples.
DE genes having padj ≤ 0.002, log2FC ≥ 2, and associated GO terms were subjected to GO enrichment analyses using a Fisher's exact test as implemented in goatools 1 . GO terms associated with genes belonging to the four aforementioned categories of DE were compared to GO terms for all genes. A GO term was determined to be under-or over-represented using a false discovery rate p-value threshold (FDR) of 0.05. Fold enrichment was calculated for each gene by dividing the proportion of DE genes assigned a particular GO term by the proportion of all genes assigned to that GO term. Fold enrichment values > 1 indicate GO terms that are enriched or over-represented. Fold enrichment values < 1 indicate GO terms that are purified or under-represented.

Phylogenetic Analyses of Phase-Specific Expression for Exemplar Gene Families
To assess phase-specific gene expression in the context of enriched GO terms and orthogroup evolution across land plants, three gene families-phototropin, terpene synthases (TPS), and Type II MADS-box-were selected for further phylogenetic analyses. The annotated P. amorphum reference transcriptome was mined for transcripts assigned to orthogroups for each gene family (see Supplementary Table S2; phototropin: orthogroup 571; terpene synthase: orthogroups 94, 716, 3728, 14080, 27466;Type II MADS-box: orthogroups 278, 1109, 1536, 2959. Prior to phylogenetic analyses, P. amorphum transcripts for each gene family were aligned with transcripts and genomic sequences from preexisting publications and public databases (phototropin: all sequences from Li et al., 2015; terpene synthase: 184 plant, microbial, and fungal TPS sequences downloaded from the Pfam database, Finn et al., 2016; Type II MADS-box: sequences from Kofuji and Yamaguchi, 1997;Münster et al., 1997Münster et al., , 2002Hasebe et al., 1998;Kwantes et al., 2011;Gramzow and Theissen, 2013;Huang et al.;, as well as with transcripts from the Lygodium japonicum Transcriptome Database (Aya et al., 2015). A nucleotide alignment for the phototropin dataset and amino acid alignments for the TPS and Type II MADS-box datasets were compiled in Mesquite v. 3.10 ( Maddison and Maddison, 2016). For the phototropin alignment, indels resulting from the addition of P. amorphum and L. japonicum transcripts, presumably corresponding to unalignable regions excluded from the published alignment of Li et al. (2015), were removed. For the TPS alignment, only the conserved C-terminal domain was included in the final alignment to facilitate the analysis of TPS sequences across disparate taxonomic groups. Similarly, alignment of the Type II MADS-box sequences was restricted to the relatively conserved SRF-like MADS-box and K-box domains.
PartitionFinder v. 1.0.1 (Lanfear et al., 2012) was used to determine the appropriate model of nucleotide evolution for the phototropin alignment (partitioned by codon position with the GTR + I + G model independently applied to each position). PartitionFinderProtein v. 1.0.1 (Lanfear et al., 2012) was used to identify the optimal model of peptide evolution for the TPS alignment (VTF) and the Type II MADS-box alignment (LG + G models for the SRF-like MADs-box domain and the JTT + G model for the K-box domain). 1 https://github.com/tanghaibao/goatools For each gene family alignment, a maximum likelihood (ML) tree search was conducted using RAxML (Stamatakis, 2006) with eight independent runs from different randomaddition starting trees. Node support was assessed with 1000 bootstrap replicates. MrBayes v. 3.2.5 (Ronquist et al., 2012) was used for Bayesian inference (BI), with four MCMC runs of four chains each and trees sampled every 1000 generations for a total of 10 million generations. The same partitions and evolutionary models as for the ML analysis were employed, with unlinked substitution parameters and the rate priors varying among subsets. Tracer v. 1.6 (Rambaut et al., 2014)

The Polypodium amorphum Transcriptome
After sequencing and quality filtering, 41-61 million read pairs and 4-6 million single-end reads were obtained for each of the six transcriptome libraries (Supplementary  Figure S1). Raw Illumina reads are deposited in NCBI's BioProject database 2 (BioProject: PRJNA341394, BioSamples: SAMN05721156, SAMN05721260-2, SAMN05721283-4). BUSCO assessment indicates that 71.4% of universal single-copy Embryophyta orthologs are represented as complete sequences, with another 3.1% represented as partial sequences in the P. amorphum transcriptome (Supplementary Figure S2). By comparison, 53.6% of universal single-copy orthologs are represented as complete sequences and 7.2% are represented as partial sequences in the L. japonicum transcriptome. To the best of our knowledge, the P. amorphum transcriptome is the most complete transcriptome assembly of combined fern gametophyte and sporophyte tissue to date.
Annotation resulted in 62,305 (41.5%) transcripts, representing 31,831 (33.2%) genes, with significant similarity (e-value < 1e −8 ) to at least one of 10,292 orthogroup HMM profiles present in the PlantTribes classification (Figure 3 and Supplementary Figure S3; Wall et al., 2008). A summary of orthogroup numbers and annotation terms associated with P. amorphum transcripts is provided in Supplementary Table S2. FIGURE 3 | Taxonomic summary of the annotated Polypodium amorphum reference transcriptome. Of the 98000 total genes, 63925 were not assigned to an orthogroup in the Plant Tribes classification (Wall et al., 2008). All remaining genes were assigned to an orthogroup with a predicted protein in the genome of at least one angiosperm species, the lycophyte Selaginella moellendorffii, and/or the bryophyte Physcomitrella patens.

Identification and Functional Classification of Phase-Specific Gene Expression Profiles
For each sample, 20-31 million reads were mapped back to the final reference transcriptome (Supplementary Table S6). After filtering to exclude very weakly expressed genes, 35,169 genes (including both those with and without annotation to the Plant Tribes orthogroup classification) were retained for the DE analyses (Supplementary Table S7). Using thresholds of FC ≥ 2 and a padj < 0.002, 3447 genes (9.8%) were supported as differentially expressed between sporophyte leaf tissue and whole gametophytes. These thresholds were selected to encompass the mean FC value (2.86) and minimize false detection, thereby providing a conservative estimate of differentially expressed genes (Supplementary Figures S4-S6), which were divided into four groups: 607 genes (17.6%) are gametophyte-specific; 1518 (44.0%) are up-regulated in the gametophyte; 194 (5.6%) are sporophyte-specific; and 1128 genes (32.7%) are upregulated in the sporophyte (Figure 4 and Supplementary Figure S6).
Despite a nearly 90% overlap in gene identity expressed in both the sporophyte and gametophyte phases of P. amorphum, each phase does have a distinctive transcription profile. This pattern is strikingly illustrated in Figure 5, which depicts the hierarchical clustering of samples by total expression values for all 35,169 genes (log2-normalized reads) and Euclidean distance of total gene expression between samples. Gametophyte samples and sporophyte samples were divided into two mutually exclusive groups due to gametophyte samples having more genes exhibiting extremely high and less variable gene expression than the sporophyte samples (Figure 4). Hence, global expression patterns-incorporating information from both gene identity and expression level-are more strongly dictated by phase rather than by the unique identity of the P. amorphum individual.
Of the genes included in the DE analysis, 18,886 genes had associated GO Gene Ontology (GO) terms and were used as the population gene set for GO enrichment analyses. Four sample gene sets, comprised of DE genes (FC ≥ 2, padj ≤ 0.002) with associated GO terms were used for enrichment analyses: 230 gametophyte-specific genes, 773 gametophyte up-regulated genes, 86 sporophyte-specific genes, and 669 sporophyte upregulated genes. Using a FDR p-value threshold of 0.05, seven GO terms were identified as enriched or over-represented in gametophyte-specific genes and 105 GO terms were purified or under-represented (Supplementary Table S8). For the gametophyte up-regulated genes, 16 GO terms were enriched and 59 GO terms were under-represented. For the sporophytespecific genes, no enriched GO terms were recovered, but 16 GO terms were under-represented. For the sporophyte-specific genes, 18 GO terms were enriched and 173 GO terms were under-represented. Table 1 summarizes the most highly enriched GO terms in the four categories of differentially expressed genes.

Phylogenetic Analyses of Phase-Specific Expression for Exemplar Gene Families
Our phylogenetic reconstructions of the phototropin, terpene synthase, and Type II MADs-box gene families largely confirm previously published phylogenies. For each gene family, the identities of P. amorphum transcripts were putatively determined based on their inclusion in well-supported, previously characterized clades (e.g., seed-plant type terpene synthases; Li et al., 2012).

Phototropin
The topology and node support for our best ML tree (likelihood score = −47805.189182) was largely congruent with that of Li et al. (2015), with well-supported sister clades comprising fern sequences of phototropin 1 (fern PHOT1) and phototropin 2 (fern PHOT2; Figure 6). Two P. amorphum transcripts, putatively representing isoforms of a single gene, and two L. japonicum transcripts are well-supported as nested within the fern PHOT1 clade and are up-regulated in the gametophyte-phase. Similarly, two P. amorphum transcripts and one L. japonicum transcript are well-supported as nested within the fern PHOT2 clade.

Terpene Synthase
Consistent with the results of Li et al. (2012), the topology for the best ML tree (likelihood score = −42727.713859) divides sequences into two large, well-supported clades-one comprising primarily of seed plant TPS sequences, and a second comprising primarily of microbial and fungal TPS (MTPSL) sequences (Figure 7). Polypodium amorphum and L. japonicum transcripts were nested within both clades, with most fern TPS sequences shown to be more closely related to sequences from the lycophyte S. moellendorffii than to sequences from seed plants, fungi, or microbes (Figure 8). These results suggest that both P. amorphum and L. japonicum have and express both plant-type and microbial/fungal-type TPS, many of which are specific to or up-regulated in the gametophyte phase.

MADS-Box Genes
The topology of our best ML tree (likelihood score = −46202.290775; Figure 8) is largely congruent with previously published Type II MADS-box gene phylogenies (Kwantes et al., 2011;Gramzow and Theissen, 2013), with non-fern sequences united into clades corresponding to 16 MIKC * and MIKC C Type II MADS-box genes. Five gametophyte up-regulated transcripts of P. amorphum fern sequences were nested within the MIKC * clade, either as sister to sequences derived from the fern Ceratopteris or as sister to the moss P. patens. Most remaining P. amorphum and L. japonicum sequences were united in three clades dispersed among, but not closely related to, 15 clades of MIKC C genes. Both P. amorphum and L. japonicum expressed numerous phase-specific or phase-upregulated MIKC C transcripts.

Polypodium amorphum Transcriptome Suggests That Polypod Ferns Show Greatest Overlap in Gene Identify With Angiosperms
Genomic resources for ferns have lagged far behind other plant lineages, but are currently experiencing a burst of expansion with the publication of numerous transcriptomes and the first genome assemblies (reviewed in Sessa and Der, 2016, as well as Matasci et al., 2014;Frank et al., 2015;Grusz et al., 2016;Tomei and Wolniak, 2016;Wang et al., 2016;Zhang et al., 2016;Grossman et al., 2017;You et al., 2017;Li et al., 2018). The P. amorphum reference transcriptome is a crucial addition because it includes biological replicates and marks the first survey of gene expression in both sporophyte and gametophyte tissue of a polypod fern, a group accounting for ca. 80% of extant fern diversity (The Pteridophyte Phylogeny Group, 2016). Such resources are critical to a full understanding of plant life cycle evolution, as well as to the study of gene family evolution and phylogenomics among ferns and across land plants.
Using the Plant Tribes classification (Wall et al., 2008), approximately 33% of the genes represented in the P. amorphum reference transcriptome can be assigned to gene families also present in the genomes of at least one angiosperm taxon, the lycophyte S. moellendorffii, and/or the bryophyte P. patens. Approximately 22% of the genes belong to families present in all three plant lineages (Figure 3). While these results suggest that most genes expressed in P. amorphum are poorly characterized and may be fern specific, they also broadly support the notion that gene family identity across all land plants is substantially conserved (Nishiyama et al., 2003;Bowman et al., 2007;Floyd and Bowman, 2007;Rensing et al., 2008;Banks et al., 2011;Szövényi et al., 2011;Pires et al., 2013). Despite the broad conservation of gene families across land plants, P. amorphum shares more orthogroups with angiosperms than it does with lycophytes or bryophytes (Figure 3). Nearly 5% of genes expressed in P. amorphum belong to orthogroups that are putatively unique to euphyllophytes (i.e., those present in ferns and seed plants, but absent in lycophytes and bryophytes). Der et al. (2011) recovered a similar pattern of shared orthogroups when considering just the gametophyte transcriptome of the fern Pteridium aquilinum. This greater overlap in gene family identity between ferns and seed plants likely reflects their sister relationship and the legacy of morphological innovations from their most recent common ancestor, including determinate lateral branching systems in regular, spiral phyllotaxis and increased complexity of shoot vasculature (Figure 2; Kenrick and Crane, 1997a;Galtier, 2010). Indeed, comparative proteomics suggests that the evolution of traits specific to euphyllophytes may have required the evolution of up to three times more new genes than the transition from gametophyte-dominant to sporophyte-dominant life cycles (Banks et al., 2011). Most notably, the greater overlap in gene family identity between ferns and angiosperms is an interesting counterpoint to our finding that P. amorphum has phase-specific gene expression profiles that are more like bryophytes and lycophytes rather than seed plants (see discussion below). FIGURE 6 | Phylogram of the best maximum likelihood tree of phototropin sequences. Shading indicates taxonomic specific phototropin clades as described in Li et al. (2015). Red and yellow circles represent single transcripts from the ferns Polypodium amorphum and Lygodium japonicum, respectively, that are up-regulated in the gametophyte phase. Node support values are as indicated in the inset legend. MLBS, maximum likelihood bootstrap value; BIPP, Bayesian inference posterior probability.

Phase Specific Gene Expression in Polypodium amorphum Mirrors That of Gametophyte-Dominant Lineages
Perhaps the most striking result of this study is the significant overlap in the identity and expression levels of the genes expressed in fern sporophyte and gametophyte tissues. Despite having distinct global transcriptomic profiles (Figure 5), 97.7% of genes were expressed in both phases, with 90.2% demonstrating no statistically supported differences in expression level between the two phases. Put differently, less than 10% of the genes surveyed are responsible for the differences between the transcriptomic profiles of the phases and, potentially, the morphological and functional differences between P. amorphum sporophyte leaf tissue and gametophytes. Furthermore, less than 3% of genes are uniquely expressed in either the gametophyte or the sporophyte phase.
To date, expectations as to the degree of gene expression overlap between phases have been based primarily on studies of seed plants, which, despite differences in methodologies, generally suggest that the transcription profiles of gametophytes have reduced complexity and unique compositions relative to those of sporophytes. Many of the earliest studies employed a limited number of isozyme markers to compare expression between sporophyte tissues and pollen (microgametophytes) across a wide range of species, including Lycopersicon (Tanksley et al., 1981), Populus (Rajora and Zsuffa, 1986), and Zea (e.g., Gorla et al., 1986;Stinson et al., 1987;Willing et al., 1988). Depending on the species, between 65 and 70% of isozyme markers were expressed in both phases, but pollen always expressed fewer isozymes than sporophyte tissue. Subsequently, microarray and RNA-Seq studies investigating the embryo sac (megagametophyte) have further demonstrated that the transcription profiles of angiosperm gametophytes have a distinct and smaller set of genes relative to those of sporophytes (e.g., Arabidopsis: Becker et al., 2003;Honys and Twell, 2003;Pagnussat et al., 2005;Pina et al., 2005;Schmid et al., 2005;Steffen et al., 2007;Wang et al., 2010;Wuest et al., 2010;Glycine: Haerizadeh et al., 2009;Oryza: Wei et al., 2010;Ohnishi et al., 2011;Vitis: Fasoli et al., 2012;Zea: Yang et al., 2006;Davidson et al., 2011). For example, Chettoor et al. (2014) estimated that mature pollen and embryo sacs of Zea expressed only 53 and 88%, respectively, of the number of genes expressed in the sporophyte seedling. However, of the genes expressed in the mature pollen and embryo sacs approximately 10 and 18%, respectively, were unique to that tissue. In contrast, only 8.4% of genes expressed were specific to the Zea sporophyte seedling.
Several high-throughput transcriptome sequencing and microarray studies of the mosses Funaria hygrometrica and P. patens reported substantially less phase-biased gene expression than recovered in angiosperms (Szövényi et al., 2011;O'Donoghue et al., 2013;Ortiz-Ramírez et al., 2016). At one extreme, Szövényi et al. (2011) found ca. 96% overlap in the identity and expression levels of genes expressed by the two phases of F. hygrometrica, with less than 1% of genes uniquely expressed in the sporophyte or the gametophyte. At the other extreme, Ortiz-Ramírez et al. (2016) reported that 85% of genes are expressed by both phases of P. patens with as many as 10% of genes unique to the gametophyte.
A transcriptome profile study of the schizaeoid fern L. japonicum recovered an approximately 85% overlap in genes expressed in sporophyte and gametophyte tissues (Aya et al., 2015). While the percentage of genes expressed in both phases of L. japonicum is similar to our findings in P. amorphum, the direction of bias of phase-specific expression was the opposite: L. japonicum exhibited more sporophyte-specific genes, whereas P. amorphum exhibited more gametophyte-specific genes. Our findings in P. amorphum are surprising given the perceived structural and functional complexity of fern sporophytes relative to gametophytes. This may, in part, reflect our experimental design. Some genes presenting as gametophyte-specific could also be expressed in sporophyte tissues not sampled here (i.e., rhizomes or roots). Sampling additional sporophyte tissues, as well as varying growth conditions, could potentially decrease the number of gametophyte-specific genes, as well as increase the number of sporophyte-specific genes.
In general, we find that phase-biased gene expression in the polypod fern P. amorphum is intermediate between that reported for Funaria and Physcomitrella/Lygodium, while also being quite different from what has been recovered for angiosperms (e.g., Chettoor et al., 2014). This suggests that common life history traits may dictate the overlap in phase-specific expression, regardless of phylogenetic relatedness or life phase dominance. Despite life cycles being gametophyte-dominant in bryophytes and sporophyte-dominant in ferns (Figure 2), both lineages have relatively large, photosynthetic gametophytes when compared to the diminutive, non-photosynthetic gametophytes of seed plants. In addition to being photosynthetic, the gametophytes of bryophytes and most homosporous ferns are responsible for a diverse array of functions including indeterminate cell growth, cell division, rhizoids, and maintenance of existing tissue (Brandes, 1973). In contrast, the gametophytes of seed plants, especially angiosperms, are often reduced to a few cells, are embedded in and nutritionally dependent on the parent sporophyte, and have increased specificity of function primarily restricted to gamete production and/or dispersal (Friedman, 1993;Grossniklaus and Schneitz, 1998). Studies of gametophyte expression patterns in other independently evolved heterosporous lineages, such as the heterosporous water ferns (Salviniales) and heterosporous lycophytes, are necessary to discern whether endospory, the reduction in gametophyte size, and increased functional specificity result in greater distinction between phase-specific expression profiles (Bateman and DiMichele, 1994).
The extensive overlap in gene expression between sporophyte and gametophyte phases of bryophytes and polypod ferns (relative to seed plants) is in line with the antithetic hypothesis that the sporophyte evolved from a haplontic ancestor by mitotic division of the zygote (Bower, 1908;Bennici, 2008). Substantial overlap in the number and identity of genes expressed in the sporophyte and gametophyte phases of both mosses and ferns supports the hypothesis that the basic orthologous genes and gene networks necessary for sporophyte function were present in the gametophyte of the common ancestor of land plants (Pires and Dolan, 2012). Hence, the substantial overlap in phase-specific gene expression in mosses and ferns FIGURE 8 | Cladogram of the best maximum likelihood tree of land plant Type II MADS-box sequences. Specific MADS-box genes, as described by Kwantes et al. (2011) and Gramzow and Theissen (2013), are indicated by black lines or along branches. Colored lines indicate the taxonomic identity of particular sequences (see inset legend). Colored dots and text describes the number of transcripts that are phase-specific or phase up-regulated in the ferns Polypodium amorphum (red) and Lygodium japonicum (yellow). Node support values are provided at Figshare (doi: 10.6084/m9.figshare.c.4229141).
potentially reflects the ancestral condition in land plants, with the enhanced differences in phase-specific gene expression seen in angiosperms reflecting a derived condition. Differences in the phase-specific transcriptome profiles of angiosperms likely reflect the combined effects of the elaboration of the sporophyte and the extreme reduction and protection of the gametophyte.

Phase-Specific Expression for Three Gene Families: Phototropins, Terpene Synthases, and Type II MADS-Box
Transcriptomic studies and associated GO enrichment analyses are useful for investigating the intersection of phase-specific and tissue-specific gene expression patterns with an understanding of gene function and the evolution of gene families in the broader context of land plant evolution. Studies adopting such an approach in ferns are rare. By identifying a suite of overrepresented GO terms and associated genes for gametophyte and sporophyte tissues in P. amorphum (Table 1 and Supplementary  Table S8), we provide a tool for investigators interested in the regulation of genes involved in specific functional differences between fern gametophytes and sporophytes. We integrate phylogenetics and expression data from the polypod fern P. amorphum, as well as the schizaeoid fern L. japonicum, to assess phase-specific expression of three exemplar gene families associated with many highly enriched GO terms (Supplementary Table S9). All three have recently received significant attention for their occurrence in seed-free plants or for their importance in plant development: phototropins, terpene synthases, and Type II MADS-box genes (e.g., Li et al., 2015;Jia et al., 2016;Wilhelmsson et al., 2017).

Phototropins
Phototropins are photoreceptors that regulate plant physiological responses to blue light, including the accumulation of chloroplasts and phototropism (Christie, 2007). They are found in all green plants, but repeatedly underwent independent duplication events in the moss, lycophyte, fern, and seed plant lineages . Functional analyses and knock-out studies of phototropin genes in Arabidopsis and Physcomitrella suggest that the divergent copies of phototropin have undergone sub-functionalization, with one paralog mediating chloroplast accumulation under low-light conditions (e.g., Atphot1 in A. thaliana; Luesse et al., 2010) and the other mediating chloroplast avoidance under high-light conditions (e.g., Atphot2; Kagawa et al., 2001). In the fern Adiantum capillus-veneris, Acphot2 has been implicated in regulating chloroplast avoidance under intense light and in cold temperatures (Kagawa et al., 2004;Kodama et al., 2008). The role of phot1 in ferns is unknown.
Both the sporophyte and gametophyte phases of P. amorphum and L. japonicum express transcripts of fern phot1 and fern phot2, but fern phot1 is significantly up-regulated in the gametophytes of both ferns (Figure 6). While GO enrichment analysis suggests that fern phot1 is broadly involved in the regulation of cellular biosynthetic and macromolecular processes (Supplementary  Tables S8, S9), similar patterns of expression in both fern species suggest a conserved function for phot1 among all ferns and hint at its possible role. Light availability is a limiting factor for fern gametophyte success, with an inverse relationship observed between light intensity and gametophyte mortality (Watkins et al., 2007). Furthermore, Donaher and Partanen (1971) demonstrated that relatively high intensity blue light is required for proper gametophyte development and growth in Pteridium aquilinum, and they hypothesized that "the ultimate form of the gametophyte is under the control of a photoreceptor that is sensitive to blue light and activated at a relatively low energy level." These ecological and developmental studies of fern gametophytes, combined with evidence for the repeated subfunctionalization of phototropin paralogs in other plant lineages, suggest that fern phot1 may mediate low-light physiological responses and may have an important role in facilitating proper gametophyte development under low-light conditions. Additional transcriptomic, physiological, and knock-out studies on gametophytes across the fern tree of life, will be important for determining the function of fern phot1 in both gametophytes and sporophytes.

Terpene Synthases
Terpene synthases (TPS) are enzymes involved in the formation of terpenoid secondary metabolites, specialized compounds that have a myriad of roles in plant-environment interactions, such as protection against photooxidative stress and defense against herbivory (Tholl, 2006;Chen et al., 2011). In addition to plants many bacteria and fungi produce microbial terpene synthaselike (MTPSL) genes, although they differ in their primary protein structures (Li et al., 2012;Yamada et al., 2015). Jia et al. (2016) demonstrated the production of both plant-type and MTPSL genes in non-seed plants, suggesting that MTPSL genes may have been present in the common ancestor of all land plants, potentially as the result of horizontal gene transfer from a microbe or fungus, but then lost in seed-plants.
Our analysis provides strong support for the expression of both types of terpene synthases in the gametophytes and sporophytes of P. amorphum and L. japonicum, with both ferns expressing a greater diversity of MTPSL than plant-type terpene synthases (Figure 7). Additionally, the majority of MTPSL genes have expression that is specific-to or up-regulated in fern gametophytes, as reflected in the substantial enrichment (11-fold) of the "terpene synthase activity" GO term for genes up-regulated in the gametophyte phase (Table 1 and Supplementary Tables  S8, S9). This suggests they may be important for gametophyte function or defense in non-seed plants more broadly, most of which have relatively large, photosynthetic gametophyte phases. Our results are consistent with a hypothesis that MTPSL were the primary enzymes to make mono-and sesquiterpenes in early land plants (lineages with gametophyte-dominant life cycles) and that the evolution of plant-type TPS led to the eventual loss of MTPSL genes in seed plants (a lineage with highly reduced and enclosed gametophytes; Jia et al., 2016). While a few studies have demonstrated the anti-herbivory and allelopathic effects of terpenes in several fern genera such as Pteridium and Adiantum (Melos et al., 2007;Santos et al., 2010), we are the first to report phase-specific or up-regulated expression of either plant-type TPS and MTPSL genes in ferns, as well as to suggest a possible connection between the evolution of terpene synthases and the function of the gametophyte phase.

MADS-Box
MADS-box genes comprise an ancient family of transcription factors regulating development in animals, fungi, and plants (where they are most diverse). In plants, Type II MADSbox (MIKC) genes are strongly correlated with the origin and diversification of angiosperm reproductive structures, such as flowers and ovules (Ambrose, 2010). However, MIKC genes are also documented in gymnosperms, ferns, lycophytes, bryophytes, and streptophyte (charophyte) algae suggesting that the transcription factors regulating the development of flowers and other organs were recruited from homologous genes present in the common ancestor of nearly all lineages of green plants (Münster et al., 1997;Wilhelmsson et al., 2017). Almost all documented MIKC genes in ferns are reported from Ophioglossum (Münster et al., 2002), Ceratopteris (Kofuji and Yamaguchi, 1997;Münster et al., 1997;Hasebe et al., 1998), and Dryopteris fragrans (Huang et al., 2014), most of which are known to be expressed in both sporophytic and gametophytic tissues. Previous phylogenetic analyses suggest that MIKC genes in ferns and seed plants largely belong to distinct, deeply divergent clades, reflecting multiple duplications and losses of MIKC genes following the divergence of the fern and seed plant lineages ca. 466 MYA (Magallón and Hilu, 2009). One exception is the MIKC * clade, for which sequences from Ceratopteris are supported as sister to each of two monophyletic clades of seed plant sequences, known as the S-and P-subclades (Kwantes et al., 2011). Notably, MIKC * genes are preferentially expressed in the gametophytes of lycophytes, mosses, angiosperms, and the fern Ceratopteris, suggesting that they have a conserved function necessary for the development of plant gametophytes (Zobell et al., 2010;Kwantes et al., 2011).
Phylogenetic and phase-specific expression studies of MADSbox genes for the polypod fern P. amorphum and the schizaeoid fern L. japonicum greatly expand the known diversity and taxonomic representation of this gene family. Most of the P. amorphum and L. japonicum transcripts are united in four fern-specific monophyletic clades along with sequences from Ceratopteris, Ophioglossum, and Dryopteris (Figure 8). As with previous phylogenetic studies, these fern-specific MADSbox clades (with the exception of the MIKC * clade) are not supported as closely related to most of the 16 previously characterized angiosperm-specific MIKC genes (as summarized in Gramzow and Theissen, 2013). All but one included transcript of P. amorphum and L. japonicum are expressed in both sporophyte and gametophyte tissues (with varying degrees of preferential expression biases; Figure 8), which is in line with the idea that MADS-box genes have less specific roles in the control of cell differentiation and development in ferns than in seed plants (Theissen et al., 2000), and reflects their association with enriched GO terms for broadly defined cellular biosynthetic and metabolic processes ( Table 1 and Supplementary Tables S8, S9).
As in Ceratopteris, several P. amorphum transcripts belong to two clades that are sister to the S-and P-MIKC * subclades comprising angiosperm sequences (Figure 8). This sister relationship suggests an ancient gene duplication of MIKC * genes in the ancestor of ferns and seed plants. Some S-and P-type MIKC * transcripts of Polypodium exhibit preferential expression in gametophyte tissue (Figure 8). The retention of S-and P-MIKC * sequences in ferns and angiosperms, as well as their preferential expression in gametophyte tissues, lends further support for the ancient and conserved functions of both S-and P-MIKC * for regulation of gametophyte development in euphyllophytes (Kwantes et al., 2011;Liu et al., 2013). Notably, our phylogenetic analyses recovered two P. amorphum transcripts nested within a third MIKC * clade of sequences derived from the moss P. patens. These transcripts may represent a previously unreported "bryophyte-type" of MIKC * gene in ferns (Kwantes et al., 2011). Additional MADS-box sequences and phase-specific expression data from diverse ferns are necessary to test this hypothesis, and will be a boon for understanding the diversification of this important gene family across the land plant tree of life.

CONCLUSION
Our study reveals substantial overlap in the identity and expression levels of transcribed genes in the gametophyte and sporophyte phases of the fern P. amorphum, a member of the homosporous polypod lineage that includes 80% of extant fern diversity. Our results echo those from comparable studies of mosses and lycophytes, but are distinct from studies of seed plants, in which gametophytes have expression profiles that show reduced complexity but unique composition relative to sporophytes. By contextualizing phase-specific expression in P. amorphum within the diversity of land plant life cycles, we conclude that plants with relatively large, photosynthetic gametophytes exhibit substantially more overlap in the identity and expression levels of genes in their gametophyte and sporophyte phases regardless of whether they have gametophytedominant or sporophyte-dominant life cycles. Furthermore, substantial overlap in phase-specific gene expression potentially reflects an ancestral condition in land plants, with the enhanced differences seen in angiosperms reflecting a derived condition. By integrating phylogenetic analyses with phase-specific expression information, we complement previous work on the evolution and function of phototropins, terpene synthases, and Type II MADS-box genes, with an emphasis on their putative roles in fern gametophytes.

AUTHOR CONTRIBUTIONS
EMS and JD designed this study, completed the data analysis, and interpreted the results. All cultivation of plant material and laboratory work was performed by EMS, with the exception of construction of the Illumina libraries which were provided by C. dePamphilis and P. Ralph, as acknowledged. The manuscript was drafted by EMS, with significant edits and intellectual contributions by ES, KP, and JD.

FUNDING
The project was funded by a NSF Dissertation Improvement Grant to KP and EMS (DEB-1110775) and a Smithsonian Institution Peter Buck Postdoctoral Research Fellowship to EMS.