SMRT Sequencing Revealed Mitogenome Characteristics and Mitogenome-Wide DNA Modification Pattern in Ophiocordyceps sinensis

Single molecule, real-time (SMRT) sequencing was used to characterize mitochondrial (mt) genome of Ophiocordyceps sinensis and to analyze the mt genome-wide pattern of epigenetic DNA modification. The complete mt genome of O. sinensis, with a size of 157,539 bp, is the fourth largest Ascomycota mt genome sequenced to date. It contained 14 conserved protein-coding genes (PCGs), 1 intronic protein rps3, 27 tRNAs and 2 rRNA subunits, which are common characteristics of the known mt genomes in Hypocreales. A phylogenetic tree inferred from 14 PCGs in Pezizomycotina fungi supports O. sinensis as most closely related to Hirsutella rhossiliensis in Ophiocordycipitaceae. A total of 36 sequence sites in rps3 were under positive selection, with dN/dS >1 in the 20 compared fungi. Among them, 16 sites were statistically significant. In addition, the mt genome-wide base modification pattern of O. sinensis was determined in this study, especially DNA methylation. The methylations were located in coding and uncoding regions of mt PCGs in O. sinensis, and might be closely related to the expression of PCGs or the binding affinity of transcription factor A to mtDNA. Consequently, these methylations may affect the enzymatic activity of oxidative phosphorylation and then the mt respiratory rate; or they may influence mt biogenesis. Therefore, methylations in the mitogenome of O. sinensis might be a genetic feature to adapt to the cold and low PO2 environment at high altitude, where O. sinensis is endemic. This is the first report on epigenetic modifications in a fungal mt genome.

Single molecule, real-time (SMRT) sequencing was used to characterize mitochondrial (mt) genome of Ophiocordyceps sinensis and to analyze the mt genome-wide pattern of epigenetic DNA modification. The complete mt genome of O. sinensis, with a size of 157,539 bp, is the fourth largest Ascomycota mt genome sequenced to date. It contained 14 conserved protein-coding genes (PCGs), 1 intronic protein rps3, 27 tRNAs and 2 rRNA subunits, which are common characteristics of the known mt genomes in Hypocreales. A phylogenetic tree inferred from 14 PCGs in Pezizomycotina fungi supports O. sinensis as most closely related to Hirsutella rhossiliensis in Ophiocordycipitaceae. A total of 36 sequence sites in rps3 were under positive selection, with dN/dS >1 in the 20 compared fungi. Among them, 16 sites were statistically significant. In addition, the mt genome-wide base modification pattern of O. sinensis was determined in this study, especially DNA methylation. The methylations were located in coding and uncoding regions of mt PCGs in O. sinensis, and might be closely related to the expression of PCGs or the binding affinity of transcription factor A to mtDNA. Consequently, these methylations may affect the enzymatic activity of oxidative phosphorylation and then the mt respiratory rate; or they may influence mt biogenesis. Therefore, methylations in the mitogenome of O. sinensis might be a genetic feature to adapt to the cold and low PO 2 environment at high altitude, where O. sinensis is endemic. This is the first report on epigenetic modifications in a fungal mt genome.

INTRODUCTION
Ophiocordyceps sinensis (syn. Cordyceps sinensis) is an entomopathogenic fungus that infects larvae of Hepialidae ghost moths to form a parasitic complex called "DongChongXiaCao" in Chinese (Commission, 2015). "DongChongXiaCao" is a Traditional Chinese Medicine treatment that has been used for 1000s of years for respiratory, renal, liver and cardiovascular diseases, hyposexuality and hyperlipidemia (Zhu et al., 1998;Yue et al., 2013). O. sinensis is endemic to alpine regions on the Tibetan Plateau, with 3000 m as the lowest altitude for the distribution . This fungus is rare in natural resources because of its strict host-specificity, limited geographical distribution and overexploitation in recent decades .
The mitochondrion is a cellular organelle that is required for respiratory metabolism and ATP production. In the mitochondrion, ATP synthesis is catalyzed by five mitochondrial inner membrane-bound enzyme complexes (Complexes I-V) (Saraste, 1999;Chandel and Schumacker, 2000). Complexes I-V are NADH-ubiquinol oxidoreductase, succinate-ubiquinol oxidoreductase, ubiquinol-cytochrome C oxidoreductase, cytochrome C oxidase, and ATP synthase, respectively. The mt genome encodes for seven subunits of Complex I (NAD1-6, NAD4L), one subunit of Complex III (COB), three subunits of Complex IV (COX1-3), and three subunits of Complex V (ATP6, ATP8, and ATP9) (Saraste, 1999). In addition to the genes encoding OXPHOS proteins, two rRNA subunits and a set of tRNA genes are included in the fungal mt genome. Gene content is highly conserved in mitochondria of various fungi, but some characteristics, such as gene order, tRNA gene clusters, intergenic regions, mobile elements and introns, can be highly variable (Ghikas et al., 2006;Aguileta et al., 2014;Losada et al., 2014;Mardanov et al., 2014). As of December 2016, there were 179 Ascomycota mt genomes submitted to GenBank, and Sclerotinia borealis (NC_025200, 203,051 bp) had the largest mt genome (Mardanov et al., 2014) 1 . Due to its small size, fast evolution, high copy number, and relatively conserved gene content, the mt genome has been successfully used in evolutionary biology and phylogeny (Joardar et al., 2012).
The present understanding of the mt epigenome has gone through a series of evolutions, and conflicting data about epigenomes existed in the 1970s and early 1980s due to the uneven distribution of methylation patterns and low sensitivity of detection methods (Castegna et al., 2015). A few studies have shown mt methylation while others have reported the absence of any methylation in the mitochondrion (Cummings et al., 1974;Dawid, 1974;Groot and Kroon, 1979;Pollack et al., 1984). The advent of next generation sequencing technologies has provided opportunities for characterizing mt genome-wide cytosine methylation. Ghosh et al. (2014) reported the first genome-wide map of human mitochondrial methylation. Ghosh et al. (2016) revealed that it was dynamic in nature of the hydroxymethly cytosine marking in the human mitochondrial genome. However, little trace of DNA modification on mtDNA is detected in fungi. The current methodologies for epigenomics, based on bisulfite, enrichment, or sequencing and PCR, impede the processing of the epigenetic study due to the limits in the resolution, modification type, locus and cost (Mensaert et al., 2014).
Single molecule, real-time sequencing has created significant progress in read length and sequencing of base modification 1 https://www.ncbi.nlm.nih.gov/genomes/GenomesGroup.cgi?opt=organelle& taxid=451864 sites (Flusberg et al., 2010;Feng et al., 2013). SMRT sequencing can produce read lengths of up to 40 kb and direct sequencing of base modification. The most important characteristics of SMRT sequencing are that SMRT sequencing (1) could be free from amplification biases and (2) easily sequenced the complex regions containing secondary structure in low coverage (Koren et al., 2013). DNA modifications across the genome could be directly determined by the kinetics of DNA synthesis, because SMRT sequencing monitors the processing of single DNA molecules by DNA polymerase (Eid et al., 2009;Feng et al., 2013). The biggest concern in SMRT sequencing is the relatively high error rate, which can be effectively decreased through multiple sequencing passes. Long reads and a circular consensus sequencing strategy make SMRT convenient for: de novo assembly of mitochondria, chloroplasts, and microbial genomes, especially for complex genomes; characterizing genomic structural variations; and analyzing targeted sequencing regions or modification sites (Carneiro et al., 2012;Grad et al., 2012;Chin et al., 2013;Feng et al., 2013;Koren et al., 2013).
In the current study, we sequenced the mt genome of wild fungus O. sinensis using SMRT sequencing technology on a PacBio RS II sequencing platform. We described the gene content and genomic organization of this high-altitude fungus, and performed a comparative analysis with the sequenced Hypocreales mt genomes. The main focus in this study is on the mitogenome-wide epigenetic DNA modification pattern of O. sinensis. This is the first report on epigenetic modifications in a fungal mt genome, and it will provide the basis for further research on mt epigenetics of Hypocrealean fungi.

Sample Collection
Sample of the fungus O. sinensis (CCTCC AF 2017003) was from the fruiting body of the wild O. sinensis. Fresh specimens were purchased in a local market in Guoluo of Qinghai Province, China (Latitude 34.48 • N, Longitude 100.23 • E). Governmental permission is not required for O. sinensis purchases in local markets, and the collections of O. sinensis specimen sold by local farmers fall under the governmental regulations for traditional Chinese herbal products. All fresh O. sinensis specimens were washed thoroughly on site in running water with gentle brushing, soaked in 75% ethanol for 10 min for surface sterilization and washed again three times with sterile water. These samples were snap-frozen in liquid nitrogen immediately after sampling, transported to our laboratory by using the solid carbon dioxide, and stored at −80 • C until further use. The wild O. sinensis was preliminarily identified according to the morphological characteristics and several nuclear loci, including internal transcribed spacer region (ITS), small and large 18S nuclear ribosomal RNA subunits (nrSSU, nrLSU) and translation elongation factor 1-alpha (TEF-1α). These four genes (GenBank accession number: MF403011, MF403012, MF403013, MF425658) all showed the highest homology to O. sinensis ().

DNA Extraction and Genome Sequencing
Total DNA was extracted from the stroma of O. sinensis by a modified CTAB method as previously described (Kang et al., 2011). The integrity, quality and concentration of total DNA were analyzed by agarose gel electrophoresis, NanoDrop 1000 spectrophotometer and Qubit fluorometer. DNA was randomly sheared to fragments with an average size of 20 kb by using g-TUBE. Sheared DNA was then DNA damage repaired and end repaired. SMRTbell templates were obtained by ligating the blunt hairpin adapters to the ends of the repaired fragments, followed by the addition of exonuclease to remove failed ligation products. Before annealing the sequencing primer and binding polymerase to SMRTbell templates, the quality of library was assessed by an Agilent 2100 Bioanalyzer High Sensitivity Kit. Eight SMRT cells were sequenced using P6-C4 reagents on a PacBio RS II sequencing platform (Pacific Biosciences, Nextomics Biosciences, Co., Ltd, Wuhan).

Assembly of the Mitochondrial Genome
Clean data were obtained by filtering out the sequencing adapters and low-quality sequences (parameters: minimum sub-read length = 500 bp; minimum polymerase read quality = 0.80). The mt sequences were extracted from the filtered reads containing both nuclear and mt genomes, using BLASR which matches each read against 201 published fungal mitochondrial genomes (Chaisson and Tesler, 2012). About 2815 subreads in 25.4 Mb mt sequencing data were obtained, with an average read length of 8993 bp and a longest read length of 39,358 bp, reaching an average depth of 167 X. The mt genome was assembled through Hierarchical Genome Assembly Process (HGAP) workflow, including preassembly, error correction, Celera assembly and polishing with Quiver (Chin et al., 2013). Long reads were selected to be "seed" reads, which the other subreads were blast against to improve accuracy. The corrected reads were retained and fully assembled by using overlap-layout-consensus (OLC) algorithm in the Celera Assembler program (Myers et al., 2000), and then further refined with Quiver (Chin et al., 2013).

Mitochondrial Genome Annotation
Genes in O. sinensis mt genome were predicted by MFannot, RNAweasel 2 and BLASTn against NCBI Organelle Genome Resources 3 (Altschul et al., 1990;Lang et al., 2007;Valach et al., 2014). The PCGs and rRNA genes were identified by MFannot and checked by BLASTn against Hypocreales (Altschul et al., 1990), while the tRNA genes were checked using RNA weasel. Intron-exon boundaries of the PCGs were adjusted manually on the basis of BLASTn against multiple Hypocreales mt coding sequence (Altschul et al., 1990). The nucleotide sequences of PCGs were translated to protein sequences using the Mold, Protozoan and Coelenterate Mitochondrial code (transl_table = 4). Open reading frames (ORFs > 100 bp) in the intergenic and intronic regions were predicted by MFannot (Valach et al., 2014). Predicted ORFs were analyzed by InterProScan 4 . The mitochondrial genome map was generated with Circos software (Krzywinski et al., 2009)

Analysis of Repetitive Sequences
Local BLASTn search of mtDNA against itself was performed using a cut-off e-value of 10 −7 (Altschul et al., 1990). Repetitive sequences were analyzed by several programs, including REPuter, Tandem Repeats Finder and MIcroSAtellite (Benson, 1999;Kurtz et al., 2001;Thiel et al., 2003). REPuter was applied to identify the forward, reverse, complement and palindromic sequences. Tandem Repeats Finder was used to find tandem repeats while MIcroSAtellite detected the microsatellite DNA (1-6 bp).

Methylation Modification
Pacific Biosciences' SMRTPortal analysis platform v. 1.3.1 was used to identify modified positions. At each genomic position, modQVs were computed as the −10 log (P-value) for representing a modified base position, based on the distributions of the kinetics of interpulse durations (IPD ratios) from all reads covering this position and from in silico kinetic reference values (Feng et al., 2013). A value of 20 is the minimum default threshold and corresponds to a P-value of 0.01. DNA methylation on both mtDNA strands was assessed independently and represented by modQV, which comprises base incorporation rates differing from that of the unmodified reference sequences. The RS_Modification_and_Motif_Analysis.1 protocol of the SMRT analysis v2.0 was used to identify methylation and the corresponding motifs of the responsible DNA methylases.

Phylogenetic Analysis
A phylogenetic tree was inferred by the ML method using the nucleotides of 14 concatenated PCGs (atp6, atp8, atp9, cox1-3, nad1-6, nad4L, cob) in mt genome. The 20 fungal mt genomes downloaded from NCBI were shown in Supplementary Table S1. Multiple sequence alignment was performed using the MAFFT program (Katoh and Standley, 2013). Poorly aligned positions and gap positions were removed with Gblocks (Castresana, 2000). Three species belonging to Glomerellales (Colletotrichum acutatum, Colletotrichum lupini) and Eurotiales (Penicillium polonicum) were used as outgroup taxa in the phylogenetic analysis. The best model used in the ML phylogenetic tree was determined by using "find best DNA/protein models (ML)" in MEGA 7.0 (Kumar et al., 2016). The "GTR + G" model produced the lowest values for both the Bayesian Information Criterion and the corrected Akaike information criterion, therefore it was chosen for phylogenetic analysis. The phylogenetic tree was constructed using RAxML 8.0.19 with 500 bootstrap replicates (Stamatakis, 2006). BI analyses were processed with 1,000,000 generations and four chains (one cold and three hot chains), with sampling every 500 generations and a burn-in of 25% (Ronquist and Huelsenbeck, 2003). The confidence values of the BI tree were shown as Bayesian posterior probabilities in percentages. The NJ phylogenetic tree based on 14 PCGs or rps3 was constructed by MEGA 7.0 and the evolutionary distances were computed using the Tajima-Nei method.
For rps3 analysis, the outgroup P. polonicum was changed to Penicillium nordicum to avoid a large number of gaps and to get more genetic information. Because rps3 gene in P. polonicum contained a premature termination codon, resulting a shorter aa sequence (Kang et al., 2016). The dN, dS, and the ratio (dN/dS) of each rps3 sequence with the reference of P. nordicum were calculated using CODEML (RateAncestor = 1) in PAML v4.7a (Yang, 1997;Yang and Nielsen, 2000). Sequence alignment was performed for rps3 genes in Hypocreales (length from 398 to 544 amino acids) using ClustalW version 2.0.12 (Sievers et al., 2011). The positive selection pressure on rps3 was detected using CODEML implemented in PAML v4.8 (Yang and Nielsen, 2002;Yang et al., 2005;Yang, 2007;Lin et al., 2015). Excluding the ambiguous sequences, the rps3 gene sequences were represented by 322 consensus aa sequences. The models of M0 (one-ratio), M1a (neutral), M2a (selection), M7 (beta), and M8 (β and ω) were used to calculate the dN/dS values.

Characteristics of the O. sinensis mt Genome
The complete mt genome of O. sinensis was a circular doublestranded DNA molecule with 157,539 bp in length (Table 1 and Figure 1, GenBank accession number: KY622006), which is similar to the O. sinensis mt genome (157,510 bp) reported by Li et al. (2015) and much larger than the other fungi in the same order Hypocreales ( Table 1). The size variation of mt genome was caused by the length of intergenic regions, and the number, length of introns and accessory genes ( Table 1) (Deng et al., 2016). Although the mtDNA size is in large variation (25,615-157,539 bp), the gene content slightly differed in Hypocreales ( Table 1). There were 14 PCGs, 1 rps3, 27 tRNAs, and 2 rRNA subunits encoded in the mt genome. All protein and RNA coding genes were located on positive strand and orient clockwise (Figure 1 and Table 2). In addition, there were 73 ORFs coding putative proteins in the O. sinensis mt genome (Table 1 and  Supplementary Table S2).
In the O. sinensis mt genome, the 14 PCGs included 7 NADH dehydrogenases (nad1-nad6, nad4L), 3 cytochrome c oxidases (cox1-cox3), 3 ATP synthases (atp6, atp8, atp9), and 1 cytochrome b gene (cob), for encoding proteins involved in respiratory chain complexes. In addition, a rps3 gene which encodes 40S ribosomal protein S3 was found in the intron of rnl (rnl_I8). The majority of the PCGs were split by several introns into multiple short exons (Figure 1 and Supplementary  Table S3). The introns (n = 54) covered ca. 67.64% of the whole mt genome, with the highest number in cox1 gene (n = 14). The intergenic regions have a total length of 29,350 bp, accounting for 18.63% of the whole genome length ( Table 1). The whole A+T content of the mt genome was 69.8% (Table 1), consistent with the characteristic of AT-rich in the Hypocreales fungal mt genome, ranging from 69.0 to 73.5% ( Table 1).
All PCGs began with a canonical start codon ATG or GTG ( Table 2) and most terminated with the stop codon TAA, except cox1, cox3, and rps3. As revealed in many fungi, the ATG initiation codon of nad5 followed immediately after the termination codon of nad4L, with an overlap of a base "A"; the genes of nad2 and nad3 were uninterrupted (Kouvelis et al., 2004;Shen et al., 2015). The codon frequency analysis showed that a total of 63 codons were used for transcription, with the absence of CGC (Supplementary Table S4). The six most frequently used codons (TTA, ATA, TTT, AAT, TAT, and AAA) reflect the biased usage of A/T nucleotides (Supplementary Table S4). The fraction of codons encoding the hydrophobic amino acids (Met, Trp, Phe, Val, Leu, Ile, Pro, Ala, accounting for 42.59%) (Supplementary Table S4) could explain the hydrophobic nature of respiratory membrane complexes.
A total of 27 tRNA genes were identified in the mt genome of O. sinensis coding for all 20 amino acids (Table 3), alleviating the need for tRNA import into the mitochondrion from the cytoplasm (Kolesnikova et al., 2000). The presence of tRNA-W recognizing the UGA codon indicates that the O. sinensis mt genome is translated according to genetic code 4 (Fox, 1987). As a unique characteristic of Hypocreales mtDNAs, most of the tRNA genes in O. sinensis were organized into three clusters with minor differences, except for five tRNAs scattered as a single gene across the mt genome (Figure 1). The single tRNA genes were suggested to play a role in transcription or recombination events (Saccone et al., 2002).

Phylogenetic Analysis of O. sinensis
A ML phylogenetic tree of 20 taxa was inferred using 14 conserved PCGs associated with the OXPHOS system. O. sinensis was most closely related to Hirsutella rhossiliensis, with a bootstrap value of 100% (Figure 2), and these species then formed a sub-cluster with H. minnesotensis and Tolypocladium ophioglossoides. The cluster of these four fungi is consistent with the traditional classification that they are all in Ophiocordycipitaceae. As Lin et al. (2015) found, all entomopathogenic fungi formed a cluster in this ML phylogenetic tree, including the fungi in Clavicipitaceae (Metarhizium and Metacordyceps), Cordycipitaceae (Cordyceps, Beauveria, and Lecanicillium), and Ophiocordycipitaceae (Hirsutella, Ophiocordyceps, and Tolypocladium), while three Fusarium plant pathogens were clustered together. The topology of the ML tree agreed with that based on BI (Supplementary Figure S1), but differed slightly from the topology obtained using the NJ method (Supplementary Figure S2). In the NJ tree, T. ophioglossoides was clustered with Metacordyceps chlamydosporia and Metarhizium anisopliae, and the fungi in Cordycipitaceae branched earlier than that in the ML tree. The ML phylogenetic relationship inferred from the mt genome is principally consistent with the current Pezizomycotina taxonomic system (Taxonomy Database in NCBI).

Introns
Notable in the O. sinensis mtDNA, it is the high degree of invasion by mobile DNA-elements (group I and group II introns). The number of introns exhibits remarkable variation in fungal mt genomes (Burger et al., 2003). The largest number of mt introns was documented for O. sinensis (n = 54) in Hypocreales in our analysis, while M. chlamydosporia has only one intron ( Table 1) (Lin et al., 2015). In O. sinensis, 45 introns were found in 11 PCGs and 9 in 2 rRNA subunits, accounting for 84.17% (89,725 bp) and 15.83% (14,805 bp in rnl, 2025 bp in rns) of the whole intron length, respectively. As shown in other fungi, introns were abundant in cox1, rnl, and cob genes (Lang et al., 2007), containing 14, 8, and 6 introns, respectively, accounting for 94.82, 75.11, and 92.32% of each gene ( Table 2). To these, cox2, and nad5 should also be included since 93.18 and 81.15% of each gene were covered by introns, respectively. In contrast, several genes, such as nad3, nad4, and atp8, were intronless in O. sinensis.
BLASTx and RNAweasel results showed that 46 Group I introns and 6 Group II introns (Supplementary Table S3) were included in the O. sinensis mt genome, with two short introns failure to classify. Group I introns are reported to be dominant in fungal mt genomes, while group II introns are found more frequently in plant mt genome (Lang et al., 2007). Group I introns encode various HE genes with LAGLIDADG or GIY-YIG domain motifs, while group II introns mostly encode RTs (Lang et al., 2007). The motifs of HE genes and RTs in introns catalyze the transfer and site-specific integration of the introns into various genes (Lang et al., 2007).

Open Reading Frames (ORFs)
In total, 73 ORFs were identified by MFannot in addition to the conserved genes (Supplementary Table S2), which is a little higher than that predicted by Li et al. (2015). The difference in ORF number may be due to different method used. The fungi in Hypocreales exhibited a broad spectrum of predicted ORFs in this study, from the least (n = 1) in M. chlamydosporia (Lin et al., 2015) to the most (n = 73) in O. sinensis ( Table 1). The length of the ORFs in O. sinensis ranged from 306 to 2397 bp, with a total length of 59,421 bp accounting for 37.72% of the mt genome of O. sinensis (Supplementary Table S2). The variation in the number and length of predicted ORFs could partly explain the variation in genome size.
Among the 73 ORFs, 12 were free-standing, and the rest were located in introns (Supplementary Table S2). Among freestanding ORFs, only ORF_71 was predicted to be a DNAdirected RNA polymerase, while the functions of the others were unknown. In addition to the ORFs that exhibited similarities to GIY-YIG/LAGLIDADG endonuclease or RTs in introns, there were some ORFs encoding domains of intron encoded nuclease repeat and nuclease-associated module, which are possibly involved in DNA-binding (Supplementary Table S2).

Molecular Evolution of rps3
The rps3 gene was identified in the rnl intron of Hypocreales fungi, except Acremonium fuci. The rps3 in A. fuci is freestanding. The length of rps3 ranged from 1131 bp in A. implicatum to 1584 bp in O. sinensis. In O. sinensis, rps3 was located in the IA intron of rnl (rnl_I8). A BLASTx search against the NCBI database showed that the rps3 was fused to a novel LAGLIDADG HE gene. Half of rnl-I8 was homologous to rps3 in H. rhossiliensis and half was similar to the HE gene from Ophiostoma novo-ulmi subsp. americana (Gibb and Hausner, 2005).
The phylogenetic relationships among the Hypocreales inferred from rps3 genes were different from that based on 14 PCGs (Supplementary Figure S3). The sub-clusters (Hypocreaceae, Cordycipitaceae, Clavicipitaceae, and Ophiocordycipitaceae) in Hypocreales were similar between these two phylogenetic trees, except Nectriaceae. In the phylogenetic tree based on rps3, Fusarium oxysporum, belonging to Nectriaceae, was clustered with M. anisopliae rather than Fusarium spp. Moreover, the Cordycipitaceae cluster was separated from the Ophiocordycipitaceae cluster in the phylogenetic tree inferred from rps3. To measure the selective pressure of the rps3 genes in Hypocreales, dN/dS values were calculated. The values were 0.2400, 0.1075, 0.1075, 0.1023, and 0.0848, when the adopted models were M0 (one-ratio), M1a (neutral), M2a (selection), M7 (beta), and M8 (β and ω),  Figure S3). Moreover, we explored a total of 36 sites in rps3 with the values of dN/dS > 1 ( Table 4 and Figure 3), and 16 sites were statistically significant (P ≥ 0.95) ( Table 4 and Figure 3).

Repetitive Sequences in the mt Genome of O. sinensis
Repetitive genes are considered as putative elements for recombination or regulation (Ghikas et al., 2006). A local self BLASTn of the O. sinensis mt genome against itself revealed 710 repetitive sequences (e-value < 10 −7 ) with a total length of 77,583 bp, accounting for 49.25% of the whole mt genome (Figure 1). As shown in the mt genome of Phlebia radiata (Salavirta et al., 2014), the abundant repeat sequences were almost exclusively localized into intronic and intergenic region in O. sinensis, in particular between positions 110 to 30 kb (clockwise, Figure 1). Reputer identified a total of 3490 bp (2.22%) repeats in the O. sinensis mt genome, including 30 forward repeats (61-128 bp, in total 2446 bp), 13 palindromic repeats (61-90 bp, in total 968 bp), and 1 reverse repeat (76 bp) ( Table 5). Tandem Repeats Finder found 44 tandem repeats (2-123 bp in copy size), with an average length of 63 bp, and accounting for 1.76% (2772 bp) of the mt genome ( Table 5).
Simple sequence repeats (also known as microsatellites) comprise tandemly repeated genetic loci of 1-6 bp (Tautz and Renz, 1984). SSRs were found in PCGs (nad5, cox1), rRNA (rnl), and non-coding regions (intronic and intergenic regions, Supplementary Table S5), with being more abundant in noncoding regions than in exons, as previously found in nuclear genome (Karaoglu et al., 2005). It assumed that SSRs might play an active role in genome evolution by creating and maintaining genetic variation (Tautz et al., 1986), and serve a functional role in gene expression regulation by influencing transcriptional activity or protein-protein interactions (Gerber et al., 1994;Kashi et al., 1997). Among these SSRs, most (36/43) were consisted of mono-nucleotide repeats, while the di-, tetra-, penta-, and poly-nucleotide repeats were found with much lower frequency (Supplementary Table S5). The reason may be that longer repeats have higher mutation rates and less stability (Wierdl et al., 1997). In agreement with the previous studies in fungi (Karaoglu et al., 2005), a majority of sequences rich in A/T were observed. All mono-nucleotide repeats were consisted of A/T repeats and the di-nucleotide repeats were AT/TA, with G/C only found in the less identified tetra-, penta-, and poly-nucleotide repeats.  "+" means located on the positive strand.
Frontiers in Microbiology | www.frontiersin.org FIGURE 2 | ML phylogeny of O. sinensis based on 14 PCGs in mt genome. The phylogenetic tree was inferred from a concatenated alignment of 14 PCGs (atp6, atp8, atp9, nad1-nad6, nad4L, cob, and cox1-cox3) using ML analysis. Numbers above branches specify bootstrap percentages (500 bootstrap replicates). MEGA 7.0 was used to determine the best evolutionary model (GTR+R) and RAxML 8.0 was used to infer the phylogenetic tree.

DNA Modification Analysis
DNA modifications were determined in the mt genome of O. sinensis in parallel with the acquisition of primary sequence data by SMRT sequencing, based on analysis of the kinetics of DNA synthesis reactions. DNA modifications differ by various modification types, including 5-methylcytosine (5mC), 6-methyladenine (6mA), 4-methylcytosine (4mC) and 5-hydroxymethylcytosine (5-hmC), which can be identified by a series of methyltransferases at specific motifs. A genomic position is covered by several sequenced DNA fragments, and the modQV score shows the consistency by which a specific modification is observed. In the O. sinensis mt genome, a total of 1604 sites were determined with an average modQV score of 24.68 at an average coverage of approximately 96x (Figure 4 and Supplementary  Table S6). There were 783 modification sites located on the forward strand versus 821 located on the backward strand (Figure 4 and Supplementary Table S6). The plot of modification scores against sequencing coverage displayed a dominant signal for modified adenosines and thymine bases (pink and burgundy dots in Figure 4). Twenty-eight 4mC (0.13%) and 10 6mA (0.017%) modification sites were identified in the O. sinensis mt genome (Figure 1 and Table 6). The 6mA levels were lower than previously reported for nucleotide methylation (0.048-0.21%) in eukaryotic nuclear genomes (Mondo et al., 2017). Most 6mA and 4mC were distributed in intergenic regions (between tRNA and nad4L/nad6/cox2, or between tRNA and tRNA) or in intron regions of different genes (e.g., nad1-2, cox1-2, cob, rnl), with only three located in the genes. Two 4mC were methylated on the backward strand in the regions of nad2 and nad5 genes, and one 6mA was methylated in the nad4L region ( Table 6). The hypermethylation in promoters and introns, and hypomethylation in exons, are also found in the nuclear genome of eukaryotes (Volpe, 2005). It is reported that methylation is strand-specific within the mt genome (Bellizzi et al., 2013). However, methyl modifications in the O. sinensis mt genome were observed on both strands, including 4mC and 6mA modifications (Supplementary Table S6 and Table 6). Because DNA methylation is diverse, widespread, and often deposited by a diverse set of methyltransferases at specific target sequences (motif), a comprehensive understanding of the distribution and diversity of methylation motifs will benefit comprehension of methylation function and evolutionary history of the mt genome. Eight modification motifs "AANNN m4 CAGCANNANNNNA, " five of which were recognized by 4mC methyltransferases, were detected in the O. sinensis genome by SMRT sequencing with a mean QV of 44.4 at a mean coverage of 90.4x (Supplementary Table S6). Among the five methylation motifs, four were in the introns of PCGs (cob, cox1, and rnl), one was in the an intergenic region (tRNA_I_tRNA_G).

DISCUSSION
Single molecule, real-time sequencing, which was recently developed by Pacific Biosciences, can achieve extraordinarily long

Model
Estimates of parameters Positively selected sites p is the number of parameters in the ω distribution. Positive selection sites with P ≥ 95% were shown in bold.
reads (up to 40 kb) with no GC bias, so that it is suitable for sequencing the genomes with low GC and high repeat content. Moreover, SMRT sequencing enables direct genomewide detection of diverse base modifications by monitoring the kinetic variations of single bases. Thus, SMRT sequencing is convenient for sequencing the mt genome of O. sinensis and revealing the base modification pattern of the mt genome. It has long been debated whether modified bases exist in mt genomes. Previously, O. sinensis has been called Cordyceps sinensis or Cordyceps sp. due to its morphology. However, the colony characteristics of O. sinensis cultures are significant different from other Cordyceps spp. Most Cordyceps species produce bright color and fleshy stromata, while the stromata of O. sinensis are often darkly pigmented, tough, fibrous to pliant, and have aperithecial apices, which are the characteristics of Ophiocordyceps (Sung et al., 2007). Based on these diagnostic characters and several loci sequence analyses (nrSSU, nrLSU, tef1, rpb1, rpb2, tub, and atp6),  (Sung et al., 2007). Our phylogenetic tree (Figure 2) based on the 14 PCGs confirms the phylogenetic position of O. sinensis in Ophiocordycipitaceae. The mt genome intron number dynamics in the compared fungi (from 1 to 54) are most likely caused by the gain or loss of introns (Haugen et al., 2005;Cuenca et al., 2016). The introns are probably acquired from ancestors or gained through horizontal gene transfer (Haugen et al., 2005;Lang et al., 2007). HE domains in group I introns help to splice the transcribed intronic RNA, and are then removed from the transcribed pre-mRNA resulting in a contiguous RNA transcript in the process of gene transcription (Taylor and Stoddard, 2012). However, not all group I introns possess auto-catalytic splicing activity, especially for the mini-introns (e.g., rnl_I5 and nad6_I1), which lack the typical features (Schafer et al., 1991;Lang et al., 2007). It is proposed that the loss of intron splicing activity may be compensated by some other assistant proteins (Kreike et al., 1987;Lambowitz and Perlman, 1990;Lang et al., 2007). If the numerous HE motifs in the O. sinensis mtDNA are active, the HE domains might regulate the transcription of their target genes or modify their target genes, as seen in bacterial viruses and the animalpathogenic Cryptococcus spp. (Ma et al., 2009;Stoddard, 2011). In addition, the existence of HEs within fungal mtDNA genes could promote intron mobility, genetic diversity and adaptive responses for mt genomes, when the allelic recombination events may be impossible or rare in mtDNA due to maternal inheritance (Barr et al., 2005;Basse, 2010).
Within the fungi, rps3 is extremely diverse in location and organization: some are lost, some are free-standing, some are incorporated into group I intron, and others have been invaded by HEs (Sethuraman et al., 2009). Among the Ascomycete fungi, group I intron-encoded rps3 seems to have a rather complex evolutionary history (Sethuraman et al., 2009). The phylogeny of rps3 genes inferred in this study displays an evolutionary pattern that is different from that based on 14 conserved PCGs in Hypocrealean fungi, similar to the report of Lin et al. (2015). The rps3 in A. fuci was a free standing gene and was grouped with the intron encoded rps3, implying that the rps3 gene in A. fuci was somehow relocated and the rnl intron was lost. rps3 is prone to insertion and deletion, is a refuge for HE genes, and is subject to recombination, making it problematic to resolve taxonomic relationships using rps3 (Sethuraman et al., 2009).
To evaluate the balance between purifying selection, neutral evolution and positive selection acting on rps3, the dN/dS ratio, an indicator of evolutionary pressure on a gene, was examined in Hypocreales fungi. A dN/dS value < 1.0 in Hypocreales indicates that rps3 has evolved under functional constraints. However, the significance of LRT showed a clear signal of positive selection in rps3. In addition, the dN/dS values that were > 1 in Hypocreales in relation to the outgroup P. nordicum and the 36 explored positive selection sites suggest that rps3 is under positive selection in Hypocreales fungi, which is in accordance with the results of Lin et al. (2015). rps3 is involved in DNA repair and potentially has endonuclease activities in Schizosaccharomyces pombe and nuclear versions of rps3 in human and Drosophila melanogaster (Neu et al., 1998;Lyamouri et al., 2002;Jang et al., 2004). Except for these model organisms, no functional studies confirm that mtDNA-encoded rps3 genes are actually functional (Kim et al., 2009). Further studies are needed to determine if the rps3 genes in the mt genomic introns produce functional proteins and how these intron-encoded rps3 genes are expressed.
Base modifications are indispensable parts of comprehending biological processes such as host-pathogen interactions, DNA damage and DNA repair (Trygve, 2010). Among base modifications, DNA methylation has been one of the most studied modifications. It has emerged as a significant phenotypic determinant for disease susceptibility and pathogenesis in eukaryotes, and involved in the Restriction-Modification system in prokaryotes (Bernstein et al., 2010;Ershova et al., 2015). However, whether mtDNA can be the site of epigenetic modifications has long been mired in controversy because the (B) Scatter plot of the DNA modifications with different per-strand coverage. At each genomic position, modification QV were computed as the -10 log (P-value) for representing a modified base position, based on the distributions of the kinetics of interpulse durations (IPD ratios) from all reads covering this position and from in silico kinetic reference values. Each dot represents a position on either strand with a modQV larger than 20. The color specified the nucleotide base, on which the modification was detected. Adenosines were colored in red, guanosines were colored in green, cytimidines were colored in blue, and thymines were colored in purple. Among the DNA modifications in the mt genome of O. sinensis, several 6mA and 4mC methylations have been identified, which are always found in prokaryotes and have the function of protecting against restriction enzymes, regulating virulence and controlling DNA replication, repair, and expression (Ratel et al., 2006). Bisulfite sequencing has enabled genomewide surveys of 5mC methylation (Masser et al., 2016), a wellestablished epigenomic mark in eukaryotes, but the historic absence of tools for studying 6mA and 4mC modifications has precluded more comprehensive studies of 6mA and 4mC methylation. Recently, Mondo et al. (2017) firstly analyzed 6mA in fungi by SMRT, and revealed its role as a gene-expressionassociated epigenomic mark. In this study, SMRT sequencing revealed that not only 6mA but also 4mC were presented in the fungal mt genome. Nucleotide modifications are one of the most evolutionarily conserved properties of RNA, and the sites of modification are under strong selective pressure (Li and Mason, 2014). Mitochondria are largely thought to originate from endosymbiotic α-proteobacteria species (Gray, 2012). Thus, the 6mA methylations, which are often found in α-proteobacteria, in the fungal mitogenome might be remnants of the methylation system from α-proteobacteria. However, the function of these epigenetic modifications in the mt genome of O. sinensis remains unknown.
Mitochondria are crucial for responses to hypobaria, hypothermia, and hypoxia due to their central role in energy production and consumption (Chandel and Schumacker, 2000;Chitra and Boopathy, 2013). Cold temperature and low oxygen pressure are the two most remarkable characters of high-altitude environments, where O. sinensis is endemic Yu et al., 2011). Current associations between the mt genome and high-altitude adaption is focused on the mtDNA content and polymorphisms of mt genes, including NADH dehydrogenases, cytochrome c oxidases, ATP synthases and cytochrome b (Luo et al., 2013). However, no studies had been conducted on mt epigenetics. In this study, we found only three methylation affecting coding regions (in nad2, nad4L, and nad5), while the rest is spread between intergenic regions and introns (Supplementary Table S6). Complex I in the mitochondria, including NAD1-6, NAD4L, are involved in collecting electrons from various donors and passing them to coenzyme Q, which then passes the electrons to Complexes III (including COB) and IV (including COX1-3), and finally to the final electron acceptor O 2 to complete the electron transport chain. At the end of this chain, Complex V (including ATP6, ATP8, and ATP9) catalyzes the reaction between ADP and inorganic phosphorus to form ATP (Saraste, 1999). The DNA methylations in eukaryotes are closely related with DNA transcription and translation (Volpe, 2005). Therefore, we supposed that the methylations across the whole O. sinensis mt genome, covering protein-coding regions, introns, and intergenic regions, might be closely related to the expression of the electron transport subunits (nad1-2, nad4L, nad6 in Complex I, cob in Complex III, cox 1-2 Complex IV, and atp6 in Complex V), and thereby modulate the enzymatic activity of OXPHOS and the mt respiratory rate, and then change the ability to capture O 2 and produce energy. Moreover, methylation of mtDNA is supposed to play an important role in mitochondrial biogenesis by affecting the binding affinity of transcription factor A to mtDNA, impacting the relative activity of promoters (Van der Wijst and Rots, 2015). As the mentioned above, the methylation of mtDNA in O. sinensis might be a genetic feature for adaptation to the cold and low PO 2 environment at high altitudes (Volpe, 2005;Luo et al., 2013).

CONCLUSION
Single molecule, real-time sequencing was applied to characterize the O. sinensis mt genome in order to mitigate problems with assembly due to high AT and repeat content of the mt genome. The phylogenetic tree inferred from 14 PCGs supports the phylogenetic position of O. sinensis in Ophiocordycipitaceae. A total of 36 sequence sites were explored with the values of dN/dS > 1, suggesting that positive selection acts on rps3 in Hypocreales fungi. Furthermore, we have analyzed the DNA modification pattern of the mitogenome directly. This is the first report of methylation in a fungal mitochondrion, and we propose that methylations in the O. sinensis mt genome might closely relate with the environmental responses for adapting to the cold and low PO 2 environments at high altitude sites where O. sinensis is endemic.

AUTHOR CONTRIBUTIONS
DL conceived this study. XK analyzed the data and drafted the manuscript. LH participated in the data analysis and prepared figures. PS participated in the data analysis. RL sequenced and analyzed the mt genome. All authors have read and approved the final manuscript.

ACKNOWLEDGMENTS
We thank Runmao Lin at Beijing Normal University for his help in exploring the positively selected signals in rps3 genes, and a referee for thoughtful comments on the topic of mitochondrial DNA methylations.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmicb. 2017.01422/full#supplementary-material FIGURE S1 | Bayesian phylogeny of O. sinensis based on 14 PCGs in mt genome. MrBayes 3.2.6 was applied to construct the phylogenetic tree. Numbers at the nodes are Bayesian posterior probabilities.
FIGURE S2 | Neighbor-joining phylogeny of O. sinensis based on 14 PCGs in mt genome. Numbers above branches specify bootstrap percentages (500 bootstraps). The evolutionary distances were computed using the Tajima-Nei method. MEGA 7.0 was used to construct the phylogenetic tree.
FIGURE S3 | Evolutionary characteristics of rps3 for 20 different taxa. Numbers above branches specify bootstrap percentages (500 bootstraps). The evolutionary history was inferred using the Neighbor-Joining method and the evolutionary distances were computed using the Tajima-Nei method. Evolutionary analyses were conducted in MEGA7.0. The right of the species names in the phylogenetic tree are the dN/dS values, which are calculated based on the comparative analysis with the rps3 sequence of P. nordicum, an outgroup for the phylogenetic analysis.