Full-Length Transcriptome of the Whale Shark (Rhincodon typus) Facilitates the Genome Information

Rhincodon typus is a keystone and indicator species in marine ecosystems. Meanwhile, R. typus has been listed on the IUCN red list of vulnerable species. Here we used ONT platform to determine the full-length (FL) transcriptome of R. typus and obtained 14,930 FL transcripts. Among all FL transcripts, 14,915 transcripts were covered 11,892 genetic loci and 1,642 novel genetic loci were further found. Meanwhile, we identified 714 novel transcripts by compared FL transcripts with the R. typus genome. Based on FL transcripts, we also predicted the distribution patterns of ASs, LncRNAs, polyAs, CDSs and methylation sites on FL transcriptome of R. typus. Furthermore, a total of 31,021 (97.86%) CDSs can obtained annotation information. Overall, our work firstly provided the FL transcriptome and these sequences complete the annotated R. typus genome information. Furthermore, these information are a potential resource to study biological processes of R. typus.


INTRODUCTION
The whale shark, Rhincodon typus is the exclusive species of the genus Rhincodon and it belonging to the family Rhincodontidae under the order Orectolobiformes (Smith, 1829). Rhincodon typus is widely distributed in the tropical and temperate seas from latitude 30 • N to 35 • S and has rarely been found in waters with surface temperature below 21 • C (Compagno, 2001;Colman, 2005;Rowat and Brooks, 2012;Sequeira et al., 2014). As the largest extant fish, the maximum length and weight of R. typus can reach 20 meters and 42 tons, respectively (Hsu et al., 2014). The head of R. typus is flat and wide, and there are many checkerboard shaped white spots and horizontal stripes on the dorsal part (Compagno, 2001). Previous study has demonstrated that the feeding habits of bait at the food chain bottom and the unique branchial cleft structure enables the R. typus to have a filter-feeding lifestyle (Stevens, 2007;Nozu et al., 2015). In conclusion, R. typus plays an important role in marine ecosystem due to its special biological characteristics and lifestyle, which can be used as an indicator of marine ecosystem stability.
The life-history strategy of R. typus has been researched extensively over the years (Weber et al., 2020). Earlier studies have found that the R. typus grow slowly and reach maturity at approximately 30 years old, and its maximum lifespan estimated at 70 to 80 years (Hsu et al., 2014). The R. typus is ovoviviparous and the pregnancy interval is also longer. Together, the life-history strategy of R. typus can be considered as K-selection (Cavanagh et al., 2003). Longer lifespan of R. typus have been suspected to be benefit from metabolic rate regulation caused by K-selection life-history strategy (including decreased growth rate, longer generational time, and increased body size) (Weber et al., 2020). However, K-selection life-history strategy may also has propelled the R. typus into the IUCN red list of vulnerable species (Cavanagh et al., 2003). This is the case because R. typus has difficulty responding quickly to habitat destruction caused by human activity and climate change, and ultimately leads to an inability to recover quickly from population declines (Norman, 2004). In conclusion, it would be interesting to accurately elucidate the effects of the K-selection life-history strategy on longevity and population size of R. typus.
The perfect genetic information for exploring the question mentioned above, thus it is essential to have access to highquality genome resource of R. typus. Recently developed nextgeneration sequencing technology has provided a convenient and highly effective solution for deciphering the whole-genome information of R. typus. Weber et al. (2020) first assembled the R. typus genome using a combination of Illumina shortinsert, mate-pair, and TruSeq Synthetic Long Read (TSLR) libraries and the estimated genome size was3.2-Gb. Meanwhile, the researcher also has uncovered several genetic traits associated with body size, metabolic rate, and lifespan of R. typus based on comparative genomics. Although the published genome has enriched the genetic information database of R. typus, the researchers predicted the incomplete protein-coding genes of R. typus only based on two published candidate gene sets (Stanke and Morgenstern, 2005;Haas et al., 2008). Meanwhile, the low completeness of R. typus genome annotation information also due to the lack of transcript data (Lou et al., 2020). Short transcripts were usually applied to the early genome annotation (Ozsolak and Milos, 2011;Duitama et al., 2012), but it is undeniable that the short transcripts have deficient in the recognition of alternative splicing, alternative transcription initiation or alternative transcription termination sites (Steijger et al., 2013;Tilgner et al., 2013). Meanwhile, the highly similar transcripts resulting from genome-wide replication events also limit the genome annotation ability of short transcripts (Postlethwait et al., 1998). The third generation sequencing technology has revolutionized the genome annotation due to fulllength (FL) transcripts can be obtained to remedy the deficiency of short transcripts (Li et al., 2018). To date, FL transcripts has been successfully used to perfect the genome annotation information of many animals, such as Tachypleus tridentatus (Lou et al., 2020), Sus scrofa (Li et al., 2018), Danio rerio (Nudelman et al., 2018). Therefore, we have every reason to believe that FL transcripts will contribute to high-resolution R. typus genome annotation and provide an impeccable information for further elaboration of R. typus biology. Since 2012, the third-generation sequencing technology been gradually applied to long-read sequencing due to this technology showing high reads yield and having clusters representing individual molecules rather than amplifications (Eid et al., 2009;Feng et al., 2015). Different third-generation sequencing platforms have superior specificity. As a representative sequencing platform, Oxford Nanopore (ONT) platform has 1-Mb of sequencing reads (Wyman et al., 2019).
Oxford Nanopore (ONT) platform was applied in the present study to achieve more efficient R. typus FL transcripts. These results not only constitute a rich dataset of FL transcripts that extends our knowledge of the R. typus transcriptome, but also help to discover novel genes/transcripts to improve the genome annotation efficiency of R. typus. The ultimate goal of the present study is provide perfect genome annotation information for the exploration of R. typus biology.

Ethics Approval and Participation Consent
We have read the policies relating to animal experiments and confirmed the present study complied. All applicable international, national, and/or institutional guidelines for the care and use of animals were followed. All procedures performed in the present study were approved by the Institutional Animal Care and Use Committee of Yantai University. All experimental methods were performed according to relevant guidelines and regulations established by the Institutional Animal Care and Use Committee of Yantai University. In accordance to the directive of Reporting of In vivo Experiments (ARRIVE) guidelines 2.0, we have designed this experiment. Meanwhile, our study did not require specific authorization.
Blood Collection, RNA Extraction, Library Construction and Oxford Nanopore Sequencing The blood was extracted from a male R. typus (approximately 15 ∼ 20 years old, seven meters) from the Haichang aquarium (Yantai, China). The blood sample was then snap-frozen in liquid nitrogen and stored at −80 • C for subsequent RNA extraction. Total RNA of R. typus blood was extracted using the standard TRIzol Reagent Kit (Invitrogen, CA, United States) and following the manufacturer's protocol. The RNA concentration was measured using the NanoDrop 2000 (Thermo Fisher Scientific, MA, United States) and the RNA integrity was assessed using the RNA Nano 6000 Assay Kit and Agilent Bioanalyzer 2100 system (Agilent Technologies, CA, United States). Approximately 900 ng of poly(A)-tailed mRNAs were purified using the standard Dynabeads mRNA Purification kit (Invitrogen, CA, United States), and the sequencing adapters and motor proteins were then immediately added to construct the sequencing library. The constructed library was added to a flow cell and transferred to Oxford Nanopore PromethION sequencing platform for direct RNA sequencing.

Oxford Nanopore Sequencing Data Processing
All raw reads in "fast5" format generated by the Oxford Nanopore PromethION sequencing platform were converted to "fastq" format using GUPPY software (version 3.2.6 1 ). The NanoFilt software (version 2.6.0 2 ) was further applied to filter out adapter sequences and low-quality raw reads (raw reads with quality score less than 7 or length less than 50bp) to obtain the clean reads for subsequent analyses, and the parameters as follows: -q 7 -l 50. These highly similar clean reads derived from the similar isoform were clustered into consensus sequences using Genomic Mapping and Alignment Program (GMAP; Wu and Watanabe, 2005) with the following parameters: -crossspecies and -allow-close-indels 0. The consensus sequences were then mapped to the reference R. typus genome (Weber et al., 2020) using the GMAP (Wu and Watanabe, 2005), and those consensus sequences with different 5 -end exons were identified as redundant sequences. The stringTie software (version 2.1.2 3 ) was used to merge redundant consensus sequences to obtain the FL transcripts of R. typus, and the parameters as follows:conservative -L -R.

The Distribution Analyses of Full-Length Transcripts on the Reference Genome
The minimap2 software (version 2.17-r941 4 ) was used to compare the FL transcripts with the reference genome, and the parameters as follows: -ax splice -uf -k14. The comparing results of all FL transcript can be divided into "unmapped, " "mapped to '+' , " and "mapped to '−' , " which represent FL transcripts that are not mapped to the genome, mapped to positive and negative strands of the genome, respectively. Meanwhile, the distribution of FL transcripts in the reference genome was investigated based on the consensus sequence density successfully mapped to each chromosome, and the density distribution of the longest 15 consensus sequences were visualized.
Gffcompare software (version 0.11.2 5 ) was applied to annotate all FL consensus sequences and then to identify the known genes/transcripts and novel genes/transcripts. According to the mapping information of FL consensus sequences in the R. typus genome, those FL consensus sequences mapped to the unannotated genome region were defined as novel genes. Meanwhile, the FL consensus sequences with structures different from transcripts in genome "gtf " files were defined as novel transcripts.

Optimization of Transcript Structure
Optimize the original annotated transcript structure is necessary to improve the annotation accuracy of the R. typus genome. In the present study, FL transcripts were compared with known transcripts of the reference genome based on Gffcompare software [version 0.11.2 (see text footnote 5)], and the parameters as follows: -R -C -K -M. The aim was to discover the new transcripts and eventually supplement the annotated information of existing transcripts. When the regions outside the original gene boundary were supported by the FL transcripts, the untranslated regions (UTRs) of the transcript were extended upward and downstream to complete the modification of the transcript boundary. In general, the novel transcripts consists of five types: (1) other parts of the same chain that overlap with the reference exon (O); (2) at least one matching multiple exons (J); (3) the exons on the anti-chain overlap (X); (4) completely contained in introns of the reference genes (I); (5) unknown novel transcripts (U) (Figure 1).

Alternative Splicing Analyses
Alternative splicing refers to multiple splicing types of precursor mRNA (pre-mRNA) can produce different mature mRNAs, which can be further translated into different proteins, and ultimately leading to diversity of biological traits. In the present study, suppa2 6 software was applied to predict the potentially alternative splicing in R. typus FL transcriptome, and the parameters as follows: -f ioe -e SE SS MX RI FL.

Fusion Transcripts Analyses
Fusion transcripts are always formed by having the coding regions of two or more transcripts joined end to end. It is worth noting that the fusion transcripts are uniformly regulated. Tofu software (version 13.0.0; Wang et al., 2016) was used to find fusion transcripts and the identification criteria as follows: (1) a fusion transcript must be mapped to two or more gene locus in the R. typus genome; (2) each gene loci must be aligned to 10% region of a fusion transcript; (3) the fusion transcript must be more than 99% of coverage on the R. typus genome; (4) distance between two mapped locus must be at least 100 kb.

PolyA Length Analyses
The PloyA tail is located in the downstream of the 3 untranslated region of mRNA, and it contributes to mRNA stabilization and cytoplasmic transport. The length of PolyA can affects the translation efficiency of mRNA and therefore it is necessary to analyze the length of PolyA. In the present study, Nanopolish software (Version: 0.12.5 7 ) was used to calculate the length of PolyA and the parameter as follows: polya. Furthermore, we also analyzed the association between median length and expression level of PolyA.

Long Non-coding RNA Prediction
Long Non-coding RNA (LncRNA) is a generic term for RNAs that are more than 200nt long and cannot encode proteins. In the present study, LncRNAs were screened using CNCI software 8 , CPC2 software 9 , and Pfam database [version 33.1; (25)].

Coding Sequence Prediction
In the present study, TransDecoder software 10 was used to identify the potential Coding Sequences (CDSs) of the R. typus FL transcripts based on the open reading frame (ORF) length, loglikelihood score, and the comparing information between amino  acid sequence and protein domain sequence of Pfam database. Annotate the potential functions of CDSs and the metabolic pathways that may be involved based on existing protein databases. In the present study, we annotated the CDSs using the BLAST (Altschul et al., 1990) and based on the reference genome (Weber et al., 2020), Pfam (Protein Families; Finn et al., 2013), Nr (Non-Redundant Protein Sequences; Deng et al., 2006), KEGG (Kyoto Encyclopedia of Genes and Genomes; Kanehisa et al., 2004), GO (Gene Ontology; The Gene Ontology Consortium et al., 2000), and COG (Cluster of Orthologous Groups of Proteins; Tatusov et al., 2000) databases, and the parameter as follows: E-value < 0.00001.

Methylation Analyses
Direct RNA sequencing not only can sequenced the transcriptome information, but also can obtain the methylation modification information. In the present study, the m5C methylation site was predicted based on the alternative model in Tombo software (version 1.5 11 ), and the top 1000 sites with the highest alternate value were regarded as the credible sites. Meanwhile, the m6A methylation site was predicted based on the MINES pipeline, 12 and those sites with alternate values greater than 0.7 were identified as credible sites. Meanwhile, we also mapped the predicted m5C and m6A methylation sites to the reference genome and then visualized the distribution of these sites on the genome. Additionally, two bases were extended upstream and downstream of the m5C and m6A methylated sites to obtain motifs that consisting of five bases. The meme software 13 was applied to calculate the sequence characteristics of the motifs.

Rhincodon typus Full-Length Transcriptome Sequencing
High-quality RNA was extracted from R. typus blood and then sequenced on the Nanopore PromethION platform. A total of 10,778,080 raw reads were generated from the platform, corresponding with 7,946,301,774 bp. The max length, average length, N50, N90 and mean quality value of all raw reads was 78,083 bp, 737.26 bp, 808 bp, 443 bp and 9.9, respectively (Figure 2). All raw reads were released in the NCBI Sequence Read Archive under BioProject number PRJNA765716, with accession number of SRR16036159. After filtering out the lowquality raw reads, a total of 9,579,943 clean reads were obtained, corresponding with 7,537,272,963 bp. The max length, average length, N50, N90 and mean quality value of all clean reads was 14,238 bp, 786.77 bp, 822 bp, 466 bp and 10.5, respectively. The length distribution of sequencing reads was showed in Figure 2. Highly similar clean reads were then clustered into 22,868 consensus sequences, with a max length, mean length and N50 of 14,089 bp, 2252.4 bp and 3,076 bp, respectively (Figure 3). All consensus sequences were further compared to the reference R. typus genome for remove redundant, and 14,930 FL transcripts were ultimately obtained. 13 http://meme-suite.org/index.html

Comparison of Transcriptome With Reference Genome
The newly constructed 14,930 FL transcripts were mapped to the reference genome and the aligned results showed that 14,915 sequences were covered 11,892 genetic loci of R. typus genome, and the successfully comparison number of "unmapped, " "mapped to ' + ' , " and "mapped to '−"' was 15, 7,566 and 6,970, respectively. Meanwhile, 1,642 novel genetic loci were found based on the FL transcripts. Finally, we have counted the FL transcripts distribution on the reference genome and drew the density distribution of known and novel transcripts on the 15 longest FL transcripts (Figure 4).

Novel Transcripts
We have provided a preliminary assessment of the novel transcripts based on the aligned results of FL transcripts on the reference genome and 714 novel transcripts were identified. Among the five transcript types, the novel transcript number of I, J, O, U, X type was 3, 630, 12, 55 and 14, respectively.

Fusion Transcripts
Based on the identification criteria, we identified 175 fusion transcripts in the R. typus FL transcriptome, all of which consisted of two transcripts.

PolyAs
By comparing the FL transcripts with the R. typus reference genome, a total of 100 ployAs were obtained in the present study, with the mean length and N50 of 89.16 bp and 77.93 bp. The length distribution of polyAs was showed in the Figure 6A. Meanwhile, we conducted a correlation analysis between polyA N50 and expression level of FL transcripts, and the correlation was showed in the Figure 6B.

LncRNAs
Three prediction approaches were applied to identify the LncRNA and a total of 308 were predicted in the FL transcripts. Specifically, 162, 210, and 212 LncRNAs with length greater than 200 bp were predicted in the CNCI software, CPC2 software and Pfam database, respectively (Figure 7). Furthermore, veen chart indicated that only 89 LncRNAs were shared among the three prediction approaches.

Coding Sequences
We have predicted 31,698 potential CDSs from the R. typus FL transcripts based on the TransDecoder software and Pfam Frontiers in Marine Science | www.frontiersin.org  database, corresponding with a total length, mean length, and N50 of 41,659,911 bp, 1314.28 bp and 6,459 bp. Among all CDSs, a large proportion (26,138, 82.46%) of CDSs are less than 2,000 bp in length (Figure 8).

Coding Sequence Annotation
The lack of previous transcripts may have limited the completeness of the R. typus genome annotation information, it is necessary to annotate the CDSs. All the 31,698 potential CDSs were further compared to the afore-mentioned databases and a total of 31,021 (97.86%) CDSs can obtained annotation information. Specifically, we have evaluated the genetic sequence similarity of R. typus and other species by comparing all FL transcripts against the Nr database. Result showed that 30,189 CDSs were found in Nr database and 94.98% CDSs have a strong similarity with the existing sequences from R. typus ( Figure 9A). GO classification can exactly define the gene characteristic and can help us understand which genes might be associated with biological processes. GO classification result showed that the terms of "positive regulation of transcription DNA-templated, " "nucleus, " and "metal ion binding" were dominant in "biological process, " "cellular component, " and "molecular function, " respectively ( Figure 9B). The COG database have divided the homologous genes of different species into different ortholog transcripts based on evolutionary relationship. In the present study, 6,561 CDSs were categorized into 23 COG categories ( Figure 9C). Among these categories, the first three largest groups were R category, T category and J category, representing "General function prediction only, " "Signal transduction mechanisms, " and "Translation, ribosomal structure and biogenesis, " respectively. KEGG analysis was used to analyze the functions and their metabolic pathways of gene products in cells. In this study, a total of 27,233 CDSs were assigned to 5 terms according to metabolic pathways: cellular processes, environmental information processing, genetic information processing, metabolism, and organismal systems ( Figure 9D). Meanwhile, the most significant pathways in 5 terms were "cellular community-eukaryotes, " "signal transduction, " "folding, sorting and degradation, " "carbohydrate metabolism" and "endocrine system, " respectively.

m5C and m6A Methylation Sites
In the present study, these m5C methylation sites with the top 1000 alternative value and 5,088 m6A methylation sites with alternative value greater than 0.7 were used to analyze the distribution of methylation sites. The methylation site distribution of 15 longest FL transcripts was showed in the Figure 10. Distribution results showed that m5C methylation sites were distributed in both the sense strand and antisense strand of the 15 longest FL transcripts, while m6A methylation  sites were only predicted in the antisense strand of the 15 longest FL transcripts.
Additionally, the motif of m5C and m6A methylation sites were identified by extending the upstream and downstream bases of the methylation sites. Results showed that the motifs of m5C methylated sites showed no regularity, while all motifs of m6A methylated sites were R(representing A or G)GACH(representing A or T or C) type (Figure 11).

DISCUSSION
The acquisition of FL transcripts is essential to accurately investigate the gene function. Although the next-generation sequencing technology has improved the capture efficiency of transcripts, assembled transcripts were generally short and non-full-length, which may ultimately limit the accuracy and completeness of genome annotation information (Postlethwait et al., 1998;Steijger et al., 2013;Tilgner et al., 2013). The thirdgeneration sequencing technology seems to offer an opportunity to solve above-mentioned difficulties, this is the case because it can obtain some transcripts large enough to cover the FL genes (Eid et al., 2009;Feng et al., 2015). Given the advantage of ONT platform in long-read sequencing (Wyman et al., 2019), thus it can be applied to sequence the FL transcriptome of R. typus. In the present study, a total of 14,930 FL transcripts were generated and these sequences helped us identify 714 novel transcripts and 1,642 novel genetic loci in the R. typus genome. Meanwhile, the longest transcript obtained in the present study is 14,089 bp, which is beyond the reach of next-generation sequencing technology.
Full-length (FL) transcripts provide the basis for the research of AS events. Gene can produce different mRNAs by different splicing and then increase the functional proteome variability in cells and tissues. This implies that AS events can alter the composition of transcribed genes without increasing the number of genes (Wang et al., 2008). Alternative splicing events have been proved to be involved in the regulation of gene expression in various biological processes such as growing development, sex differentiation and immune resistance (Modrek and Lee, 2002). However, short transcripts are difficult to identify the combinations of splice-site (Wang et al., 2008;Chacko and Ranganathan, 2009). Therefore, FL transcripts are necessary for the more complete and accurate identification of R. typus AS events. A total of 1,941 AS events were identified in this study and these results will greatly improve reference annotation. There is no denying that the proportion of AS events obtained by FL transcripts is low, which may be because the R. typus have slightly more single-exon isoform with lower coding ability, thus representing lncRNAs (Sharon et al., 2013). Meanwhile, considering that exons are more prone to methylation than introns, thus we suspected that alternatively splicing of a large proportion of multi-exon genes may increase the transcriptional diversity of R. typus, which has been demonstrated in many organisms (Pan et al., 2008;Gelfman and Ast, 2013).
Fusion transcripts are usually identified by RNA sequencing and play an important role in some physiological processes, whether protein-coding or non-coding RNAs (Wu et al., 2014). In the present study, 175 fusion transcripts were detected, which is relatively low. We suspected that the poor accuracy of ONT platform sequencing led to this finding (Wyman et al., 2019). Due to the lack of genomic data at the chromosomal level, we have not been able to determine whether these fusion transcripts are from intergenic splicing or chromosomal rearrangement . In fact, fusion transcripts have been identified in a variety of animals including humans and mouse . Although the physiological function of fusion transcripts is still unknown, fusion transcripts with specific functions may be positively selected during the R. typus evolution. In fact, fusion transcripts can increase the diversity of the transcriptome (Ou et al., 2021). Future studies still need to verify the physiological function of the fusion transcripts to explore its regulatory mechanism for the life-history strategy of the R. typus.
The ability to estimate the relationship between polyA length and expression level is another advantage of the third-generation sequencing technology. Our study have confirmed that the N50 of polyA was negatively correlated with the expression level, meaning that highly expressed genes generally have shorter polyA length (Lima et al., 2017;Legnini et al., 2019). In fact, polyA are the recognized regulators of translation and transcriptional stability (Roach et al., 2020). A similar negative correlation was found in the larval stage of Caenorhabditis elegans, but not in the adult stage (Roach et al., 2020). The R. typus used in this study were 15-20 years old and belonged to the early development. Whether the correlation between polyA length and expression level of FL transcripts of R. typus would show age-specific is still worth studying. Additionally, the polyA length is species-specific and it is also of concern because mRNA with short polyA are easily enzymatic or translational dormancy (Elkon et al., 2013;Abdel-Ghany et al., 2016). In the present study, the mean polyA FIGURE 11 | The consensus motifs of m6A methylation sites. The height of the base signal represents the relative base frequency of each site. length of R. typus was only 89.16 bp and significantly lower than human (Mayr and Bartel, 2009). Previous study have confirmed that polyA length are closely related to the intron retention events involved in cell senescence regulation (Roach et al., 2020;Yao et al., 2020). Although the effect of intron retention on cell senescence is unclear, it does not prevent us to speculate that the short polyA may be related to the longevity of R. typus.
LncRNAs are these non-coding RNAs which structure is similar to mRNA and length is longer than 200 nucleotides (Wan et al., 2019). Although LncRNAs have no protein-coding ability, they still regulate gene expression at epigenetic, transcriptional and post-transcriptional levels, and therefore participate in many biological processes (Kapranov et al., 2007). To date, the LncRNAs of R. typus has never been reported. In our study, 89 LncRNAs were predicted by three methods. Fewer lncRNAs were identified may be due to fewer lncRNAs were annotated in the database. At present, lncRNAs are mostly found in humans and mice, but their regulatory mechanisms in other species have not been thoroughly studied . A critical reason for this is that non-coding RNAs are not highly conserved among species (Liu et al., 2018). Therefore, LncRNAs obtained in this study will facilitate the function study of R. typus. However, it is undeniable that the real-time dynamic changes of lncRNAs in cells are difficult to be analyzed, which affects the annotation of the function and mechanism of R. typus lncRNAs.
RNA methylation refers to methylated modifications that occur at different sites on RNA, and it plays an important role in the regulation of RNA metabolism, such as precursor RNA splicing, RNA editing, RNA translation, and RNA stability (Liang et al., 2020). The two most common post-transcriptional modification of RNA, including m5C and m6A, are believed to play important roles in the regulation of adaptive functions of organisms (Cui et al., 2017;David et al., 2017). This study for the first time clarified the distribution patterns of m5C and m6A in R. typus FL transcripts, which can provide basic data for exploring the adaptive functions of R. typus. Meanwhile, the motifs of m5C methylated sites showed no regularity, although all motifs of m6A methylated site were R(representing A or G)GACH(representing A or T or C) type. The regular existence of m5C modification may imply that it is a specific modification and regulation mode of R. typus. Previous study has suggested that the distribution of m5C in mRNA is related to cis-acting elements and microRNA binding sites, and thus m5C may be involved in the metabolism process after mRNA transcription (Squires et al., 2012). As mentioned earlier, a low basal metabolic rate may contribute to the longevity of R. typus (Weber et al., 2020). Meanwhile, m6A has been confirmed not only to regulate transcriptome, but also to participate in DNA damage repair (Lee et al., 2015). Although the excavation of longevity-related m5C and m6A sites is necessary to unlock the longevity code of R. typus, the mechanisms involved are not yet clear. Future research needs to further discuss these uncertainties.
In conclusion, the present FL transcript information would be of great value to improve R. typus genome annotation and relevant biological research.

CONCLUSION
In the present study, the ONT platform was first applied to sequence the FL transcriptome of R. typus and these FL transcripts can compensate for short-read used in genomic annotation processes. Based on FL transcripts, we have identified the novel genes and novel transcripts of R. typus and further generated an updated reference genome annotation information. Meanwhile, our research revealed the R. typusspecificity distribution patterns of ASs, fusion transcripts, LncRNAs, and methylation sites, and these information provides a more comprehensive foundation to explain the transcriptome diversity of R. typus. In conclusion, our results not only significantly improve existing genome annotation information of R. typus, but also have revealed important transcriptome characteristics and generated novel resource and information with positive implications for biological research of R. typus.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: NCBI (accession: PRJNA765716).

ETHICS STATEMENT
The animal study was reviewed and approved by the Institutional Animal Care and Use Committee of Yantai University.

AUTHOR CONTRIBUTIONS
FL: conceptualization, methodology, formal analysis, data curation, writing-original draft preparation, writing-review and editing, visualization, and project administration. LiW, ZW, LeiW, and LZ: software. ZL and YT: validation and supervision. QZ: resources. All authors have read and agreed to the published version of the manuscript.