A de novo Transcriptome Assembly of the European Flounder (Platichthys flesus): The Preselection of Transcripts Encoding Active Forms of Enzymes

The RNA sequencing data sets available for different fish species show a potentially high variety of forms of enzymes just in teleosts. This is primarily considered an effect of the first round of whole-genome duplication with mutations in duplicated genes (isozymes) and alternative splicing of mRNA (isoforms). However, the abundance of the mRNA transcript variants is not necessarily reflected in the abundance of active forms of proteins. We have investigated the transcriptional profiles of two enzymes, aralkylamine N-acetyltransferase (AANAT: EC 2.3.1.87) and N-acetylserotonin O-methyltransferase (ASMT: EC 2.1.1.4), in the eyeball, brain, intestines, spleen, heart, liver, head kidney, gonads, and skin of the European flounder (Platichthys flesus). High-throughput next-generation sequencing technology NovaSeq6000 was used to generate 500M sequencing reads. These were then assembled and filtered producing 75k reliable contigs. Gene ontology (GO) terms were assigned to the majority of annotated contigs/unigenes based on the results of PFAM, PANTHER, UniProt, and InterPro protein database searches. BUSCOs statistics for metazoa, vertebrata, and actinopterygii databases showed that the reported transcriptome represents a high level of completeness. In this article, we show how to preselect transcripts encoding the active enzymes (isozymes or isoforms), using AANAT and ASMT in the European flounder as the examples. The data can be used as a tool to design the experiments as well as a basis for discussion of diversity of enzyme forms and their physiological relevance in teleosts.


THE BASICS
The presence of multiple forms of enzymes in teleost fish is primarily considered a result of the first round of whole-genome duplication and mutations in duplicated genes that occurred in teleostean evolutionary history (isozymes) as well as it is generated by the process of alternative splicing of mRNA (isoforms). A common occurrence of gene duplication and the scale and importance of this phenomenon are shown by Zhang (2003). Just for the record, isozymes are different forms of the same enzyme coming from different genes but catalyzing the same chemical reaction; isoforms originate from the same gene, but they can have the same or unique functions, depending on how they are spliced. Such abundance of the mRNA transcript variants as is shown in teleost fish is not necessarily reflected in the abundance of active forms of enzymes. Unfortunately, the analysis of transcriptome sequencing by de novo assembly without any mechanisms of data preselection may lead to an overestimation of the number of transcripts, and only some of them correspond to active enzymes. Although genome-based transcriptome analyses should be immune to the problem, they do require a reasonably complete genome project, which is often not available for the target species. Reasonably deep RNA-Seq is relatively easy to perform, and de novo assembly remains a method of choice for these species. Thus, a question arises as to how to effectively preselect cDNA transcripts corresponding to proteins that are active in the cells. In this article, we show it on the example of two enzymes: aralkylamine N-acetyltransferase (AANAT: EC 2.3.1.87) and N-acetylserotonin O-methyltransferase (ASMT: EC 2.1.1.4) in the European flounder (Platichthys flesus). We have investigated the transcriptional profiles of the enzymes in the eyeball, brain, intestines, spleen, heart, liver, head kidney, gonads, and skin. A link between aanat and asmt expression and the physiological role of AANAT and ASMT in various organs in fish is discussed in our previous papers (Kulczykowska et al., 2017;Pomianowski et al., 2020).
In teleost fish, in contrast to tetrapods, there are several AANAT and ASMT isozymes encoded by distinct genes as a result of genome duplication (Falcón et al., 2009(Falcón et al., , 2011. The first report of AANAT encoding genes shows two (aanat1 and aanat2) genes expressed in pike (Esox lucius) (Coon et al., 1999). Later papers report expression of three genes: aanat1a, aanat1b (also known as snat), and aanat2 in the pufferfish (Takifugu rubripes and Tetraodon nigroviridis), medaka (Oryzias latipes) (Coon and Klein, 2006), and sea bass (Dicentrarchus labrax) (Paulin et al., 2015). Additionally, ASMT encoding genes asmt and asmt2 (also known as hiomt and hiomt2) and their transcripts or only one asmt transcript have been detected in various fish species (Velarde et al., 2010;Khan et al., 2016;Muñoz-Pérez et al., 2016;Zhang et al., 2017). Our research group has examined the expression of the aanat and asmt genes in various organs of the three-spined stickleback (Gasterosteus aculeatus) as a part of the project on the cutaneous stress response system in fish (Kulczykowska, 2019). We have found two transcripts of genes encoding AANAT and two of ASMT in the eyeball and one of each in the skin (Pomianowski et al., 2020). However, in an earlier study (Kulczykowska et al., 2017), also in the three-spined stickleback, we even found three aanats mRNAs in the brain, eye, skin, stomach, gut, heart, and kidney, but their levels differed significantly within and among organs.
A variety of AANAT and ASMT isozymes and isoforms in teleost fish detected by the analysis of transcriptome sequencing data (for example, Li et al., 2015;Zhang et al., 2017;Lv et al., 2020) together with the results of our previous studies (Kulczykowska et al., 2017;Pomianowski et al., 2020) show some discrepancies. Thus, a question arises if all transcript variants correspond to active forms of enzymes (isozymes and/or isoforms) in fish organs/tissues. The diverse properties of the encoded proteins with respect to their enzymatic activity, including substrate preferences, kinetic characteristics, and mechanism of regulation as well as their organ distribution can indicate multiple biological functions. Furthermore, different AANAT and ASMT variants (isozymes and isoforms) and their combinations, which are organ specific, can be engaged in regulation of homeostasis of the organism under different conditions and in different phases of organism development having a marked impact on fish physiology. Therefore, a preselection of transcripts corresponding to the enzymes that are active in studied organs is required. Biological importance of AANAT and ASMT resulting from their role in many metabolic pathways in the cells explains a continuous need for research on them. This paper follows this trend.

WHY THE EUROPEAN FLOUNDER? IS IT JUST ANOTHER SPECIES EXAMINED?
The European flounder inhabits the European coastal waters from the White Sea in the north to the Mediterranean and the Black Sea in the south. The exceptional adaptability of this species to live, breed, and prosper in waters of different salinity and temperature makes it an excellent model organism in which to study various physiological processes, including osmoregulation and adaptation to variable oxygen conditions (for example, Balment, 1995, 1997;Kulczykowska et al., 2001;Lundgreen et al., 2008). Furthermore, the flounder is widely used in many studies as a bio-indicator (Hylland et al., 1996;Grinwis et al., 2000;Napierska et al., 2009;Laroche et al., 2013). This flatfish is generally readily chosen as an experimental subject as a species easily adaptable to laboratory conditions. Despite this, so far, there are no comprehensive transcriptome data from different organs of this species except for a mixed-tissue RNAseq data set (SRX893920) released in 2015 but not described in any formal publication. The only transcriptomic and genomic data for closely related species are limited to Japanese flounder Paralichthys olivaceus (Pleuronectiformes) inhabiting the Western Pacific, which hampers any comparative molecular approaches between flatfish species so that we want to fill the gap.
It is becoming increasingly apparent that transcriptome data of the European flounder can provide new tools for study of the molecular mechanisms underlying the disruption of homeostasis, which can affect the reproductive success of fish. Therefore, such data are important not only for basic science, but they can be applied in fisheries and resource management. We found the European flounder to be a good model organism for studying the diversity of enzyme forms and potential physiological consequences of this phenomenon. We address our study to marine and freshwater biologists and ecologists, aquaculturists, toxicologists, and climatologists.

RNA Extraction and Sequencing
One European flounder (Platichthys flesus) female was collected in the Gulf of Gdańsk (Poland), transported to the Institute of Oceanology PAS, and kept in a 200-L aerated aquarium (at 7 ppt salinity, 8 ± 0.2 • C water temperature and 8L:16D natural photoperiod) 2 weeks before sampling. The fish was sacrificed at 10 pm by cutting a spinal cord. Whole organs: eyeball, brain, and approximately 5 × 5 mm samples of intestine, spleen, heart, liver, head kidney, and gonad as well as skin from the upper and bottom parts of the fish were dissected immediately after sacrificing. All tissue samples were transferred to Eppendorf tubes and snap frozen in a dry ice-95% EtOH cooling bath. Frozen samples were stored at −70 • C until RNA extraction. Total RNA was purified with a GenElute TM Mammalian Total RNA Miniprep Kit (RTN70, Sigma-Aldrich, St. Louis, MO, United States) with minor modifications according to Pomianowski et al. (2020). RNA integrity number (RIN) was determined with a 2100 Bioanalyzer (Agilent), and samples with a total RNA amount ranging from 1.85 to 6.36 µg, RIN 6.5 to 8.6, and rRNA ratio 1.0 to 1.5 were used to construct sequencing libraries. Equal amounts of RNA isolates were sequenced on an Illumina NovaSeq6000 platform (TruSeq NGS library) with 150 bp paired-end run mode and 40M reads per sample throughput (Macrogen Inc., Korea). Initially, a total of more than 5.1 × 10 8 raw PE reads were obtained from all libraries. Then, after filtering by removal of adaptor sequences, contaminated and poor-quality reads we obtained approximately 75 Gbp of clean data (Q20 bases > 99%).

De novo Assembly of Fish Transcriptome
The Trinity version 2.9.1 (Grabherr et al., 2011) assembler with default parameters was used to obtain de novo assembly of the combined reads from all samples. There were 350,609 contigs in this initial, highly redundant assembly. The redundancies were reduced by applying CD-HIT-EST program release v4.8.1 (Li and Godzik, 2006) with parameter −c 0.98 and by filtering off very poorly represented contigs (TPM <1.0) after mapping the raw data back at the assembly as recommended by the Trinity manual (Supplementary Table 1). Additional filtering was performed after functional annotation. Contigs matching likely contaminants (similarity >85% to inconsistent taxa) were removed by our tritoconstrictor python script available on github 1 . Moreover, contigs without consistently annotated open reading frames and TPM <10 in at least one sample were also removed. The final filtered assembly consisted of 75,017 contigs (or Trinity isoforms) in 37,956 unigenes (defined as Trinity "groups"). The technical quality of this assembly was assessed by transrate v1.0.3 (Smith-Unna et al., 2016) (Supplementary Table 2). The completeness of the assembly was evaluated using BUSCO pipeline version 3.0.2 (Simão et al., 2015; Supplementary  Figure 1). More than 93% of the 2,586 representative vertebrate BUSCOs were present in the reported transcriptome. For the metazoa and actinopterygii BUSCOs, the statistics were also very good, suggesting that the transcriptome represents a rather high level of completeness. The relatively large fraction of duplicated BUSCO for all databases suggests that the number of alternatively assembled isoforms or assembly artifacts is still high in the final assembly.

Functional Annotation of Transcriptomes
To annotate the assembled unigenes, we searched for the homologous sequences of all isoforms in three protein databases: UniRef90 (2020/02 release) (Suzek et al., 2015), PFAM release 32.0 (Finn et al., 2010), and PANTHER release 15.0 (Thomas et al., 2003). All databases were searched on a local highperformance computer cluster. The two databases containing protein profiles (PANTHER and PFAM) were searched with hmmer 2 (version 3.3), UniRef90 was searched with Mmseqs2 release 11-e1a1c (Mirdita et al., 2019), and the results were integrated according to the pipeline outlined in Supplementary  Figure 2. Only database hits with bitscores higher than 20 were used to produce the final annotation. Gene ontology (GO) terms were assigned to those annotated unigenes based on the current (dated 1/1/2017) official release (Ashburner et al., 2000) using mapping files provided by UniRef and PANTHER. Additionally, based on PFAM and PANTHER signatures, some unigenes were classified according to InterPro system (Mitchell et al., 2019), and GO terms for these unigenes were also integrated. The majority (20,077) of unigenes from the reference assembly were assigned some GO terms (Supplementary Figure 3). The tritoconstrictor python script performing annotation and filtering is available on github.

Analysis of aanat and asmt Transcripts
The final annotated transcriptome contained all six expected groups of transcripts of aanat and asmt, some with several potential isoforms ( Table 1). The integrity of these transcripts was verified as follows. First, homologous genomic sequences of Japanese flounder (Paralichthys olivaceus) (GenBank GCA_001904815 and GCF_001970005) and stickleback (Gasterosteus aculeatus) (BROADs1, Ensemble release 97) were identified using the CLC Genomics workbench ver. 9.5.5 (Qiagen) "Large Gap Read Mapping" procedure. Then, all isoforms were aligned with Japanese flounder reference sequences. The alignments of problematic isoforms are presented in Figure 1. Based on these alignments, only a single complete transcript from each gene was identified.
The remaining isoforms can be regarded as artifacts due to illegitimate assembly of disjointed sequences. The apparent presence of unspliced introns in three (out of 13) isoforms is perhaps surprising. Apparently, given the substantial sequencing depth, our RNA isolation procedure left enough genomic DNA in the sample to generate this type of assembly artifact. Despite that, the unambiguous identification of correctly spliced transcripts was possible in each case. There was no evidence for the presence of differentially expressed isoforms in any of the six genes.
In all studied organs of the European flounder, asmtl transcripts were present at low levels ( Table 1). In teleosts, asmtl gene is a product of fusion between maf and asmt genes (Zhang et al., 2017). Transcripts of the asmtl gene that were identified in Platichthys flessus had the same intronexon structure as in the three-spined stickleback and Japanese flounder. Moreover, their sequence similarity to stickleback and Japanese flounder genomic sequences was convincing enough to conclude that these genes are truly homologous. The asmtl gene was found in the genomes of several fish species by Zhang et al. (2017). The transcriptomic survey made by the same author showed that asmtl is transcribed in the eye, skin, liver, and gonads of Sinocyclochelius fish and in the brain, gill, liver, muscle, and skin of mudskippers (Boleophthalmus pectinirosris and Periophthalmus magnuspinnatus) (Zhang et al., 2017). Taking into consideration that asmt transcripts were found mostly in the eye and brain and asmt2 and asmtl transcripts in many peripheral organs, their function seems to be different.

Data Records
The sequencing and assembly data of the transcriptome for all samples were deposited into public repositories. The raw sequencing data generated in this work were deposited in NCBI Sequence Read Archive (Leinonen et al., 2011). The assembly was deposited at DDBJ/EMBL/GenBank Transcriptome Shotgun Assembly (TSA) database. The version described in this paper is linked to NCBI BioProject number PRJNA637628.
Additional data, including expression profiling across samples, are available as a Supplementary Material (Supplementary Table 3).

CONCLUDING REMARKS
A variety of enzyme variants, isozymes and isoforms, in teleost fish, which are shown by the analysis of transcriptome FIGURE 1 | Alignment of aanat/snat, asmt, asmt2 transcript sequences (referenced by contig/isoform IDs used in Table 1 and Supplementary Data) to relevant genomic sequences of Japanese flounder (referenced by accession numbers). The figure was prepared in CLC Genomics Workbench and follows conventions used in this software. Annotations are presented as arrows above the sequences represented by black lines. A good match is observed only within annotated exons (green arrows), confirming the structural integrity of the annotated open reading frames (yellow arrows). Note that ORF annotations span over the alignment gaps. Only a single isoform from each locus has a complete ORF and no indication of misassembly (red boxes).
sequencing data and presented in many papers, including ours, prompted us to investigate the factual diversity of the enzymes. Hence, we proposed how to effectively preselect those cDNA transcripts while analyzing transcriptome sequences to distinguish those corresponding to the active proteins. In this article, we show it on the example of two enzymes, aralkylamine N-acetyltransferase and N-acetylserotonin O-methyltransferase, in the European flounder. It is important to sequence RNA from as diverse a set of organs as possible to assure that the final transcriptome is complete enough for identification of true organs-specific alternatively spliced transcripts. Therefore, the analyses were performed in nine different organs. Discrimination of true alternatively spliced isoforms from assembly artifacts proved to be difficult based on the de novo data alone. However, raw genomic data of closely related Japanese flounder coming from two sequencing projects (GenBank GCA_001904815 and GCF_001970005) were used for final verification of identity and completeness of prefiltered isoforms. The expression of five aanat and asmt genes differed markedly between organs. Moreover, a low expression of asmtl, a gene previously described in several teleost species, was also found in nine organs of the European flounder. No compelling evidence for alternatively spliced isoforms was identified for any of the six target genes.

DATA AVAILABILITY STATEMENT
This Transcriptome Shotgun Assembly project has been deposited at DDBJ/EMBL/GenBank under the accession GIPN00000000. Sample description and raw sequencing reads are deposited at NCBI databases. Accession numbers are provided in the Data column of Supplementary

ETHICS STATEMENT
The animal study was reviewed and approved by the Ethics Committee for Animal Experimentation (University of Science and Technology, Bydgoszcz) Mazowiecka 28 Str. 85-084 Bydgoszcz Poland.

AUTHOR CONTRIBUTIONS
KP and AB were the authors of the research idea. KP assisted in sample collection, extracted the RNA and estimated its quality and integrity. AB assembled and annotated the transcriptome. EK, KP, and AB contributed to writing the final manuscript. All authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
We would like to thank Mr. Ireneusz Stepczyński for catching fish investigated in the presented research and Mrs. Hanna Kalamarz-Kubiak, Ph.D., DSc., for tissue sample collection from the investigated animal. Computer analyses in this study were run on supercomputers of the Academic Computer Centre (TASK) in Gdańsk, as well as using the PLGRID (www.plgrid.pl) infrastructure.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmars.

2021.618779/full#supplementary-material
Supplementary Figure 1 | BUSCO assessment of assembly completeness. The final filtered assembly (75k contigs) was compared with the three reference sets of BUSCOs as defined in www.orthodb.org database version 9. For the smallest metazoan data set, only 5 genes were not found in the assembly out of the 978 searched for, indicating good completeness of the assembly.
Supplementary Figure 2 | Simplified overview of assembly annotation and filtering workflow.
Supplementary Figure 3 | Functional annotation of the assembly. There were 37,956 unigenes in the final assembly consisting of 75,017 contigs. Out of these, 20,090 unigenes were assigned at least one Gene Ontology (GO) term. The Venn diagram shows the distribution of unigenes among the three GO categories: cellular component, biological process, and molecular function. The colored areas are proportional to the number of unigenes with respective assignments.