Genome Analysis of Lagocephalus sceleratus: Unraveling the Genomic Landscape of a Successful Invader

The Tetraodontidae family encompasses several species which attract scientific interest in terms of their ecology and evolution. The silver-cheeked toadfish (Lagocephalus sceleratus) is a well-known “invasive sprinter” that has invaded and spread, in less than a decade, throughout the Eastern and part of the Western Mediterranean Sea from the Red Sea through the Suez Canal. In this study, we built and analysed the first near-chromosome level genome assembly of L. sceleratus and explored its evolutionary landscape. Through a phylogenomic analysis, we positioned L. sceleratus closer to T. nigroviridis, compared to other members of the family, while gene family evolution analysis revealed that genes associated with the immune response have experienced rapid expansion, providing a genetic basis for studying how L. sceleratus is able to achieve highly successful colonisation. Moreover, we found that voltage-gated sodium channel (NaV 1.4) mutations previously connected to tetrodotoxin resistance in other pufferfishes are not found in L. sceleratus, highlighting the complex evolution of this trait. The high-quality genome assembly built here is expected to set the ground for future studies on the species biology.


INTRODUCTION
The opening of the Suez Canal in 1869 initiated a process of biological invasions from the Red Sea into the Mediterranean, an event commonly known as Lessepsian migration (Por, 1971;Golani and Appelbaum-Golani, 1977). This influx of marine organisms has greatly impacted the local communities in ecological, evolutionary (Sax et al., 2007), and economical terms (Arim et al., 2006). Lessepsian fishes comprise a significant percentage of all recorded invasive species in the Mediterranean Sea (Zenetos et al., 2012) and may be causing several indigenous species displacements (Golani and Appelbaum-Golani, 2010). Lessepsian migration, having both direct and indirect human-driven origins, is a phenomenon that offers a unique opportunity for studying fast evolutionary change (Palumbi, 2001).
One of the most successful Lessepsian migrant species is the silver-cheeked toadfish, Lagocephalus sceleratus (Gmelin 1789), a member of the Tetraodontidae family (called puffers), widely distributed throughout the Indian and Pacific Oceans (Akyol et al., 2005). The first record of L. sceleratus invasion in the Mediterranean Sea was reported in 2003 in the Gökova Bay, in the south-eastern Aegean Sea coast of Turkey (Filiz and Er, 2004) and 2 years later in the North Cretan Sea (Kasapidis et al., 2007). Since then, is has spread rapidly throughout the entire Levant, Aegean and Ionian Seas (Kalogirou, 2013;Akyol and Ünal, 2017).
The survival and dispersal rate of alien species into novel habitats are dependent on multiple factors including encountering previously unknown pathogens. Thus, the invasion success may be affected by the effectiveness of the immune response (Kolar and Lodge, 2001;Lee and Klasing, 2004) rendering the gene families involved in the adaptive immune response potentially under positive selection.
Lagocephalus sceleratus is a highly toxic species, which along with other pufferfishes, contains in different tissues, a toxic organic compound, tetrodotoxin (TTX) (Chua and Chew, 2009;Kosker et al., 2016). TTX is accumulated in high concentration in ovary, liver and intestine of L. sceleratus individuals, while lower amounts have been detected in skin and muscle tissues (Akbora et al., 2020). TTX is the metabolic product of TTX-producing bacteria and is retained in the fish tissues. However, the mechanisms of absorption and the differential concentration in the tissues and the organs of pupperfishes remain unclear (Nagashima and Arakawa, 2016). TTX toxicity is due to its binding to the outer pore of the NaV1.4 (SCN4) channel, blocking the transport of sodium ions across the pore (Hanifin, 2010). Multiple pufferfishes have been shown to deploy TTX resistance through mutations in specific domains of NaV1.4, (Venkatesh et al., 2005;Soong and Venkatesh 2006;Jost et al., 2008).
In this study, we provide and analyse the first high-quality genome assembly of L. sceleratus. To that end, we combine Illumina and Oxford Nanopore Technology (ONT) reads to construct a highly contiguous assembly, which allows us to explore the genomic landscape of this successful invader and study the genetic background of TTX-resistance in the species. This valuable and robust genome resource will facilitate future studies on the ecology, evolution and potential exploitation of this invasive species.

Sample Collection and Sequencing
Animal care and handling were carried out following well established guidelines [Guidelines for the treatment of animals in behavioral research and teaching. Anim. Behav. 53, 229-234 (1997)].
One female fish (58 cm in length) was caught alive in Agios Georgios (Hersonissos), Crete, Greece (35°20′07.50″N 25°23′11.30″E) at the pre-spawning/spawning stage (as determined by stereoscopic investigation of the oocytes) and was anesthetized using clove oil. In total, 10 ml of blood was collected using a sterilized syringe and stored in tubes that contained ∼1/10 of volume heparin for subsequent DNA extraction.
DNA extraction for the purpose of ONT sequencing was conducted on the day of sampling, from 2 μl of the freshly collected blood, using Qiagen Genomic tip (20 G) and following ONTs protocol for DNA extraction from chicken blood. The final elution was made with 50 μl AE buffer providing 90.4 ng/μl (Qubit measurement) of high purity, high molecular weight DNA (Nanodrop ratios: 260/280 1.87 and 260/230 2.12). DNA integrity was assessed by electrophoresis in 0.4% w/v Bio-Rad Megabase agarose gel. We constructed four ligation libraries (SQK-LSK109) following the manufacturer's instructions. Approximately 1.2 μg of unsheared DNA was used for each library. Two of the prepared libraries were divided into two aliquots. Each library was run for approximately 24 h on a MinION sequencer at HCMR, after which the ONT nuclease flush protocol was performed, and a fresh library or library aliquot was loaded onto the same R9.4.1 flow cell. The total run time was ∼130 h. Basecalling was done with Guppy v3.2.4 (https://community.nanoporetech.com/posts/ guppy-3-2-4-release) in High Accuracy Mode with minimum quality score 7.
For Illumina sequencing, we proceeded with 2-days old, refrigerated blood samples using the same DNA extraction protocol as above. We used 4 μl of blood eluted in 100 μl AE buffer which resulted in 79.2 ng/μl (Qubit measurement) of pure DNA (260/280 1.85 and 260/230 2.21).
Template DNA for Illumina sequencing was sheared by ultrasonication in a Covaris S220 instrument. A PCR-free library was prepared with the Kapa Hyper Prep DNA kit with TruSeq Unique Dual Indexing. Paired end 2 × 150 bp sequencing was performed on an Illumina Hiseq4000 platform.
Tissue samples from brain, gonad, skin, liver, spleen and muscle of the same female individual, were grounded and powdered using pestle and mortar under liquid nitrogen, and except the spleen tissue were all homogenized in TRIzol ® reagent (Invitrogen) assisting the homogenization procedure using needle and syringe. Total RNA was extracted from the TRIzol ® homogenate according to the manufacturer's instructions. In the case of spleen, due its nature and the inability to clean properly the RNA from "dirt" even after five washes with ethanol, the spleen RNA was finally extracted using the commercial kit NucleoSpin ® RNA. The quantity of the isolated RNA was measured spectrophotometrically with NanoDrop ® ND-1000 (Thermo Scientific), while its quality was tested on an agarose gel (electrophoresis in 1.5% w/v). Finally, all samples were used for mRNA paired-end library construction with the Illumina TruSeqTM RNA Sample Preparation Kits v2 following the manufacturer's protocol (Poly-A mRNA isolation with oligo-dT beads, mRNA fragmentation, followed by transcription into first-strand cDNA using reverse transcriptase and random hexamer primers) and sequenced as 150 bp paired reads in half a lane of a HiSeq4000 ® following the protocols of Illumina Inc. (San Diego, CA).

Raw Data Pre-Processing and Genome Size Estimation
Quality assessment of the raw DNA Illumina sequence data was performed with FastQC v0.11.8 (Andrews, 2010). Low quality reads and adapters were removed using Trimmomatic v0.39 (Bolger et al., 2014). The reads were scanned by a 4-based sliding window with an average cutting threshold lower than 15 Phred score. Leading and trailing bases with quality scores less than 10 were also filtered out. Reads with total length shorter than 75 bp and average score below 30 were omitted. The same process was applied to the RNASeq reads.
The genome size was estimated using the k-mer histogram method with Kmergenie v1.7051 (Chikhi and Medvedev, 2014) from the Illumina genomic sequencing data.

De Novo Genome Assembly
To build the genome assembly the long ONT reads were used for the construction of an initial de novo assembly, and then the Illumina reads were used for the polishing stages. (Figure 1). To construct the initial assembly, we used the v. Flye v2.6 (Kolmogorov et al., 2019) algorithm, a repeat graph assembler. The assembly was evaluated by assessing: 1) the N50 sizes of contigs, using QUAST v5.0.2 (Gurevich et al., 2013), and 2) a gene completeness score using BUSCO v3.1.0 (Simão et al., 2015) against the Actinopterygii ortholog dataset v10, with default parameters. In addition Genomescope2 was used to for kmer frequency profiling (Ranallo-Benavidez et al., 2020).

Gene Prediction and Functional Annotation
After repeat masking, gene prediction was conducted using MAKER2 pipeline v2.31.10 (Holt and Yandell, 2011) with two iterative rounds. We used a combined strategy of ab initio, homology-based and transcriptome-based methods. In the first round, for homology annotation, MAKER2 was initially run in protein2genome mode, while SWISS-PROT (www.uniprot.org) was used for protein sequences extraction of three closely related species, Mola mola, Tetraodon nigroviridis, and Takifugu rubripes. For annotation using the RNA-Seq data, est2genome mode was enabled, which is based on transcriptome evidence. Τranscriptomic reads from all sequenced tissues were mapped and assembled through the genome-guide approach, using HISAT2 v2.2.0 (Kim et al., 2015) and StringTie v2.1.1 (Pertea et al., 2015). Ab initio prediction was performed with SNAP (Korf, 2004) (http://korflab.ucdavis.edu), which was independently trained on L. sceleratus genome with default parameters and AUGUSTUS v3.3.3 (Stanke et al., 2006) previously trained through BUSCO v3.1.0 (Simão et al., 2015) with the extra parameter "-long." The second round of MAKER2 was run using the previously trained models with the same settings as round one, except est2genome and protein2genome modes. The previous custom repeat library and MAKER2 repeat library that used for genome masking, remained for both rounds. The completeness of putative genes was assessed using BUSCO v4.0.5 (Simão et al., 2015) against the Actinopterygii odb10 database. The functional annotation of the predicted genes of L. sceleratus was performed by similarity search against the UniprotKB/Swissprot database (release-2020_03) with BLASTP v.2.9.0+ (e-value 1e-6, -max_target_seqs 10) (Altschul et al., 1990). InterProScan v5 (Jones et al., 2014) was used to search motifs and domains against all default databases and the extra of SignalP_EUK and TMHMM. Functional annotation results were also retrieved using eggNOG-mapper (Huerta-Cepas et al., 2017) based on fast orthology assignments using precomputed eggNOG v5.0 (Huerta-Cepas et al., 2019) clusters and phylogenies.

Gene Ontology Mapping
Gene ontology analysis was carried out using a custom python script (gene_ontology_mapping.py). Gene ontology terms were retrieved through the Uniprot API service (https://www.uniprot.org/help/ programmatic_access) and as queries we chose the best blast hits that we extracted after the functional annotation step against UniProtKB/Swiss-Prot.

Orthology Assignment
To identify paralogous and orthologous genes, we compared 30 whole-genome protein-coding gene sets from teleost fish (Supplementary Table S3) (27 species from our previous dataset of Natsidis et al. (2019) in addition with two extra Tetraodontidae species i.e., Takifugu bimaculatus. T. flavidus, and L. sceleratus) with Orthofinder v2.3.12 (Emms and Kelly, 2015) using default parameters. Firstly, the longest isoform of each gene was kept using the primary_transcript.py script provided by Orthofinder suite. For L. sceleratus, only the longest isoforms over 30 amino acids were extracted with a custom script (longestIsoforms.py) and used in the analysis.

Species Tree Reconstruction
For the phylogenomic analysis the orthogroups produced by orthofinder were filtered, keeping those containing a single gene per species to avoid inclusion of paralogs. Then, we kept those with representation from at least 26 out of the 30 taxa analysed in total, using a custom python script (filtered_orthogroups.py) The aminoacid sequences encoded by selected genes of each orthogroup were aligned using MAFFT v7.453 (Katoh and Standley, 2013), with the -auto mode. The aligned orthogroups were then concatenated using a python script by P. Natsidis (https://github.com/pnatsi/Sparidae_ 2019/blob/master/concatenate.py). The resulted alignments were filtered with Gblocks v0.91b (Castresana, 2000) to exclude poorly aligned regions with the following parameters: "Allowed Gap Positions" was set to half, "Minimum Length of a Block" was set to 8, "Minimum Number of Sequences for a Flanking Position" was set to 20, and "Minimum Number of Sequences for a Conserved Position" was set to 18.
Then, we ran RAxML-NG v0.9.0 (Kozlov et al., 2019) for phylogenetic tree reconstruction, and in order to select the best model, we used ModelTest-NG v0.1.6 (Darriba et al., 2020) specifying the --topology type parameter to maximum likelihood (ml) mode. The phylogenomic inference was run using the selected model, JTT + I + G4 + F. To assess the branch confidence, we ran 100 bootstrap replicates. The final tree was visualized using R/RStudio RStudio Team (2021) with a custom script using Lepisosteus oculatus as outgroup (phylo_tree_plot.r).

Gene Family Expansion and Contraction
The expansion and contraction of gene families were analysed using CAFE v4.2.1 (De Bie et al., 2006). The sequences were first clustered using MCL (Enright et al., 2002) following the CAFE developers' instructions (https://iu.app.box.com/v/cafetutorial-files) and filtering the gene families using a custom python script (cafe_filterin_blast_ dump.py) to exclude gene families that contain at least one or more species with ≥100 genes. An ultrametric tree was produced with r8s v1.81 (Sanderson, 2003) using the phylogenetic tree produced in our phylogenomic analysis and the divergence time for Thunnus thynnus and Oreochromis niloticus taken from TIMETREE (http://www.timetree.org/). Finally, CAFE was run with conditional p-values, for each gene family below 0.01.

Synteny Analysis
Synteny analyses were performed on two tiers, on whole-genome sequence comparison and at the gene level. For the whole-genome comparison, the L. sceleratus genome assembly was compared with all available genomes of Tetraodontiformes to date (T. nigroviridis, T. rubripes, T. flavidus and T. bimaculatus). Furthermore, we performed similar comparisons aligning the genomes of T. nigroviridis and T. rubripes against the other species. For the alignment, we used LAST v1145 (Kielbasa et al., 2011), implementing the sensitive alignment protocol as described for Human-mouse wholegenome project comparison (https://github.com/mcfrith/lastgenome-alignments) with e-value cut-off 0.001. On the gene level, we used the one-to-one orthologues outputs of Orthofinder v2.3.12 analysis for comparing the same set of species.
We selected for visualization the 41 largest contigs of the L. sceleratus assembly representing ∼91% of the genome (Supplementary Figure S1), for both whole-genome and genebased synteny results. In each set of fish, the whole-genome pairwise alignments were plotted by custom python scripts (synteny_plot.py) (Supplementary Figures S3-S8), while the one-to-one orthologs relationships of all the above-mentioned comparisons, were visualized through Circos (circos_plot.py) (Krzywinski et al., 2009) (Supplementary Figures S9-S13).

Genome Size and Assembly Completeness
Sequencing yielded 57.30 Gb of raw Illumina reads and 9.68 Gb of ONT reads above Q7, with N50 of 48.85 Kb. The estimated genome size was calculated via kmergenie (Supplementary Figure S1) and found to be ∼360 Mb and the best predicted k 81. After quality trimming and filtering, we retained 44.45 Gb of Ιllumina data for genome polishing and 9.67 Gb of ONT data (Table 1) for the genome assembly. The final assembled and polished genome contained 235 contigs with total length of ∼373 Mb, with 41 contigs representing ∼91% of the genome and the largest contig sizing 17 Mb and N50 of 11 Mb ( Table 2). In addition, kmer frequency profiling showed low levels of heterozygocity (0.521983%), as calculated using Genomescope2 (Supplementary Figure S2). Regarding genome completeness, we found 98% (4,513 out of 4,584) of the genes included in the BUSCO Actinopterygian gene set. Of those, 96.20% (4,410) were complete ( Table 2), suggesting a high level of completeness and contiguity in the built assembly.

Repeat Annotation, Gene Prediction and Functional Annotation
A total of 61.9 Mb of repeat sequences that accounted for 16.55% of the genome assembly were masked in L. sceleratus. The class of Retroelements makes up 7.81% of the total assembly and LINEs are the most abundant of this class, with 5.54%. LTR elements sequences (2.07%) is the second most abundant group in the Retroelements class, and the results also indicated that 2.30% of the genome assembly consists of sequences of the class DNA transposons (Table 3).

Gene Prediction and Functional Annotation
The combination of ab initio-based, homologue-based and RNASeq-based methods resulted in 32,451 putative proteincoding genes. After removing putative genes with Annotation Edit Distance (AED) (Eilbeck et al., 2009) score below one (AED<1), with custom script (longestIsoforms.py), we ended up with 21,251 genes of average gene length and exon size 587,43 and 249,62 bp, respectively. A total of 20,578 genes were successfully annotated, accounting for 97% of the predicted gene set ( Table 4). Using BLAST to search the longest proteins encoded by each locus, a total of 17,706 L. sceleratus genes had a match in T. rubripes, and 16,706 in T. nigroviridis. Of those matches, we found 14,271 reciprocal best hits in T. rubripes and 13,425 reciprocal best hits in T. nigroviridis. The completeness of the gene set was assessed using BUSCO v4.0.5 (Simão et al., 2015). From a core set of 3,640 single-copy ortholog genes from the Actinopterygii (odb 10) lineage, 92.2% were complete (70.5% as single-copy, 21.7% as duplicates), 1.7% were fragmented and 6.1% were not found.

Orthology Assignment and Phylogenomic Analysis
The total number of genes from all 30 fish proteomes (Supplementary Table S3) analysed by Orthofinder was 731,383 while 21,897 orthogroups were identified. After filtering, we chose 731 one-to-one orthogroups to construct the super-alignment. The initial matrix consisted of 494,732 amino acid positions. The Gblocksfiltered matrix contained 252,477 positions (51% of the initial), which were used for the phylogenomic analysis.
We identified JTT + I + G4 + F as the best model which was used for the estimation of the phylogenetic tree (Figure 2). At the resulted phylogeny, almost all branches were supported with 100 bootstraps. The recovered phylogenetic position of L. sceleratus is within Tetraodontidae clade and is placed closer to T. nigroviridis.

Gene Family Evolution
Gene family evolutionary analysis revealed multiple predicted rapidly expanded and contracted gene families of L. sceleratus (Figure 3; Supplementary Tables S4, S5), respectively. Gene families found only in one or more pufferfish species and not in any other taxon were considered as puffer-specific. Among rapidly evolving families, four gene families were identified as L. sceleratus specific, two as rapidly expanding (Supplementary Table S1) and two gene families as rapidly contracting. Finally, 43 T. bimaculatus and 45 T. rubripes families were found to be rapidly contracted (Supplementary Figures  S1-S2). The gene family analysis did not reveal a core set of rapidly evolving families at the base of all five puffer species ( Supplementary  Figures S2-S3).

Synteny Analysis
Whole Genome-Based Synteny The results that we obtained from whole-genome pairwise alignment of L. sceleratus against the other four puffers (T. nigroviridis, T. rubripes, T. flavidus and T. bimaculatus) are summarised in Supplementary Table S2. Comparisons of L. sceleratus against T. nigroviridis (Figure 4), T. rubripes, T. flavidus and T. bimaculatus ( Supplementary Figures S9-S11) indicated high collinearity of the studied species' genome against the rest. In particular, we aligned the 41 largest contigs of L. sceleratus against the chromosomes of T. flavidus (∼231.3 aligned Mb), T. bimaculatus (∼224 aligned Mb), T. rubripes (∼227 aligned Mb) and T. nigroviridis (∼154.7 aligned Mb). Whole genome alignments were also carried out between T. nigroviridis and T. flavidus, as well as T. bimaculatus and T. Figures S4-S6) revealing highly contiguous matches across all species, with higher collinearity observed between T. nigroviridis and T. bimaculatus compared to T. nigroviridis and T. rubripes.

Gene-Based Synteny Analysis
Synteny information obtained from a gene-based analysis revealed higher conserved synteny between L. sceleratus and T. nigroviridis ( Figure 5) than between L. sceleratus and T. rubripes or T. bimaculatus (Supplementary Figures  S14-S15). L. sceleratus' contigs that are unaligned to T. nigroviridis chromosomes in Figure 6, were aligned to unplaced regions of the T. nigrorviridis assembly (Supplementary Figure S6).

L. sceleratus Lacks SCN4A Mutations Previously Associated With Pufferfish TTX Resistance
The SCN4A protein has been previously associated with pufferfish TTX resistance, showing that mutations in the core of its ion transport domain led to changes in TTX binding affinity. We performed multiple sequence alignment of the ion transport domain sequences from different Tetraodontiformes species and selected outgroups, including teleosts (SCN4AA and SCN4AB paralogs) and other vertebrates. We found that L. sceleratus and M. mola lack the previously characterized mutations in domains 1 and 2 in residues outlined in red. In contrast, all Tetraodontiformes show changes in the second to last residue of the domain 4 sequence outlined in red in Figure 6. Frontiers in Genetics | www.frontiersin.org December 2021 | Volume 12 | Article 790850 9

DISCUSSION Genome Size and Assembly Completeness
In this study, a high-quality pufferfish genome assembly of high contiguity was reconstructed, from data obtained from a single MinION flow cell and half a lane of Illumina HiSeq. To our knowledge, only one other highly contiguous reference Tetraodontiformes genome assembly has previously been constructed using the same strategy, for Thamnaconus septentrionalis (Bian et al., 2019). The assembly of L. sceleratus (360 Mb) is consistent with the predicted genome size from kmergenie and is comparable in size with that of other puffers, such as  Kang et al., 2020) and T. nigroviridis (340 Mb, Jaillon et al., 2004). The contig N50 value (∼11 Mb) of the L. sceleratus assembly is considerably greater than that reported for the genomes of T. bimaculatus (1.31 Mb; Zhou Z. et al., 2019) and T. flavidus (4.4 Mb; Zhou Y. et al., 2019). Similarly, our assembly appears of equivalent levels of completeness to other Tetraodontidae genomes, based on BUSCO scores (e.g., T. obscurus (Kang et al., 2020) and T. flavidus (Zhou et al., 2019a)].

Repeat Content, Gene Prediction and Functional Annotation
The percentage of transposable elements (TEs) found in the L. sceleratus genome (16.55% of the assembled genome) is marginally higher than the one found in T. septentrionalis (14.2%) (Bian et al., 2019), T. obscurus (11.05%) (Kang et al., 2020), and M. mola (11%) (Pan et al., 2016). Moreover, it is almost twofold higher that in T. rubripes (7.53%) and threefold higher than T. nigroviridis (5.60%) and T. flavidus (6.87%) (Gao et al., 2014). T. rubripes contain more copies of transposable elements than T. nigroviridis, which have been proposed to contribute to its marginally larger genome size (365-370 Mb) (Jatllon et al., 2004). Although the L. sceleratus genome has a comparable size to reported Takifugu genomes, it harbors much higher repeat content. Moreover, D. holocanthus genome of the Diodontidae family contains 36.35% repetitive sequences, almost double the repeat content of L. sceleratus. These findings imply that TEs might follow an independent pathway of accumulation and diversification across Tetraodontiformes species. In the case of L. sceleratus, such differential repeat expansion may have taken place after the divergence of the Takifugu and Tetraodon genera.
Frontiers in Genetics | www.frontiersin.org December 2021 | Volume 12 | Article 790850 been documented across a larger evolutionary scale in teleosts (Shao et al., 2019). For example, the relatively smaller genome of T. nigroviridis (∼360 Mb) contains 5.6% TEs, in contrast to the zebrafish genome (∼1.4 Gb) which is composed of 55% repetitive sequences (Shao et al., 2019). This positive correlation is also reflected in the small size and relatively low repeat content of the L. sceleratus genome, regardless of differences with other pufferfish. However, it would be interesting to further explore these differences, as they may be informative for genome evolution. As an interesting example, LINE elements are the most abundant in the L. sceleratus genome, with ∼170,000 copies, as compared to the ∼12,300 copies of the T. rubripes genome. This finding indicates dynamic genome evolution in the two species.
Previous studies have shown a correlation between genome TEs and species adaptations to new environments, suggesting they may be associated to invasiveness (Stapley et al., 2015;Yuan et al., 2018). Thus, the repeat content of L. sceleratus may play a role in its fast adaptation to novel environments and should be investigated further.

Species Tree Reconstruction
Although the order Tetraodontiformes is a cosmopolitan taxonomic group that includes multiple families, large parts of their phylogenetic relationships remain unexplored. In this study, we presented the first phylogenetic tree based on whole genome data including the invasive "sprinter" L. sceleratus. The recovered phylogenetic position of L. sceleratus is within Tetraodontidae and is placed closer to T. nigroviridis, while the long branch length of the Tetraodontidae clade possibly suggests a faster evolutionary rate. Regarding relationships within the pufferfish group (T. nigroviridis, T. rubripes, T. flavidus, T. bimaculatus and L. sceleratus), the resulting topology agrees with previous studies (Yamanoue et al., 2011;Meynard et al., 2012;Hughes et al., 2018;Hughes et al., 2021). Moreover, the Tetraodontidae group was recovered confidently as monophyletic in accordance with Yamanoue et al. (2011). Our results suggest that Tetraodontiformes are the closest group to Sparidae and corroborates the results of Natsidis et al. (2019) and of others (Kawahara et al., 2008;Meynard et al., 2012), based both on six mitochondrial and two nuclear genes.

Synteny Analysis
All pairwise comparisons of the whole-genome alignment analysis of L. sceleratus against the four other Tetraodontidae species ( Figure 5) (Supplementary Figures S9-S11), showed highly conserved synteny. The genome that exhibited the highest synteny conservation with the L. sceleratus genome was that of T. nigroviridis, in accordance with our reconstructed phylogeny which places the two species as more closely related to each other compared to the rest. The synteny between L. sceleratus and the three species of the genus Takifugu (T. rubripes, T. bimaculatus and T. flavidus) was less conserved, especially between L. sceleratus and T. bimaculatus.
To sum up, the higher synteny between L. sceleratus and T. nigroviridis corroborates their closer phylogenetic position compared to the three Takifugu species.

Gene Family Evolution and Adaptation
Adapting to a new habitat is a challenging task for a species, requiring a certain degree of physiological plasticity. To achieve establishment in a new niche, an invader must face environmental challenges that involve both biotic and abiotic factors (Crowl et al., 2008). Invasive species are facing novel pathogens during the colonisation of new environments and the ability to deal with these new immune challenges is key to their invasive success (Lee and Klasing, 2004). Interestingly, we found several expanded immune related families, including immunoglobulins (C-Type and V-Type), Ig heavy chain Mem5like, B-cell receptors and the Fish-specific NACHT associated domain, which are related to the innate immunity (Stein et al., 2007).
In addition, we also detected major histocompatibility complex (MHC) class I genes in the expanded gene families. MHC genes are crucial for the immune response, involved in pathogen recognition by T cells (Germain, 1994), thus initiating the adaptive immune response. The expanded repertoire of L. sceleratus immune response associated genes might be related to its survival in novel habitats, through the detection and inhibition of a wide range pathogens. Therefore, in this context, we suggest further research to explore the role of the expanded genes related to immune response.
Another interesting finding was the expansion of the fucosyltransferase (FUT) gene family. In particular, we detected 24 FUT9 [alpha (1.3) fucosyltransferase 9] genes. Glycosylation is one of the most frequent post-translational modifications of a protein. Many proteins involved in the immune response are glycosylated, extending their diversity and functionality (Bednarska et al., 2017). Fucosylation, a type of glycosylation, plays an essential role in cell proliferation, metastasis and immune escape (Jia et al., 2018). In mice, FucTC has been shown to regulate leukocyte trafficking between blood and the lymphatic system, after its engagement in selectin ligand biosynthesis (Maly et al., 1996).
Overall, based on our results, we believe that the expanded innate immune system gene families identified herein deserve further study as to whether they have contributed to the ability of L. sceleratus to spread rapidly (Kalogirou, 2013).

TTX Resistance of SCNA4 Sodium Channel
Sodium channels (NaV) are formed by an α subunit consisting of four domains (I-IV) and an optional β subunit that may alter its activity (Yu et al., 2005). Furthermore, the third whole-genome duplication of teleosts has expanded the α subunit family to eight members (Zakon et al., 2011). NaV channels are the target of several neurotoxins (Daly, 1995;Fry et al., 2009) and TTX is known to bind to the outer pore of the NaV1.4 (SCN4) channel, blocking the transport of sodium ions across the pore (Hanifin, 2010). As different pufferfishes have been shown to acquire TTX resistance through mutations in specific domains of NaV1.4 (Venkatesh et al., 2005;Soong and Venkatesh, 2006;Jost et al., 2008), we checked if this is also the case for L. sceleratus. For this purpose, we investigated whether the sequences of voltage-gated sodium channels (SCN4AA, SCN4AB) in L. sceleratus carry the previously reported mutations associated with TTX resistance in Frontiers in Genetics | www.frontiersin.org December 2021 | Volume 12 | Article 790850 other pufferfish. According to these studies (Venkatesh et al., 2005;Soong and Venkatesh, 2006), the residues that are associated with TTX resistance are located in the same position in NaV1.4a Domain I and were mutated to Cys and Asn, in T. nigroviridis and T. rubripes, respectively ( Figure 6). Surprisingly, despite the fact that the Domain I mutations lead to extensive decrease in TTX binding (Satin et al., 1992;Kaneko et al., 1997;Yotsu-Yamashita et al., 2000;Venkatesh et al., 2005) and have been associated with TTX resistance in pufferfishes and in one C. pyrrhogaster newt (Kaneko et al., 1997), we did not observe any of these reported mutations in the SCN4A_A gene of L. sceleratus (Figure 6). Similarly, no changes from the ancestral sequence of the same paralog were found in M. mola, which is also a toxin resistant species (Halstead, 1988). The absence of mutations in L. sceleratus and M. mola, in conjunction with the updated Tetraodontiformes relationships presented in Figure 3, suggest that the changes seen in T. rubripes and T. nigroviridis have arisen via convergent evolution, a hypothesis further supported by the different amino acid replacement observed in these two species. This could also suggest that the position of the replacement is more important than the amino acid change per se.
Our results are also congruent with previous findings for Hapalochlaena lunulata (Geffeney et al., 2019) and Hapalochlaena maculosa (Whitelaw et al., 2020), as NaV1.4a Domain I is highly conserved in both octopod species. Furthermore, unlike the pufferfishes A. nigropunctatus and T. nigroviridis (Jost et al., 2008), the garter snake Thamnophis couchii (Feldman et al., 2012) and the octopuses Hapalochlaena lunulata (Geffeney et al., 2019) and Hapalochlaena maculosa (Whitelaw et al., 2020), replacements were also not observed in the NaV1.4a Domain III of L. sceleratus. Similar to L. sceleratus, mutated residues are not found in T. rubripes either. However, in NaV1.4a Domain IV all the Tetraodontiformes species have a replacement in the same position (outlined in red in Figure 7). Interestingly, a different mutation is observed in each species, except for T. rubripes and T. pardalis, which both share a change to Thr. Similar to what is discussed earlier for the changes observed in domain I, the position of the replacement may be more crucial than the specific changes observed. This could hint to a decrease in TTX binding by disturbing the conserved ancestral binding interface, as previously shown in domain I mutations in T. rubripes and T. nigrovidis (Venkatesh et al., 2005;Soong and Venkatesh, 2006). Previously, these mutations for T. rubripes and T. nigroviridis had not been associated with TTX resistance (Venkatesh et al., 2005;Soong and Venkatesh, 2006). At the last positions of Domain IV, there are also two replacements to His and Ser in H. lunulata (Geffeney et al., 2019) and H. maculosa (Whitelaw et al., 2020), which may inhibit TTX binding. The same studies (Venkatesh et al., 2005;Soong and Venkatesh, 2006;Jost et al., 2008) also showed that an Asp mutation in Domain II of T. nigroviridis NaV1.4b is highly associated with TTX resistance. A similar substitution has been reported in the soft-shelled toxin-resistant clam M. arenaria (Bricelj et al., 2005). Strikingly, the L. sceleratus and M. mola NaV1.4b sequences also lack mutations in Domain II, comparable to NaV1.4a domain I (Figure 6).
While the origin of TTX resistance remains elusive, several studies have provided insight into resistance mechanisms. Toxin tolerance may appear through mutations in sodium channels or toxin binding proteins (Ho et al., 1994;Zou, 2020). A wide range of different organisms have taken advantage of mutations which confer TTX tolerance either independently or complementary. The lack of previously characterised pufferfish mutations associated with TTX resistance in L. sceleratus raises several questions about how TTX resistance has evolved. Nevertheless, the results of our analysis imply that combined effects of complex polygenic adaptations working redundantly have played a role in the evolution of this complex trait, while similar genetic changes have arisen convergently multiple times.

CONCLUSION
Invader fishes, such as L. sceleratus, often thrive in novel environments. Our analysis provides the first high-quality genome assembly built from sequencing a single female individual and a comprehensive evolutionary genomic analysis of the species. We uncovered a close phylogenetic position of L. sceleratus with T. nigroviridis, untangling relationships within the pufferfish group, that were not clearly resolved in previous studies. The study gives insights into a variety of genomic signatures that may be associated with L. sceleratus invasion and colonisation effectiveness. Surprisingly, examination of voltage-gated sodium channels (NaV1.4) revealed a lack of TTX resistance associated mutations found in other pufferfishes, highlighting the complex evolution of the trait. Overall, the L. sceleratus genome will be an invaluable resource for additional studies on immune response in novel environments, osmoregulation, reconstruction of ancient chromosome rearrangements, investigation of complex TTX resistance mechanisms and population genomics and adaptation. Such studies are expected to elucidate the mechanisms behind the high invasiveness of L. sceleratus and assist the management of this invasive sprinter in the Mediterranean.

DATA AVAILABILITY STATEMENT
Genomic and transcriptomic Illumina and Oxford Nanopore raw data can be accessed in ENA with the IDs ERX6020380, ERS6677531 and ERR6391424 respectively. The genome assembly has been uploaded with the ENA ID CAJVRN010000000.

ETHICS STATEMENT
Ethical review and approval was not required for the animal study because this study does not include experimentation on laboratory animals for which a protocol approval is required by the law. Further, as stated within the manuscript we followed guidelines for animal care and handling [Guidelines for the Frontiers in Genetics | www.frontiersin.org December 2021 | Volume 12 | Article 790850