Gene Arrangement and Adaptive Evolution in the Mitochondrial Genomes of Terrestrial Sesarmid Crabs Geosesarma faustum and Geosesarma penangensis

Citation: Lau N-S, Sam K-K, Ahmad AB, Siti K-A, Ahmad Zafir AW and Shu-Chien AC (2021) Gene Arrangement and Adaptive Evolution in the Mitochondrial Genomes of Terrestrial Sesarmid Crabs Geosesarma faustum and Geosesarma penangensis. Front. Ecol. Evol. 9:778570. doi: 10.3389/fevo.2021.778570 Gene Arrangement and Adaptive Evolution in the Mitochondrial Genomes of Terrestrial Sesarmid Crabs Geosesarma faustum and Geosesarma penangensis


INTRODUCTION
Decapoda order (Crustacea and Malacostraca) is the most diverse and species-rich taxonomy group comprising many of the well-known invertebrates such as shrimps, prawns, crayfishes, lobsters, and true crabs (Shen et al., 2013). The true crabs belong to Brachyura, the largest infraorder within Decapoda with approximately 7,200 described species and some of which are economically important (De Grave et al., 2009;Ahyong et al., 2011). Brachyura crabs have a diverse range of forms and adaptations, making them one of the most important groups to study in terms of evolution (Castro et al., 2015). Brachyura encompasses Podotremata, Heterotremata, and Thoracotremata, the latter two of which are collectively referred to as Eubrachyura (de Saint Laurent, 1980;Ahyong et al., 2007). Within Eubrachyura, the family Sesarmidae includes terrestrial, semiterrestrial, or treeclimbing species occurring mainly in the mangroves (Cumberlidge et al., 2005;Serrano-Sánchez et al., 2016). Sesarmid crabs, due to their role in nutrient cycling, have been considered keystone species in the mangrove ecosystem (Smith et al., 1991). Multiple transition routes have been proposed for this family, which comprises members that have colonized land via both marine and freshwater environments (Watson-Zink, 2021).
The genus Geosesarma De Man, 1892, belonging to the Sesarmidae family, is represented by 67 species at present. The genus is distributed in Malaysia, Indonesia, Philippines, Thailand, Papua New Guinea, and the Andamans (Ng et al., 2008;Ng and Wowor, 2019;Shy and Ng, 2019;Naruse and Ng, 2020). Geosesarma species are commonly known as vampire crabs due to the bright yellow eyes of some species (Ng et al., 2015). Geosesarma species are amphibious or terrestrial, and they sometimes live far from water (Hartnoll, 1988). As many of the species display abbreviated or no larval development and low fecundity, Geosesarma species often have restricted distributions and a high degree of endemism (Schubart and Ng, 2014). Geosesarma faustum and Geosesarma penangensis are found on Penang Hill, Malaysia, with G. faustum associated with phytotelms and living at altitudes greater than 700 m, whereas G. penangensis at lower altitudes (Ng, 2017). Geosesarma species are small crabs with carapace widths that rarely exceed 10 mm (Hartnoll, 1988). Their small size, high level of endemism, and cryptic nature account for the scarcity of information on the genus.
The animal mitogenomes are circular double-stranded molecules of 15-18 kb in size and generally contain 37 genes such as 13 protein-coding genes (PCGs), 22 transfer RNA (tRNA) genes, 2 ribosomal RNA (12S rRNA and 16S rRNA) genes, and an A + T-rich control region (Boore, 1999). Crustacean mitogenomes exhibit exceptions to this organization with instances of duplications of tRNA gene and rearrangements of the gene order (Segawa and Aotsuka, 2005;Basso et al., 2017). Due to its smaller effective population size and lack of recombination, mitochondrial DNA (mtDNA) is more strongly influenced by evolutionary processes than nuclear DNA and has thus been used to analyze genetic diversity and adaptive evolution (Shen et al., 2019;Peng et al., 2021). The high level of endemism exhibited by Geosesarma crabs makes them an index for the study of adaptation to the local environment. Nevertheless, genomic and genetic resources are lacking for the genus Geosesarma. Previous studies have assigned Geosesarma to Sesarmidae based on morphology (Ng et al., 2008;Ng, 2017;Ng and Wowor, 2019;Shy and Ng, 2019); nevertheless, no genetic analysis has been performed to show the position of Geosesarma within Sesarmidae. In this study, the complete mitogenomes of G. faustum and G. penangensis were analyzed and compared with other Brachyura mitogenomes. The results illustrate the taxonomic placement of Geosesarma within Brachyura, gene arrangement of Geosesarma, and adaptive evolution in mitochondrial genes related to adaptation to hill habitat.

Sample Collection and Mitogenome Sequencing
Adult specimens of G. faustum and G. penangensis were collected from Penang Hill, Malaysia (5.4085 • N, 100.2773 • E) in April 2021. Voucher specimens of G. faustum and G. penangensis were deposited at the Zoological Reference Museum Collection, Universiti Sains Malaysia. Total DNA was extracted from muscle tissues of three individuals in total for each species using the DNeasy Blood and Tissue Kit (QIAGEN, Germany) according to the instructions of the manufacturer and pooled together. DNA quality was assessed using gel electrophoresis, and DNA concentration was quantified using the dsDNA HS Assay Kit on the Qubit Fluorometer (Thermo Fisher, United States). Sequencing libraries were constructed using the TruSeq PCR-Free DNA Library Kit (Illumina, United States) and sequenced on an Illumina NovaSeq 6000 platform. Paired-end 150-bp sequencing yielded approximately 5.8 Gb data for each species.

Mitogenome Assembly, Annotation, and Comparative Analyses
Raw reads were assembled by baiting and iterative mapping to the reference mitogenome sequence of Sesarmops sinensis (NC_030196) using MITObim (v1.9.1) (Hahn et al., 2013) and MIRA (v4.0.2). Annotations were performed with the MITOS pipeline (Bernt et al., 2013) using the invertebrate mitochondrial genetic code. The tRNA genes were annotated, and their secondary structures were inferred using the MiTFi module in the MITOS. Secondary structures of the predicted tRNAs were visualized on the Forna web server. Tandem Repeats Finder was used to detect repeat sequences in the control region, and their secondary structures were predicted with Mfold software. The base composition and the relative synonymous codon usage were determined using MEGA X. The skew in the nucleotide composition was calculated according to the formulas as follows: AT-skew = (A-T)/(A + T) and GCskew = (G-C)/(G + C). Circular mitogenomes representations were drawn using CGView online server. The ratio of nonsynonymous to synonymous substitutions (ω = dN/dS) was estimated using the codon-based maximum likelihood (ML) method in PAML (v4.9) (Yang and Nielsen, 2000). The branchsite test was applied with G. faustum or G. penangensis as the foreground branch and the rest of the Sesarmidae as the background branch. A likelihood ratio test was used to compare model A against the null model A. The posterior probability of the positively selected sites was computed by the Bayes Empirical Bayes method.

Phylogenetic Analyses
The phylogenetic analyses were performed using 13 PCG sequences from 76 species covering representative Brachyura families, and Clibanarius infraspinatus was used as the outgroup (Supplementary Table 1). The nucleotide sequences of the PCGs were aligned using MAFFT (v7.313) in auto mode and then concatenated. The ambiguous sequences or poorly aligned regions in the alignment were removed using Gblocks (v0.91b) with the default setting. Potential saturation in the nucleic acid sequences was assessed using DAMBE (v7.3.5). None of the tests yielded the index of substitution saturation (Iss) greater than the critical one (Iss < Iss.cSym or Iss.cAsym) except for the third codon position in asymmetrical topology (Iss > Iss.cAsym) (Supplementary Table 2). The best-fitting evolutionary model was determined using ModelFinder (v2.1.1) based on the Akaike Information Criterion. The ML tree was built in IQ-Tree (v1.6.8) (Nguyen et al., 2015) with 1,000 bootstrap replicates and the GTR + F + I + G4 model. The Bayesian inference (BI) was performed with MrBayes (v3.2.7) (Ronquist et al., 2012) using the GTR + I + G model. Markov Chain Monte Carlo was run for 10 million generations with four independent chains. Sampling was performed every 1,000 generations, and the initial 25% of the generations were discarded as burn-in. The convergence of the runs was assessed using Tracer (v1.7.2). The phylogenetic trees were visualized using FigTree (v1.4.4) software 1 .

Genome Structure, Organization, and Composition
In this study, the complete mitochondrial genome sequences of the vampire crabs G. faustum and G. penangensis were determined. The mitogenomes of G. faustum and G. penangensis are 15,880-and 15,955-bp long (Figure 1A), respectively, which fall within the range of other Sesarmidae mitogenomes from 15,611 bp (Parasesarma pictum) to 15,920 bp (Chiromantes neglectum) (Supplementary Table 3). The mitogenomes follow a typical metazoan mtDNA structure comprising 37 genes (13 PCGs, 22 tRNAs, and 2 RNAs) and a major non-coding region known as the control region (Supplementary Table 4). As a result of the compactness of mitogenomes, there are a total of 45-to 71bp overlap between genes, and the longest overlap occurs between trnL1 and large rRNA (rrnL) (25-55 bp). The mitogenomes also contain 331-to 372-bp intergenic sequences distributed in 20 regions ranging from 1 to 187 bp in size. The whole mitogenomes of Geosesarma are biased toward AT nucleotides (78.44-78.49%) (Supplementary Table 5), showing significant strand asymmetry in a similar manner to other Sesarmidae mitogenomes. The overall AT-and GC-skews for the mitogenomes were also negative, indicating higher occurrences of T than A and C than G. The AT-and GC-skews for all sequenced Sesarmidae mitogenomes were calculated and compared (Supplementary Table 3). The GC-skews for all Sesarmidae are negative, ranging from -0.194 (P. pictum) to -0.232 (Metopaulias depressus), and the AT-skews are weakly negative, ranging from -0.010 (Chiromantes dehaani and C. neglectum) to -0.032 (P. pictum).
The relative synonymous codon usage for 13 PCGs was analyzed (Supplementary Figure 1). Both the mitogenomes of G. faustum and G. penangensis consist of 3,716 codons (Supplementary Figure 2). In the 13 PCGs, the most frequently utilized amino acids were Leu (12.94%), Ile (10.09-10.15%), and Phe (9.58-9.85%). The usages of both the twofold and fourfold degenerate codons are biased toward codons abundant in A or T. This amino acid bias trend is consistent with the higher representation of nucleotides A and T in the whole mitogenome.
Similar to other Brachyura mtDNAs, G. faustum and G. penangensis mitogenomes contain a complete set of 22 tRNA genes ranging in size from 63 bp (trnF) to 73 bp (trnV). The majority of tRNAs fold into a typical cloverleaf secondary structure, although trnD, trnH, and trnR (G. faustum) have truncated dihydrouridine (DHU) loop compared with normal tRNA (Supplementary Figure 3). The trnS also lacks the thymidine-pseudouridine-cytidine (T C) stem and loop, whereas trnC (G. penangensis) has reduced the DHU stem. Aberrant tRNA structures in mitogenomes appear to be a common phenomenon in many species (Li et al., 2020;; previous research on nematodes and metazoans suggests that such features have no effect on tRNA function (Watanabe et al., 2014;Fourdrilis et al., 2018).
The rrnL and small rRNA (rrnS) genes of G. faustum and G. penangensis are 1,364-1,394 bp and 825-844 bp in length, respectively, which both are located between trnL1 and control region, and separated by trnV. The control region located between rrnS and trnQ spans 712-785 bp and has functional importance related to the initiation of replication and transcription of the mitogenome (Clayton, 1992). This region exhibits high A + T content (82.56-83.69%) and contains features such as tandem repeat, microsatellite sequence (TA) n , and stem-loop structure (Supplementary Figure 4).

Selection Analyses
To analyze the selective pressure on G. faustum and G. penangensis compared with other Sesarmidae species, the non-synonymous/synonymous substitution ratios of the mitochondrial PCGs were estimated under the branch-site test ( Figure 1B). The analysis detected signals of positive selection in cob of G. faustum, as well as nad5 and atp6 of G. penangensis at seven codon sites. The different positively selected sites detected in the two Geosesarma species could be attributed to differences in the mechanisms used by crabs endemic to different locations. Previous studies have found evidence for positive selection in mitochondrial genes that may play a role in adaptive evolution in organisms inhabiting extreme environments or with requirements for extreme metabolic demands (Li et al., 2018;Sun et al., 2018;Shen et al., 2019). In Chinese snubnosed monkeys and galliform birds living in high-altitude environments, the selections on mitochondrial genes are driven by the requirements for higher energy demand and oxygen metabolism (Yu et al., 2011;Zhou et al., 2014). In brachyurans, a positive selection of mitochondrial genes has only been reported in a deep-sea representative . We showed here, for the first time, the occurrence of positive selection detected in a land crab, which suggests a relationship between altered metabolic requirements and high-altitude colonization. Terrestrial brachyurans display morphological, behavioral, and physiological adaptations to sustain prolonged periods from the water source (Lim and Goh, 2021). Among these, osmotic and ionic regulation, waste excretion, aerial respiration, and thermoregulation are pivotal changes (Watson-Zink, 2021). The positive selection of PGCs in the two Geosesarma species showed the importance of adaptation in respiratory physiology for survival at higher altitudes. The differences in positive selection sites and genes between G. faustum and G. penangensis, which occupy different altitude niches, warrant further investigation.

Gene Rearrangement
Compared to the hypothetical ancestral pancrustacean gene order (Boore et al., 1998), it is evident that the Geosesarma experienced mitochondrial gene rearrangement (Figure 1C). Frontiers in Ecology and Evolution | www.frontiersin.org Typically, trnH located between nad4 and nad5 in the pancrustacean ground pattern had been shifted into the trnE and trnF junction in the Geosesarma mitogenomes. This translocation is a relatively common event in the brachyuran mitogenomes Wang Q. et al., 2020). In the most common brachyuran gene order, the tRNA gene arrangement between the control region and nad2 is trnI-trnQ-trnM. Nevertheless, G. faustum and all Sesarmidae mitogenomes (except G. penangensis) shared a transposition of trnQ that resulted in the trnQ-trnI-trnM order. The tandem duplicationrandom loss model is one of the most widely used mechanisms to explain mitochondrial gene arrangement (Moritz and Brown, 1987;Macey et al., 1997). In this case, tandem duplication in the trnF-nad5-trnH region, followed by loss of supernumerary genes, is the most likely process for forming trnH-trnF-nad5 gene order. Meanwhile, the trnQ-trnI-trnM arrangement could also be caused by slipped-strand mispairing followed by gene deletion. Despite belonging to the Geosesarma genus, G. penangensis differs from G. faustum and the rest of the Sesarmidae by showing trnI-trnQ-trnM. This feature raises the possibility of a secondary independent reversal to the plesiomorphic brachyuran gene arrangement. Such reversion is extremely rare; endemism in Geosesarma species may have promoted the evolutionary transition. Brachyura families such as Varunidae, Macropthalmidae, Xenograpsidae, Majidae, Potamidae, and Parathelphusidae also showed signatures of gene arrangement with respect to the ancestral pancrustacean pattern.
Sesarmidae species are most closely related to Gecacinidae, constituting part of the Grapsoidea group, which is in accordance with the results of the studies by Chen et al. (2019), Wang et al. (2019), Lu et al. (2020), . The mitochondrial phylogenomic analyses including our dataset have shown the divergent phylogenetic status of Grapsidae and Dotilidae (Wang et al., 2019;Li et al., 2020;. In the ML tree, Grapsidae and Dotilidae form a sister group and constitute the base of the (((((Sesarmidae) + Gecarcinidae) + Xenograpsidae) + Ocypodidae) + (Varunidae + (Macropthalmidae + Mictyridae))) clade. Nevertheless, the BI tree placed Grapsidae and Dotilidae group at the basal to the ((((Sesarmidae) + Gecarcinidae) + Xenograpsidae) + Ocypodidae) clade. While ML tree recovered Dynomenidae + [Homolidae + (Raninidae)] relationship that was similar to those reported in the studies by Lu et al. (2020) and , BI tree exhibited Homolidae + [Dynomenidae + (Raninidae)] topology. The presence of only one or a small number of representatives for certain families could have resulted in unstable phylogenetic relationships.
In this study, the complete mitogenomes of G. faustum and G. penangensis were characterized and compared with that of other Brachyura. The phylogenetic analyses confirmed that Geosesarma belongs to Grapsoidea, Sesarmidae. Evidence of positive selection was detected in Geosesarma mitochondrial genes, highlighting adaptive evolution in crabs for the colonization of terrestrial hill habitat. Unlike G. faustum and all other Sesarmidae mitogenomes, G. penangensis has a trnI-trnQ-trnM gene arrangement, suggesting a reversal to the plesiomorphic brachyuran pattern. Importantly, our study provides genomic resources that are pivotal to understand the biology of endemic Geosesarma crabs.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: NCBI (accession: MZ725940, MZ725941, and SRR15514280-1).

AUTHOR CONTRIBUTIONS
ACS-C and N-SL conceived and designed the experiments. N-SL analyzed the data and wrote the manuscript. K-KS performed the experiments. AA, K-AS, and AWAZ collected and identified the samples. All authors reviewed the manuscript.