Comparative Genomics Studies on the dmrt Gene Family in Fish

Doublesex and mab-3-related transcription factor (dmrt) genes are widely distributed across various biological groups and play critical roles in sex determination and neural development. Here, we applied bioinformatics methods to exam cross-species changes in the dmrt family members and evolutionary relationships of the dmrt genes based on genomes of 17 fish species. All the examined fish species have dmrt1–5 while only five species contained dmrt6. Most fish harbored two dmrt2 paralogs (dmrt2a and dmrt2b), with dmrt2b being unique to fish. In the phylogenetic tree, 147 DMRT are categorized into eight groups (DMRT1–DMRT8) and then clustered in three main groups. Selective evolutionary pressure analysis indicated purifying selections on dmrt1–3 genes and the dmrt1–3–2(2a) gene cluster. Similar genomic conservation patterns of the dmrt1–dmrt3–dmrt2(2a) gene cluster with 20-kb upstream/downstream regions in fish with various sex-determination systems were observed except for three regions with remarkable diversity. Synteny analysis revealed that dmrt1, dmrt2a, dmrt2b, and dmrt3–5 were relatively conserved in fish during the evolutionary process. While dmrt6 was lost in most species during evolution. The high conservation of the dmrt1–dmrt3–dmrt2(2a) gene cluster in various fish genomes suggests their crucial biological functions while various dmrt family members and sequences across fish species suggest different biological roles during evolution. This study provides a molecular basis for fish dmrt functional analysis and may serve as a reference for in-depth phylogenomics.


INTRODUCTION
Doublesex and Mab-3-related transcription factor (dmrt) genes are originally homologous to Doublesex (Dsx) in Drosophila melanogaster and Male abnormal 3 (Mab-3) in Caenorhabditis elegans, both of which play important roles in sex determination (Burtis and Baker, 1989;Zhu et al., 2000;Zarkower, 2001). In recent years, a large number of genes from the dmrt family have been identified from lower invertebrates to higher vertebrates, including corals, nematodes, fruit flies, frogs, fish, birds, and mammals, some of which have been confirmed to be related to sex differentiation (Hodgkin, 2002). Currently, in addition to Dsx and Mab, the dmrt family in vertebrates include nine dmrt genes (dmrt1-8 and dmrt2b) that share common characteristics with Dsx and Mab-3. Almost all of the encoded polypeptide chains contain a conserved DNAbinding motif, known as the Doublesex and Mab-3 (DM) domain, which is composed of six conserved cysteines and two histidines (locus 1 of CCHC and locus 2 of HCCC). Both loci form two highly intertwined zinc-finger-like DNAbinding regions can bind to the minor groove in DNA. Notably, this domain is highly conserved among organisms of different evolutionary types (Erdman and Burtis, 1993;Zhu et al., 2000).
Fish dmrt genes were first discovered in Nile tilapia (Oreochromis niloticus) and rainbow trout (Oncorhynchus mykiss) (Guan et al., 2000;Marchand et al., 2000). These genes in the dmrt family have now been identified in more than 30 fish species. Seven dmrt genes have been found in fish, including dmrt1-6 and dmrt2b. DMRT1 plays an important role in sex differentiation and testicular development (Matson and Zarkower, 2012), except the DM-W gene, a DMRT1 W-linked paralog in Xenopus laevis, play the opposite roles in primary ovary development (Yoshimoto et al., 2010). DMRT1 is specifically expressed only in the embryonic genital ridge and adult testes of human males, and is related to the expression of sex-determining genes and differentiation of primordial germ cells (Raymond et al., 1998;Moniot et al., 2000;Matson et al., 2011). Alternatively, studies on more than 20 fish species have determined that fish dmrt1 expression is related to male development regardless of the various sex determination mechanisms (Kobayashi et al., 2004(Kobayashi et al., , 2008Johnsen et al., 2010), indicating that dmrt1 plays a key role in male germ cells self-renewal and differentiation, testicular development and spermatogenesis of fish (Herpin and Schartl, 2011;Lin et al., 2017). Furthermore, in the medaka Oryzias latipes, a Y-specific dmy gene, copy of autosome dmrt1, is the master sex-determining gene inducing male formation too.
The genomes of amphibians, reptiles, birds, and mammals contain only a single dmrt2 gene, whereas fish harbor two dmrt2 genes (dmrt2a and dmrt2b) (Liu et al., 2009;Su et al., 2015;Lyu et al., 2019). DMRT2 is widely distributed in the tissues of mammals and fish, and is expressed in both testes and ovaries (Kim et al., 2003;Winkler et al., 2004;El-Mogharbel et al., 2007). However, the function of DMRT2 has not been conserved during the evolution of species (Meng et al., 1999;Seo et al., 2006). For example, mouse DMRT2 is mainly involved in somite differentiation, in particular the patterning of the axial skeleton system (Lourenco et al., 2010). In contrast, both zebrafish dmrt2a and dmrt2b are involved in somite development, of which dmrt2a is necessary for symmetric somite formation and fast muscle differentiation (Saude et al., 2005;Lu et al., 2017), and dmrt2b regulates asymmetric organ positioning via the Hedgehog signaling pathway and therefore it is related to branchial arch and slow muscle development Li et al., 2018). This indicates that differences exist in the expression and functionality of dmrt2a and dmrt2b in fish.
Mammalian DMRT3 is highly expressed in the testis but not in the ovary; hence, it may be related to testicular differentiation and development (Hong et al., 2007). In mice, DMRT3 is also expressed in numerous non-gonadal tissues such as the embryonic forebrain and olfactory placode, in addition to spinal cord neurons, and thus it may be involved in neuronal specification (Smith et al., 2002;Kim et al., 2003;Andersson et al., 2012). Fish Dmrt3 is highly expressed in the testis and nervous system, and has accordingly been speculated to play a role in the developmental processes of the nerves and germ cells (Yamaguchi et al., 2006;Li et al., 2008;Dong et al., 2010).
The mouse dmrt4 gene is expressed in the testis and ovary, in addition to other various tissues (Kim et al., 2003). It can regulate the formation and development of ovarian follicles (Balciuniene et al., 2006). Alternatively, Xenopus DMRT4 is involved in the regulation of neurogenesis in the olfactory system (Huang et al., 2005b). In some fish species, the expression of dmrt4 in the ovary is significantly higher than that in the testis (Guan et al., 2000;Su et al., 2013;Wang, 2013); in other species, its expression is significantly higher in the testis than the ovary (Kondo et al., 2002;Dong and Chen, 2013;Sheng et al., 2014), whereas yet other species show high expression in both organs (Yamaguchi et al., 2006). In addition, dmrt4 is also expressed in the spleen (Yamaguchi et al., 2006;Sheng et al., 2014), kidney (Kondo et al., 2002;Sheng et al., 2014), gills (Kondo et al., 2002;Wang, 2013), and brain (Dong and Chen, 2013) in fish. Hence, it has been speculated to be related to immune and nervous system development.
Mouse DMRT5 is mainly expressed in brain tissue and is necessary for the early embryonic development of the cerebral cortex (Veith et al., 2006a;Konno et al., 2012). As a novel neurogenic factor, DMRT5, together with DMRT3, jointly controls hippocampal development and neocortical area map formation (Muralidharan et al., 2017;De Clercq et al., 2018). Fish dmrt5 is highly expressed primarily in the brain but can also be found in the gonads, eyes, and pituitary gland (Guo et al., 2004;Veith et al., 2006a;Yamaguchi et al., 2006;Gu et al., 2019). Furthermore, dmrt5 plays a key role in zebrafish neurogenesis in the telencephalon (Yoshizawa et al., 2011) and can regulate corticotrope and gonadotrope differentiation in the pituitary (Graf et al., 2015), in addition to spermatogenesis (Xu et al., 2013).
Mammalian DMRT6 is mainly expressed in gonadal intermediate cells and differentiating spermatogonia. It plays a crucial role in coordinating the transition of primordial germ cells from the mitotic to meiotic developmental programs during spermatogenesis (Zhang X.et al., 2014) and is also expressed in the embryonic brain of mice (Kim et al., 2003). Early studies have suggested that the dmrt6 gene is missing in fish (Veith et al., 2006b). However, recent studies have found that certain fish, such as coelacanth, tilapia, and Southern catfish also carry the dmrt6 gene, and that tilapia dmrt6 is involved in spermatogenesis (Forconi et al., 2013;Zhang X.et al., 2014). However, DMRT7 and DMRT8 are only present in mammals. The two genes are very similar, although DMRT8 does not have a complete DM domain. DMRT7 is specifically expressed in the male and female gonads and is related to mouse gonadal development and spermatogenesis (Kawamata and Nishimori, 2006;Hong et al., 2007). In comparison, DMRT8 is highly expressed in the male gonads and may have evolved from DMRT7 (Ottolenghi et al., 2002;Veith et al., 2006a).
Currently, reports are only available regarding the phylogenetic analysis of pan-arthropod and pan-metazoan DMRT family members (Volff et al., 2003;Wexler et al., 2014;Panara et al., 2019); to our knowledge, no studies have yet been published on the phylogeny of fish dmrt family. However, fish comprise a wide variety of species and previous reports have shown that members of the fish dmrt family own unique features such as two paralogs of dmrt2 genes (dmrt2a and dmrt2b), in addition to diverse tissue expression of the same gene family member in various fish (e.g., dmrt4), thus suggesting a remarkable difference in function. As the sequences of DMRT family members are highly variable with only the DM domain [∼49 amino acids (aa)] exhibiting high sequence homology (Volff et al., 2003), it is difficult to accurately determine the evolutionary relationship among the family members based on such short sequences, which in turn has limited our understanding of the history of DMRT functional development.
Nevertheless, in recent years the whole-genome sequencing of many fish species has significantly facilitated the in-depth and systematic analysis on the evolutionary relationships among gene family members. In this study, we therefore employed the fine genomic map of largemouth bass recently obtained using third-generation sequencing by our team and collected the dmrt sequences of 16 fish species with different taxonomic positions from published whole-genome sequences, in order to analyze the sequence structure, phylogenetic relationship, sequence conservation, and synteny of members of the fish dmrt family. These findings will lay a solid foundation for a more systematic understanding of the structural characteristics of these members in fish dmrt family, and for further investigations into the different functions of fish dmrt family members in sex determination or differentiation along with their underlying mechanisms.

Sequence Collection
In the present study, we employed two strategies to collect nucleotide or deduced amino acid sequences for dmrt family members in various vertebrates (Supplementary Table S1). For those with publicly available sequences, such as in human (Homo sapiens) and mouse (Mus musculus), we downloaded the sequences from NCBI or Ensembl (Supplementary Table S2). Other dmrt sequences were extracted from corresponding genome databases through BLAST (Altschul et al., 1990) and Genewise (Birney et al., 2004).
In brief, we used zebrafish (Danio rerio), Japanese medaka (Oryzias latipes), and mouse DMRT protein sequences from NCBI as the references, and mapped them onto the examined genomes using tBLASTn with an E-value <1e −5 and an alignment rate>0.6. Solar v0.9.6 was applied to connect highidentity segment pairs. Subsequently, we discarded those low-quality results with alignment rate <0.6 and mapping identity <0.5. Finally, each gene sequence was predicted on the target genomic region using Exonerate v2.2.0 (Slater and Birney, 2005), and extended 5 kb in the upstream and downstream directions to obtain the integrated gene model. A total of 147 dmrt sequences were derived from 23 representative vertebrate species, including 2 mammals (human and mouse), 2 birds (chicken Gallus gallus and zebra finch Taeniopygia guttata), 1 reptile (green Anole Anolis carolinensis), 1 amphibian (Western clawed frog Xenopus tropicalis), and 17 fish species belonging to two classes (Actinopterygii and Sarcopterygii), and ten superorders (

Sequence Alignment and Phylogenetic Analysis
We performed phylogenetic analysis on these collected dmrt sequences. MAFFT v7.273 (Katoh et al., 2002) was employed to align these sequences. Gblocks was used to find conserved fragments with the following parameter settings: minimum number of sequences for a conserved/flank position (75/75), maximum number of contiguous non-conserved positions (50), minimum length of a block (50), allowed gap positions (all). ProtTest v3.42 was operated to determine the best-fit models of amino acid replacement (Darriba et al., 2011). Based on the Akaike Information Criterion (AIC) algorithm, we set the bestfit model as "JTT+I+G+F." Finally, we utilized PhyML 3.0, MrBayes v3.24 7, and MEGA v7.0 8 to analyze these sequences with 1,000,000 generations for Ngen and 100 for Samplefreq (Ronquist et al., 2012). Branch support values were calculated using Bayesian posterior probabilities. Evolview (He et al., 2016) was applied to edit constructed phylogenetic trees.
Identification of Conserved Synteny for the dmrt1-dmrt3-dmrt2(2a) Gene Cluster (Synteny Analysis) To evaluate the conservation of the dmrt1-dmrt3-dmrt2(2a) gene cluster, we explored conserved genes in the upstream and downstream regions (20 kb) within the genomes of 19 examined species, using zebrafish genomic sequence as the reference, since the zebrafish genome is currently the best fish genome assembly with the highest quality and the completest genome annotation. These examined genome assemblies were explored using tBLASTn (Altschul et al., 1990), and the best-fit results were selected using a Perl script and Adobe Illustrator.

Analysis of Regulatory Regions and Cross-Species Comparisons of the
Complete genomic sequences with 20 kb-upstream/downstream regions of the dmrt1-dmrt3-dmrt2(2a) gene cluster were extracted from various species. We applied mVISTA (Frazer et al., 2004) to align these relevant genomic sequences. This tool can align and compare long sequences based on the window-based comparisons of sequence conservation.
Repetitive elements were annotated using RepeatMasker v4.06 software (Chen, 2004), and the zebrafish genomic sequence was used as the reference. Pair-wise sequence comparisons were determined with a threshold of 70% identity in each 50-bp window. In addition, five typical regulatory elements, including BRE, CAAT box, E box, GC box, and TATA box, were predicted in each sequence using a Perl script (the motif function in Primer 5.0 and Genomatix MatInspector). Finally, Adobe Illustrator and R were applied to produce graphs for the information obtained.

Cross-Species Changes in dmrt Family Members and Copy Numbers
A total of 147 dmrt sequences were derived from 23 representative vertebrate species (Table 1 and Supplementary Tables S1, S2). Among them, 128 dmrt sequences for 17 species were downloaded from the

Class
Superorder Asterisks indicate the dmrt genes were downloaded from the NCBI/Ensembl databases.
Frontiers in Genetics | www.frontiersin.org NCBI/Ensembl databases (asterisk in Table 1 and Accession number in Supplementary Table S2). The remaining 19 dmrt sequences for three species were extracted from genomes through the method described in section "Similarities and Variances of the dmrt Gene Family Members in Various Fish Species." These nucleotide sequences and corresponding deduced protein sequences were used for our further data analysis. In mammals, eight dmrt genes (dmrt1-dmrt8) were identified in their genomes. However, in other species, dmrt7 and dmrt8 were lost. In addition, dmrt4 was also lost in birds. In the fish dmrt gene family, dmrt1-dmrt5 showed relatively high conservation. Among these, dmrt2 usually consisted of two paralogs (dmrt2a and dmrt2b) in most fish species, with only three species (Atlantic cod, Japanese eel, and coelacanth) carrying a single paralog. dmrt6 was only found in five fish species, i.e., largemouth bass, Asian sea bass, channel catfish, spotted gar, and African coelacanth. In addition, some of the dmrt genes were duplicated in Atlantic salmon (dmrt2, 3, 5) and Japanese eel (dmrt3; see Table 1).

Structural Characterization and Evolutionary Analysis of the dmrt Family Genes
The gene structure of dmrt1 is composed of five exons in all examined species except for Atlantic salmon ( Table 2 and Figure 1A), and a highly conserved DM domain (with a total of 49 aa) is located in the DMRT1 protein. In comparison, dmrt2 contains three exons and dmrt3-dmrt4 contain two exons in most examined species. Dmrt5 consists of 2 to 4 exons in higher vertebrates but only two in all examined fish species except for Stickleback ( Table 2). Dmrt6 contains four exons in higher vertebrates, whereas the number of exons in fish varies greatly (from 2 to 4). dmrt7 and dmrt8 can only be identified in mammals, and both contain a large number of exons (8 for dmrt7, 6-7 for dmrt8). Except for DMRT8, all DMRT proteins (DMRT1-7) contain a conserved DM domain, often locating in the first exon of each gene ( Figure 1A).
Using DMRTA protein sequence of the sea anemone (Nematostella vectensis) as the out-group, we constructed a   protein-based phylogenetic tree (Figure 1B), in which the DMRT family is distinctly categorized into eight groups (DMRT1-DMRT8). All DMRT proteins are distributed in the following three main groups: Group 1 includes five subfamilies, i.e., DMRT1, 2, 6, 7, and 8. DMRT2 was placed as the sister of DMRT7/8 and DMRT1 as the sister of DMRT6, suggesting a closer evolutionary relationship among these subfamilies. The subfamilies DMRT2, 7, and 8 were together placed as a sister group to the DMRT1 and 6 subfamilies. Group 2 includes DMRT4 and DMRT5. Group 3 contains only one subfamily DMRT3 (see more details in Figure 1B).
Mammals have lost larger numbers of genes next to this cluster, which also happens in two fish species (Atlantic salmon and coelacanth). Furthermore, in some fish species, such as largemouth bass, the dapk1 gene experienced a polyploidization event to generate four tandem duplicated copies. This phenomenon was also observed in the southern platyfish, which harbors two copies of the ctsla gene in its genome. Moreover, in both largemouth bass and southern platyfish, the gas1a gene experienced a translocation and inversion event as well (see more details in Figure 2B).

Substitution Rates (Ka/Ks) of the dmrt1-dmrt3-dmrt2(2a) Cluster in Fish Genomes
Ka/Ks represents the ratio of non-synonymous substitutions (Ka) to synonymous substitutions (Ks). This ratio can be used to determine whether there is selective pressure on a given proteincoding gene. It is generally believed that synonymous mutations are not subjected to natural selection, whereas non-synonymous mutations are. Ka/Ks > 1 implies the existence of positive selection; Ka/Ks = 1 suggests neutral selection; and Ka/Ks < 1 indicates purifying selection.
Overall, a similar conservation pattern in both coding and non-coding sequences was observed. Comparisons of these six fish species along with zebrafish showed considerable homology within and between these dmrt genes. We also identified three regions with remarkable diversity among these fish (lower panels in Figure 4A). Region 1 covers 207 bp located at the 11-kb upstream region of the dmrt1 gene and contains nine TATA boxes (63 bp), which only exists in Chinese tongue sole. Region 2 is located in the third exon of dmrt1 with 18-bp missing in Chinese tongue sole. Comparing the protein sequences of tongue sole and other fish species, we determined that six amino acids (-P/S-A/S/T/P-YY-S/G/N-N-) were missing ( Figure 4B). Region 3, located in the second exon of dmrt3, shows a 21-nucleotide (nt) deletion in Japanese medaka and a 15-nt deletion in southern platfish ( Figure 4B). In the examined six species, TATA box represents the main regulatory element. In Chinese tongue sole, TATA boxes are much more frequent than in other fish, however, markedly fewer E boxes, GC boxes, and B recognition elements (BREs) are present in Japanese medaka than in other species (Figure 4C).

Synteny of Other dmrt Genes [Excluding dmrt1, dmrt2(2a), and dmrt3] in Fish Genomes
Based on the whole-genome sequence of largemouth bass and other eight representative vertebrate species (including O. niloticus, T. rubripes, O. latipes, I. punctatus, L. oculatus, A. carolinensis, and H. sapiens) obtained from NCBI, we performed a synteny analysis of four dmrt genes, including dmrt2b, dmrt4, dmrt5, and dmrt6. The results (Figure 5) indicated that among these fish species, the KN motif and ankyrin repeat domain-containing protein 4 (kank4) and lowdensity lipoprotein receptor-related protein 8 (lrp8) genes in the upstream of dmrt2b were conserved. The ELAVlike protein 2 (elavl2) and caspase activity and apoptosis inhibitor 1 (caap1) genes in the downstream of dmrt4, and the elavl4 and FAS-associated factor 1 (faf1) genes in the downstream of dmrt5 were also conserved, which is consistent with the findings in reptiles and humans. This suggests that dmrt2b, dmrt4, and dmrt5 were relatively conserved during the evolutionary process.
Although dmrt6 was lost in most fish species including T. rubripes and O. latipes, in L. oculatus and I. punctatus, lrp8 was present in the upstream of dmrt6, which is consistent with higher vertebrates; whereas in M. salmoides and O. niloticus, dmrt6 was located between the conserved plectin (plec) and epiplakin-Fbox/LRR-repeat protein 6 (eppk1-fbxl6) genes (see more details in Figure 5).

DISCUSSION
Fish are the oldest and most diverse group among vertebrates, containing about 32,000 species and accounting for more than half of the vertebrate species. Fish have undergone a long history of emergence, development, and evolution. The increasing amount of fish genomic information provides an important resource for studying the evolution, structure, and function of key genes through comparative genomics analysis. Seven dmrt genes have been identified in fish to date, including dmrt1-6 and dmrt2b. dmrt genes have also been reported in more than 30 fish species and a number of functional studies have been performed to reveal that regardless of the sex determination mechanism, the majority of fish dmrt genes ( Table 3) are related to sexual development (Li et al., , 2018Liu et al., 2009;Herpin and Schartl, 2011;Yoshizawa et al., 2011;Xu et al., 2013). However, the phylogenetics of the dmrt gene family in fish have not yet been reported.
To obtain a better understanding of the functional diversification of this gene family, we therefore examined dmrt gene complements from the whole genome sequences of 17 representative fish species representing 10 various superorders and several non-fish outgroups. The evolutionary relationships of the dmrt genes in fish were subsequently examined using both phylogenetic and synteny analyses. Zhou et al. (2008) showed that unlike mammals and other groups that only harbored one dmrt2, zebrafish carries a second paralog of dmrt2(2a), dmrt2b, which was subsequently identified in many other fish species Liu et al., 2009;Su et al., 2015;Lyu et al., 2019). The 17 representative fish species analyzed in the present study belong to Actinopterygii, with the exception of coelacanth L. chalumnae that belongs to Sarcopterygii. Among the 16 actinopterygians, 14 harbored the two paralogs of dmrt2 (dmrt2a and dmrt2b), however, dmrt6, which is commonly found
Among the 17 fish species, only A. japonica and G. morhua carried dmrt2a alone and lacked dmrt6. L. chalumnae only had one dmrt2 (2a) and one dmrt6, similar to higher vertebrates. A search through the database revealed that two other sarcopterygians (Protopterus annectens and Latimeria menadoensis) also only carried one dmrt2a and dmrt6 (see more details in Supplementary Table S2; Forconi et al., 2013;Biscotti et al., 2018). Actinopterygii and Sarcopterygii are two relatively independent evolutionary branches of fish. Sarcopterygii is a sidebranch in the evolution of fish, from which tetrapods evolved (Nelson et al., 2016). Therefore, the characteristics of the dmrt family genes in Sarcopterygii are more similar to those of higher vertebrates.
Based on the cross-species comparisons of dmrt family genes and copy numbers, we found that some of the dmrt genes were duplicated in S. salar and A. japonica (S. salar: dmrt2, 3, 5; A. japonica: dmrt1-3; see Table 1). Lien et al. (2016) suggested that S. salar is a typical tetraploid teleost that had experienced a salmonid-specific genome duplication. The copies of dmrt genes were duplicated in its genome, whereas one copy of dmrt1 and dmrt4 were lost (Lien et al., 2016). Loss of the duplicated gene possibly occurred owing to the salmonid-specific genome duplication event, which may lead to rearrangements of genome sequences, as S. salar has lost numerous syntenic genes in comparison with other teleosts. Similar dmrt duplication and loss were also found in four other fish species (e.g., brown trout Salmo trutta and Sockeye salmon Oncorhynchus nerka) that belong to the same superorder as S. salar (i.e., Protacanthopterygii; Supplementary Table S5). In addition, the copy number of dmrt1 and dmrt3 is doubled in A. japonica, which is considered to be an uncommon ploidy (2n = 38) of this special teleost (Nomura et al., 2004).
The conservation of fish dmrt1 and dmrt(3-5) sequences is relatively high, all of which containing the highly conserved DM domain and a stable number of exons (majority of dmrt1 contained 5 exons and most dmrt3-5 had 2 exons). Phylogenetic analysis showed that dmrt4 and 5 were clustered into a major branch, indicating that these genes appear to be originated from a common ancestor of dmrt.
To date, the dmrt7 and dmrt8 genes have not been found in fish but only in mammals. In fact, they exist in all mammals, from the lower Monotremata in Prototheria (platypus) (Tsend-Ayush et al., 2009) to Marsupiala in Metatheria (wombat), and to the higher Euarchonta in Eutheria (mouse) (Veith et al., 2006a), thus indicating that both genes were only formed after the evolutionary divergence of mammals from other vertebrates including fish (Veith et al., 2006a).

Similarities and Variances of the dmrt1-dmrt3-dmrt2(2a) Gene Cluster in Various Fish Genomes
In vertebrate genomes, the dmrt1, dmrt2(2a), and dmrt3 genes are in tandem in the order of dmrt1-dmrt3-dmrt2(2a) (Johnsen and Andersen, 2012). Our phylogenetic analysis based on this dmrt1-dmrt3-dmrt2(2a) cluster confirmed the clustering in fish within the same superorder, thus indicating that the dmrt1-dmrt3-dmrt2(2a) gene cluster is highly conserved in various fish species (Figure 2). Further analysis of the conserved genes flanking this cluster revealed that D. rerio carried 11 neighboring genes, as did I. punctatus and C. harengus. However, other fish species showed partial loss (such as the gc gene), duplication (dapk1 in M. salmoides and ctsla in X. maculatus), and transversion (gas1a in M. salmoides and X. maculatus). This may have been caused by genomic polyploidization events during the evolutionary process of fish (Braasch and Postlethwait, 2012). Despite the large variations in the flanking genes among different fish species, the number and location of the dmrt1-dmrt3-dmrt2(2a) genes have been stable. Thus, the high conservation of the dmrt1-dmrt3-dmrt2(2a) gene cluster in various fish genomes suggests their crucial biological functions in fish.
Among the fish genomes analyzed in this study, the Ka/Ks ratios of the dmrt1-dmrt3-dmrt2(2a) gene cluster and the three dmrt genes were less than 0.2, impling that after the examined actinopterygians diverged from L. oculatus, the dmrt1-dmrt3-dmrt2(2a) gene cluster was subjected to relatively strong purification selection in its evolutionary process, whereas its positive selection may have occurred prior to the divergence from L. oculatus. These low Ka/Ks ratios across various fish species indicate that the dmrt1-dmrt3-dmrt2(2a) genes are highly conserved during evolution. Occurrence of a non-synonymous substitution would alter the conformation and function of the corresponding protein, thereby affecting any individual's sex differentiation, which in turn would affect the inheritance of the mutation site by its offspring (Wang D.et al., 2009). Therefore, the high conservation of the dmrt1/2/3 genes across fish suggests that its key role in sex differentiation.
Analysis of the conserved sequences and regulatory elements was performed on the dmrt1-dmrt3-dmrt2(2a) gene cluster of three representative fish genomes with different sex determination systems [i.e., C. semilaevis (ZW) , O. latipes (XY) (Otake et al., 2006), X. maculatus (WXY) (Schultheis et al., 2009)]. Three distinct regions 1-3 were identified ( Figure 4A). C. semilaevis showed 207-bp only exists in Region 1 and 18-bp deletions in Regions 2, respectively, and Region 1 contained nine TATA boxes. O. latipes and X. maculatus showed 21-and 15-bp deletions, respectively, in Region 3. Analysis of the regulatory elements for this gene cluster indicated that the number of TATA boxes in C. semilaevis was higher than that in other fish species (twice of O. latipes), whereas O. latipes had significantly more E box, GC box, and BRE elements than other fish species. Fish with various sex-determination systems showed significant differences in their conserved sequences and regulatory elements, suggesting that the dmrt1-dmrt3-dmrt2(2a) gene cluster may be related to the sex-determination systems in fish. In our recent study, it reveals that M. salmoides is a XY/XX system species (Sun et al., 2020). In conserved sequences analysis of fish dmrt1-dmrt3-dmrt2(2a) gene clusters, it had much difference in Regions 1-3 between M. salmoides and C. semilaevis (ZW/ZZ). The Region 2 of M. salmoides was more similar to O. latipes (XX/XY). The Region 3 of M. salmoides was similar to D. labrax (PSD) and L. calcarifer (hermaphrodite) . Therefore, the sex-determination systems of M. salmoides might be preferred to XY/XX system species.

Conserved Synteny of the dmrt Genes in Fish Genomes
The synteny analysis performed in this study showed that apart from dmrt6, all other six dmrt genes in the fish dmrt gene family (dmrt1, dmrt2a, dmrt2b, and dmrt3-5) were relatively conserved. Fish dmrt1-dmrt3-dmrt2(2a) clusters are located in tandem in genomes, which is consistent with higher vertebrates. Fish dmrt4 is usually located on a different chromosome from the cluster, and the downstream elavl2 gene is conserved. In contrast, dmrt4 is located on the same chromosome as this dmrt1-dmrt3-dmrt2(2a) cluster in higher vertebrates, but the downstream elavl2 gene is also conserved. Fish dmrt5 gene is the same as that in higher vertebrates, in which the upstream and downstream elavl4 and faf1 genes are also conserved. dmrt2b gene can be found in most fish species, and the upstream kank4 and lrp8 genes are conserved. The dmrt6 gene is lost in most fish genomes. However, in L. oculatus and I. punctatus, dmrt6 is conserved with downstream lrp8, which is consistent with higher vertebrates, however, in M. salmoides and O. niloticus, the dmrt6 gene is located between the conserved plec and eppk1-fbxl6 genes (see Figure 5). Kondo et al. (2002) was the first report of conserved synteny analysis on the dmrt1-4 genes between fish and human. This study demonstrated that the dmrt1, 2, and 3 genes formed clusters in fish and constituted a part of a large number of genes in this cluster that exhibit conserved synteny between human and fish. Johnsen and Andersen (2012) performed chromosomal synteny analysis on dmrt2a and dmrt2b, and proposed that these genes originated from the second round (2R) of whole genome duplication of the ancestral dmrt2 (Johnsen and Andersen, 2012). In turn, Mawaribuchi et al. (2019) performed phylogenetic cluster analysis of lower bilaterian and higher animal dmrt genes, based on which they speculated that the dmrt3 gene emerged by genome duplication (1R), and dmrt1 and dmrt6 emerged after the 2R genome duplication; they also proposed an evolutionary history for the dmrt family genes in bilateria (Mawaribuchi et al., 2019). Therefore, according to our data coupled with these relevant literatures, we hypothesized evolutionary history of the dmrt genes in fish (Figure 6).
We should note that our present study has several limitations. First, although fish are the most numerous vertebrates on earth, whole genome sequences are currently available for only a small fraction of fish. In this study, 17 representative fish species from 10 superorders were selected for analysis; however, the number and coverage of species was still insufficient and may have limited the generalizability of our results. Second, this study was based on fish species with known genome sequences, which may have affected the accuracy of data analysis because genome assembly techniques and quality vary significantly among species. For example, since coelacanth genomes are not assembled to the chromosomal level, our syntenic analysis of their genes is affected. Third, it is expected that as sequencing coverage and quality FIGURE 6 | Hypothetical evolutionary history of the dmrt genes in fish through fish-specific (3R) genome duplications. This figure was constructed based on figures in this study. The common ancestor of chordate might possess four ancestral genes, including dmrt4/5, dmrt2a/2b, dmrt93B, and dmrt1/6. A common ancestor of vertebrata may have possessed four dmrt family genes, dmrt1/6, dmrt2a/2b, dmrt3, and dmrt4/5. The syntenies of kank1-dmrt1-dmrt3-dmrt2a, dmrt4-elavl2-caap1, kank4-lrp8-dmrt2b, and dmrt5-elavl4 are conserved after three rounds of whole genome duplication in the ancestral vertebrates. dmrt6 is lost in most fish species.
increase for fish genomes, future studies will be able to confirm and expand findings and generalizability from the present study.
In this study, we applied bioinformatics methods to perform phylogenetic and synteny analyses on dmrt genes in 17 fish speices. (1) All the examined fish species have dmrt1-5 and most fish species harbored two dmrt2 paralogs (dmrt2a and dmrt2b). Phyletic evolution and structure of dmrt1∼5 and dmrt2b genes were relatively conserved in most of fish. The dmrt6 gene is lost in most fish genomes and less conservative.
(2) Purifying selections on the dmrt1, dmrt2(2a), dmrt3, and the dmrt1-dmrt3-dmrt2(2a) gene cluster were observed. (3) Fish with various sex-determination systems have the similar genomic conservation patterns of the dmrt1-dmrt3-dmrt2(2a) gene cluster. dmrt2b, dmrt4, and dmrt5 were also relatively conserved during the evolutionary process. The high conservation of the dmrt1-dmrt3-dmrt2(2a) gene cluster in various fish genomes suggests their crucial biological functions while various dmrt family members and sequences across fish species suggest different biological roles during evolution.
Furthermore, we hypothesized the evolutionary history of the dmrt genes in fish after fish-specific genome duplication(s). Moreover, here raised a series of new questions during the course of our data analysis. For example, in terms of evolutionary analysis, whether dmrt2b is homologous and functionally similar to a specific dmrt in higher animals, or does fish dmrt6 have similar functions to mammalian counterpart. We anticipate that these gene trees will help to place current dmrt research in a proper phylogenomic context. Our present study will provide a solid molecular basis for functional research on the fish dmrt family and may in particular serve as genetic reference for indepth phylogenomics studies.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
XY and QS conceived and designed the project and revised the manuscript. JD and JL performed the genomic investigations and wrote the manuscript. JH, CFS, YT, NY, CXS, XS, and SY participated in discussion and figure preparation. All authors read and approve the final manuscript.