Improved Understanding of the Role of Gene and Genome Duplications in Chordate Evolution With New Genome and Transcriptome Sequences

Comparative approaches to understanding chordate genomes have uncovered a significant role for gene duplications, including whole genome duplications (WGDs), giving rise to and expanding gene families. In developmental biology, gene families created and expanded by both tandem and WGDs are paramount. These genes, often involved in transcription and signalling, are candidates for underpinning major evolutionary transitions because they are particularly prone to retention and subfunctionalisation, neofunctionalisation, or specialisation following duplication. Under the subfunctionalisation model, duplication lays the foundation for the diversification of paralogues, especially in the context of gene regulation. Tandemly duplicated paralogues reside in the same regulatory environment, which may constrain them and result in a gene cluster with closely linked but subtly different expression patterns and functions. Ohnologues (WGD paralogues) often diversify by partitioning their expression domains between retained paralogues, amidst the many changes in the genome during rediploidisation, including chromosomal rearrangements and extensive gene losses. The patterns of these retentions and losses are still not fully understood, nor is the full extent of the impact of gene duplication on chordate evolution. The growing number of sequencing projects, genomic resources, transcriptomics, and improvements to genome assemblies for diverse chordates from non-model and under-sampled lineages like the coelacanth, as well as key lineages, such as amphioxus and lamprey, has allowed more informative comparisons within developmental gene families as well as revealing the extent of conserved synteny across whole genomes. This influx of data provides the tools necessary for phylogenetically informed comparative genomics, which will bring us closer to understanding the evolution of chordate body plan diversity and the changes underpinning the origin and diversification of vertebrates.


INTRODUCTION
The genetic basis of development is of fundamental interest to understanding animal evolution. In the course of the evolution of chordates, gene and genome duplications have played important roles in shaping gene families by providing the means with which genetic novelties can arise (Ohno, 1970). Whole genome duplications (WGDs) have occurred at the base of speciose and diverse lineages, including the vertebrate two rounds of whole genome duplication (2R WGD) (Meyer and Schartl, 1999;Dehal and Boore, 2005), or the teleost-specific 3R WGD (Hoegg et al., 2004;Meyer and Van de Peer, 2005), and have been implicated in the evolution of more complex, novel, or diverse clades. The 2R WGD duplicated the genome of the vertebrate ancestor twice, such that many genes were retained in as many as four copies, though the majority of paralogues were lost (Dehal and Boore, 2005). The paralogues that were retained in duplicate, ohnologues, are of particular importance to understanding chordate evolution because they may be candidate genes underpinning the evolutionary transition to the vertebrates, and many of these genes retained in duplicate have roles in development (Brunet et al., 2006). While the 2R WGD certainly affected the evolution and subsequent radiation of the vertebrates, duplications also occur frequently on much smaller scales than polyploidisation, and small-scale duplications of developmental genes have also played a role in chordate evolution. Our current understanding of how gene and genome duplications have shaped chordate lineages has progressed rapidly due to the advancements in comparative genomics and with the availability of new genome and transcriptome sequences from phylogenetically informative lineages like amphioxus, lampreys, coelacanth, and monotremes.

ADDRESSING THE ORIGIN OF VERTEBRATES
Within the chordates, there are three extant subphyla, the cephalochordates, the tunicates/urochordates, and vertebrates, the vertebrates being the most familiar and speciose of the three, which in turn includes cyclostomes (lampreys and hagfish), actinopterygians (ray-finned fish), amphibians, sauropsids (birds and reptiles), and mammals (Figure 1). The three chordate lineages share common traits including a hollow neural tube located dorsal to a notochord, pharyngeal slits, and a post-anal tail, but the vertebrate lineage is far more speciose and diverse than either invertebrate chordate subphylum. The field of developmental biology has spent decades determining the genetic basis for the construction of the morphologies of a select few model species, including a handful of vertebrates (like mouse, frog, chick, and zebrafish), but in our current age of genomics, our understanding is growing at an unprecedented pace, with much broader taxon-sampling now making possible a more "phylogenetically informed" and hence evolutionarily robust era of comparative biology and evolutionary developmental biology.

Invertebrate Chordate Models
One of the major questions facing evolutionary developmental biology is the origin of the vertebrates. For a comparison, sister to the vertebrates is the tunicate lineage, the two lineages together making up the clade Olfactores (Delsuc et al., 2006). Tunicates generally have two life stages, comprising the tadpole larval stage and the sessile filter-feeding adult in which chordate-like larval traits such as the notochord, tail, and head are absent or highly reorganised (Lemaire, 2011). Tunicate models such as Ciona spp. have been used for decades, representing a growing basis of comparative genomics and transcriptomics studies in chordate developmental biology. Tunicate embryos have been used as a model for the chordate nervous system, as well as for key chordate traits such as the notochord and the precursors of the vertebrate neural crest cells (Schubert et al., 2006;Kugler et al., 2011;Holland, 2016). Neural crest cells are characteristic of vertebrates and constitute a population which migrates and differentiates to contribute to the vertebrate peripheral nervous system, pigmented cells, and elements of the skeletal system, thus the origin of these cells is of particular importance to understanding the invertebrate-vertebrate transition (Gans and Northcutt, 1983). While invertebrate chordates are considered to lack a true neural crest, they do have cells with some similar properties. These include migratory cell types emerging from the neural plate boundary and differentiating into neurons or pigment cells but, critically, the skeletogenic role is lacking from these tunicate cells (Jeffery et al., 2004;Jeffery, 2006;Stolfi et al., 2015). Tunicates have also been instrumental in understanding the control of gene expression because of their relatively simple body plan and a small number of transcription factors compared to vertebrate models, and they have greater extents of homology to vertebrates than other comparisons outside of the phylum Chordata (Lemaire, 2011). There is a growing database of genomic resources for ascidians (ANISEED), which in recent years has acquired datasets focussing on gene expression, including RNA-seq and epigenetic datasets, as well as many new genome annotations (Brozovic et al., 2018), reflecting the utility of tunicate models for molecular biology. The most recent update to this database includes the addition of a larvacean, Oikopleura dioica, and the expansion of tools to explore gene expression data in larvae (Dardaillon et al., 2019).
Tunicate genomes, while small, evolve very quickly and have undergone extensive rearrangements relative to the other chordate subphyla (Denoeud et al., 2010;Holland, 2016). At first, this made genomic comparisons between tunicates and vertebrates confusing, and more than a fifth of Ciona robusta (previously C. intestinalis type A) genes had no clear homology to known bilaterian genes when the genome was first sequenced (Cañestro et al., 2003). Gene families typically conserved in clusters in the genomes of other organisms are often disorganised in tunicate genomes, e.g., the Hox cluster (Sekigami et al., 2017). There is also an issue with long-branch attraction in molecular phylogenies with the highly divergent C. robusta sequences, but the sequencing of a second clade of tunicates, the larvacean O. dioica to complement the ascidian C. robusta, helped resolve this (Delsuc et al., 2006). Now genomes are available for FIGURE 1 | Diversity of chordates and available genome assemblies. (A) Cladogram of chordate lineages and count of representative genome assemblies available on NCBI (https://www.ncbi.nlm.nih.gov/assembly/organism; Accessed 25 April 2021). Branches with single species mentioned in the text end with a line, but lineages with multiple species end with a triangle, e.g., there are ∼35 amphioxus species (Bertrand and Escriva, 2011). Stars represent WGDs. NCBI Taxonomic codes used to count genomes for each lineage: Therians (taxid:32525); Sauropsids (taxid:8457); Amphibians (taxid:8292); Salmonids (taxid:8015); Teleosts (taxid:32443); Cartilaginous fishes (taxid:7777); Tunicates (taxid:7712); Cephalochordata (taxid:7735). (B) Genome assemblies at scaffold (red) and chromosome (purple) level available on NCBI for animal subgroups (https://www.ncbi.nlm.nih.gov/genome/browse#!/eukaryotes/; Accessed 25th April 2021), right axis, relative to approximate number of species (grey), left axis. Table inset is approximate percentage of species in each group that has been sequenced at least to the draft level.
several different tunicate lineages, including a chromosomal level assembly of O. dioica (Bliznina et al., 2021). Sequencing of two C. intestinalis Type B (i.e., C. intestinalis rather than C. robusta) genomes revealed that between and even within Ciona species, there have been many chromosomal inversions and there are high levels of polymorphism (Satou et al., 2021). Because of their fast rates of evolution, between tunicate and vertebrate genomes there is little conserved synteny remaining, making inferences about the genomic changes at the origin of the vertebrates difficult from these comparisons.

2R and Amphioxus
It is now widely understood that 2R WGD occurred in the ancestors leading to vertebrates, and these events have been implicated as playing a permissive role in the evolution of vertebrate novelties. Still, the specifics of how the 2R WGD provided the genetic basis for these traits is not yet fully understood. Much of the evidence comes from comparative studies with amphioxus, the basal chordate lineage. Considered the "archetypal" chordate, both its genome and morphology are thought to have not changed much in the nearly 600 million years since it diverged, in contrast to the fast-evolving tunicates Putnam et al., 2008;Zhang et al., 2018), making it an ideal model for inferring features of the chordate ancestor (Dehal and Boore, 2005;Schubert et al., 2006;Bertrand and Escriva, 2011). While amphioxus lacks a true neural crest, as do tunicates (Holland and Holland, 2001), the draft genome of Branchiostoma floridae contains most pro-orthologues of the genes that constitute the regulatory network required for vertebrate neural crest formation (Yu et al., 2008). This suggests that the ancestral chordate may have possessed the components of a neural crest rudiment, but it was the 2R WGD at the origin of the vertebrates that may have facilitated the evolution of the true neural crest.
With the sequencing of an amphioxus genome came the most solid evidence for the 2R WGD, revealing the prevalent one-to-four syntenic relationship between the scaffolds of B. floridae and the human genome (Putnam et al., 2008). This amphioxus genome was highly polymorphic, thus each haplotype assembled separately, which complicated these analyses. More recently, the application of newer techniques like long-read sequencing and Hi-C has allowed the genome to be assembled to the chromosome level, revealing the depth of synteny conserved amongst chordate lineages (Simakov et al., 2020). Advancements in long-read sequencing reduce the assembly errors that can arise in highly polymorphic genomes that can cause false gene gains to be inferred for alleles rather than paralogues (Denton et al., 2014). With this chromosome-level assembly, the authors were able to precisely determine the sequence and timing of polyploidisations, and the chromosomal rearrangements that occurred after each one in the process of rediploidisation characterised by chromosomal fusions and rampant gene losses (Simakov et al., 2020).

Cyclostomes-the Agnathan Lampreys and Hagfish
This recent study also took a comparative approach to determine the timing of 2R WGD in vertebrate ancestors, with particular focus on the placement of the divergence of the cyclostome lineage. The cyclostomes (lampreys and hagfish) diverged from the gnathostomes 500 Mya, and provide a phylogenetically well-placed point of comparison to the jawed vertebrates (Janvier, 2007). They share a cartilaginous skeleton including a cranium, paired sensory organs, and true neural crest, but lack key gnathostome structures like a jaw and paired fins (Green and Bronner, 2014). An initial lamprey genome assembly from a sample of liver tissue was later updated with a germline assembly, since hundreds of genes are eliminated from somatic tissues, though both assemblies revealed a high number of rearrangements and repeats (Smith et al., 2013(Smith et al., , 2018. These complicated genomic characteristics have hindered the genome assembly and contributed to uncertainty about the number of duplications in the cyclostome lineage, and how many, if any, are shared with the gnathostomes.

The Timing of 2R
One potential marker of genome duplications is the Hox cluster. This cluster of developmental transcription factor genes is highly conserved across animals, so that comparison of the single cluster in amphioxus to the four in mammals gave an early line of evidence for 2R (Garcia-Fernàndez and Holland, 1994) and was complemented by the discovery of the teleost-specific 3R and the seven (i.e., eight minus one) Hox clusters in zebrafish (Amores, 1998). Lampreys were reported to have at least six Hox clusters, and the updated germline assembly of Petromyzon marinus revealed the synteny of these regions shared with other vertebrates' Hox chromosomes (Mehta et al., 2013;Smith et al., 2018). Amongst these six, two pairs appear to be the product of more recent, chromosome-level duplications, suggesting there may be a high prevalence of large-scale duplications occurring more recently besides the WGDs in lampreys. Smith and Keinath (2015) had previously constructed the lamprey meiotic map using RAD-seq, and with these syntenic comparisons, concluded that fewer chromosomal rearrangements were required if lampreys diverged after 1R WGD, but before 2R.
These data are intriguing, and indeed several different scenarios of duplications could result in the six Hox clusters, among other unexpected or unique patterns of synteny. Lampreys could have undergone the 2R WGD and several independent chromosomal duplications and losses or diverged after 1R then undergone their own WGD or several chromosome-level duplications (Figure 2). If lampreys and gnathostomes share 2R, the lack of direct orthology between lamprey Hox clusters and those of gnathostomes can be explained by cyclostome divergence during rediploidisation, before the ancestral gnathostome karyotype was established. This pattern, where orthology is obscured because lineages retained different paralogues when they diverged, termed tetralogy, has been described for early-branching 3R teleosts, and may apply here as well (Martin and Holland, 2014). Many of the paralogues in lamprey are younger than 2R ohnologues, suggesting some independent duplications have occurred more recently than either of the 2R WGD (Mehta et al., 2013). Some phylogenies suggest that cyclostomes and gnathostomes share both WGDs (Kuraku et al., 2009), while others support independent duplications (Force et al., 2002) or that cyclostomes diverged before either WGD (Fried et al., 2003), though these early conclusions may have been misled by long-branch attraction and hidden paralogy due to the divergent sequences and independent gene gains and losses in cyclostome genomes. Furthermore, with only one lamprey genome assembled as of 2018, the rigour of any of these conclusions was tenuous (Holland and Ocampo Daza, 2018). Because of this, the extent to which lamprey duplications may be shared with the other cyclostome lineage or if one or both of the 2R WGD occurred in the vertebrate ancestor has been contentious. Previous synteny evidence suggested that all vertebrates share at least 1R, and likely the 2R as well (Holland and Ocampo Daza, 2018;Sacerdot et al., 2018). However, with the new chromosome-level assembly of B. floridae, the distinct chromosomal fusion events that occurred between the first and second round of WGD could be discerned. Simakov et al. (2020) provide evidence that lampreys diverged after only 1R, as certain genomic rearrangements predating the 2R are unique to jawed vertebrates and lacking from lampreys, in particular the gnathostome-specific CLGE-CLGO fusion, and the observation that lamprey syntenic blocks are generally unmixed. This new assembly improves upon the extensive synteny work aimed at determining the origin of chordate karyotypes from previous studies (Nakatani et al., 2007;Putnam et al., 2008;Sacerdot et al., 2018) and provides the latest working hypothesis for the FIGURE 2 | Scenarios of the timing of the 2R WGD relative to the divergence of cyclostomes and the Hox cluster as a marker of WGDs. Pre-duplicate chordate phyla with single Hox clusters are coloured to represent the ancestral state (purple lines). The reorganised Hox cluster in tunicates is represented by a dashed line.
(A) The current consensus that cyclostomes diverged from the gnathostomes after 1R (black star) but before 2R (grey star) and underwent some independent chromosome-level duplications (black crosses) (Smith et al., 2018;Simakov et al., 2020). Remnants of 1R paralogy (red versus blue) may distinguish Hox clusters in the cyclostomes. (B) Alternate hypothesis that cyclostomes diverged before 1R and only underwent several chromosome-level duplications (Fried et al., 2003;Furlong et al., 2007) (or perhaps an independent WGD). (C) Alternate hypothesis that cyclostomes diverged after 2R and underwent few independent chromosome-level duplications (Caputo Barucchi et al., 2013;Sacerdot et al., 2018). Remnants of 1R paralogy and 2R paralogy (dark red versus pink, dark blue versus light blue) may distinguish Hox clusters in the cyclostomes. Cyclostome Hox cluster numbering and colouring does not represent actual homology [ (Smith et al., 2018) found homology between lamprey Hox clusters i. and iv., and ii. and v.]; no conclusive affinities have been found between cyclostome Hox clusters and any particular gnathostome Hox cluster.
timing of the 2R WGD and their impact on the evolution of vertebrates.

Cyclostome Peculiarities
Genomes of amphioxus and lamprey have been instrumental in understanding the origin of the vertebrates as these lineages are phylogenetic outgroups to larger chordate lineages. While amphioxus is remarkably conserved, making inferences about the chordate ancestor fairly straightforward, the lamprey genome is much more complicated. These complications, however, are interesting themselves, especially the programmed genomic rearrangements (PGRs) in the somatic tissues, presumably an adaptation reflecting the genetic conflict between germline and soma (Smith et al., 2012;Timoshevskiy et al., 2017). The hagfish genome also undergoes PGR (Nakai et al., 1991(Nakai et al., , 1995, and further resolution of genomes in this cyclostome clade should be instrumental in deciphering the mechanism of PGR. In addition, comparisons to determine homology of structures and their genetic underpinnings depend on reliable gene annotations. For instance, while amphioxus lacks myelin, lampreys were reported to have it (Smith et al., 2013), though lamprey neurons are not myelinated. In fact, the genes annotated as myelin in lamprey are likely to be in another gene family altogether (Werner, 2013). This highlights the pitfalls of automated gene annotations and the need for careful gene curation and annotation with a focus on phylogenetically informed and robust gene nomenclature.
More recently, the genome sequence of a Reissner's lamprey (Lethenteron reissneri) was assembled to higher completeness than any previous lamprey assemblies, aided by the combinatorial use of short-and long-read sequencing, and Hi-C . The protein-coding genes of this genome were annotated using a wider and more phylogenetically informed array of species, including several genomes only recently sequenced from both invertebrates and vertebrates, revealing a total of seven Hox clusters (Figure 2). This new assembly, as well as the improved sea lamprey genome (Smith et al., 2018) show that the independent duplications indicated by the seven Hox clusters are likely synapomorphic to at least the lampreys, but maybe to the cyclostomes as a whole. Comparisons of lamprey and hagfish Hox clusters revealed that cyclostomes share the loss of Hox genes from paralogy group 12 and likely independent duplications, as hagfish may have six or more Hox clusters as well (Pascual-Anaya et al., 2018). These data are taking us closer to finally determining the cyclostomes' placement in relation to the 2R WGD. The sequencing of multiple cyclostomes, particularly from hagfishes, would build on current data in order to enable further understanding of the role 2R WGD played in chordate evolution, and particularly, the evolution of vertebrate-or gnathostomespecific developmental novelties.

DUPLICATIONS IN THE HOX CLUSTER
As mentioned above, the Hox gene cluster has been a fruitful system for determining how gene duplication has affected animal development. Not only have multiple ohnologous clusters been maintained following WGDs, including 2R as well as the teleostspecific 3R and others (Hoegg and Meyer, 2005), but the cluster (and others) arose via extensive tandem duplications of a single ANTP-class transcription factor gene early in animal evolution Holland, 2013;Ferrier, 2016). Furthermore, there tends to be a link between the expression of the Hox genes and their clustered arrangement, as genes at one end of the cluster are expressed anteriorly, and expression follows successively toward the other end of the cluster, where those genes are expressed in the posterior of the embryo (Duboule and Dollé, 1989;Graham et al., 1989;Duboule, 2007;Tschopp et al., 2009). This spatial collinearity is frequently observed alongside temporal collinearity, where the activation of the genes follows the same sequence of early to late as anterior to posterior in many animals (Duboule, 1994;Kmita and Duboule, 2003;Iimura and Pourquié, 2006;Monteiro and Ferrier, 2006;Gaunt, 2018;Krumlauf, 2018;Ferrier, 2019). Hox genes are transcription factors, and their expression in the developing embryo is strictly regulated so as to provide positional identity along the anterior-posterior axis of the animal. Changes to the expression of Hox genes therefore changes rostro-caudal identity, most obviously manifested in vertebrates in the formation of the axial skeleton (Casaca et al., 2014). Between different vertebrate groups, differential expression of Hox genes drives the different identities of vertebral segments, illustrating how the Hox gene diversification directly underpins the diversity of vertebrate body-plans (Krumlauf, 1994;Burke et al., 1995). The Hox cluster perfectly illustrates the interplay between gene regulation, expression, duplication, and subfunctionalisation, and the effects of these processes on development and animal evolution.

Hox Clusters Through WGDs
Other studies have turned to the Hox cluster in order to understand the impact of the 3R WGD. Early teleost genome projects revealed excess Hox clusters in zebrafish, medaka, and two pufferfish (Hoegg and Meyer, 2005). In these, as above, the genome sequence and transcriptomics of an appropriate outgroup was essential. PCR screening of the birchir, a nonteleost ray-finned fish, revealed only four Hox clusters, which allowed the timing of the 3R WGD to be determined (Ledje et al., 2002). Another more basally branching fish lineage, the spotted gar, also has four Hox clusters (Braasch et al., 2016) while zebrafish has seven (Amores, 1998). As many as eight Hox clusters can be found in different teleost lineages, though some have been lost from different lineages (Sato and Nishida, 2010;Martin and Holland, 2017). Detection of the 3R WGD has led to speculation that WGDs may play a permissive role in the evolution of diverse lineages, as within the more than 30,000 actinopterygian species, only 50 are non-teleosts which did not undergo 3R (Near et al., 2012). The link of WGDs and subsequent species diversity has been posited for vertebrates as well (Cañestro et al., 2013), but evidence for this link is still lacking (Glasauer and Neuhauss, 2014).
A more recent WGD allows the study of rediploidisation as it is occurring. Salmonids have thirteen Hox clusters (Mungpakdee et al., 2008), indicative that this lineage has undergone a fourth whole genome duplication (4R WGD) more recently, around 88 Mya (Figure 1; Macqueen and Johnston, 2014;Lien et al., 2016). The Atlantic salmon (Salmo salar) genome was particularly difficult to assemble because of the repetitiveness created by the fourth more recent WGD, but its assembly enables our understanding of the processes of rediploidisation (Lien et al., 2016). Many of the paralogues created in the 4R WGD were able to be identified, even those that were recently pseudogenised (Lien et al., 2016). In another salmonid, the rainbow trout (Oncorhynchus mykiss), nearly half of the 4R ohnologues remain in duplicate (Berthelot et al., 2014). Many of the 4R ohnologue pairs in salmon consist of one gene that maintained the ancestral function, as conserved between the salmon and pike, while the second had a different expression pattern (Lien et al., 2016;Robertson et al., 2017). Genomes of 4R salmonids also enable detection of a category of genes retained following successive WGDs, which is enriched in genes involved in development and transcription factors (Berthelot et al., 2014). This was associated with changes to the ecology of salmonids, but did not directly cause rapid species diversification as was hypothesised for the 3R in teleosts, rather, it is correlated with the adaptation of anadromy (Macqueen and Johnston, 2014). These observations suggest that environmental adaptation may have had a larger impact on teleost diversity than the 3R WGD because there are significant discrepancies between the timing of 3R and the divergence of major teleost lineages (Donoghue and Purnell, 2005;Santini et al., 2009;Near et al., 2012), as well as 40-50 million years between the salmonid 4R WGD and their subsequent radiation (Macqueen and Johnston, 2014). The salmonid clade is still within the process of rediploidisation, and comparisons between lineages reveals that a quarter of the Atlantic salmon genome underwent rediploidisation independently of the trout, suggesting that rediploidisation occurred before and after speciation (Robertson et al., 2017). These differently resolved regions may underpin the differences in ecology amongst salmonid lineages and also help to explain the (perhaps surprising) length of time that is required post-WGD before species diversification might be observable (Robertson et al., 2017). The differences between salmon and trout karyotypes also provide support for tetralogy, the pattern of homology between genes in lineages that diverged during rediploidisation (Martin and Holland, 2014). These assemblies of salmon and trout revealed the processes of rediploidisation and provided a case study for examining the fate of recently duplicated genes in the context of a WGD.

THE FATE OF DUPLICATED GENES
Several mechanisms have been proposed for the maintenance of duplicated genes, despite the likelihood of non-functionalisation of redundant paralogues (Nakatani et al., 2007). For some gene types, increased copy number is advantageous, despite their redundancy, such as ribosomal RNAs (Zhang, 2003). Alternatively, once a dosage-sensitive network of genes is duplicated in entirety, perhaps along with the entire genome following a WGD, to preserve the balance of the network components, no loss can occur, therefore entire duplicate networks are retained (Lynch and Conery, 2000). This dosage compensation theory, however, does not result in as high an incidence of retained paralogues being related to dosage-sensitive complexes and pathways as one might expect; instead, this model might better explain a stop-gap mechanism that prevents initial loss and allows time for sub-or neo-functionalisation processes to occur (Hughes et al., 2007).
While duplicated genes are likely to be redundant initially, this redundancy presumably reduces selective constraint and allows one or both daughter genes to evolve, which explains how duplication can facilitate evolutionary novelties. Under the duplication-degeneration-complementation (DDC) model, genes undergo degenerative mutations following their duplication, resulting in the complementary partitioning of the ancestral gene's functions among the daughters (Force et al., 1999). This mechanism of subfunctionalisation is particularly relevant to developmental genes, which have complex cisregulatory regions, since degenerative mutations between the regulatory regions of the daughter genes can quickly partition their expression profiles and therefore their functions. One step further is neofunctionalisation, which suggests that the reduced constraint on paralogues allows for novel functions to evolve in one or both of the paralogues, including new upstream transcription binding sites or even protein domains [sometimes referred to as the duplication-degeneration-innovation (DDI) model] (Jimenez-Delgado et al., 2009). The role of DDC or DDI in the evolution of developmental genes is supported by the overrepresentation of transcription factors among retained paralogues, the often complicated cis-regulatory regions these transcription factors have, and the fact that expression rather than sequence changes are the primary observed differences between different animals' developmental genes.

Evidence of Subfunctionalisation of 3R Paralogues
Evidence supporting the role of DDC, or DDI, in chordate evolution is growing. Large RNA-seq datasets have been instrumental in detecting subfunctionalised ohnologues, for instance the reduced expression of genes retained in duplicate compared to single orthologues in mammals (Qian et al., 2010). One of the largest datasets of expression data of paralogues and their pro-orthologues in another species comes from the gar genome project. The spotted gar is a ray-finned fish whose lineage diverged from the teleosts before the 3R WGD that characterises this latter clade (Figure 1; Braasch et al., 2016). This makes it an ideal outgroup to understand the impact of the 3R WGD on teleost evolution that has been implicated in possibly permitting the huge diversity of fishes (Hoegg et al., 2004;Sato and Nishida, 2010). This model suggests that speciation occurs following WGD, but before the genomes undergo rediploidisation, so that changes to genome architecture like chromosomal fusions reinforce speciation (Taylor et al., 2001). Because it has not undergone the 3R, but still lies within Actinopterygii, the spotted gar possesses pro-orthologues of the duplicated 3R ohnologues. Thus, expression of gar genes is expected to be similar to that of the ancestral gene. Between the gar and teleost transcriptomes, each of the two teleost 3R ohnologues were expressed at lower levels than the gar pro-orthologue, but the pooled expression of the ohnologue pair was similar to the gar gene's expression (Braasch et al., 2016). This pattern suggests that following 3R, many teleost ohnologues underwent subfunctionalisation in accordance with the predictions of the DDC model, and for certain gene families, subfunctionalisation amongst paralogues has been traced to specific regulatory regions.
These inter-species comparisons are giving us concrete examples of the DDC process. For example, comparisons between mammals and zebrafish revealed that in the genepoor region around the Pax6 locus, there are cross-functional conserved non-coding elements (CNEs) that have been retained in a patchwork pattern between the zebrafish Pax6 paralogues relative to the single mouse Pax6 (Navratilova et al., 2009). The zebrafish, a 3R teleost, has two Pax6 paralogues: pax6a and pax6b, which are expressed in overlapping yet complementarily constricted patterns relative to tetrapod Pax6; pax6a is expressed widely in the eye and brain, while pax6b is found in the eye, a restricted section of the developing brain, and the pancreas (Kleinjan et al., 2008). The changes to expression of the two ohnologues was traced to the differential loss of certain cis-regulatory regions around the two genes; brain-specific elements from pax6b and the pancreas control element from pax6a (Kleinjan et al., 2008). This pattern of retention of the two Pax6 ohnologues in zebrafish is consistent with the DDC framework of subfunctionalisation, illustrating the role of cis-regulatory elements in this process.

Specialisation of Paralogues
Among daughter paralogues, subfunctionalisation processes are not necessarily symmetrical. Specialisation results in one paralogue with a function more similar to that of the ancestral gene, while the other paralogue's function diverges, either to a small subset of the ancestral function, or even the adaptation of new functions (Farrè and Albà, 2010;Marlétaz et al., 2018). Specialisation of ohnologues has been detected often in the past but has been described by several different mechanisms, corresponding to the different mechanisms of paralogue retention following duplication. Previous studies focusing on protein sequence evolution and detecting neofunctionalisation have found examples of specialisation in protein function (Chain and Evans, 2006;Steinke et al., 2006;Sémon and Wolfe, 2008). This takes place in accordance with the process described as qualitative subfunctionalisation, driven by adaptive evolution of the paralogue's protein sequence (Espinosa-Cantú et al., 2015). For many developmental genes, however, the DDC hypothesis, which falls under quantitative subfunctionalisation and occurs by neutral processes affecting gene regulation, is more relevant (Espinosa-Cantú et al., 2015;Braasch et al., 2018). Following duplication, for some gene families, one paralogue retains widespread expression across several domains consistent with the inferred ancestral function, while the other's function is reduced to a specialised domain. This process is linked to the changes to regulatory domains occurring primarily on one paralogue instead of symmetrically across both (during the degeneration and complementation aspects of the DDC), though this was detected for the most part in genes with complex or widespread ancestral expression patterns (Marlétaz et al., 2018). Some studies of expressed sequence tag (EST) data between the polyploid frog Xenopus laevis and the non-duplicate species X. tropicalis found that many paralogues exhibited subfunctionalisation, but others had patterns where only one of the two paralogues was greatly reduced in expression (Hellsten et al., 2007). Also, for many pairs of 4R ohnologues in the salmon, one retained a broader more ancestral expression pattern, while only the other ohnologue changed, often even neofunctionalising (Lien et al., 2016). These observations of specialisation are consistent with an asymmetric mechanism of the DDC, where one ohnologue's regulatory region degenerates much more than the other.
A more recent detection of this pattern extends more broadly across the amphioxus transcriptome, with comparisons to multiple vertebrates (Marlétaz et al., 2018). For certain ohnologues, new regulatory elements were found near the specialised gene, indicating a role for neofunctionalisation as well as the asymmetrical loss of one ohnologue's "ancestral" regulatory elements (Marlétaz et al., 2018). These studies rely on broad overviews of gene expression from extensive transcriptome datasets, so may be unable to detect more focused and specific changes to gene function and expression between particular ohnologues, such as qualitative changes to protein sequences described above. Furthermore, specialisation may only be detectable for genes whose ancestral function or expression was widespread and complex (pleiotropic), while other studies detecting asymmetrical changes in expression amongst paralogues with more specialised or localised functions ascribe the process to sub-or neo-functionalisation. The mechanisms resulting in more symmetrical patterns of subfunctionalisation, namely the relaxed constraint provided by redundancy allowing degenerative mutations to accrue in regulatory regions, are not required to affect each paralogue equally, so patterns described as specialisation or neofunctionalisation can also occur by the same logic, if for some reason one paralogue is affected more than the other. This asymmetry amongst paralogues can also explain how novelties arose via co-option of redundant ohnologues to novel functions, through changes in gene regulation.

Regulatory Elements and Subfunctionalisation
Advancements in bioinformatic approaches to comparative genomics have facilitated the detection of CNEs, aiding in the identification of the regulatory changes between paralogues. The patterns of retention of these CNEs, many of which are situated within the introns of neighbouring genes, may have important consequences for the evolutionary maintenance of the synteny of gene clusters or highly conserved genomic neighbourhoods, in which many developmental genes can be found. For example, VISTA plots show that the conserved regulatory elements for Pax6 are situated in the introns of several gene neighbours across vertebrates (Kleinjan et al., 2008;Navratilova et al., 2009), providing the selective pressure to maintain these gene clusters. This can be seen in many other developmental gene clusters, including the MyoD-related genes that are key for muscle development. In most non-teleost 2R vertebrates, there are four myogenic regulatory factors (MRFs; MyoD, Myog, Myf5, and Myf6), two of which, Myf5 and Myf6, reside in a closely linked cluster and share many overlapping regulatory elements (Braun et al., 1990;Carvajal et al., 2008). While in zebrafish these genes act early in myoblast determination and later in myoblast differentiation, respectively, so that Myf5 expression is followed by that of its neighbour, Myf6, in mice Myf6 also has an early role in determination and so has a biphasic expression pattern (Kassar-Duchossoy et al., 2004;Schnapp et al., 2009;Moncaut et al., 2013;Hernández-Hernández et al., 2017). This early versus late distinction is exemplified by MyoD binding the Myog promoter (Tapscott, 2005). Deletion of the minimal Myf5 promoter results in misexpression of Myf6 in Myf5 regions, thus it is clear that the regulatory domains of these two genes are intrinsically linked (Carvajal et al., 2008).
Mirroring this cluster, a cryptic fifth MRF ohnologue is located adjacent to MyoD in the genomes of the coelacanth, sterlet, and spotted gar (Aase-Remedios et al., 2020). Only with the recent availability of the genomes of these "non-model" organisms could this fifth ohnologue, Myf7, be found, since it has been lost from most other vertebrate lineages to which common model organisms belong, including tetrapods, cartilaginous fish, and teleosts. The finding of this fifth vertebrate MRF upended previous interpretations of the evolution of the four MRFs shared amongst all vertebrates. Instead of a single ancestral gene, duplicated in 2R into four ohnologues, and a highly unusual translocation event, it is much more likely that there was an ancestral two-gene state resulting from a tandem duplication that generated the two types of MRF, early and late (Aase-Remedios et al., 2020). This cluster nevertheless shares synteny with the MyoD locus across the vertebrates, potentially as a result of constraint on regulatory elements that overlap with the MRFs and their gene neighbours. Other developmental gene families also have so-called "cryptic" vertebrate paralogues analogous to Myf7, that have been lost from common model system lineages and found only with the availability of genomes from more basally branching species (Kuraku et al., 2016), such as Bmp16 (Feiner et al., 2019) and Foxl2B (Geraldo et al., 2013). The wider availability of sequencing has allowed more phylogenetically informative outgroups and lineages to be studied, changing our understanding of chordate evolution.

Small-Scale Independent Duplications in Chordate Lineages
Wider taxon-sampling has enabled the detection of independent duplications in certain lineages overlaid on the larger general trend of various WGDs. In many important developmental gene families, amphioxus has undergone its own expansions of gene clusters, formed by tandem duplications (Minguillón et al., 2002). This ranges from expansions within existing clusters, possibly like the Posterior Hox gene AmphiHox15 within the Hox cluster , to the advent of new clusters, like the expansion of the amphioxus MRFs (Schubert et al., 2003;Urano et al., 2003;Yuan et al., 2003;Somorjai et al., 2008;Tan et al., 2014;Aase-Remedios et al., 2020). The five genes in this cluster are expressed in different temporal and spatial patterns during amphioxus muscle development, indicating these genes have subfunctionalised as well, providing an independent case of MRF subfunctionalisation from that seen with the vertebrate MRFs (Aase-Remedios et al., 2020). In the Pax gene family of developmental transcription factors, we see a comparable case of independent subfunctionalisation between amphioxus and vertebrates. Amphioxus has, independently from vertebrates, duplicated the ancestral chordate pro-orthologue of Pax3/7 to produce two Pax3/7 trans-homologues of vertebrate Pax3 and Pax7, which have again evolved distinct expression patterns (Barton-Owen et al., 2018). Being that in vertebrates, Pax3 and Pax7 are upstream regulators of the MRFs, perhaps this duplication is acting in concert with the MRF expansion, as an expanding network of muscle regulators. These observations complicate the overall relationship between the pre-duplicate amphioxus genome and vertebrate genomes shaped by 2R WGD and require careful "mining" of the latest genome sequence data and close consideration of gene phylogenies. Overall, these cases show that although amphioxus does indeed provide us with an excellent proxy for much of the chordate ancestor, this animal also has its own unique evolutionary history. This needs to be carefully considered in order to properly infer the features of the chordate ancestor.

Lineage-Specific WGDs
Wider taxon-sampling allows us to study not only phylogenetically informative lineages, but also species that serve as case studies for interesting phenomena like more recent lineage-specific WGDs. Recent sequencing efforts have provided a characterisation of the WGD in the sterlet lineage (Cheng et al., 2019;Du et al., 2020). Not only does this genome provide an outgroup comparison to 3R teleosts as a basal osteichthyan (Figure 1), but because sturgeons are slow-evolving, they have retained ancestral characteristics possibly remnant of the bony fish ancestor. Evidence of several levels of polyploidy has been observed in different sturgeon groups (Vasil'yev et al., 1980;Fontana et al., 2008), but the WGD identified at the base of the sturgeon lineage provides a case study to examine the process of rediploidisation. In the sturgeon, while large chromosomes were retained in duplicate, many of the second copies of the smaller chromosomes were lost, meaning some chromosomes are tetraploid, while others are diploid (Romanenko et al., 2015;Du et al., 2020). In contrast, in 4R salmonids, ohnologues were lost gene by gene (Lien et al., 2016). The sterlet genome's mobilome shows identical transposable element content between each haplotype, suggesting an origin by autopolyploidy (Du et al., 2020). Comparisons to allopolyploid genomes could reveal the different mechanisms of genomic rediploidisation following the different modes of duplication; perhaps an autopolyploid genome allows the loss of entire chromosomes, while allopolyploid genomes are constrained by differences between haplotypes, and therefore losses occur on smaller scales.
An important allopolyploid is the African clawed frog (Xenopus laevis) (Kobel and Du Pasquier, 1986;Session et al., 2016). This frog has a diploid chromosome number of 36, almost twice the number of its congeneric X. tropicalis, with 20, and arose via hybridisation of two 18n frogs that had undergone chromosomal fusion between homologues of X. tropicalis chromosomes 9 and 10 (Evans, 2008). There is extensive conserved homology between the two sub-genomes in X. laevis and the diploid X. tropicalis, reflecting the recentness of this duplication, and only 17% of duplicated genes have been lost (Uno et al., 2013), in contrast to ∼80% of teleost 3R ohnologues for example (Brunet et al., 2006). The mechanism for retention in the frog may be dosage-based, though some paralogues may also have subfunctionalised (Charbonnier et al., 2002;Hellsten et al., 2007;Sémon and Wolfe, 2008), and different types of genes are lost at different rates (Session et al., 2016). Notably, the karyotype is stable, unlike in the sterlet, probably due to the lack of recombination between the two X. laevis sub-genomes (S and L), though more genes are lost from sub-genome S, which has also undergone more intra-chromosomal rearrangements (Session et al., 2016). Altogether, this sheds light on the patterns of rediploidisation in an allopolyploid, where each sub-genome is more intact and independent, as opposed to the sterlet example of an autopolyploid, where the polyploid genome is truly redundant and entire chromosomes can be lost.
These differences in rediploidisation are relevant more widely, as the 2R WGD at the base of the vertebrates is thought to have taken place with an initial autopolyploidy, then subsequent allopolyploidy of two descendants of that first polyploid (Simakov et al., 2020). This is supported by the genomic rearrangements and symmetrical gene losses that follow 1R, also seen in the rediploidisation process in the sterlet. These rearrangements are shared with lampreys, but the asymmetric gene losses and independence of haplotypes characteristic of allopolyploids are not. Ancient duplications can now be understood in more detail with our better understanding of the different rediploidisation processes, only now possible with the wider taxon-sampling of case-study species with lineagespecific WGDs. For instance, only 0.2% of the diversity of amphibian genomes has been sequenced, while over 8% of mammal genomes have been sequenced, and roughly the same number of genomes are available for teleosts as sauropsids, though there are twice as many teleost species (Figure 1B). This has changed remarkably over the past few decades with the rapid improvements in genome sequencing, and is set to improve still further with initiatives like the Earth Biogenome Project (Lewin et al., 2018), which should release us from our anthropocentric concentration on only biomedically and economically relevant species and allow us instead to obtain an unbiased overview of genomic biodiversity and the processes that generated it.

NEW GENOMES AND TRANSITIONS WITHIN VERTEBRATES The Origin of Tetrapods
Of these more recently sequenced species, the coelacanth genome has been particularly instrumental in addressing the origin of tetrapods and the transition to land, as it is the most basal-branching sarcopterygian, to which tetrapods also belong (Figure 1). Likewise, the lungfish genome has been instrumental in revealing the genomic changes underpinning this transition, as it is the closest living relative to tetrapods (Figure 1; Meyer et al., 2021). These groups make more suitable comparisons to tetrapods than commonly used teleost outgroups, since they have not undergone the teleost-specific 3R WGD and have the fourfold paralogy of a 2R vertebrate (Koh et al., 2003;Noonan et al., 2004;Amemiya et al., 2010). They also allow observed differences between tetrapods and teleosts to be timed specifically to the tetrapod or sarcopterygian lineages (Figure 1). Many cryptic vertebrate paralogues have been found in such lineages, and indeed more than fifty developmental genes were found in the coelacanth that had been lost in tetrapods (Amemiya et al., 2013), including HoxA14 (Amemiya et al., 2010). While these gene losses corresponded to the transition to land, several thousand CNEs near genes involved in morphogenesis and development were identified that evolved anew in the tetrapod lineage (Amemiya et al., 2013).
Looking more closely at developmental genes to address the morphological evolution of limbs in the transition to land, the genomes of coelacanth and lungfish reveal changes to the Hox clusters that may underpin these morphological changes. The coelacanth genome study identified a limb development enhancer in the gene desert upstream of the HoxD cluster shared between tetrapods and the coelacanth but absent in the teleost comparison, suggesting this sequence could have been co-opted into lower limb development in tetrapods from an element present in the aquatic ancestral sarcopterygian (Amemiya et al., 2013). In lungfish, HoxA14 is also present, indicating its loss occurred in tetrapods, and HoxC13 expression was detected in distal fins, showing the early evidence of its co-option to tetrapod limb development (Meyer et al., 2021). Also implicated in the adaptation to land is an enhancer near HoxA14 in coelacanth that controls the expression of Posterior Hox genes, though HoxA14 itself is lost in tetrapods. This enhancer has evolved to be essential for the formation of the placenta (Amemiya et al., 2013). This analysis revealed the impact of new regulatory elements in major transitions in the vertebrates, which are key players in the differentiation among paralogous loci with the potential to evolve different or new functions because of genetic redundancy.
Gene duplications also characterise this transition, particularly in gene families involved in adaptation to land. The lungfish genome contains a gene encoding a respiratory surfactant, Sftpc, which originated via duplication earlier in the sarcopterygian lineage, and further duplicated in lungfish and tetrapods . Also expanded in the lungfish genome are several gene families for olfactory receptors to detect airborne chemicals as well as the vomeronasal receptor family, showing the timing of this adaptation coincides with the adaptation to land (Meyer et al., 2021). Likewise, in comparison with the coelacanth, many tetrapod-specific CNEs were identified near genes involved in olfaction (Amemiya et al., 2013), showing that for the adaptation to land, both gene duplications and changes to gene regulation likely contributed to some extent. These candidate genes and potential regulatory elements may represent some of the genomic changes underpinning the evolution of land vertebrates and were only found with new comparative genomics approaches and new genome sequences. In particular, the lungfish genome is the largest of any animal, so only with new sequencing techniques, in this case ultra-long-read Nanopore technology, could a genome with this quantity of repetitive elements be sequenced (Meyer et al., 2021), enabling a better understanding of the origin of tetrapods.

The Origin of Mammals
A major transition within the tetrapods was the evolution of mammals, most of which are warm-blooded, viviparous, milkproducing, and have fur, distinct from the reptilian ancestor. Of particular relevance to this transition are the monotremes, the basal lineage of mammals who simultaneously exhibit both mammalian and reptilian traits, including milk production and fur, but also have oviparity and produce venom like reptiles, thus providing an excellent point of comparison to address the genetics supporting mammalian adaptations (Warren et al., 2008). There are two extant monotreme lineages, the platypuses and echidnas (Figure 1). When the draft genome of the platypus was sequenced, platypuses were found to have many microchromosomes like in reptiles (Warren et al., 2008). Nearly three quarters of the first genome sequence was unmapped to chromosomes, but now, with advancements like long-read sequencing and Hi-C, the platypus genome is assembled to the chromosomal level and the echidna genome was much improved as well (Zhou et al., 2021). These new assemblies allow us to address monotreme traits like their many sex chromosomes, electromagnetic sensory organs, and venom. Mammalian venom evolved convergently to reptilian venoms, originating in monotremes by tandem duplication of defensin family genes in glands that were originally sweat glands, similarly to the evolution of shrew venom and in contrast to snake venom, which originated by duplication of other defensins in salivary glands (Whittington et al., 2008). The many sex chromosomes in monotremes evolved from the same autosomes as bird sex chromosomes (ZW female and ZZ male) rather than the SRY (XX female, XY male) sex determination genes of therian mammals, which remain autosomal in monotreme genomes (Veyrunes et al., 2008). The new assembly recharacterised the 10 platypus sex chromosomes and found female-biassed expression of sexually differentiated regions of these chromosomes, consistent with a heterozygous female, and allowed the reconstruction of the series of recombinations and fusions that occurred in their formation (Zhou et al., 2021). This showed that the therian sex determination system evolved independently from the more reptile-like system in monotremes.
A major advancement within mammals was the evolution of viviparity. Monotremes lay small eggs that hatch early in development, as young rely primarily on milk for nutrition, as in therians. The first platypus genome enabled the identification of the genes involved in the transition from oviparity to viviparity, including the retention of reptile-like ZPAX genes and nothepsin, but paired with the loss of vitellogenin genes, leaving only one in the platypus and none in therians, compared to three in birds (Warren et al., 2008). Casein genes are integral to milk production in mammals and evolved via tandem gene duplications of genes in the secretory calcium-binding phosphoprotein family (Kawasaki et al., 2011). Monotremes not only have the full complement of therian caseins, but have a second CSN2 paralogue (Lefèvre et al., 2009). The new assemblies of platypus and echidna genomes have built on these previous findings, including the identification of a CSN3 paralogue and determining the origin of these genes from teeth-related genes based on their syntenic locations (Zhou et al., 2021). A comparative genomics study of the new platypus assembly also revealed that genes involved in the development of the therian placenta were co-opted from genes with other functions, and this transition was underpinned by changes to gene regulation and expression in old genes (Hao et al., 2020). Taken together, these studies show that only with new, chromosome-level assemblies of phylogenetic outgroups can the genetic basis of transitions in chordate evolution be studied.

CONCLUSION
The age of genomics has made sequencing easier and cheaper than ever before, and important advancements such as longmolecule technologies and Hi-C are enabling chromosomelevel assemblies for not just traditional model organisms, but also those of particular evolutionary importance. These advances have allowed us to more accurately understand transitions in evolution, for instance using the chromosome-level assembly of B. floridae to determine the specific chromosomal rearrangements in and around the 2R WGD. Comparative genomics is now equipped with the material with which to determine the genetic changes that underpin these major evolutionary transitions. It is clear that both gene and genome duplications have been instrumental in the evolution of chordate lineages, from the revolutionary 2R WGD around the origin of the vertebrates, to smaller scale expansions of developmental gene clusters, and tandem duplications underpinning changes to particular lineages or in specific gene families. These duplications can be linked to functional changes as studies of gene expression have revealed the role of regulatory regions, non-coding RNAs, and epigenetics in the evolution of chordate genomes. The continuously improving sequence resources for key outgroup lineages, including amphioxus, cyclostomes, coelacanth and lungfish, and the monotremes, have enabled us to detect gene and genome duplications, pinpoint their timing in the tree of chordates, and to infer the evolutionary impact of these events.