Corticotropin-Releasing Factor: An Ancient Peptide Family Related to the Secretin Peptide Superfamily

Corticotropin-releasing factor (CRF) is the hypothalamic releasing peptide that regulates the hypothalamic-pituitary-adrenal/inter-renal (HPA/I) axis in vertebrates. Over the last 25 years, there has been considerable discussion on its paralogs genes, urotensin-I/urocortin-1, and urocortins-2 and-3 and their subsequent role in the vertebrate stress response. Phylogenetically, the CRF family of peptides also belong to the diverse assemblage of Secretin- and Calcitonin-based peptides as evidenced by comparative-based studies of both their ligand and G-protein-coupled receptor (GPCR) structures. Despite this, the common origin of this large assemblage of peptides has not been ascertained. An unusual peptide, teneurin-C-terminal associated peptide (TCAP), reported in 2004, comprises the distal extracellular tip of the teneurin transmembrane proteins. Further studies indicated that this teneurin region binds to the latrophilin family of GPCRs. Initially thought to be a member of the Secretin GPCR family, evidence indicates that the latrophilins are a member of the Adhesion family of GPCRs and are related to the common ancestor of both Adhesion and Secretin GPCR families. In this study, we posit that TCAP may be a distantly related ancestor of the CRF-Calcitonin-Secretin peptide family and evolved near the base of metazoan phylogeny.


INTRODUCTION
Corticotropin-releasing factor (CRF) is the critical hypothalamic releasing factor that regulates the hypothalamus-pituitary-adrenal/inter-renal (HPA/1) axis in vertebrates, yet after some 40 years after its discovery, numerous questions still exist regarding when, why, and how this peptide evolved. We hypothesize that due to the high level of primary structure similarity among CRF paralogs and related peptide lineages (e.g., calcitonin, secretin) there was likely an ancestor peptide common to this cluster. We further suggest that the "teneurin C-terminal associated peptides" (TCAP) represent an extant candidate lineage related to the hypothetical common ancestor.
The discovery of CRF in the early 1980s (1) occurred about the same time as the discovery of other peptides of similar structure [sauvagine (2); urotensin-I (3)]. Later, Vale and his laboratory characterized a mammalian version of sauvagine/urotensin-I in rat brain that they termed, urocortin (4). Further phylogenetic studies suggested that mammalian urocortin, amphibian sauvagine, and fish urotensin-I were orthologs of the same gene (5). In 2001, the structures of two novel related peptides were reported by the Vale laboratory who named the peptides, urocortin 2 and 3 (6,7) and by Hsu and Hsieh (8) who termed the peptides as "stresscopin" and "stresscopin-related peptide." These novel CRF family homologs were subsequently established to be a separate paralogs lineage of CRF and the urotensin-I/sauvagine/urocortin grouping (5,(9)(10)(11)(12)(13). In parallel to studies of vertebrate CRF isoforms, the presence of related peptides were reported in insects and arthropods (12,(14)(15)(16). Therefore, the high degree of structural similarity among CRF-like peptides in both deuterostome (e.g., chordates) and protostomes (e.g., arthropods) indicated that an ancestral peptide with CRF family primary structure attributes was present before the bifurcation of these metazoan lineages. Importantly, this ancestral peptide appeared to exist in a physiologically mature form indicative of a distant lineage that likely radiated as other ancestral peptides with distinct but overlapping functions. The identity of these hypothetical ancestral peptides has remained elusive, however, it is plausible that these lineages led to the evolution and expansion of the secretin and calcitonin family of peptides (11,12).
The Secretin superfamily of peptides is a diverse assemblage of peptide lineages with overlapping functions utilizing structurally related receptors. The nomenclature describing the phylogeny of the secretin grouping of peptides and receptors is confusing. In order to clarify this, we have used the term "secretin family" to denote those peptides that are thought to be part of a direct monophyletic clade (e.g., secretin, PACAP, VIP, and glucagon paralogs). For the inclusion of the wider group which include CRF and calcitonin, we have referred to this as the "Secretin superfamily." Due, in part, to the similarity and structural conservation of their cognate receptors, the Secretin family G-Protein Coupled Receptors (GPCR) was defined as a distinct clade (17). The Secretin superfamily of peptides is one of the five main families of ligands that bind to GPCRs. The GPCRs have most recently been classified into five main families using the GRAFS system; Glutamate (G), Rhodopsin (R), Adhesion (A), Frizzled/Taste2 (F), and Secretin (S) (17). Notably, both CRF and calcitonin receptors are included within the Secretion GPCR family. Among these, Adhesion and Secretin GPCRs are the most evolutionarily ancient (18). Adhesion GPCRs have a characteristically long N-terminus rich in serine and threonine residues whereas Secretin GPCRs have a characteristic hormonebinding domain (HBD) in their N-terminal region (18). Secretinrelated receptors form a single monophyletic clade that derived from the Adhesion GPCRs (18,19). Adhesion GPCR genes have been identified in choanoflagellate and sea anemone genomes but Secretin GPCR genes have not suggesting that Adhesion GPCRs are more evolutionarily ancient than Secretin GPCRs (18). Interestingly, some derived phylogenetically younger Adhesion GPCR members possess an HBD with highly conserved amino acid sequences and similar splice site motifs as Secretin GPCRs. These observations led, in part, to the hypothesis that the Secretin GPCR clade was derived from an offshoot of the Adhesion GPCR lineage. However, although the data linking the Adhesion and Secretin superfamilies were compelling, evidence of a structurally related peptide ligand linking the two receptor clades was lacking.
One such lineage of Adhesion GPCRs that does possess a HBD with similar structural motifs to Secretin GPCRs are the latrophilins (LPHN) or ADGRL (Adhesion G-protein coupled receptors, subfamily L). It was originally considered a new type of Secretin GPCR, due to its characteristic HBD, but has now been re-classified as an Adhesion GPCR (17). The first identified ligand for ADGRL was α-latrotoxin, a peptide component of black widow spider toxin venom that specifically targets vertebrates (20) and shares major sequence similarity with other Secretin superfamily ligands (21). The data suggest that these peptides have a common origin. Although, α-latrotoxin was an exogenous ligand, the high affinity binding of this soluble peptide to ADGRL indicated that this receptor had the potential to bind and be activated by an endogenous peptide similar to the structure of αlatrotoxin. The search for this theoretical ligand led to the identification of the teneurin transmembrane proteins as a likely suspect.
Several recent studies established that the distal region of the extracellular domain of the teneurin transmembrane proteins binds ADGRL with high affinity and activates the receptor. Silva et al. (22) first discovered that teneurin-2, expressed on post-synaptic dendritic branches, binds LPHN-1 expressed on pre-synaptic nerve terminals to form a trans-synaptic complex. Similar trans-cellular interactions were observed between teneurins-2 and 4 and all three LPHNs (23) and between teneurin-1 and LPHN-3 (24). A C-terminal fragment of teneurin-2, named Lasso, triggered an increase in cytosolic Ca 2+ in Nb2a cells overexpressing LPHN-1 and in pre-synaptic nerve terminals of hippocampal cells (22). This distal region of the teneurin extracellular domain contains a peptide-like sequence termed "teneurin C-terminal associated peptide" (TCAP). The TCAPs are a family of four bioactive peptides that are 40-41 amino acids in length and are located at the C-terminus of each of the teneurin transmembrane proteins (25,26). TCAPs possess a cleavage motif at the N-terminus and an amidation motif at the C-terminus (27) and may be autolytically cleaved from teneurins upon binding with LPHN (28,29). TCAP shares about 20% sequence similarity with CRF and calcitonin, members of the Secretin superfamily of ligands, suggesting a common evolutionary origin (27). Moreover, our laboratory has recently identified that teneurin C-terminal associated peptide (TCAP)-1 is likely an endogenous ligand that interacts with the HBD of LPHN (30).
Therefore, as TCAP binds to an Adhesion GPCR and shares sequence similarity to CRF and calcitonin, ligands that bind to Secretin GPCR receptors that are classified as being most closely related to ancestral Adhesion GPCRs, this prompted the investigation of TCAP as a progenitor of the Secretin superfamily. The hypothesis that the teneurin-TCAP system is an ancient system that arose prior to the emergence of the Metazoa as a result of a horizontal gene transfer (HGT) event from a prokaryote to a choanoflagellate ancestor has previously been raised (31)(32)(33). However, the TCAP family has not been previously examined. Thus, TCAP may be associated with an early evolving lineage of peptides that is a sister lineage to the CRF, calcitonin, and secretin families of peptides (11,34). We therefore examined the phylogenetic relationships of these peptides using TCAP as an outgroup.

Collection of Sequences
Peptide sequences of Secretin GPCR ligands, including CRF, calcitonin and secretin families, and Adhesion GPCR ligands, including TCAP 1-4, as well as reference groups including neuropeptide Y (NPY) and insulin were collected among a range of extant protostomes and deuterostomes, using the GenBank genome sequence analysis program on the NCBI website. The peptides were organized by organism, phylum, class, and order and were tabulated and their accession numbers were recorded ( Table 1). Sequences were divided into pre-propeptides (or propeptides for TCAP) and mature peptides, after which were imported to MEGA 6.0 for analysis (38). Downloaded from http://www.megasoftware.net/.

Sequence Alignments
Peptide sequences were aligned using the MUSCLE algorithm (39). The alignment was examined, reviewed for duplicate sequences using pairwise distances (d = 0.0 was identical) and excess sequence was cut at both 5 ′ and 3 ′ ends, as these fragments did not contribute to the alignment. Modifications to the alignment were made to ensure that the characteristic residue motifs were conserved. This included highly conserved cysteine (C), tryptophan (W), arginine (R), and lysine (K) residues throughout as well as motifs characteristic of each family. For the CRF family this was the 5 ′ leucine (L), serine (S), and the 3 ′ asparagine (N) motif that is conserved throughout the entire family, the "TCV" or "TCXV" motif that is conserved among the calcitonin family and the "PELAD" motif that is conserved among the TCAP family.

Phylogenetic Analysis
Phylogenetic tree construction and statistical analyses were carried out in MEGA 6.0 (38). A multi-step approach was undertaken in order to understand the relationship of each family relative to TCAP prior to conducting a comprehensive analysis of all of the families.

Maximum Likelihood (ML) Method
The amino acid substitution model and the rate among sites were both chosen based on the model that resulted in the greatest log likelihood, the lowest Akaike Information Criterion (AIC) and the lowest Bayesian Information Criterion (BIC), parameters calculated by MEGA 6.0. To ensure the most accurate analysis, these parameters were calculated for each constructed tree. The model that maximized the log likelihood was used for analysis. A partial deletion of sequences with too many gaps/missing data was applied with a cutoff of 95%, so sites that were not found in at least 95% of sequences were not used toward the analysis. The applied heuristic method was Nearest-Neighbor Interchange (NNI), so the initial trees were obtained using the NJ method to a matrix of pairwise distances estimated using a JTT model. Reliability of the tree was tested using 1,000 bootstrap replicates.

Pre-propeptide and Mature Peptide Analysis
Two sets of analyses were performed. The first involved Secretin superfamily pre-propeptides, which are composed of a signal, cryptic, and mature peptide and TCAP propeptides, as TCAP does not possess a signal peptide. Given the functional importance, bioactivity, and high level of conservation throughout evolution, a second separate analysis was performed on mature peptides of both Secretin superfamily and TCAP family members.
For analysis involving Secretin superfamily pre-propeptides and TCAP family propeptides, a total of 181 amino acid sequences were used, with a total of 44 positions in the final dataset after all positions with <95% site coverage were eliminated.

Mature Peptide Analysis
For analysis involving Secretin superfamily mature peptides and TCAP mature peptides, a multi-step analysis was undertaken in order to elucidate the relationships of each family with respect to one another and TCAP. As insulin has a tertiary structure where the peptide folds and the two mature chains are connected by sets of disulfide bonds from the cysteine residues (40), the mature peptide had to be divided into A and B chains for the purpose of this analysis. Due to the high sequence conservation of NPY that may have resulted in the odd placement of the NPY reference group in the pre-propeptide analysis and given that the NPY mature peptide is even so more highly conserved, it was not included as a reference group in the analysis of mature peptides.

Evolutionary Analysis of Pre-propeptides and Mature Peptides of Secretin Superfamily and TCAP Family Members
Phylogenetic analysis of CRF, calcitonin, and secretin prepropeptide families and TCAP family propeptides revealed that each family formed a distinct group. TCAP, CRF, and secretin families form distinct clades and insulin forms a sister group with the calcitonin family (Figure 3). Also, CRF and calcitonin are closely related sister lineages and they, in turn, form a sister lineage to the secretin family. TCAP, the putative progenitor, is most distantly related to these families relative to their relationships to one another.
A separate analysis was performed with mature peptide sequences of the Secretin superfamily and TCAP mature peptides due to their high conservation and functional importance throughout evolution. Phylogenetic analysis of calcitonin mature peptides, insulin A and B mature chains and TCAP demonstrated that calcitonin and insulin families are sister lineages (Figure 4). Insulin A chains were more closely related to the calcitonin family than insulin B chains (Figure 4). Phylogenetic analysis of calcitonin, insulin A and B chains, CRF, and TCAP mature peptides confirmed that calcitonin and insulin families were sister lineages and that CRF formed a separate group to these two families ( Figure 5). Lastly, phylogenetic analysis of calcitonin, insulin A and B chains, CRF, secretin, and TCAP mature peptides revealed that calcitonin and insulin families were sister lineages and that both CRF and secretin formed separate groups from these two families ( Figure 6). Therefore, the multi-step mature peptide analysis confirmed that insulin and calcitonin are sister lineages, that form distinct groups from CRF and secretin families and in turn, that the TCAP family is a distinct clade from Secretin superfamily members.

DISCUSSION
In this study, the TCAP family is presented as a putative progenitor of the Secretin superfamily of ligands for the first time. The evolutionary relationships among the receptors of these peptides are well-established (18,19). However, the relationships among members of the Secretin superfamily of ligands as well as a progenitor for this family of peptides have not been elucidated. We considered TCAP as a putative progenitor of the Secretin superfamily for the following reasons. First, evolutionary relationships among the receptors of these ligands demonstrate that Secretin GPCRs derived from Adhesion GPCRs (19) and as TCAP-1 binds to LPHN, an Adhesion GPCR with a HBD characteristic of Secretin GPCRs (17). It is possible that a similar course of evolution occurred for the ligands. Second, the sequence similarity that TCAP shares with CRF and calcitonin (27), both Secretin superfamily members whose receptors are the most closely related to Adhesion GPCRs, suggests that these peptides may have evolved from TCAP, a candidate progenitor peptide.
The teneurin-TCAP system is well-established as being evolutionarily ancient. Evidence suggests that this system arose before the Metazoa evolved about 1 billion years ago and prior to the emergence of the Secretin superfamily that arose around the time of the protostome-deuterostome divergence, about 600 million years ago. As a result, although the TCAP sequence shows some amino acid similarity with the Secretin superfamily, there are a number of differences indicating that the two lineages are evolutionarily divergent. Indeed, we could not determine any significant binding or activation capacity of TCAP with any members of the Secretin GPCRs [(11, 34); Lovejoy, unpublished observations]. In contrast, TCAP binds to the latrophilin HBD and activates this receptor [(30); Reid et al., submitted]. As proposed by Zhang et al. (33), the teneurin-TCAP system likely evolved from a polymorphic proteinaceous toxin (PPT) gene that arose as a result of a HGT event from a prokaryote to a choanoflagellate, a primitive unicellular organism. Importantly, the teneurin gene has been identified in the choanoflagellate, Monosiga brevicollis (32). Choanoflagellates are thought to be a progenitor to the Metazoans (42). This supports the hypothesis that a choanoflagellate may have engulfed a prokaryote containing the PPT gene, which became integrated into its genome and lost its toxic role over time (32,33). With respect to structural evidence, the teneurins share characteristics of PPTs: the same type II orientation, rearrangement hotspot (RHS) domains and close similarity to the C-terminal domain to the histidine-asparagine-histidine (HNH) bacterial toxin of the glycine-histidine-histidine (GHH) clade (33,43). The GHH domain may be an ancestor of TCAP that lost its toxic role and functioned as an intracellular signaling molecule (33). Additionally, the C-terminal region of the M. brevicollis teneurin protein contains tyrosine-aspartate (YD) repeats characteristic of proteobacteria and most of the extracellular domain is encoded on one large 6,829 base pair exon characteristic of prokaryotic genomes and of HGT (32). Therefore, evidence suggests that the teneurin-TCAP system is ancient as it evolved as a result of a HGT event prior to the emergence of the Metazoa.
Moreover, with respect to the course of evolution of the receptors, evidence demonstrates that Adhesion GPCRs evolved prior to Secretin GPCRs and that Secretin GPCRs are derived from Adhesion GPCRs. Adhesion GPCR genes have been identified in the genome of amphioxus, Branchiostoma floridae, the choanoflagellate, M. brevicollis, and the sea anemone, Nematostella vectensis (18), meaning that these lineages were present prior to the protostome-deuterostome divergence. On the other hand, Secretin GPCRs have not been identified in these species and therefore, receptor lineages of the Secretin superfamily likely expanded and radiated around the time of the bifurcation of protostomes and deuterostomes. Also, Nordström et al. (18) demonstrated Secretin GPCRs evolved from Adhesion GCPRs using phylogenetic analysis. Therefore, evidence that the teneurin-TCAP system arose prior to the emergence of the Metazoa as well as the characterization of Adhesion GPCRs but not Secretin GPCRs prior to the protostome-deuterostome divergence suggests that the teneurin-TCAP system predates members of the Secretin superfamily. We suggest that if the ligands for these receptors underwent a similar course in evolution, the TCAP family may be a putative progenitor to the Secretin superfamily.
In light of the evidence to suggest that the teneurin-TCAP system evolved prior to the emergence of the Metazoa, the previously established relationship that Secretin GPCRs derived from Adhesion GPCRs [(Nordstom et al., 2009); (19)], the evidence that TCAP binds to LPHN, an Adhesion GPCR with a HBD characteristic of Secretin GPCRs (17) and given the sequence similarity that TCAP shares with Secretin superfamily members, CRF, and calcitonin (27), a phylogenetic investigation using TCAP as a putative progenitor of the Secretin superfamily was undertaken. A putative progenitor of the Secretin superfamily of ligands has not been previously established. Sequence analysis of TCAP family members demonstrated a highly conserved peptide and     phylogenetic analysis of the Secretin superfamily in relation to TCAP as a putative progenitor revealed relationships among Secretin superfamily members. Calcitonin and insulin families are sister lineages and they are much more closely related to one another than was previously thought. Also, calcitonin and insulin are sister lineages that form distinct lineages to CRF and secretin families. Therefore, placing TCAP as an ancestor of the Secretin superfamily allowed a novel interpretation of evolutionary relationships among Secretin superfamily members.

Sequence Analysis of TCAP Paralogs and Orthologs
Sequence analysis of both TCAP paralogs and orthologs revealed that this family of peptides is highly conserved. The presence of a conserved "PELAD" motif among TCAP orthologs and paralogs, suggests that it may possess a functional attribute, such as a receptor-binding or activation site (27). Also, some characteristic amino acids are retained throughout orthologs and paralogs. Arginine (R) and lysine (K) residues are retained in some parts of the mature peptide and they are often characteristic of the presence of cleavage sites. Glycine (G) and proline (P) are also highly conserved and these amino acids have a tendency to be retained as their secondary structure can break the α-helical structure of peptides. A peptide system with such a large amount of conservation is indicative of great functional importance that may have been selected for. Therefore, the high sequence conservation among TCAP orthologs and paralogs suggests that this peptide system is evolutionarily ancient and may have been strongly selected for throughout evolutionary time.

Evolutionary Analysis of Pre-propeptides and Mature Peptides of Secretin Superfamily and TCAP Family Members
Phylogenetic analysis of Secretin superfamily pre-propeptides (composed of the signal, cryptic, and mature peptide) and TCAP family pro-peptides (composed of the cryptic and mature peptide) was undertaken in order to elucidate the relationships among these peptides. Analysis revealed that calcitonin, CRF, secretin, and TCAP families formed distinct groups. Despite being chosen to serve as a reference group because it binds to a tyrosine kinase receptor and not a GPCR, insulin formed a group with calcitonin, suggesting that they may be sister lineages (Figure 3). The close relationship between calcitonin and insulin has previously been explored where Wimalawansa (44) suggested that insulin and calcitonin families are closely related. This is supported by phylogenetic analysis of the prepropeptides and suggests that insulin and calcitonin are sister lineages. When the tree was rooted to TCAP (Figure 3), to establish the assumption that TCAP is the ancestor, CRF, calcitonin, and secretin families formed distinct groups. This evolutionary analysis suggests that the secretin family forms a separate clade that is a sister to CRF and calcitonin families, which, in turn, are sisters to one another. This is consistent with what has been observed with respect to Secretin GPCR evolution, where CRF and calcitonin receptors share the greatest amount of sequence similarity among Secretin GPCRs (17). Therefore, it is possible that a similar evolutionary scheme occurred with respect to the ligands. Thus, analysis of Secretin superfamily pre-propeptides with TCAP propeptides suggests that insulin and calcitonin are closely related sister lineages, that calcitonin-insulin and CRF lineages are closely related and that calcitonin-insulin and CRF form a distinct sister lineage to the secretin family. Subsequently, phylogenetic analysis was performed with the mature peptides of Secretin superfamily members and the TCAP family. The analysis of TCAP family mature peptide sequences with calcitonin and insulin mature sequences (Figure 4) demonstrated that insulin A chains were closely related to mature calcitonin peptides. This suggests that the insulin A mature chain is more closely related to the calcitonin family than the insulin B mature chain, which is different from what was previously suggested by Wimalawansa (44). Subsequent analyses involving CRF, calcitonin, insulin, and TCAP mature peptides ( Figure 5) as well as secretin, CRF, calcitonin, insulin, and TCAP mature peptides (Figure 6) confirmed that the insulin A chain was more closely related to the calcitonin family than the insulin B chain. Taken together, insulin and calcitonin are closely related sister groups, which was also observed with the pre-propeptide analysis (Figure 3). Moreover, with respect to relationships among Secretin superfamily members, calcitonininsulin, and CRF families are more closely related to one another than they are to secretin or TCAP, which is supported by the evolutionary scheme of their receptors, which also appear to be very closely related. Finally, secretin forms a sister lineage to a lineage that comprises both calcitonin-insulin and CRF families. This is consistent with what was observed for analysis of the pre-propeptides (Figure 3).
Considering the evidence with respect to the ancestral origin of the teneurin-TCAP system and in light of the findings presented here, it is possible to present two hypotheses for the evolutionary scheme of these peptides. The first suggests that an ancient TCAP-like peptide may have been the ancestor of the Secretin superfamily and that it evolved prior to the emergence of CRF, calcitonin, and secretin families. This is supported by the identification of TCAP in organisms prior to the protostome-deuterostome divergence, where as members of the Secretin superfamily have not been identified this early in evolution (31,32,34). The possibility of a second hypothesis, suggesting that the Secretin superfamily forms a parallel lineage to extant TCAP and that these two lineages evolved from a proto-CRF-calcitonin-secretin-TCAP ancestor that was related to all of these families, cannot be discounted. Due to sequence availability, phylogenetic analysis was performed using extant Secretin superfamily and TCAP sequences. As a result, both of these hypotheses are plausible. Future analysis should be undertaken in order to further investigate whether TCAP is a progenitor of the Secretin superfamily of ligands.

CONCLUSIONS
Taken together, phylogenetic analysis of members of the Secretin superfamily using TCAP as a putative progenitor demonstrated relationships among Secretin superfamily members. First, calcitonin formed a closely related sister lineage to insulin, particularly the insulin A chain with respect to mature peptides, but this was also observed with the pre-propeptides. Also, calcitonin-insulin and CRF families are more closely related to one another than they are to secretin or TCAP, which is supported by the evolutionary scheme of their receptors. Finally, secretin forms a sister lineage to a group that comprises both calcitonin-insulin and CRF. Therefore, given evidence that the teneurin-TCAP system arose as a result of a HGT event prior to the emergence of the Metazoa, as well as the previously established structural similarity of TCAP to calcitonin and CRF, members of the Secretin superfamily, the presented phylogenetic analysis allowed for the elucidation of relationships among members of the Secretin superfamily. To conclude, this is the first time that relationships among this family of peptides were resolved and because a progenitor peptide for the Secretin superfamily has not been elucidated, we present TCAP as a candidate progenitor.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.