Posttranslational Modifications in Conserved Transcription Factors: A Survey of the TALE-Homeodomain Superclass in Human and Mouse

Transcription factors (TFs) guide effector proteins like chromatin-modifying or -remodeling enzymes to distinct sites in the genome and thereby fulfill important early steps in translating the genome’s sequence information into the production of proteins or functional RNAs. TFs of the same family are often highly conserved in evolution, raising the question of how proteins with seemingly similar structure and DNA-binding properties can exert physiologically distinct functions or respond to context-specific extracellular cues. A good example is the TALE superclass of homeodomain-containing proteins. All TALE-homeodomain proteins share a characteristic, 63-amino acid long homeodomain and bind to similar sequence motifs. Yet, they frequently fulfill non-redundant functions even in domains of co-expression and are subject to regulation by different signaling pathways. Here we provide an overview of posttranslational modifications that are associated with murine and human TALE-homeodomain proteins and discuss their possible importance for the biology of these TFs.


INTRODUCTION
TFs recognize specific DNA sequences, often depending on DNA shape or methylation status, to control the local assembly of larger protein complexes that induce the transcriptional activation or repression of nearby genes. Transcription factors (TFs) are thus vital to determining which gene product is produced when, where, in which quantities, and in response to what external signal(s). In human, these multifaceted tasks are performed by an estimated ∼1,600 different TFs (Lambert et al., 2018). Although this seems like an impressive repertoire, TFs use a limited number of DNA binding domain (DBD) types, with most metazoan TFs belonging to the C 2 H 2 zinc-finger-, homeodomain (HD)-, basic helix-loop-helix (bHLH)-, basic leucine zipper-, forkhead-, nuclear hormone receptor-, or high-mobility group (HMG)/SRY-related HMG-box (SOX)-superclasses. DBD-types are highly variable across classes but very similar in TFs belonging to the same class. Evolutionary related TFs often also share extensive sequence similarity outside of the DBD. This raises the conundrum how physiologically distinct functions may be carried out by proteins that possess the same overall structure and, at least in vitro, nearly identical DNA-binding properties.
TFs almost always function as ensembles, consistent with the concept that the composition of the multiprotein complex dictates the affinity and specificity of DNA binding (Slattery et al., 2011;Bridoux et al., 2020). The ability of a TF to interact with DNA or with other proteins depends on the biochemical properties of the amino acids involved in binding, which in turn can be profoundly altered by the attachment of additional chemical moieties in a process known as posttranslational modification (PTM). Consequently, the type of binding partners a TF assembles with, the sequence motif recognized by the complex, and the strength of interaction with this motif are sensitive to PTMs (Filtz et al., 2014;Draime et al., 2018). These are important features for any TF, because the composition of transcriptional multiprotein complexes determines the cellular and physiological context in which the TF acts, while recognition of motif variations can lead to high-or low affinity DNA binding, which in turn may result in dynamic gene expression levels (Crocker et al., 2016). In this minireview, we manually surveyed high-throughput proteomics studies, published in peer-reviewed journals or deposited to open-source platforms, to compile PTMs that were recorded in TALE-HD TFs isolated from various murine and human sources. Comparing these PTMs between paralog and ortholog proteins revealed general principles by which PTMs may shape the activity of individual members of conserved TF protein families.

TALE-HD PROTEINS
Three amino acid loop extension-homeodomain (TALE-HD) TFs are evolutionary highly conserved and found in single-cell eukaryotes (e.g., Mata1/Mata2 in yeast), plants (e.g., KNOX and BELL), and animals (see below; Mukherjee and Bürglin, 2007). The TALE-HD differs from the canonical, 60 amino acid-long HD by the insertion of three extra residues between helix 1 and helix 2 of the HD. This motif, known as the TALE-motif, forms a hydrophobic pocket to mediate protein-protein interactions ( Figure 1A; Bürglin, 1997;Piper et al., 1999;LaRonde-LeBlanc and Wolberger, 2003;Mukherjee and Bürglin, 2007). For this feature, TALE-HD proteins have been classified as "atypical" HD proteins. In animals, they have been grouped into five classes, PBC, MEINOX, TGIF, IRO and MKX, based on the sequence of the HD itself and conserved, class-specific motifs flanking the HD ( Figure 1B). The developmental functions of individual TALE-HD genes and the defects associated with their mutation in animal models or in human diseases have been covered by a series of excellent recent reviews and will therefore not be discussed in detail Blasi et al., 2017;Schulte and Geerts, 2019;Selleri et al., 2019). Instead, we here provide an overview of the different PTMs detected in mouse and human TALE-HD TFs and explore how such PTMs may help to convey functional specificity among these structurally similar proteins.

More Distantly Related TALE-HD Proteins: TGIF-, IRO-and MKX-Classes
Tgif1 and Tgif2 (Transforming growth factor beta (TGF-β)induced factor/TG-interacting factor) are phylogenetically most closely related to the MEINOX class (Mukherjee and Bürglin, 2007). They carry a distinct variation of the TALE-motif, AYP, instead of the PYP found in all other TALE-HD proteins ( Figure 1A) as well as two short sequence motifs C-terminal to the HD ( Figure 1B). TGIF proteins are transcriptional repressors that have been implicated in the regulation of various signaling pathways, most prominently TGF-β-and retinoic acid  Table 1. signaling (Bertolino et al., 1995;Wotton et al., 1999;Shen and Walsh, 2005;Guca et al., 2018). Loss-of-function phenotypes for Tgif1 in mice are strain-dependent and range from no overt defect to holoprosencephaly, a brain malformation that has also been linked to TGIF1 mutations in humans (Kuang et al., 2006;Taniguchi et al., 2012). Constituting another TALE-HD class, the six mammalian Irx genes, taking their names from the Iroquois complex in D. melanogaster, are located in two paralogous clusters in the genome and characterized by a bipartite IRO-box C-terminal of the HD ( Figure 1B; Peters et al., 2000;Mukherjee and Bürglin, 2007). Loss-of-function models in mice were generated for all six Irx genes and established that Irx3, -4 and -5 are important transcriptional regulators in the developing and adult heart, that Irx1 controls lungand tooth development, and that Irx5-and -6 participate in retina development (Bruneau et al., 2001;Costantini et al., 2005;Zhang et al., 2011;Gaborit et al., 2012;Star et al., 2012;Yu et al., 2017). Finally, the single gene Mohawk (Mkx, also known as iroquois homeobox protein-like 1) most closely related to IRX but recognized as separate class, plays a prominent role in tendon development (Mukherjee and Bürglin, 2007;Ito et al., 2010).
In short, members of the same class of TALE-HD proteins share a high degree of sequence similarity, are frequently co-expressed, and functionally cooperate in some physiological contexts but fulfill unique developmental functions in others.

PTMS IN TALE-HD PROTEINS
We manually surveyed 26 high-resolution and/or quantitative mass-spectrometry analyses, as well as data deposited in the open-source platform PhosphoSitePlus R to compile PTMs that had been detected in mouse or human TALE-HD proteins ( Table 1). Although this information is freely available in the supporting information of the respective publications, it had not been systematically assessed nor had the data been compared among studies or between protein groups. We limited our search to the three PTMs that were most frequently detected in these studies: phosphorylation, lysine-ubiquitination and arginine-methylation. This search identified a total of 187 distinct phosphorylation sites, 11 ubiquitinated and 3 methylated residues. Many of these PTMs were detected in various physiological contexts and across species, suggesting that common regulatory mechanisms apply. Particularly arginine-methylation and lysine-ubiquitination occurred almost exclusively at amino acids that were highly conserved among paralogs, indicating that significant evolutionary pressure may act on these residues (Figures 1C,E,F). The amino acid arginine forms more hydrogen bonds with protein or DNA than any other amino acid, with particularly strong bonds formed with guanine bases and the DNA phosphate backbone (Luscombe et al., 2001). Arginine residues are therefore important to stabilize the intra-and intermolecular interaction of amino acids in proteins and multiprotein complexes as well as the contact of proteins to DNA (Luscombe et al., 2001;Bedford and Clarke, 2009;Lorton and Shechter, 2019). Consequently, methylation of arginine residues in TFs can profoundly alter their function. In fact, although the significance of argininemethylation in hPBX2 and hMEIS1 is still unknown, methylation of R174 in mMEIS2 controls nucleo-cytoplasmic translocation (Kolb et al., 2018).
In ubiquitination, the 76-amino acid protein ubiquitin is covalently attached to lysine residues of protein substrates. Ubiquitination generates conjugates that widely differ in structure, size, composition, and function (Pickart, 2001). The many ways by which lysine-ubiquitination impacts on gene expression include modification of histone tails and the subsequent change in chromatin structure and the ubiquitinguided partial processing or full degradation of TFs (Rape, 2018). The presence of several, highly conserved ubiquitination sites in TALE-HD proteins argues for important regulatory roles, although it is presently unexplored what type(s) of ubiquitin modification TALE-HD proteins carry (e.g., monomeric, polymeric, linear, branched, carrying additional PTMs or not), whether ubiquitin-conjugation targets TALE-HD proteins for degradation, and what the cellular consequences of TALE-HD protein ubiquitination are.
Compared to arginine-methylation and lysine-ubiquitination, protein phosphorylation emerges as more wide-spread and diverse type of PTM in TALE-HD proteins. Protein phosphorylation, the covalent attachment of phosphate groups on serine, threonine, or tyrosine residues, acts within milliseconds to seconds to control protein function by primarily two mechanisms: it locally changes the electrochemical properties of a protein and by this its conformation, and it creates docking sites for intermolecular protein interactions, which in turn can propagate cellular signals or create recognition sites for other post-translationally modifying enzymes that catalyze the deposition of further PTMs nearby (Filtz et al., 2014). Phosphorylation of TFs can thereby increase or decrease protein stability, control nuclear import or export, alter the secondary structure of the TF to expose or hide its DBD, and modify the DBD's affinity to distinct sequences in the DNA resulting in high-affinity or low-affinity binding (Filtz et al., 2014). In TALE-HD proteins, phosphosites often cluster together, frequently in regions anterior or posterior of the HD (Figures 1C-F). For instance, several studies identified phosphorylated serine, threonine, and tyrosine residues in PBX family proteins just C-terminal to the TALE HD ( Figure 1C).

PTMS, A WAY TO GENERATE FUNCTIONAL DIVERSITY?
Although the physiological relevance of these phosphorylation events and the signaling pathways that induce them remain to be elucidated, it is worth pointing out that none of these phosphosites are conserved in MEIS3, PREP1, or PREP2 ( Figure  1E). Similarly, most of the phosphorylated amino acids that were detected in TGIF1 are not conserved in TGIF2, and vice versa ( Figure 1F). Whether or not TALE-HD paralogous proteins are subject to regulation by shared kinase pathways thus appears to be dictated by the substitution of few key residues. It should be pointed out, however, that phosphorylation is a dynamic process in which phosphorylation and dephosphorylation may alternate in rather rapid cycles (Gelens and Saurin, 2018). Phosphoproteomic data hence only reflect a snapshot of a transient phosphorylation state. Lack of evidence in literature for a specific phosphorylation event can thus very well just reflect the inability of detection at a specific moment and in that specific cellular context.
Taken together, we here compiled a broad collection of PTMs in TALE-HD proteins that had been identified in unbiased, high-resolution mass-spectrometry analyses ( Table 1). Few of these PTMs have been assigned a physiological function. Yet, by taking the evolutionary conservation of modification sites into account we identified both class-specific and paralog-specific PTMs. From comparing these, concepts emerge about how the combinatorial use of such PTMs may generate functional diversity from evolutionarily conserved protein structures. Specifically, we propose that the vast repertoire of PTMs, shared or not, in paralogous and orthologous TALE-HD proteins, forms the structural backbone by which individual proteins can acquire the ability to respond to context-specific extracellular signals and exert physiologically diverse functions. Although explored here only by the example of the TALE-HD superclass, similar principles may very well also apply to other evolutionarily conserved TFs. Assays based on mutational approaches now need to be developed to test these PTMs alone and in combination for their functionality and physiological relevance. Ultimately, such information can pave the way for future studies, help unravel disease processes and facilitate rational drug design.