Fairy “tails”: flexibility and function of intrinsically disordered extensions in the photosynthetic world

Intrinsically Disordered Proteins (IDPs), or protein fragments also called Intrinsically Disordered Regions (IDRs), display high flexibility as the result of their amino acid composition. They can adopt multiple roles. In globular proteins, IDRs are usually found as loops and linkers between secondary structure elements. However, not all disordered fragments are loops: some proteins bear an intrinsically disordered extension at their C- or N-terminus, and this flexibility can affect the protein as a whole. In this review, we focus on the disordered N- and C-terminal extensions of globular proteins from photosynthetic organisms. Using the examples of the A2B2-GAPDH and the α Rubisco activase isoform, we show that intrinsically disordered extensions can help regulate their “host” protein in response to changes in light, thereby participating in photosynthesis regulation. As IDPs are famous for their large number of protein partners, we used the examples of the NAC, bZIP, TCP, and GRAS transcription factor families to illustrate the fact that intrinsically disordered extremities can allow a protein to have an increased number of partners, which directly affects its regulation. Finally, for proteins from the cryptochrome light receptor family, we describe how a new role for the photolyase proteins may emerge by the addition of an intrinsically disordered extension, while still allowing the protein to absorb blue light. This review has highlighted the diverse repercussions of the disordered extension on the regulation and function of their host protein and outlined possible future research avenues.


Introduction
Proteins occupy a central position in the architecture and functioning of living matter. A major objective of protein biochemistry consists in explaining the physiological functions of these molecules by means of structural studies, also known as the "structure-function" relationship. Among others, X-ray crystallography is a powerful tool to solve macromolecular three-dimensional 3D structures. However, some proteins cannot be crystallized because they are fully disordered or possess disordered parts that are missing in the electron density map of the crystals. In the 1990s, Sedzik and Kirschner (1992) attempted to crystallize the myelin basic protein (MBP), the predominant extrinsic protein in both central and nervous system myelins. After several attempts, the authors concluded that MBP adopts a random coil conformation and that as long as its flexibility was not suppressed, it was not possible to obtain crystals (Sedzik and Kirschner, 1992). MBP was one of the first examples of many other "un-crystallizable" proteins. These proteins, originally named Intrinsically Unstructured Proteins (IUPs), are nowadays termed Intrinsically Disordered Proteins (IDPs) (Wright and Dyson, 1999;Dunker et al., 2001Dunker et al., , 2008a. In 1998, Romero et al. showed that 15 000 proteins from the Swiss-Prot database contain one or more Intrinsically disordered regions (IDRs) comprising more than 40 amino acid residues (Romero et al., 1998). It was shown later that despite their lack of well-defined 3D structure, many partially or completely disordered proteins are functional (Wright and Dyson, 1999;Dunker et al., 2001Dunker et al., , 2008aTompa, 2002). In the late 1990s, studies of disordered yet functional proteins emerged as a new research field, extending the traditional paradigm to include a more comprehensive view of protein structure-function (Wright and Dyson, 1999;Dunker et al., 2001;Tompa, 2002;Dunker et al., 2008a,b). In the past, different models have been proposed to explain protein functioning, and protein flexibility has appeared as a key point (Fersht, 1998). Among these models, the "inducedfit" model (Koshland et al., 1966) introduced the idea that protein conformational changes could be triggered upon ligand binding. These notions were applied to IDPs, and many of them were shown to undergo an "induced-folding" upon binding to their partners (Dunker et al., 2002). Short motifs called MoREs (Molecular Recognition Elements) are often involved in the interaction, involving disorder to order transitions (Fuxreiter et al., 2004(Fuxreiter et al., , 2007Oldfield et al., 2005;Mohan et al., 2006;Vacic et al., 2007;Hazy and Tompa, 2009). However, the idea that preformed binding elements exist before the binding, and even in the absence of a partner, led to the "conformational selection" model. In some cases, the IDP is not fully structured in the presence of its partner and the term "fuzziness" was coined by Fuxreiter and Tompa to describe such complexes (Tompa and Fuxreiter, 2008;Hazy and Tompa, 2009;Fuxreiter and Tompa, 2012). The flexibility of IDPs increases the chance of their polypeptide chains adopting the right conformations in the presence of their partners. Furthermore, the high ratio of hydrophilic residues in IDPs facilitates initial contacts with their partners. The interactions are also stronger with IDPs: their lack of structure or absence of rigidity increasing association constants (Dunker et al., 2001;Meszaros et al., 2007;Chouard, 2011). The ability of the IDPs to adopt multiple conformations allows the same region to adapt to different binding sites in many "induced-fits" and thus to have multiple partners (Uversky et al., 2000;Tompa, 2002;Uversky, 2002Uversky, , 2011Meszaros et al., 2007;Carmo-Silva and Salvucci, 2013). The discovery of IDPs and their singular lack of definite structure brought nuances to the "structure-function" dogma, showing that the same structure, or lack of one, could have multiple partners and thus multiple functions (Wright and Dyson, 1999;Dunker et al., 2001;Tompa, 2002;Uversky, 2002Uversky, , 2011Meszaros et al., 2007;Sun et al., 2013). In this regard, IDPs could be seen as the "master keys" of the protein-protein interaction network.
The ability of IDPs to bind to multiple partners makes them naturally good regulators, as they can modulate the activity of several proteins in a coordinated way (Dunker et al., 2001(Dunker et al., , 2002Gavin et al., 2002Gavin et al., , 2006Haynes et al., 2006;Patil and Nakamura, 2006;Mittag et al., 2010;Uversky, 2010;Pancsa and Tompa, 2012). Therefore, multiple areas of the cellular response can be affected by a single signal allowing IDPs to play a major role in regulatory pathways (Tompa, 2002;Dunker et al., 2005;Haynes et al., 2006;Uversky, 2010;Pancsa and Tompa, 2012). The flexibility of IDPs can also be modified by the cellular environment or by posttranslational modifications. IDPs and IDRs are often targets of different post-translational modifications (the most common being phosphorylation, methylation and ubiquitination) which can radically affect their affinity for their partners and their stability, thus multiplying the possibilities for a fine-tuned regulation (Tompa, 2002;Dunker et al., 2005;Haynes et al., 2006;Uversky, 2010;Pancsa and Tompa, 2012). These particularities make IDPs the hubs in a vast net of protein-protein interactions (Gavin et al., 2002(Gavin et al., , 2006. They carry out basic functions such as regulation of metabolic pathways, transcription, translation or cellular signal transduction; they can act as scavengers of toxic molecules and they play a key role in the assembly of multiprotein complexes (Uversky, 2011). Moreover, their roles in several diseases of major medical interest, such as cancer (Castillo et al., 2014;Saha et al., 2014;Xue et al., 2014), Alzheimer's disease (Uversky, 2009;Kovacech and Novak, 2010;Salminen et al., 2011;Karagoz and Rudiger, 2015) prion disease (Tompa, 2009;Uversky, 2009;Breydo and Uversky, 2011) or Parkinson's disease (Uversky, 2009;Hazy et al., 2011;Breydo et al., 2012;Alderson and Markley, 2013) have been extensively studied (Babu et al., 2011).
While the discovery and characterization of IDPs and IDRs is a rapidly growing, and an increasingly recognized, area of protein science, (Tompa, 2002;Uversky, 2010;Uversky and Dunker, 2010;Chouard, 2011), little information is available photosynthetic organisms, where IDPs have been described as central players in many responses such as biotic and abiotic stress, development, metabolism regulation, or adaptation to oxic atmosphere (Kragelund et al., 2012;Pancsa and Tompa, 2012;Contreras-Moreira, 2012, 2013;Pietrosemoli et al., 2013;Sun et al., 2013;Panda and Ghosh, 2014). Published data mainly concern Arabidopsis thaliana, a higher plant model with one of the best-annotated sequenced genomes (Arabidopsis Genome Initiative, 2000). Yet, the recent analysis of 12 plant genomes revealed that the occurrence of disorder in plants is similar to the one found in other eukaryotes (Bracken et al., 1999;Contreras-Moreira, 2012, 2013;Sun et al., 2013). An in silico analysis of plant nuclear proteomes suggested a higher disorder in the internal part of nuclear-encoded plant proteins rather than at their extremities, in contrast to the chloroplastand mitochondrion-encoded proteomes (Yruela and Contreras-Moreira, 2012). This is also pointed by studies on prokaryotes showing that the IDRs may be more frequent at the extremities of the proteins that act as "molecular shields" such as chaperones (Krisko et al., 2010;Chakrabortee et al., 2012).
In this review, we describe several globular proteins with N-or C-terminal IDR extensions in photsynthetic organisms, as opposed to entirely disordered proteins or globular proteins containing one or more IDRs in the middle of their sequences. The aim of this work is not to give an exhaustive list of the roles undertaken by such disordered extensions, as this has recently been reviewed (Uversky, 2013). Instead, we focus on globular proteins or domains that acquired their disordered tails during evolution, using examples from photosynthetic organisms. The addition of a disordered extension to a globular protein created new regulation opportunities, making these proteins responsive to environmental factors through selfregulation, post-translational modifications or new proteinprotein interactions. We illustrate the impact of disordered extensions by describing proteins involved in photosynthetic metabolism and regulation of gene expression (Table 1).
Frontiers in Molecular Biosciences | www.frontiersin.org one function in a single polypeptide chain and were classified as multifunctional proteins (Kirschner and Bisswanger, 1976). In many cases, the dual function resulted from the fusion of two genes that initially encoded different proteins. Later on, the term "moonlighting" (Jeffery, 1999) categorized proteins that have different functions. The glyceraldehyde-3-phosphate dehydrogenase (GAPDH) is a well-known moonlighting enzyme and has at least ten distinct, confirmed non-enzymatic activities apart from its enzymatic function (Sirover, 1999(Sirover, , 2011Hildebrandt et al., 2015). The GAPDHs constitute a large and diverse family of dehydrogenases universally represented in living organisms. They catalyze the reductive dephosphorylation of 1, 3-bisphosphoglyceric acid (BPGA) producing glyceraldehyde-3-phosphate (GAP) and inorganic phosphate using NAD(P)H as a cofactor (Trost et al., 2006). Glycolytic GAPDHs(also named GapC) are NAD-specific and mainly found in the cytosol, but in land plants a second type of glycolytic GAPDH named GapCp is targeted to the chloroplast (Petersen et al., 2003;Marri et al., 2005;Munoz-Bertomeu et al., 2010). Both GapC and GapCp are NAD-specific and form homotetramers in vivo that are not subject to complex regulatory mechanisms. However, in photosynthetic organisms, another GAPDH catalyzes the unique reductive step of the Benson-Calvin cycle is present and uses both NADH and NADPH with a marked preference for NADPH (Falini et al., 2003). Like all GAPDHs, the NADPH-dependent GAPDH is made up of two functional domains, one corresponding to the catalytic domain (residues 148-313 in spinach GAPDH) and the other one being the cofactor-binding domain, or Rossman fold (residues 1-147 and 313-334, respectively). The latter contains a flexible, arginine-rich region called the S-loop (Fermani et al., 2001). In higher plants, this GAPDH exists in different forms such as a heterotetramer made up of two GapA and two GapB subunits (A 2 B 2 ), a homotetramer made up of four GapA subunits, and as a hexadecamer (A 8 B 8 ) (Baalmann et al., 1994(Baalmann et al., , 1995Scheibe et al., 2002;Howard et al., 2011a,b). The GapB subunit is similar to the GapA subunit but bears a Cterminal extension which has a regulatory function (Cerff, 1979;Brinkmann et al., 1989;Baalmann et al., 1996;Li and Anderson, 1997;Scagliarini et al., 1998;Sparla et al., 2002) (Figure 1A). This subunit is thought to derive from a gene duplication  (Robbens et al., 2007) subunit compared to the GapA and CP12 proteins. The C-terminal extension of GapB is homologous to the C-terminal of CP12 and present two regulatory cysteine residues. (B) Schematic representation of the autonomous redox regulation of the A 2 B 2 -GAPDH. When oxidized, the C-terminal extension of the GapB subunit presents a disulfide bridge, which places the C-terminal amino acids inside the active site of the enzyme, resulting in its inhibition. The disulfide bridge can be reduced by the thioredoxin f (TRX), and the enzyme becomes active. (C) Schematic representation of the autonomous redox regulation of the A 4 -GAPDH by CP12. When oxidized, the C-terminal part of the CP12 protein presents a disulfide bridge, which places its C-terminal amino acids inside the active site of GAPDH, resulting in its inhibition. The disulfide bridge can be reduced by the thoioredoxin f (TRX) or DTT and the enzyme becomes active.
CP12 is a protein of about 80 amino acid residues that was originally described by Pohlmeyer et al. (1996) and has been found in most photosynthetic organisms (Groben et al., 2010;Gontero and Maberly, 2012;Gontero and Salvucci, 2014;Lopez-Calcagno et al., 2014). The CP12 proteins show high primary sequence variability, in particular at the N-terminus. However, they share some remarkable common features.CP12 proteins have an amino acid composition poor in orderpromoting residues although they contain cysteine residues (Groben et al., 2010), and behave abnormally under gel electrophoresis and size exclusion chromatographies (Gontero and Maberly, 2012) suggesting that they are IDPs. Moreover, recent data from fluorescence correlation spectroscopy (FCS) show that the hydrodynamic radius of CP12 from the green alga Chlamydomonas reinhardtii is large compared to that expected for globular proteins of this molecular mass (Moparthi et al., 2014). The cysteine residues are involved in the formation of disulfide bridges and peptide loops and are found as pairs at the C-terminus and/or at the N-terminus. When oxidized, CP12 proteins may present α helices maintained by the N-terminal disulfide bridge (Graciet et al., 2003a;Gardebien et al., 2006). The algal CP12 is a key component of a supra-molecular complex controlling the activity of the Benson-Calvin cycle by regrouping several key enzymes of the cycle, including GAPDH, phosphoribulokinase (PRK) and fructose 1,6-bisphosphate aldolase (FBP aldolase). Within the ternary GAPDH/CP12/PRK complex, both enzymes are strongly inhibited Lebreton et al., 1997;Graciet et al., 2003a;Erales et al., 2008;Marri et al., 2008). CP12 forms a fuzzy complex with the green alga C. reinhardtii A 4 -GAPDH, as revealed by EPR studies (Mileo et al., 2013) (see the Minireview by Lebreton et al. in this Topic Research). The ternary complex has also been found in the cyanobacterium Synechococcus elongatus and in the higher plant A. thaliana. The A 4 -GAPDH-CP12 sub-complex from these organisms have been crystallized but in both complexes, the first 50 amino acid residues were not visible in the density map, consistent with a high flexibility of this region in the crystal (Avilan et al., 2000;Matsumura et al., 2011;Fermani et al., 2012). Recently, it was also observed using FCS and FRET (Förster Resonance Energy Transfer) that the algal CP12 flexibility is not abolished by its interactions with either GAPDH or with PRK (Moparthi et al., 2014(Moparthi et al., , 2015. In the case of the GapB subunit of GAPDH from higher plants, the C-terminal end of the protein (ca 30 residues)is strongly homologous to the C-terminal part of CP12 (Pohlmeyer et al., 1996;Trost et al., 2006;Groben et al., 2010) (Figure 1A).The regulation of the A. thaliana A 2 B 2 -GAPDH activity by the C-terminal extension of the GapB subunit is now very well understood: upon oxidation (which happens during the daynight transition) the two cysteine residues of the CP12-like tail form a disulfide bridge that places the C-terminal ultimate glutamate residue (E362) inside the active site (stabilized by the electrostatic interactions with an arginine residue R77 involved in the NADP cofactor binding). Consequently, the NADPH cofactor is not able to enter the catalytic site and thus NAPDHdependant A 2 B 2 -GAPDH activity is inhibited Fermani et al., 2007) (Figure 1B). In contrast, during the night-day transition, the disulfide bridge maintaining the Cterminal extension into the active site is reduced by thioredoxin f, thereby releasing the CP12-like tail and resulting in A 2 B 2 -GAPDH activity ( Figure 1B) (Sparla et al., 2002;Trost et al., 2006;Fermani et al., 2007). This mechanism is very similar to the one observed in C. reinhardtii between the homotetrameric A 4 -GAPDH and free CP12, where the penultimate glutamate (E79) of the CP12 interacts with the arginine residue R82 of A 4 -GAPDH ( Figure 1C) (Trost et al., 2006;Erales et al., 2011;Avilan et al., 2012). The reduction of the GAPDH-CP12 by dithiothreitol (DTT) in the alga results in a more active NADPH-GAPDH as a consequence of the rupture of disulfide bridges on CP12. Of interest, DTT, in vitro mimicks thioredoxins in vivo and it has been shown that CP12 can be reduced by thioredoxin f in the light (Marri et al., 2009).
In the higher plant, A. thaliana and in the green alga, C. reinhardtii, the stoichiometry of the oxidized A 4 -GAPDH-CP12 sub-complex is two CP12 molecules for one A 4 -GAPDH (Marri et al., 2008;Kaaki et al., 2013), while four CP12 molecules interact with each GAPDH tetramer in the cyanobacterium, S. elongatus (Matsumura et al., 2011). When interacting with CP12, A 4 -GAPDH activity decreased by two-fold (in the case of the C. reinhardtii proteins, the catalytic constant k cat of the free enzyme was 430 ± 17 s −1 , and became 251 ± 9 s −1 in the presence of CP12), suggesting that only two of the four active sites were blocked (Graciet et al., 2003b) (Figure 1C). The same observation was made for the A. thaliana A 2 B 2 -GAPDH: upon oxidation, the A 2 B 2 -GAPDH activity decreased by 2-fold (its k cat changed from 59 ± 19 s −1 to 27 ± 10 s −1 ) although its catalytic constant in a reduced state (k cat = 59 ± 19 s −1 ) was comparable to the one of the free A 4 -GAPDH (k cat = 61 ± 4 s −1 ) (Sparla et al., 2004). The regulation of the plant A 2 B 2 -GAPDH and of the algal A 4 -GAPDH-CP12 complex is thus very similar. With the addition of the C-terminal extension within the GapB subunit, the A 2 B 2 -GAPDH has become autonomously redox-regulated, a property that was previously provided through interaction with CP12.
Although the appearance of the GapB subunit represents an important step in the evolution of the redox control of the Calvin-Benson cycle enzymes, this new autonomous regulation co-exists with the CP12-based one in higher plants (Scheibe et al., 2002), and a A 2 B 2 -GAPDH-PRK complex entirely devoid of CP12 has yet to be identified. The presence of CP12 is likely to be required for the assembly of larger supramolecular complex, and in C. reinhardtii, A 4 -GAPDH-CP12-PRK was shown to interact with the aldolase (Erales et al., 2008). In this regard, one may wonder how this system will continue to evolve, and if more enzymes of the Benson-Calvin cycle will also acquire similar CP12-like disordered extensions, possibly meaning that the CP12 protein will become redundant. However, CP12 seems to be a part of numerous other processes in photosynthetic organisms (Singh et al., 2008;Howard et al., 2011a,c;Stanley et al., 2013), so it is unlikely to disappear completely from higher plants genomes in the future.

Rubisco Activase
Ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) is the enzyme that catalyzes the formation of two molecules of phosphoglyceric acid using one molecule of ribulose 1,5bisphosphate (RuBP) and one of carbon dioxide (CO 2 ). As the primary CO 2 acceptor of most photoautotrophic organisms, Rubisco can represent up to half of soluble proteins in higher plants, and is believed to be the most abundant protein on Earth (Ellis, 1979;Losh et al., 2013;Raven, 2013). Rubisco from most photosynthetic organisms, including plants and cyanobacteria, is a very large protein (550 kDa), composed of large (L, 52 kDa) and small (S, 12 kDa) subunits arranged as a L 8 S 8 hexadecamer. For this enzyme to be active, a lysine residue inside the Rubisco active site (K201 in Nicotiana tobacum) must be carbamylated and bind a Mg 2+ ion (Lorimer et al., 1976;Andersson and Backlund, 2008). The addition of the non-catalytic CO 2 molecule to the active site is a spontaneous process, but the presence of RuBP or other sugar-phosphate at the active site decreases the carbamylation efficiency of Rubisco and thus its activity (Lorimer et al., 1976;Cleland et al., 1998). The Rubisco activases (RCAs) exhibit ATPase activity and were first characterized for their ability to promote the carbamylation of such RuBP-inhibited Rubisco (Portis et al., 1986). With time, it became clear that the RCAs allowed the CO 2 to enter the active site of Rubisco by removing the hindering RuBP or its analog, carboxyarabinitol bisphosphate (CABP) (Figure 2A) (Portis et al., 2008). The presence of RCAs allows Rubisco to function at its maximal capacity in sub-optimal CO 2 concentration that would normally not permit carbamylation in vivo (Portis et al., 1986). In higher plants, RCAs, as expected, are mostly present in the parts of plants involved in photosynthesis (Watillon et al., 1993;Liu et al., 1996), and their expression follows a daily cycle that is regulated by external factors like light and temperature (Martino-Catt and Ort, 1992;Watillon et al., 1993;Liu et al., 1996;To et al., 1999).
In most organisms from the green lineage, two isoforms of RCA are found: an α isoform of 45-46 kDa and a β isoform of 41-43 kDa (Werneke et al., 1989;Rundle and Zielinski, 1991;To et al., 1999;Gontero and Salvucci, 2014). The only difference between the two RCA isoforms is the presence of a short Cterminal extension (ca 30 amino acid residues depending on the species) on the α isoform ( Figure 2B). Both the α and β isoforms were found in Arabidopsis thaliana, spinach, and rice, although only one RCA gene is present (Werneke et al., 1988;To et al., 1999); in these species, the presence of the two RCA isoforms is the result of alternate splicing (Werneke et al., 1989;Rundle and Zielinski, 1991;To et al., 1999). On the other hand, other species like barley, cotton, maize and tobacco have multiple RCA genes (Rundle and Zielinski, 1991;Qian and Rodermel, 1993;Salvucci et al., 2003;Yin et al., 2010). In most cases, these organisms have separate genes coding for α and β RCAs without alternative splicing (Rundle and Zielinski, 1991), although all the genes identified in tobacco and cucumber appear to only encode the β isoform (Portis, 2003). To the best of our knowledge, the C-terminus of the α and β RCA isoforms were never tested for intrinsic disorder. Using several disorder predictors, including MeDor (Lieutaud et al., 2008) and MFDp2 (Mizianty et al., 2014), we were able to determine that the end of both the Cterminal part of the α and β RCAs (ca 50 residues for the α isoform and 20 residues for the β isoform) seem to be intrinsically disordered, including the entire C-terminal extension of the α RCA (Figure 3). The most remarkable features of this disordered tail are the two cysteine residues (C392 and C411 in the A. thaliana protein), that are highly conserved among the α RCA isoforms (Zhang and Portis, 1999).
The crystal structure of the tobacco β RCA was recently solved at 3 Å , showing that RCA proteins are functional doughnut-shaped hexamers displaying an AAA + fold, as was predicted using other AAA + proteins (ATPases involved in a multitude of processes, Neuwald et al., 1999as templates Portis, 2003. Interestingly, the last 23 residues of the protein were absent from the structure, indicating that this part of the molecule is very flexible and can adopt different conformations. Moreover, the substrate recognition site of the RCA from the creosote bush, Larrea tridentata, was solved at the atomic level (Henderson et al., 2011). Unfortunately, the structural studies were performed only on the β RCA isoform. As the α RCA core is identical, its structure should not be different from the β RCA, but the α RCA was shown to form functional α n β n heteromers rather than α n homomers (Crafts-Brandner et al., 1997;Zhang et al., 2001). In the light of these new structural data, we can suppose that α RCA can form heterohexamers α 3 β 3 ( Figure 2C). These structural data also show the presence of three loops containing hydrophilic amino acid residues lining the surface of a central pore. Sitedirected mutations introduced in this part of the proteins severely diminished the Rubisco activation and the ATP hydrolysis by the RCA proteins , confirming that this region is implicated in the binding of ATP (Salvucci et al., 1993;Li et al., 2006). Based on this information, the model that has been proposed for RCA interaction with Rubisco includes one face of the flat hexamer interacting with the surface of the Rubisco, while an exposed loop of the Rubisco protein could fit into the central hole. Minor conformational changes of this Rubisco loop, allowed by the ATP hydrolysis, would then be transmitted to Rubisco allowing the inhibiting RuBP secondary structure elements and the HCA plot, are shown above and below the amino acid sequence, respectively. Arrows below the HCA plot correspond to regions of predicted disorder (Lieutaud et al., 2008).
Frontiers in Molecular Biosciences | www.frontiersin.org to be released and Rubisco to be carbamylated .
The activity of the α and β RCAs is classically described to be dependent on the ATP:ADP ratio (Zhang and Portis, 1999;Carmo-Silva and Salvucci, 2013), and is extremely sensitive to high temperatures (Portis, 2003;Salvucci, 2004). Moreover, the activity of the α RCA is regulated by light (Mächler and Nösberger, 1980;Perchorowicz et al., 1981). This observation is linked to the action of thioredoxin f on the two cysteine residues present on its C-terminal extension (Shen and Ogren, 1992;Zhang and Portis, 1999;Zhang et al., 2001Zhang et al., , 2002Portis, 2003;Wang and Portis, 2006).A site-directed mutagenesis study (Shen and Ogren, 1992) showed that the substitution of only one of the two cysteine residues was enough to abolish the light regulation of α RCA, implicating the involvement of a disulfide bridge. Several studies showed that the mechanism of inhibition involves the blocking of the ATP-binding region by the C-terminal extension upon oxidation. This self-inhibition would be stabilized by strong electrostatic forces between the negatively-charged tail and the positively charged nucleotide site (Shen and Ogren, 1992;Zhang and Portis, 1999;Zhang et al., 2001Zhang et al., , 2002Wang and Portis, 2006;Carmo-Silva and Salvucci, 2013) (Figure 2C). It was also observed that the β RCA, although devoid of regulatory cysteine residues, could be lightregulated in the presence of the α isoform (Zhang and Portis, 1999;Zhang et al., 2001). In the hypothesis that RCAs form α 3 β 3 heterohexamers, we can assume that the combined bulk of C-terminal extensions efficiently inhibit the whole complex in dark conditions ( Figure 2C). The RCA activity can be restored consequently by the reduction of the C-terminal disulfide bridge by the thioredoxin f, which occurs upon dark-light transitions (Carmo-Silva and Salvucci, 2013). In this case, the acquisition of a C-terminal tail, originally by alternate splicing, has allowed the RCA protein to fine-tune the activity of Rubisco in function of the light availability in addition to the energetic state of the cell.
In other photosynthetic organisms, the Rubisco activase system is different or works in a different way. In β-cyanobacteria, the RCA protein has the same main domains as plant RCAs, but lacks the N-terminal domain necessary for Rubisco activation found in plants and green algae (Van De Loo and Salvucci, 1996;Li et al., 1999;Stotz et al., 2011;Gontero and Salvucci, 2014;Mueller-Cajar et al., 2014). This could explain why no Rubisco activation has been observed using cyanobacterial RCAs (Li et al., 1999;Pearce, 2006). The latter also possess a very long (180 residues) intrinsically disordered C-terminal extension that seems to target the protein to the carboxysomes (Zarzycki et al., 2013) (Figure 2B). Organisms from the red lineage (αproteobacteria, rhodophyta, heterokontophyta, etc.) do not have exactly the same Rubisco as the green lineage, and the socalled "Red Rubisco" has a slightly longer large subunit. These organisms do not have RCA genes, but the same Rubisco activase function is carried out by another protein, CbbX (Pearce, 2006;Gontero and Salvucci, 2014) (Figure 2B). The crystal structure of CbbX has recently been solved, showing that this protein is organized in hexamers arranged in a very comparable manner to green RCAs . It was also suggested that CbbX mechanisms are based on the same principles as the one of RCA, with the C-terminus of the large Rubisco subunit inserted into the central hole of CbbX . It should be noted that CbbX seems to have an IDR at its C-terminus, but its implications in CbbX activity is yet to be studied.
Rubisco activase is not the only "friendly" protein involved in the regulation of Rubisco, since other proteins are needed during its assembly and folding, including the cpn60 chaperone, which also has a disordered C-terminal tail (Goloubinoff et al., 1989;Cloney et al., 1992;Libich et al., 2013).

Three Transcription Factor Families (NAC, bZIP and TCP)
Disordered regions are ideal for proteins coordinating regulatory events and as such, transcription factors participating in regulation and signaling functions are enriched in IDRs.
The NAC family (named after No Apical Meristem, ATAF, Cup-Shaped Cotyledon) is one of the largest families of plantspecific transcription factors (Ooka et al., 2003;Olsen et al., 2005;Rushton et al., 2008;Sun et al., 2013). These family members are involved in a very large variety of processes, including plant development (Olsen et al., 2005), biotic and abiotic stress responses (Jensen et al., 2010b; and leaf senescence (Kjaersgaard et al., 2011). The NAC transcription factors usually contain two domains: the N-terminal NAC domain and the C-terminal extremity domain ( Figure 4A). The NAC domain is mainly conserved and well-ordered, displaying a typical structure comprising α helices flanking one β strand (Ernst et al., 2004). This domain binds the consensus DNA sequence CGT(GA) (Olsen et al., 2005). The C-terminal domain of the NAC proteins is highly variable within the family; however, some motifs in the Cterminus may display a sub-family-specific conservation (Jensen et al., 2010a). The C-terminal domains composition reveals a very high percentage of hydrophilic (Asp, Glu, Ser, Thr) and proline (Pro) residues, whereas the proportion of hydrophobic and aromatic residues is very low (Olsen et al., 2005;Jensen et al., 2010a). These specificities are typical of IDRs, and the C-terminal domain of some NAC proteins was experimentally characterized as an IDR (Jensen et al., 2010a,b). Despite this IDR feature, some hydrophobic and/or aromatic residues are present in this domain; interestingly, these amino acid residues are often conserved among a subfamily (Jensen et al., 2010a,b;Kjaersgaard et al., 2011). The IDR C-terminal domains of the NAC proteins are predicted to contain MoREs that are conserved in sub-families (Jensen et al., 2010a). It has been experimentally confirmed that these particular residues are very important to the specific function of each sub-group in the NAC family, and are essential to activation mechanisms often involving many different partners (Ooka et al., 2003;Ernst et al., 2004;Taoka et al., 2004;Olsen et al., 2005;Ko et al., 2007;Jensen et al., 2010a) ( Figure 4B).
The bZIP (basic Leucine Zipper) transcription factors family is ubiquitous and is one of the largest families of transcription factors in eukaryotes. bZIP transcription factors take part in a multitude of regulatory pathways such as development, metabolism, circadian rhythm and response to stress (Sun et al., 2013). The bZIP proteins are composed of two domains: a C-terminal bZIP domain and a N-terminal activation domain ( Figure 4A). The C-terminal bZIP domain gives its name to the family and displays large patches of basic residues and leucine zipper motifs (Ellenberger et al., 1992;Vinson et al., 1993). The leucine zipper regions are organized in α helices and are responsible for the dimerization of the proteins through the formation of a coiled-coil structure (Vinson et al., 1993;Yoon et al., 2006), while the basic regions bind to the DNA molecule (Ellenberger et al., 1992). Interestingly, the basic regions have been described either as fully ordered, very flexible or intrinsically disordered depending on the protein (Bracken et al., 1999;Podust et al., 2001;Moreau et al., 2004;Yoon et al., 2006). When bound to DNA, the basic regions have however been observed as α helices, suggesting that the interaction triggers folding in response to a specific DNA motif (Hollenbeck et al., 2002), illustrating once more the disorder to order transition (inducedfit). The N-terminal regions of bZIP proteins act as regulators (Ang et al., 1998;Sun et al., 2013), and are mostly intrinsically disordered (Campbell et al., 2000;Moreau et al., 2004;Yoon et al., 2006;Sun et al., 2013). These regions typically contain different MoREs, and their flexibility allows the interaction with multiple partners, again by adopting different secondary structures (Ang et al., 1998;Campbell et al., 2000;Oldfield et al., 2005;Yoon et al., 2006) (Figure 4B). Through these activating or inhibiting interactions, transcription of the genes targeted by bZIP proteins is effectively modulated in response to several signals. The Nterminal disordered domain also modulates the activity of bZIP transcription factors through post-translational modifications, and phosphorylation in particular. In plants, bZIP transcription factors can be phosphorylated in response to illumination, which disrupts the interactions between the bZIP proteins and their activating partners (Ciceri et al., 1997;Hardtke et al., 2000). The phosphorylated proteins also have lower affinity for their DNA targets, resulting in a decrease of gene activation (Ciceri et al., 1997;Hardtke et al., 2000). Interestingly, some bZIP proteins also display IDRs in their C-terminal domain. In the case of bZIP28 (initially a transmembrane protein), these IDRs are exposed to the lumen of the endoplasmic reticulum and allow the interaction, through MOREs with BIP, the majority reticulum chaperone. In response to stress, bZIP28 is relocated to the Golgi and the cytoplasmic domain is detached, allowing it to enter the nucleus and to control gene expression (Srivastava et al., 2013(Srivastava et al., , 2014. A recent study on TCP8, a transcription factor belonging to the TCP [Teosinte branched 1 (tb1), Cycloidea (cyc) and Proliferating Cell Factor (PCF)] family, showed the presence of three IDRs, two of them at the N-and C-terminal extremities (Valsecchi et al., 2013). While the N-terminus binds DNA in an induced-fit mechanism, the C-terminal region is involved in the TCP protein self-association in a coiled-coil structure (Valsecchi et al., 2013). Furthermore, it seems that different transcription factors from the TCP family can interact, modulating the response of different pathways to multiple stimuli (Baier and Latzko, 1975;Viola et al., 2011Viola et al., , 2012Steiner et al., 2012;Valsecchi et al., 2013).
As illustrated in these examples, the disordered tails of transcription factors have an essential role in modulating their activities through protein-protein interactions with a wide range of activators and inhibitors. Moreover, these extensions are often prone to phosphorylation and constitute another level of regulation. Together, these IDRs form a complex signaling web, turning the transcription factors into hubs and allowing the genes involved in adaptive responses to be finely regulated.

GRAS Family
The GRAS family comprises proteins involved in numerous aspects of plant development and growth. This large family is named after its first members, Gibberellic Acid Insensitive (GAI), Repressor of Gai (RGA) and Scarecrow (SCR), and its members are mostly related to signaling in response to phytohormones [gibberellic acid (GA), auxin, brassinosteroids] and biotic and abiotic stress (Bolle, 2004;Sun et al., 2011). The GRAS family proteins are composed of one variable N-terminal region and a commonly conserved C-terminal GRAS domain (Figure 4A), and are divided into ten subfamilies based on phylogeny (Bolle, 2004;Tian et al., 2004;Lim et al., 2005;Sanchez et al., 2007;Sun et al., 2011). The conserved GRAS domain (ca 380 residues depending on the subfamilies) acts as a transcriptional coactivator (Heery et al., 1997) through leucine-rich motifs. GRAS domains typically contain two leucine-rich motifs, which are needed for specific protein-protein interactions (Cui et al., 2007;Vacic et al., 2007;Fode et al., 2008;Hou et al., 2010). The GRAS proteins interact with a large number of nuclear proteins, most of which are transcription factors, thereby modulating their target activity Hou et al., 2010;Sun et al., 2012).
In contrast to the highly conserved GRAS domain, the Nterminal domains of the GRAS family proteins display a rich diversity at the sequence level, although the N-terminus is conserved within subfamilies (Sun et al., 2010(Sun et al., , 2011(Sun et al., , 2012(Sun et al., , 2013. Moreover, these N-terminal domains have recently been identified as intrinsically disordered (Sun et al., 2010(Sun et al., , 2011(Sun et al., , 2012(Sun et al., , 2013. Interestingly, patches of repeated hydrophobic and/or aromatic residues are found in the N-terminal region (Triezenberg, 1995;Sun et al., 2011). These patches are arranged in conserved motifs within subfamilies (Triezenberg, 1995;Sun et al., 2011), and are involved in specific multiple protein-protein interactions (Sun et al., 2010(Sun et al., , 2012. In the case of the DELLA subfamily which has been intensively studied, the N-terminal domain can interact with the gibberellic acid receptor GIB1, but only when GIB1 has bound its ligand (Murase et al., 2008;Hirano et al., 2010;Sun et al., 2010Sun et al., , 2012. Moreover, each DELLA protein domain (N-terminal and C-terminal domains) can interact with several partners, making these proteins a hub at the center of the gibberellic acid response pathway. Other examples of GRAS proteins are important in other regulatory pathways, although subfamilies are always specialized in a precise type of stimulus (phytohormones, biotic and abiotic stress, etc. . . ) (Sun et al., 2010(Sun et al., , 2011(Sun et al., , 2012(Sun et al., , 2013. A common feature of the GRAS proteins is their ability to acquire a structure when bound to a partner, unlike the fuzzy GAPDH/CP12 complex (Mileo et al., 2013). As mentioned above, MoREs are present in GRAS proteins; each one was predicted to occur within the N-terminal domains, and more specifically in the elements conserved within subfamilies, strengthening the idea that these motifs are the key to the specificity of GRAS proteins (Sun et al., 2011(Sun et al., , 2012. In the case of the DELLA subfamily, the presence of the MoREs has been verified experimentally (Sun et al., 2010(Sun et al., , 2011(Sun et al., , 2012. Interestingly, the N-terminal domain of the GRAS proteins is also the target of phosphorylation, which again introduces another way to fine-tune the regulation of these proteins (Fu et al., 2002;Iakoucheva et al., 2004;Hussain et al., 2007;Mittag et al., 2010). Phosphorylation of the N-terminal domain is directly linked to the activity of the GRAS proteins, modulating the affinity of the N-terminus for its partners, and having a direct effect on the GRAS proteins stability through the control of their degradation (Day et al., 2004;Hussain et al., 2005;Itoh et al., 2005;Czikkel and Maxwell, 2007).
When considering the GRAS family as a whole, it is remarkable how conserved the GRAS domains and patterns are, while the N-terminal domains are highly variable. It seems that the addition of a disordered protein segment to the GRAS domain has increased its number of partners, and thus turned it into a signal-integration hub involved in many different pathways. On the other hand, one could consider that the addition of GRAS domains to pre-existing IDPs involved in the phytohormonal and/or stress responses has allowed these IDPs to control, even more directly, the cellular responses by acting on gene expression.

Cryptochrome
Cryptochromes are a group of proteins in which most members have an intrinsically disordered C-terminal tail that can have a profound impact on their overall function. Together with the photolyases, these proteins belong to the photolyase/cryptochrome family (Lin and Shalitin, 2003;Sancar, 2004;Chaves et al., 2006;Ozturk et al., 2007;Fortunato et al., 2015).
Photolyases are ancient enzymes that use blue light to catalyze the repair of DNA lesions caused by ultraviolet light. Lesions such as cyclobutane pyrimidine dimers (CPD) and pyrimidinepyrimidone photoproducts are repaired by photolyases CPD and by photolyases 6-4, respectively. Photolyase capacity to use blue light is due to the presence of two chromophores: a photoantenna pterin (5,10-methenyltetrahydrofolateor ahydroxy-5-deazaflavin) and flavin adenine dinucleotide (FAD). During the DNA repair, the two chromophores cofactors absorb blue photons and initiate splitting of the cyclobutane ring by a mechanism involving reactive radicals (Liu et al., 2011b).
Cryptochromes, the other group of proteins in the photolyase/cryptochrome family, have a photolyase homologous region (called PHR) and a C-terminal tail ( Figure 5A) . Cryptochromes are able to absorb blue light in a very similar way to the photolyases. Another group within this family includes DASH-type cryptochromes named after the Drosophila, Arabidopsis, Synechocystis and Human. Members of this group are closer to photolyases than to cryptochromes, and are able to repair single-stranded DNA (Chaves et al., 2011) and may also have N-terminal and C-terminal disordered extensions.
In contrast to photolyases, cryptochromes do not have the ability to repair DNA. However, in many organisms, the absorption of photons by the chromophores in the photolyase homologous region of these proteins, induces conformational change (through electron transfer and subsequent phosphorylation), which in turn trigger specialized signaling events through protein-protein interactions (Liu et al., 2011b). It has been shown that the function of cryptochromes resides mainly within their C-terminal tails (Yang et al., 2000;Green, 2004;Chaves et al., 2006Chaves et al., , 2011Yu et al., 2010). Interestingly, this tail is poorly conserved among groups of organisms. In Arabidopsis, two cryptochromes are present, CRY1 and CRY2, that have different C-terminal extensions although a DAS motif is found in both (Lin and Shalitin, 2003). The length of the C-terminal tail in cryptochromes of animals, plants and some unicellular organisms varies from 30 to 250 residues and, as mentioned above, is intrinsically disordered. This characteristic has been established by sequence analysis, biochemical methods such as analysis of the sensitivity to protease cleavage, and physical methods such as circular dichroism and nuclear magnetic resonance (NMR)on recombinant C-terminal extensions of both Arabidopsis and human cryptochromes (Partch et al., 2005). Comparison of the proteolysis susceptibility between full-length cryptochromes and their C-terminal tail showed that this tail interacts with the photolyase domain, causing it to adopt a tertiary structure. The susceptibility to proteolysis of the C-terminal tail of the CRY1 Cryptochromes have a photolyase-homologous region (PHR) and a C-terminal tail. The chromophore molecules of the PHR are shown. (B) Model of the action mechanism of cryptochromes from Arabidopsis. After absorption of light, the C-terminal tail is phosphorylated and a change in conformation is triggered in the entire molecule. The C-terminal tail is exposed at the surface of the protein and as a consequence interactions with partner proteins such as COP1 and SPA are induced (Liu et al., 2011a,b). (C) In darkness, the C-terminal tail of the cryptochrome from Drosophila inhibits the binding of the proteins involved in the circadian rhythm. After illumination, the inhibition by the tail is released and the PHR domain interacts through electrostatic interaction with the protein partners TIM and JET (Green, 2004;Czarna et al., 2013). (D) In mammals, cryptochrome is necessary for the translocation of the protein into the nucleus in which it is part of the core of the transcription/translation feedback that controls the circadian clock together with the proteins PER, BMAL, and CLOCK. from A. thaliana increases after illumination, which is consistent with a conformational change (Partch et al., 2005). Indeed, the crystal structure of the complete cryptochrome from Drosophila confirmed that the C-terminal tail stays in a groove of the photolyase domain and mimics the recognition of photolyases with DNA (Zoltowski et al., 2011;Czarna et al., 2013).
In plants, cryptochromes play a role, together with other photoreceptors, in a variety of functions. In general, the cryptochromes of plants are involved in mechanisms that respond to blue light and their action has been explored in the inhibition of the elongation of hypocotyls, in the photoperiodic induction of flowering, in the circadian clock as in animals, and in other functions Chaves et al., 2011;Liu et al., 2011b).These studies have been mainly performed in the model plant A. thaliana. Studies using transgenic plants overexpressing the C-terminal tail of CRY1 or CRY2, fused with β-glucuronidase (GUS) showed a constitutive morphogenic phenotype similar to that produced by blue light (Yang et al., 2000), indicating that, in the cryptochrome molecule, the C-terminal tail is responsible for the light-induced function. Moreover, NC80, an 80-residues segment present in the Arabidopsis protein, is responsible for the function of the C-terminal tail of CRY2 (Yu et al., 2007). The C-terminal tail of these proteins interacts with other proteins such as COP1 (constitutive photomorphogenic 1) (Wang et al., 2001;Yang et al., 2001), a multifunctional E3 ubiquitin ligase, and SPA1 (suppressor of phytochrome A 1) (Zuo et al., 2011;Liu et al., 2011a). This interaction is part of the initial steps for the light signaling and mechanisms to modulate the developmental process in the plant either by: (1) modulation of gene transcription or (2) suppression of proteolysis of regulators involved in development (i.e., flowering) (Liu et al., 2011a,b). Models have been proposed to explain the mode of action of plant cryptochromes (Lin and Shalitin, 2003;Partch et al., 2005;Yu et al., 2007;Liu et al., 2011a). In general, in these models, the photolyase domain and the C-terminal tail form a closed conformation in the dark. Upon illumination, an open and active conformation is adopted and, in this new conformation, the C-terminal tail is exposed allowing its interaction with other proteins to initiate signaling ( Figure 5B). A model of action that includes dimerization and light dependent-phosphorylation that explains the exposure of the C-terminal tail as a result of charge repelling has also been proposed ( Figure 5B) (Lin and Shalitin, 2003;Yu et al., 2007).
Although cryptochromes of plants are involved as photoreceptors in the circadian cycle, the molecular role of cryptochromes in relation to this cycle has been more elucidated in Drosophila. In this organism, the cryptochrome modulates the central oscillator, or clock, through the light-dependent interaction with the protein Timeless (TIM) (Busza et al., 2004), one of the components of the clock core. This interaction favors the degradation of both TIM and the cryptochrome itself, thus triggering the light/dark cycle each day by synchronization of the clock with the environment. The protein Jetlag (JET), an E3 ligase, also binds to the cryptochrome in a light-dependent manner and is responsible for the ubiquitination and subsequent proteolysis of both the cryptochrome and TIM (Peschel et al., 2009). In this case, and in contrast with the cryptochromes in Arabidopsis, the binding of the cryptochrome from Drosophila to its partners is performed by the photolyase domain of the protein (Figure 5C), whereas in the dark, the C-terminal tail inhibits this binding determining thus the photosensitivity of the circadian clock (Busza et al., 2004;Green, 2004).
In contrast to their homologs from plants and Drosophila, where the disordered C-terminal tail is used for light signaling, mammalian cryptochromes are light-independent transcriptional repressors in the core of the circadian clock ( Figure 5D). Mammalian cryptochromes repress transcription processes that are dependent on the protein complex BMAL/CLOCK (Sancar, 2004;Chaves et al., 2011). In the case of these cryptochromes, the function of the C-terminal tail is more complex: (i) it is involved in the nuclear localization of the protein and (ii) with the photolyase domain, it also has a role in the interaction with other components of the clock such as BMAL (Chaves et al., 2006). Interestingly, the C-terminal tail also contributes to the circadian period length, since its phosphorylation affects the level of the protein, either promoting its own degradation in the case of CRY 2 (Harada et al., 2005) or stabilizing the protein as for CRY 1 (Gao et al., 2013).
It has been proposed that cryptochromes have evolved several times independently as an example of convergent evolution (Green, 2004). Only small changes have occurred in the photolyase domain, this part of the protein being conserved among cryptochromes and photolyases. One possible mechanism to explain the acquisition of C-terminal extensions in existing proteins would be through gene fusion (Marsh and Teichmann, 2010). If this mechanism had taken place at the origin of cryptochromes, it would suggest that proteins related to the Cterminal tail of cryptochromes already existed independently and had a function of their own. These independent domains became later associated to a photolyase domain providing them with the capacity to detect light. As mentioned above, the plant cryptochromes C-terminal domain is active and has the information needed to achieve signaling (Yang et al., 2000). During evolution, this protein could have fused with a duplicate of photolyase. In this hypothesis, the addition of the lightdependent photolyase module might be a way to adjust the physiology of the organisms to their environment through light perception. This could therefore be seen as an IDP having acquired a globular extension. Since in plants, a motif (DAS) within the C-terminal tail is conserved, it has been proposed that the ancestral plant cryptochrome emerged from a fusion of a photolyase with a protein containing the DAS motif (Lin and Shalitin, 2003). Another hypothesis that could explain the acquisition of the C-terminal tail in cryptochromes is by gene extension into a non-coding region (Marsh and Teichmann, 2010). The photolyase gene could thus have been extended through junk DNA. Analysis of phylogenetic relationships of gene families in animals showed that extension of an existing gene by "exonization" of a previous non-coding region seems to be an important evolutionary strategy to add a C-terminal disordered extension to proteins (Buljan et al., 2010). The high variability and different functions of the C-terminal tail of cryptochromes among plants and animals are in accordance with this hypothesis. Studies on the origin and evolution of the C-terminal tail of cryptochromes will give insights into the adaptation of organisms to light.

Conclusion
Within the present review, we tried to demonstrate the central and multiple roles of intrinsically disordered tails carried by certain globular proteins. Describing several examples of proteins displaying IDRs in photosynthetic organisms, we discussed how IDRs impact on both the functions and mechanisms of action of their "host" proteins. The examples of the A 2 B 2 -GAPDH and the α-Rubisco activase isoform show that their C-terminal disordered extensions participate in the light-dependent redox regulation of the photosynthetic metabolism. The cases of the multiple transcription factors with a disordered tail are very similar yet very different. In the few examples listed here, the disordered region plays a major role in the regulation of the DNA-binding domain through protein-protein interactions or post-translational modifications. Their sensitivity to a large number of signals allows the activity of the transcription factors to be modulated according to many factors (one to many), turning these proteins into hubs in a large signaling web. Lastly, the cryptochrome family is a prime example of a disordered extension changing the fundamental function of the initial photolyase into a light-dependent signaling protein, conserving the ability to absorb blue light and repurposing it.
The examples presented here are but a few of the multitude of proteins that have acquired a disordered extension (Uversky, 2013), although most examples do not usually come from the photosynthetic world. We can expect that in the years to come, an increasing number of these proteins will be identified. A great question that remains is how these proteins originated. While in some cases, the addition of an IDR seems to be quite recent like the GapB subunit. In other cases, this addition might be very ancient as in the NAC, bZIP, and GRAS families, in which there are multiple disordered extensions families that may derive from multiple fusion events, or a long succession of duplications followed by diverging evolution of the subfamilies. We hope that the expansion of the IDP field in general and specifically, the one involved in "green" biochemistry, will 1 day answer these questions.