HYPOTHESIS AND THEORY article
Aggregation is a Context-Dependent Constraint on Protein Evolution
- 1Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Spain
- 2RNA System Biology Lab, Centre for Human Technologies, Istituto Italiano di Tecnologia (IIT), Genoa, Italy
- 3Department of Chemistry, Columbia University, New York, NY, United States
- 43UK-DRI Centre at the Maurice Wohl Institute, Department of Clinical and Basic Neuroscience, King’s College London, London, United Kingdom
- 5Centre for Genomic Regulation (CRG) and ICREA, The Barcelona Institute for Science and Technology, Barcelona, Spain
- 6Dipartimento di Biologia e Biotecnologie, Sapienza University, Rome, Italy
Solubility is a requirement for many cellular processes. Loss of solubility and aggregation can lead to the partial or complete abrogation of protein function. Thus, understanding the relationship between protein evolution and aggregation is an important goal. Here, we analysed two deep mutational scanning experiments to investigate the role of protein aggregation in molecular evolution. In one data set, mutants of a protein involved in RNA biogenesis and processing, human TAR DNA binding protein 43 (TDP-43), were expressed in S. cerevisiae. In the other data set, mutants of a bacterial enzyme that controls resistance to penicillins and cephalosporins, TEM-1 beta-lactamase, were expressed in E. coli under the selective pressure of an antibiotic treatment. We found that aggregation differentiates the effects of mutations in the two different cellular contexts. Specifically, aggregation was found to be associated with increased cell fitness in the case of TDP-43 mutations, as it protects the host from aberrant interactions. By contrast, in the case of TEM-1 beta-lactamase mutations, aggregation is linked to a decreased cell fitness due to inactivation of protein function. Our study shows that aggregation is an important context-dependent constraint of molecular evolution and opens up new avenues to investigate the role of aggregation in the cell.
The majority of proteins function as monodispersed ordered species dissolved in intra- or extra-cellular aqueous fluids. There are, however, abnormal or pathologic states in which proteins aggregate, often becoming insoluble and dropping out of solution (Tartaglia et al., 2008; Dobson, 2017). In other cases, protein assembly leads to a liquid-liquid phase separation producing a reversible soluble coacervate (Bolognesi et al., 2016; Gotor et al., 2020). A liquid-to-solid phase transition is often detrimental to the cell as it is associated with a loss of the function that the protein has evolved to attain in the specific context (De Baets et al., 2015). In some cases, aggregates could acquire aberrant functionalities, which can have consequences for the onset and development of disorders (Murakami et al., 2011). Although these principles are valid for a large part of the proteome (Tartaglia et al., 2007; Vecchi et al., 2020), aggregation can be required in specific cases to increase the functionality of a biological system. This occurs, for instance, for Pmel17, a protein found in mammalian melanosomes, whose aggregation is necessary for the production and storage of melanin (Fowler et al., 2007). Given the double valence of protein aggregation that can result either in the formation of toxic assemblies or physiologically required organelles, it has long been debated what could determine whether aggregation is beneficial or detrimental from an evolutionary perspective. Here, we set to clarify this complex matter and question of whether aggregation is an important constraint in protein evolution that can be the discriminant between beneficial and detrimental situations by considering two cases: we studied molecular evolution of proteins in an endogenous vs. exogenous host. One of the important factors that reduce the danger of aggregation is in fact the presence of interaction partners that protect the regions that would otherwise be involved in self-assembly (Zacco et al., 2019; Louka et al., 2020). Indeed, the interfaces through which proteins interact are often highly aggregation prone, so that their engagement in interactions reduces propensity to self-assembly (Pechmann et al., 2009). This means that regions under the evolutionary pressure of protein function may be the same that lead to aberrant aggregation (Lee et al., 2010; Masino et al., 2011; Temussi et al., 2021). Any situation, such as mutations or anomalous expression, could result in disruption of these beneficial interactions with consequent aggregation and depletion of normal function. Expression of a protein in a host different from the natural one is of course an artificial case but represents a prime example in which natural interactions cannot occur. This situation could induce a completely unnatural environment in which proteostasis (or maintainance of protein homeostasis) is impeded (Tartaglia et al., 2010): if not promptly degraded, an exogenous protein could establish aberrant interactions and aggregate (Beerten et al., 2012; Gotor et al., 2020). To evaluate the effect of exogenous vs. endogenous expression, we analysed two large-scale data sets in which variants of an exogenous and an endogenous protein were expressed in different host systems. We focused on molecular evolution data of human TAR DNA-binding protein 43 (TDP-43) exogenously expressed in yeast and bacterial TEM-1 beta-lactamase endogenously expressed in E. coli. TDP-43 is involved in various steps of RNA biogenesis and processing, including alternative splicing (Bhardwaj et al., 2013). TEM-1 beta-lactamase is an enzyme that hydrolyzes the beta-lactam bond in susceptible beta-lactam antibiotics, conferring resistance to penicillins and cephalosporins in bacteria (Bonomo, 2017). The mutational library of TDP-43 was designed to introduce mutations specifically in the intrinsically unstructured region near the C-terminus of the protein (290–331 or 332–373) (Bolognesi et al., 2019). The library of beta-lactamase mutants targeted instead the whole stably structured enzyme (Fantini et al., 2020). With these two systems, we investigated whether and how aggregation acts as a constraint of the evolutionary process by finely tuning the selection of proteins to fit in a specific environment (Tartaglia et al., 2007). While aggregation leads always to protein depletion, we aimed to determine if the detrimental or beneficial effects sensitively depend on the context and nature of the selection sieve.
Materials and Methods
Random Mutations and Algorithms
In our analyses, we used mutation of TDP-43 and TEM-1 beta-lactamase sequences that have the same length as their respective wild types. Each amino acid was selected for a mutation with the same frequency. Changing the amino acid frequencies from uniform to weighted according to the natural occurrences does not change the results (Supplementary Material). As for the negative sets used in TDP-43 and TEM-1 beta-lactamase analyses, the amount of random mutations was matched with the experimental occurrences of the positive sets. Predictions were carried out with Zyggregator (Tartaglia et al., 2008). This algorithm estimates whether a protein aggregates into solid-like aggregates (fibrillar or protofibrillar structures) on the basis of combinations of physico-chemical properties such as hydrophobicity, helical and beta-propensities (Tartaglia and Vendruscolo, 2008). Since the physico-chemical properties of the amino acids change with the environment in which the aggregation processes take place, the coefficients in the Zyggregator algorithm were fitted on a database of experimental rates collected for homogeneous conditions and the predictions were only performed in the vicinity of such conditions. Here, we used a linearized version of Zyggregator (Klus et al., 2014) to speed up the analysis of the large-scale sets of mutants. By linearization we mean a procedure in which the aggregation propensity was computed considering the amino acid composition of aggregation-prone regions predicted by Zyggregator. This calculation transforms each amino acid and close neighbors (Seven amino acids as defined in previous works (Tartaglia et al., 2008; Klus et al., 2014)) in a numerical value to estimate the aggregation propensity. The camFOLD method (Tartaglia and Vendruscolo, 2010) was used to calculate the folding propensity profiles of proteins according to a model that exploits the same physicochemical properties exploited by Zyggregator but combined in a different way.
Data Generation Description
The experimental datasets were retrieved from two publications in which error-prone oligonucleotide synthesis was adopted to comprehensively generate the mutational data sets of human TDP-43 (Bolognesi et al., 2019) and TEM-1 beta-lactamase (Fantini et al., 2020). Briefly, human TDP-43 was expressed in S. cerevisiae (Bolognesi et al., 2019). Deep sequencing before and after TDP-43 mutant library induction was used to quantify the relative effects caused by each variant on growth (the input culture was split into two cultures and fitness was calculated from changes of output to input variant frequencies relative to wild type). After quality control and filtering (three replicates of deep mutational scanning experiments, minimum mean input read count threshold of ten) (Bolognesi et al., 2019), the approach quantified the toxicity of 1,266 single and 56,730 double amino acid changes with high reproducibility. TEM-1 beta-lactamase was expressed in E. coli and twelve libraries of variants were generated (Fantini et al., 2020). These libraries underwent cell phenotypic selection in a growth functional assay to retain only the variants coding for a functional protein. Differently from TDP-43, only positive fitness variants were sequenced. The number of transformants were in the same order of magnitude of the sequencing capacity of the next-generation sequencing platform (between 100 thousand and 1 million) to guarantee good library complexity. The different generations (or evolutionary cycles) were produced serially, which indicates that the number of mutations increase with respect to the wild-type. The sequencing was performed at the 1st, 5th and 12th generation. While in the 1st generation the dose of antibiotics for phenotypic selection was at 25 ug/ml, in the 5th and 12th generation the level was raised to 100 ug/ml to allow stricter selection. In the last generation library a total of 177,253 protein variants with the same length of the wild-type protein (i.e., 286 amino acids without inclusion of premature stop codons) were retrieved (Fantini et al., 2020).
For the analysis of the TDP-43 data set, we used both the negative and positive fitness scores measured relatively to the toxicity of wild-type protein in S. cerevisae. By contrast, in the TEM-1 beta-lactamase case, we only had variants selected through the evolutionary process and comparison with an antagonistic set was not directly possible. To carry out the analysis, we generated a negative set that was as close as possible to the pool of mutations in the experiments of deep mutational scanning. We tested the randomized negative set using the TDP-43 case: we designed a set of sequences with one or two random mutations depending on the original composition of the pool and performed a comparative analysis. Our results indicated similar performances using a negative or random set, which suggests that they are interchangeable. Once the procedure to generate the random set was tested, we proceeded to build a negative set for the TEM-1 beta-lactamase case. We designed a number of mutations following the rule of keeping the number of mutations present in the positive set. To avoid statistical biases, we analyzed only the sequences from the positive set that have the same length of the wild-type. We analyzed the 1st, 5th and 12th generation sets for which sequencing data were available. To quantify how aggregation changes along the data sets, we computed the Receiver Operating Characteristics (ROC) curves and their relative Area Under the Curve (AUC) for each sub-data set using the corresponding random set as a negative control. We measured the AUC vs the number of mutations keeping track of the evolutionary cycles. Some sub-data sets with specific mutations were poorly populated in the later cycles (as the number of mutations increase in each evolutionary cycle) and therefore could not be used for our statistical analysis. To avoid the issue and maintain consistency, we analyzed only sub-data sets that had a size at least 5% that of the largest sub-data set at each evolutionary cycle. We then tested how aggregation and folding propensities changed while a protein accumulated random mutations. We generated a control data set selecting mutations randomly from the 20 natural aminoacids. At each step of evolution, we ran the linearised Zyggregator algorithm to get a score to quantify the aggregation and folding propensities. We did this for all the variants and checked if the sequences increased or decreased their propensities at each random mutation.
Aggregation and Folding Predictions
We first tested if a linearized version of the Zyggregator approach could be used to perform proteome-wide analyses, which comes with the advantage of having faster calculations (see Materials and Methods). As shown in Figure 1, the linearized version of the algorithm shows a Pearson correlation of 0.78 with the original Zyggregator algorithm. This result indicates that the linearization procedure is able to capture the fundamental features of the Zyggregator algorithm and can be used in our analysis. In agreement with previous work (Tartaglia and Vendruscolo, 2010), we found that the Zyggregator and camFOLD scores (see Materials and Methods) are functionally related for a large set of proteins taken from the human interactome (Figure 1B; correlation of –0.73). In other words, proteins with a high propensity to fold show poor aggregation propensity.
FIGURE 1. Aggregation and folding propensity of the human proteome. (A) Scatter plot of Zyggregator algorithm and “linearized” version computed on 5,000 random proteins from the human proteome. The correlation among the two methods is 0.79, which justifies the use of the linear approximation of Zyggregator. (B) Scatter plot of Zyggregator and camFOLD scores. Folding and aggregation propensities are anti-correlated with a Pearson correlation coefficient of –0.73.
Random Mutations Increase the Aggregation Propensity
Can evolution drive sequences to aggregate ? Clearly much depends on the context and functions of the proteins of interest, but it is possible to make some general considerations. When we mutated sequences in the human proteome, the aggregation propensity rose and the folding propensities decreased (Figures 2A,B). This result indicates that in many cases, including enzymes acting in aqueous solutions (Ho and Gasch, 2015; Saarikangas and Barral, 2016), protein function requires a stable native fold. This is in agreement with the hypothesis that proteins “live on the edge” due to opposing effects on their sequences caused by the evolutionary pressure to prevent aggregation at the concentrations needed by the cell and the random mutational processes that inherently increase the aggregation propensities (Tartaglia et al., 2007).
FIGURE 2. Aggregation and folding propensities changes upon amino acid mutations in human proteome. (A) The aggregation propensity increase while the protein accumulate random mutations. By contrast, (B) the folding propensities decreases when the protein is subjected to mutation.
Aggregation as a Context-Dependent Constraint on Protein Evolution
To address the question on how much the environment influences the role that aggregation has in the evolution of proteins, we analyzed data from two different molecular evolution experiments. In the first experiment, TDP-43 was heterologously expressed in S. cerevisiae and the toxicity of variants collected. In the second experiment, variants of TEM-1 beta-lactamase were selected in E. coli while antibiotics were given to the cell culture. In this analysis we used the ROC analysis to measure the strength of the link between the toxicity and the aggregation propensity (Tartaglia et al., 2008). The ROC is highly appropriate as it allows to simultaneously analyse the true and false positive rates.
Cell Fitness and Aggregation Propensity of TDP-43 Mutants
In the case of TDP-43 expression in S. cerevisiae, we divided the experimental sets in two partitions of equal size corresponding to the highest (positive) and lowest (negative) fitness values associated with the mutants. As shown in Figure 3, the Area Under the ROC Curve (AUC) rises progressively with the high vs low fitness mutations score. Our result indicates that the aggregation is an excellent predictor of fitness, especially for those mutants that have the strongest experimental signal (AUC of 0.78 or more). In the insets of Figure 3 we show statistics for the aggregation propensities of single and double mutations associated with the highest and lowest fitness scores (corresponding to AUCs of 0.78 and 0.77, respectively). We then swapped the negative set with a list of mutants generated randomly (Figure 4, Materials and Methods). We observed that the AUC increases monotonically with the signal strength encoded in the aggregation propensity of the high fitness score mutants (statistics of the mutations associated with the 100 highest vs 100 random mutations are shown in Figure 4). This result indicates that the negative and random sets are interchangeable for our analyses.
FIGURE 3. Deep mutational scanning experiments of TDP–43 mutants expressed in S. cervisiae. AUC scores computed on the top and bottom sub-data sets. (A, B) show the rise of the AUC values as function of the high vs low fitness data sets for single and double mutations, respectively.In the insets, we show the aggregation propensities of mutations associated with the highest (top 100) and lowest (bottom 100) fitness mutations (**, p-value < 0.01, Kolmogorov-Smirnoff test).
FIGURE 4. Deep mutational scanning experiments of TDP-43 mutants expressed in S. cervisiae using a random negative set. The AUC values increase progressively with the experimental signal strength. Double mutants are analysed. In the insets, we show the aggregation propensities of mutations associated with the highest (top 100) and random (100 cases) fitness scores (**, p-value < 0.01, Kolmogorov-Smirnoff test).
Cell Fitness and Aggregation Propensity of TEM-1 Beta-Lactamase Mutants
In the case of TEM-1 beta-lactamase in E. coli, only the positive fitness set was available from the experiments. To generate a negative set following the approach used for TDP-43, we first analysed the number of mutations and sequence lengths of variants. We observed that the number of mutations increases on average with the generations (Figures 5A,B). Focusing on the variants that have the same length as the reference TEM-1 beta-lactamase, we generated the negative set respecting the frequencies of mutations occurring at each cycle. In Figure 5C we show the values corresponding to the AUC computed using the aggregation propensities for the high (positive) vs low (negative) fitness score mutants. There is a clear trend toward a decrease of the AUC score (i.e., increase of 1-AUC) with the number of mutations and evolutionary cycles (generation). Thus, differently from TDP-43, we observed that the aggregation propensity decreases. In Figure 5D we show that the aggregation propensity of mutants associated with high fitness score decreases progressively through evolutionary cycles. In this analysis, we selected variants by ranking the number of mutations occurring at each evolutionary cycle and compared their aggregation propensities with their random counterpart (100 sequences for each set; the AUCs are 0.6, 0.65 and 0.70, respectively).
FIGURE 5. Deep scanning experiments of TEM-1 beta-lactamase mutants expressed in E. coli. Distribution of (A) mutation numbers and (B) sequence lengths for different generation of evolution. (C) AUC value as a function of high vs low fitness data sets (data computed only for sub data set with at least 1000 data points). (D) Aggregation propensities of mutations associated with the highest (top 100) and random (100 cases) fitness scores for generation 1,5, 12 (**, p-value < 0.01, Kolmogorov-Smirnoff test). The colors match the points in the (D).
This study aimed to investigate the influence of protein aggregation on molecular evolution. Mutational scanning approaches have been previously used in the literature and demonstrated to provide a powerful tool to investigate the ability of a protein to aggregate and cause toxicity (Luheshi et al., 2007). We also have previously observed that, under physiological conditions, expression levels are fine-tuned to the specific protein aggregation rates (Tartaglia et al., 2007). This finding suggests that mutations altering the aggregation propensity can impact cell fitness and thus influence evolution (Tartaglia et al., 2007; Vecchi et al., 2020). The computational analysis presented in this paper revealed that aggregation and folding propensities anti-correlate at the proteomic level. These results agree with and expand studies that had suggested similar conclusions from completely different premises (Jahn and Radford, 2008; Tartaglia and Vendruscolo, 2010). Our data indicate that, for the large majority of cases, naturally-occurring protein sequences have been optimized by evolution to have lower aggregation and higher folding propensities (Lee et al., 2010). Our calculations predict that random mutations render proteins more prone to aggregate at the expenses of their folding ability, even though this variability could help exploring new functionalities depending on the cellular context. To estimate the impact of aggregation in different cellular contexts, we analysed molecular evolution experiments in which human TDP-43 and TEM-1 beta-lactamase mutants were expressed in S. cerevisiae (Bolognesi et al., 2019) and in E. coli (Fantini et al., 2020) respectively. Notably, TDP-43 contains > 30% of structural disorder in the C-terminus (Gotor et al., 2020), whereas TEM-1 beta-lactamase has a well-defined structure (Fantini et al., 2020). When prion-like regions, as in the case of TDP-43 (Bolognesi et al., 2019), are not protected by the native fold, aggregation takes place because the exposure to the solvent favours self-assembly (Linding et al., 2004; Tartaglia et al., 2008; Gotor et al., 2020). For TDP-43 mutants, both negative and positive fitness scores were directly compared with the aggregation propensities. By contrast, for TEM-1 beta-lactamase, we had only positive fitness variants selected through the evolutionary process, which made the comparison with a negative set not directly possible. To carry out the analysis, we artificially generated a negative set as close as possible to the experimental one. Our analysis revealed two quite different behaviors. With TEM-1 beta-lactamase, the system was under antibiotic selection pressure. Thus, the protein had to remain active to protect against penicillins. The variants were in fact selected by evolution to maintain enzyme activity and thus antibiotics resistance while minimizing aggregation (Fantini et al., 2020). Our results clearly indicate that while the negative set maintains a relative high aggregation propensity through the evolutionary cycles, the positive set significantly reduces it, thus showing that a purifying pressure exists. Interestingly, it has been previously observed that M180T, E195D, L196I, and S281T were the most frequent mutations in the system. These residues are all exposed and their mutations should, with the exception of the well documented M180, only have minor effects on protein fold as indicated by visual inspection and semi-quantitative analysis with the DUET software, a web server for an integrated computational approach to study missense mutations in proteins (Pires et al., 2014). Thus, these mutations would represent a “neutral” passage that do not impinge on fitness. With TDP-43, we observed the opposite trend. We have recently stressed the importance of protein interactions with chaperones to prevent aggregation (Masino et al., 2011; Louka et al., 2020; Temussi et al., 2021). Expression of a protein in a heterologous system will lead by definition to a situation in which natural interactions cannot occur (Gotor et al., 2020). Accordingly, we observed that TDP-43 has a high toxicity in yeast, as expected for a protein that has a tendency to aggregate also in its natural host (Bolognesi et al., 2019). In this case, we observed that mutations that increased hydrophobicity and aggregation also decreased toxicity by eliminating from solution proteins that could not have functional partners. This hypothesis would explain why solid aggregates formed by TDP-43 A238V, Q360Y and G335I mutants are non-toxic, while the soluble W334K, M332K and A328P have a higher toxicity, as described in Bolognesi ed al. (Bolognesi et al., 2019). This would also be consistent with the generally accepted idea that soluble aggregates are more toxic than the insoluble end-points (Lashuel and Lansbury, 2006), although in our work we did not analyse the formation of soluble oligomers. It is also worth mentioning that the TDP-43 mutational library was designed to target a specific region of the protein, the intrinsically unstructured C-terminus. This meant no evolution pressure to preserve structure. The two scenarios indicate a clear ambivalence that protein aggregation can acquire depending on the system in which it occurs and suggest aggregation as an important constraint for evolution. Our approach demonstrates how different selective pressures may completely change the effect of aggregation on a specific system: an exogenous protein with no functional role, such as TDP-43 in S. cerevisiae is less toxic if compartimentalised in an aggregate, while an endogenous proteins, such as TEM-1 beta lactamase in E. coli, is functional when aggregation is avoided. Further analyses will be important to shed light onto the selective pressure against aggregation in specific cellular environments. Indeed, it has been previously suggested that the volume of the organelles imposes stringent limitations on protein evolution to avoid high aggregation propensities in confined compartments (Tartaglia et al., 2005; Tartaglia and Vendruscolo, 2009). This has implications for protein variability in different organisms (Tartaglia et al., 2005). In line with this observation, it should be mentioned that poly-anionic molecules such as RNA can reduce aggregation propensity (Chen et al., 2019; Sanchez de Groot et al., 2019). On the contrary, mutations that abolish RNA-binding in these proteins can force them to change their cellular localization, accumulate and aggregate. This is the case of FUS and TDP-43 (Chen et al., 2019; Marrone et al., 2019). Making large-scale predictions of protein aggregation can reveal important properties of biological systems, as demonstrated by the elegant study by Khodaparast et al. (Khodaparast et al., 2018) who showed that identifying aggregation-promoting regions of a protein can be instrumental to block the process of bacterial infection. In the future, it will be interesting to investigate the role of specific molecular partners, such as RNA, in regulating aggregation. Computational approaches such as the one presented in this study (Agostini et al., 2014; Louros et al., 2020) will be key to achieve a more complete understanding of molecular evolution, and to increase our ability to manipulate proteins for specific purposes.
Data Availability Statement
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.
MM, AA and GT carried out the analysis. AA and MF analysed the data. AP and GT wrote the manuscript.
The research leading to these results was supported by the Dementia Research Institute (REI 3556 and AlzUK) (ARUK-PG2019B-020), European Research Council (RIBOMYLOME 309545 and ASTRA 855923), the H2020 projects (IASIS 727658 and INFORE 25080), the Spanish Ministry of Economy and Competitiveness BFU 2017-86970-P as well as the collaboration with Peter St. George-Hyslop financed by the Wellcome Trust.
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmolb.2021.678115/full#supplementary-material
Agostini, F., Cirillo, D., Livi, C. M., Delli Ponti, R., and Tartaglia, G. G. (2014). ccSOL Omics: a Webserver for Solubility Prediction of Endogenous and Heterologous Expression in Escherichia coli. Bioinformatics 30, 2975–2977. doi:10.1093/bioinformatics/btu420
Bolognesi, B., Faure, A. J., Seuma, M., Schmiedel, J. M., Tartaglia, G. G., and Lehner, B. (2019). The Mutational Landscape of a Prion-like Domain. Nat. Commun. 10, 4162. doi:10.1038/s41467-019-12101-z
Bolognesi, B., Lorenzo Gotor, N., Dhar, R., Cirillo, D., Baldrighi, M., Tartaglia, G. G., et al. (2016). A Concentration-Dependent Liquid Phase Separation Can Cause Toxicity upon Increased Protein Expression. Cel Rep. 16, 222–231. doi:10.1016/j.celrep.2016.05.076
Chen, H.-J., Topp, S. D., Hui, H. S., Zacco, E., Katarya, M., McLoughlin, C., et al. (2019). RRM Adjacent TARDBP Mutations Disrupt RNA Binding and Enhance TDP-43 Proteinopathy. Brain 142, 3753–3770. doi:10.1093/brain/awz313
De Baets, G., Van Doorn, L., Rousseau, F., and Schymkowitz, J. (2015). Increased Aggregation Is More Frequently Associated to Human Disease-Associated Mutations Than to Neutral Polymorphisms. Plos Comput. Biol. 11, e1004374. doi:10.1371/journal.pcbi.1004374
Fantini, M., Lisi, S., De Los Rios, P., Cattaneo, A., and Pastore, A. (2020). Protein Structural Information and Evolutionary Landscape by In Vitro Evolution. Mol. Biol. Evol. 37, 1179–1192. doi:10.1093/molbev/msz256
Gotor, N. L., Armaos, A., Calloni, G., Torrent Burgas, M., Vabulas, R. M., De Groot, N. S., et al. (2020). RNA-binding and Prion Domains: the Yin and Yang of Phase Separation. Nucleic Acids Res. 48, 9491–9504. doi:10.1093/nar/gkaa681
Khodaparast, L., Khodaparast, L., Gallardo, R., Louros, N. N., Michiels, E., Ramakrishnan, R., et al. (2018). Aggregating Sequences that Occur in many Proteins Constitute Weak Spots of Bacterial Proteostasis. Nat. Commun. 9, 866. doi:10.1038/s41467-018-03131-0
Klus, P., Bolognesi, B., Agostini, F., Marchese, D., Zanzoni, A., and Tartaglia, G. G. (2014). The cleverSuite Approach for Protein Characterization: Predictions of Structural Properties, Solubility, Chaperone Requirements and RNA-Binding Abilities. Bioinformatics 30, 1601–1608. doi:10.1093/bioinformatics/btu074
Lashuel, H. A., and Lansbury, P. T. (2006). Are Amyloid Diseases Caused by Protein Aggregates that Mimic Bacterial Pore-Forming Toxins?. Quart. Rev. Biophys. 39, 167–201. doi:10.1017/s0033583506004422
Lee, Y., Zhou, T., Tartaglia, G. G., Vendruscolo, M., and Wilke, C. O. (2010). Translationally Optimal Codons Associate with Aggregation-Prone Sites in Proteins. PROTEOMICS 10, 4163–4171. doi:10.1002/pmic.201000229
Linding, R., Schymkowitz, J., Rousseau, F., Diella, F., and Serrano, L. (2004). A Comparative Study of the Relationship between Protein Structure and β-Aggregation in Globular and Intrinsically Disordered Proteins. J. Mol. Biol. 342, 345–353. doi:10.1016/j.jmb.2004.06.088
Louros, N., Orlando, G., De Vleeschouwer, M., Rousseau, F., and Schymkowitz, J. (2020). Structure-based Machine-Guided Mapping of Amyloid Sequence Space Reveals Uncharted Sequence Clusters with Higher Solubilities. Nat. Commun. 11, 3314. doi:10.1038/s41467-020-17207-3
Luheshi, L. M., Tartaglia, G. G., Brorsson, A.-C., Pawar, A. P., Watson, I. E., Chiti, F., et al. (2007). Systematic In Vivo Analysis of the Intrinsic Determinants of Amyloid β Pathogenicity. Plos Biol. 5, e290. doi:10.1371/journal.pbio.0050290
Marrone, L., Drexler, H. C. A., Wang, J., Tripathi, P., Distler, T., Heisterkamp, P., et al. (2019). FUS Pathology in ALS Is Linked to Alterations in Multiple ALS-Associated Proteins and Rescued by Drugs Stimulating Autophagy. Acta Neuropathol. 138, 67–84. doi:10.1007/s00401-019-01998-x
Masino, L., Nicastro, G., Calder, L., Vendruscolo, M., and Pastore, A. (2011). Functional Interactions as a Survival Strategy against Abnormal Aggregation. FASEB j. 25, 45–54. doi:10.1096/fj.10-161208
Murakami, T., Yang, S.-P., Xie, L., Kawano, T., Fu, D., Mukai, A., et al. (2011). ALS Mutations in FUS Cause Neuronal Dysfunction and Death in Caenorhabditis elegans by a Dominant Gain-Of-Function Mechanism. Hum. Mol. Genet. 21, 1–9. doi:10.1093/hmg/ddr417
Pechmann, S., Levy, E. D., Tartaglia, G. G., and Vendruscolo, M. (2009). Physicochemical Principles that Regulate the Competition between Functional and Dysfunctional Association of Proteins. Proc. Natl. Acad. Sci. 106, 10159–10164. doi:10.1073/pnas.0812414106
Pires, D. E. V., Ascher, D. B., and Blundell, T. L. (2014). DUET: a Server for Predicting Effects of Mutations on Protein Stability Using an Integrated Computational Approach. Nucleic Acids Res. 42, W314–W319. doi:10.1093/nar/gku411
Sanchez de Groot, N., Armaos, A., Graña-Montes, R., Alriquet, M., Calloni, G., Vabulas, R. M., et al. (2019). RNA Structure Drives Interaction with Proteins. Nat. Commun. 10, 3246. doi:10.1038/s41467-019-10923-5
Tartaglia, G. G., Pawar, A. P., Campioni, S., Dobson, C. M., Chiti, F., and Vendruscolo, M. (2008). Prediction of Aggregation-Prone Regions in Structured Proteins. J. Mol. Biol. 380, 425–436. doi:10.1016/j.jmb.2008.05.013
Tartaglia, G. G., Pechmann, S., Dobson, C. M., and Vendruscolo, M. (2007). Life on the Edge: a Link between Gene Expression Levels and Aggregation Rates of Human Proteins. Trends Biochemical Sciences 32, 204–206. doi:10.1016/j.tibs.2007.03.005
Temussi, P. A., Tartaglia, G. G., and Pastore, A. (2021). The Seesaw Between Normal Function and Protein Aggregation: How Functional Interactions May Increase Protein Solubility. Bioessays 43, 100031. doi:10.1002/bies.202100031
Vecchi, G., Sormanni, P., Mannini, B., Vandelli, A., Tartaglia, G. G., Dobson, C. M., et al. (2020). Proteome-wide Observation of the Phenomenon of Life on the Edge of Solubility. Proc. Natl. Acad. Sci. USA 117, 1015–1020. doi:10.1073/pnas.1910444117
Zacco, E., Graña-Montes, R., Martin, S. R., de Groot, N. S., Alfano, C., Tartaglia, G. G., et al. (2019). RNA as a Key Factor in Driving or Preventing Self-Assembly of the TAR DNA-Binding Protein 43. J. Mol. Biol. 431, 1671–1688. doi:10.1016/j.jmb.2019.01.028
Keywords: protein aggregation, computational model, deep scanning, cellular fitness, evolution
Citation: Monti M, Armaos A, Fantini M, Pastore A and Tartaglia GG (2021) Aggregation is a Context-Dependent Constraint on Protein Evolution. Front. Mol. Biosci. 8:678115. doi: 10.3389/fmolb.2021.678115
Received: 08 March 2021; Accepted: 13 May 2021;
Published: 18 June 2021.
Edited by:Alfonso De Simone, Imperial College London, United Kingdom
Reviewed by:Stefano Ricagno, University of Milan, Italy
Irene Julca, Nanyang Technological University, Singapore
Copyright © 2021 Monti, Armaos, Fantini, Pastore and Tartaglia. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.