A refined pH-dependent coarse-grained model for peptide structure prediction in aqueous solution

Tufféry, Pierre; Derreumaux, Philippe

doi:10.3389/fbinf.2023.1113928

ORIGINAL RESEARCH article

Front. Bioinform., 16 January 2023

Sec. Protein Bioinformatics

Volume 3 - 2023 | https://doi.org/10.3389/fbinf.2023.1113928

A refined pH-dependent coarse-grained model for peptide structure prediction in aqueous solution

1. Université Paris Cité, CNRS UMR 8251, INSERM U1133, Paris, France
2. Université Paris Cité, CNRSUPR9080, Laboratoire de Biochimie Théorique, Institut de Biologie Physico-Chimique, Fondation Edmond de Rothschild, Paris, France
3. Institut Universitaire de France (IUF), Paris, France

Article metrics

View details

Citations

3,5k

Views

878

Downloads

Abstract

Introduction: Peptides carry out diverse biological functions and the knowledge of the conformational ensemble of polypeptides in various experimental conditions is important for biological applications. All fast dedicated softwares perform well in aqueous solution at neutral pH.

Methods: In this study, we go one step beyond by combining the Debye-Hückel formalism for charged-charged amino acid interactions and a coarse-grained potential of the amino acids to treat pH and salt variations.

Results: Using the PEP-FOLD framework, we show that our approach performs as well as the machine-leaning AlphaFold2 and TrRosetta methods for 15 well-structured sequences, but shows significant improvement in structure prediction of six poly-charged amino acids and two sequences that have no homologous in the Protein Data Bank, expanding the range of possibilities for the understanding of peptide biological roles and the design of candidate therapeutic peptides.

1 Introduction

Peptides of less than 40 amino acids have diverse biological functions, acting as signaling entities in all domains of life, and targeting receptors or interfering with molecular interactions. Hormones and their bacterial mimetics (Fetissov et al., 2019), neuropeptides and their roles in neurodegenerative diseases (Ben-Shushan and Miller, 2021), antimicrobial peptides contribution to host defence (Mookherjee et al., 2020), and immunomodulatory peptides in the perspective of vaccine design (Pavlicevic et al., 2022) are some current directions motivating their study at a fundamental level. Due to their specific features, peptides have also gained interest as therapeutical agents (Muttenthaler et al., 2021), particularly to target protein-protein interactions (Cabri et al., 2021). They are also considered as having interest in the development of new functional biomimetic materials (Levin et al., 2020). Peptides have limitations though, as they can be highly flexible (Apostolopoulos et al., 2021), which motivate efforts to understand and predict their conformational energy landscapes.

Structure prediction of polypeptides with amino acid lengths up to 40 amino acids in aqueous solution can be performed by a series of methods including machine-learning approaches such as AlphaFold2 (Jumper et al., 2021), TrRosetta (Du et al., 2021), and APPTEST (Timmons and Hewage, 2021). Looking at AlphaFold2, which revolutionized structure prediction of single folded domain to a root-mean-square deviation (RMSD) accuracy of 0.2 nm, its capability lies on machine learning based on protein data bank (PDB) (Rose et al., 2012) templates, multiple sequence alignments, co-evolution rules and sophisticated algorithms to predict local backbone and side conformations, and side chain - side chain contact probability within distances bins. AlphaFold2 builds the protein by energy minimization using a protein-specific energy potential.

TrRosetta is basically similar to AlphaFold2. It builds the protein structure based on direct energy minimizations with a restrained Rosetta. The restraints include inter-residue distance and orientation distributions predicted by a deep neural network. Homologous templates are included in the network prediction to improve the accuracy further.

APPTEST also uses machine learning on the PDB structures with a chain length varying between 5 and 40 amino acids. APPTEST derives Cα-Cα and Cβ-Cβ distance restraints, and backbone dihedral restraints that are input of simulated annealing and energy minimization.

Other methods which are accessible by WEB-servers or can be downloaded include Rosetta (Bonneau et al., 2001), I-TASSER (Zhang, 2008), PepStrMod (Singh et al., 2015) and PEPFOLD (Shen et al., 2014; Lamiable et al., 2016). Rosetta is a fragment-assembly approach based on Monte Carlo simulation, a library of predicted nine and then three residues, and a coarse-grained model, followed by all-atom refinement. I-TASSER is a hierarchical approach that identifies structural templates from the PDB by multiple threading approaches, with full-length atomic models constructed by iterative template-based fragment assembly simulations.

The PEPstrMOD server predicts the tertiary structure of small peptides with sequence length varying between 7 and 25 residues. The prediction strategy is based on the realization that β-turn is an important feature of small peptides. Thus, the method uses both the regular secondary structure information predicted from PSIPRED and the β-turns information predicted from BetaTurns. The structure is further refined with energy minimization and molecular dynamic simulations.

PEP-FOLD2 is a fast accurate structure peptide approach based on the prediction of a profile of the structural alphabet of four amino acid lengths along the sequence, and a chain growth method based on the coarse-grained sOPEP2 model followed by Monte Carlo steps. It should be noted that PEP-FOLD2 is not free of learning as it uses an Support Vector Machine predictor relying on multiple sequence alignment. Of practical interest, during the time of this study, we could not access the APPTEST and PepStrMod servers. Also, TrRosetta cannot be applied to sequences with 10 amino acids.

Overall, all these methods generate good models for well-structured peptides at pH 7 in aqueous solution because most structures deposited in the PDB from nuclear magnetic resonance (NMR) and X-ray diffraction experiments were determined at neutral pH, and the PDB contains close to 200,000 structures as of 30 October 2022.

These methods face, however, two current limitations: correct conformational ensemble sampling of intrinsically disordered peptides or proteins (IDPs) which lack stable secondary and tertiary structures, and accurate conformational ensemble prediction of peptides as a function of pH and salt conditions. The first issue has motivated the development of new force fields, such as CHARMM36m-TIP3P modified (Huang et al., 2017), AMBER99SB-DISP (Robustelli et al., 2018) and many others (Nguyen and Derreumaux, 2020). The current approach to address the impact of pH variations is to perform your own extensive molecular dynamics and replica exchange molecular dynamics simulations at your desired pH. Alternatively one can use pH-replica exchange molecular dynamics using a discrete protonation method (Sabri Dashti et al., 2012) or all-atom and coarse-grained continuous constant pH molecular dynamics (CpHMD) methods (Barroso da Silva et al., 2016; Huang et al., 2016; Aho et al., 2022). Accurate and fast peptide structure predictions at different pH and salt conditions are the objectives of the present study.

The organization of this paper is as follows. In section 2, we present an extension of the coarse-grained, sOPEP2, force field to integrate Debye-Hückel charge interactions as a function of pH and salt concentrations. Next, we present the TrRosetta, AlphaFold2 and PEP-FOLD with and without Debye-Hückel protocols and the analysis methods. In section 3, we present the results of structure predictions of six poly-charged peptides as a function of pH and compare them to experimental circular dichroism (CD) data, and the predicted models obtained by TrRosetta and AlphaFold2. The charged polypeptides are particularly interesting to assemble the sOPEP2 interactions and the Debye-Hückel charge interactions. This is followed by the prediction of 15 ordered peptides, which have NMR structures resolved at a pH varying from 2 to 8. We finish this section on the prediction of four peptides for which low-resolution experimental data and topological descriptions are available. Finally section 4 summarizes our findings.

2 Methods

2.1 The sOPEP2 force field

The sOPEP2 potential, to be used in a discrete space, originates from the OPEP potential which uses an explicit representation of the backbone (N, H, Cα, N and H atoms) and one bead for each side chain, whose position to Cα and Van der Walls radius depend on the amino acid type (Maupetit et al., 2007; Sterpone et al., 2014). The sOPEP2 is expressed as a sum of local, non-bonded and hydrogen-bond (H-bond) terms, with all parameters described in (Binette et al., 2022).

Since the geometry in PEP-FOLD is mainly imposed by the superimposition of the discrete structural alphabet (SA) letters, the local contributions are restricted to a simple flat-bottomed quadratic potential to described the energy associated with dihedral angles ϕ and ψ, described by:where ϕ_{0_sc_i} = ϕ within the interval and ϕ_{0_sc_i} = min(ϕ − ϕ_{low_sc_i}, ϕ − ϕ_{high_sc_i}) outside of the interval ϕ_{low_sc_i} and ϕ_{high_sc_i} are specific to each amino acid type (Binette et al., 2022).

Non-bonded interactions corresponding to repulsion/attraction effects are described using the Mie potential (Mie, 1903) given by:where ϵ_ij is the potential depth and is the position of the potential minimum function of atomic types for i and j. The combination of exponents, n and m, gives the relationship between the position of the potential minimum (r⁰) and the position where it is zero (gR0):

Hydrogen bonds are considered explicitly, using a combination of two types of contributions:

where corresponds to the two-body contributions of all hydrogen bonds between residue (i) and residue (j), characterized by the hydrogen/acceptor distance r_ij and the donor/hydrogen/acceptor angle α_ij:where σ = 0.18 nm is the position of the potential minimum and ϵ is the potential depth. We distinguish between α-helix-like hydrogen bonds defined by O(i)-H(i+4) and other hydrogen bonds. Hydrogen bonds between a pair of residues separated by less than four amino acids are not considered.

involves four-body interactions involving pairs of hydrogen bonds (between residues (i) and (j) and residues (k) and (l)), so as to stabilize secondary structure motifs. The cooperation energy is given by the following equations:

2.2 Debye-Hückel charge interactions

The new sOPEP version introduces the possibility to consider pH-dependent charge interactions, using the Debye-Hückel formulation (Debye and Hückel, 1923).where q_i and q_j correspond to the charge of particles i and j, j > i + 1, respectively, r_ij is the distance between the particles, l_DH is the Debye length that depends of the ionic strength of the solvent, and ϵ(r_ij) is the dielectric constant that depends on the distance between the charges:

where D_w is the dielectric constant of water, D_p is the dielectric constant inside a protein, and s is the slope of the sigmoidal function. In practice, we used values of 78, 2 and 0.6 for D_w, D_p and s, respectively, as stated in (Iwaoka et al., 2020).

Since the sOPEP representation does not include all-atom side chains, but charges associated with particles of heterogeneous sizes, it is necessary to shift the energy curve to have energy values compatible with those of the Mie formulation. For each pair of particle, we shifted the distance using:

and we evaluated using except for ϵ(r), where the unshifted distance is used.

Shift values were adjusted for r such as E_{vdw_r} = k, , as illustrated in Figure 1. In practice, we found that values of k on the order of 4 kcal/mol are convenient, and the Debye-Hückel energy was truncated to to avoid redundancy with the Mie potential. Also, as sOPEP2 side-chain side-chain potential already includes some of the interactions between the charged residues, the Mie potential is restricted to only the repulsive part for charged side chains.

FIGURE 1

Charges were assigned to particles depending on the pH using pKa values of 3.9, 4.2, 6.0, 10.5 and 12.5 for ASP, GLU, HIS, LYS and ARG side chains, respectively, and 9.0 and 2.0 for the N-terminal α-ammonium and C-terminal α-carboxyl groups, respectively. Note that it is possible to consider blocking the extremities using acetyl and N-methyl on the N-terminus and C-terminus groups, respectively, in which case no charge is assigned to the extremity.

Finally, we have considered weighting differently the electrostatic contributions depending on the separation of the amino acids in the sequence. In our experience, best results were obtained using a weight of 10 for residue separation of less than 7 amino acids and a weight of 2, otherwise.

2.3 PEP-FOLD, TrRosetta and AlphaFold2 protocols and analysis

Our validation test set includes a total of 25 peptides as described in Section 3. For each peptide, we performed two PEP-FOLD simulations, one TrRosetta simulation which uses PDB templates and homologous sequences, and one AlphaFold2 simulation in its standard version using three recycles, template information, and AMBER refinement. Both TrRosetta and AlphaFold2 simulations return five models that we considered equiprobable. The PEP-FOLD simulations without Debye-Hückel (referred to as PF-noDH), and with Debye-Hückel (PF-DH) generated 200 models each. We selected the representative models of the five best clusters identified among the 200 generated models based on their rankings using sOPEP2 energies - i.e. the standard PEP-FOLD model selection - for PF-noDH and the sum of sOPEP2 and Debye-Hückel energies for PF-DH.

We have considered 15 peptides for which a PDB structure is available. These correspond to peptides previously studied during PEP-FOLD development and new peptides with their structures released after 1 September 2019, and determined in pure aqueous environment. The predicted models of the 15 peptides were evaluated by computing the CAD-score (Olechnovič et al., 2013). The reported CAD-score corresponds to the largest value of the cross CAD-scores between the five predicted models and all NMR structures. Following our previous work, if the CAD-score calculated on the backbone atoms is 0.60, the model is associated with largely correct secondary structure prediction, otherwise if it is 0.65 the model is correct in terms of secondary and tertiary structures. For the poly-charged peptides, we also computed their secondary structure contents using STRIDE program (Frishman and Argos, 1995). For the four sequences free of any NMR structure, we compared their predicted and experimental topologies.

3 Results and discussion

3.1 Predicted models of poly-charged peptides

For the simulations of the six poly-charged peptides, namely (EK)15, (EK)5, (H)30, (E)15, (K)15 and (R)25, we calculated the alpha-helix, coil and turn contents averaged over the five models of each method and compared with circular dichroism (CD) experiments. It is to be noted that by default TrRosetta, AlphaFold2 and PF-noDH only perform simulations at neutral pH. CD values are not available at all pH varying from 3 to 13. We report, however, on the pH-dependent conformations using PF-DH. Results are summarized in Table 1.

TABLE 1

	Experiment			PF-noDH			PF-DH			AlphaFold2			TrRosetta
	Exp. Blck	pH	CD	α	Coil	β-turn	α	Coil	β-turn	α	Coil	β-turn	α	Coil	β-turn
		3					28	69	3
(EK)15	-	7.4	β-turn	94	6	0	0	85	15	97	3	0	97	3	0
		11					24	68	8
		3					0	100	0
(EK)5	-	7.4	coil	90	10	0	0	68	32	54	10	36	68	26	6
		11					0	78	22
(H)30	-	5	coil				0	100	0
		7.4	aggregation	75	5	20		95	5	0	81	19	97	3	0
(K)15	Ace, Nmt.	3.6	0% α				0	100	0
		7.4	0% α	93	7	0	0	100	0	73	8	19	93	7	0
		11.2-13	83.7% α				93	7	0
		3–3.6	ND-42% α				93	7	0
(E)15	Ace, Nmt.	4.2	75% α				93	7	0
		7.4	0% α	93	7	0	0	100	0	93	7	0	93	7	0
(R)25	-	5.7	50% coil				0	100	0
		7.4	ND	96	4	0	0	100	0	77	4	19	96	4	0
		11.3	50% coil 15%α				0	100	0
		13					96	0	0

Structural impact of pH variation on poly-charged peptides.

Results are presented for PEP-FOLD, without and with Debye-Hueckel (PF-noDH, PF-DH), AlphaFold2 and TrRosetta. Experimental information about the blocking of extremities (Exp. Blck.) using acetyl (Ace) or N-methyl (Nmt), pH and Circular Dichroism (CD) data, is reported. ND, stands for not determined.

TrRosetta, AlphaFold2 and PF-noDH have a very high propensity to report alpha-helical conformations for the six polypeptides at pH 7.4, the exception being (H)30, with alpha-content varying from 54% to 97%, while CD displays only coil or beta-turn signals. For instance, for (EK)5, TrRosetta reports 68% helix and 26% coil, AlphaFold2 reports 54% helix and 10% coil and PF-noDH reports 90% helix and 10% coil. Only PF-DH is able to predict the CD coil character of (EK)5 with 68% coil and 32% turn.

PF-DH is the single method to predict 85% coil and 15% turn consistent at pH 7.4 with the beta-turn CD signal of (EK)15 (Smith et al., 2020), and PF-DH predicts 100% coil at pH 5 consistent with the coil CD signal of (H)30 (Jesus, 2020). There is strong experimental evidence that (H)30 polymerizes at pH 7.4 forming beta-sheets. At this pH, PF-noDH and TrRosetta predict strong helical conformations, while PF-DH and AlphaFold2 predict a random coil, with contents of 95% and 81%, respectively.

The polypeptides (K)15 and (E)15 are particularly interesting because the alpha-helix content changes inversely with the pH. As observed by CD, the helical content of (K)15 increases with pH, while the helical content of (E)15 decreases with pH (Batys et al., 2020). (K)15 have 0% helix at pH 3.6 and 83.7% helix at pH 11–13 by CD. PF-DH finds 0% helix at pH 3.6 and 93% helix at pH 11–13 (Figure 2). In contrast to (K)15, (E)15 (Figure 3) have 42% helix at pH 3.6 and 0% helix at pH 11–13 by CD. PF-DH finds 93% helix at pH 3.6 and 100% coil at pH 11–13.

FIGURE 2

FIGURE 3

The conformation ensemble of (R)25 is predicted to have 50% coil and 31% beta-sheet at pH 5.7 and have 51% coil and 21% beta-sheet at pH 11.3 by CD (Morga et al., 2022). PF-DH predicts 100% coil, independently of the pH values. Its performance is however much better than those of PF-noDH, AlphaFold2 and TrRosetta which predict a high helical signal varying from 77% to 96%.

Overall, the structure predictions of the six polypeptides at neutral pH (7.4) give quite different contents of the secondary structure using PF-noDH and PF-DH, with PF-noDH behaving and failing like AlphaFold2 and TrRosetta predictions. This result emphasizes the role of the Debye-Hückel charged-charged interactions when treating poly-charged peptides. The results also demonstrate that the learning stage of the local conformations in PEP-FOLD performed from structures at neutral pH can be counterbalanced by the force field, making possible to explore new conformations depending on the pH. In contrast, AlphaFold2 and TrRosetta rely on homologous structures and multiple sequence alignements. Since neither is available for poly-charged peptides, it is normal for both predictors to fail. But surprisingly, the LDDT (local distance difference test) metric predicted by both methods is, on average, very high (80%) for all amino acids of the six poly-charged peptides.

It is important to emphasize that in this study, we assume the standard pka values of charged amino acids irrespective of the amino acid composition of the peptides and the conformations of the peptides. This is a strong limitation of our current approach. Determining the pka values of charged amino acids in protein structures has motivated the development of many theoretical methods (Sabri Dashti et al., 2012; Barroso da Silva et al., 2016; Huang et al., 2016; Aho et al., 2022). To illustrate the variation of the pka values, we used the H++ server which is based on classical continuum electrostatics and basic statistical mechanisms (Anandakrishnan et al., 2012). Using (K)15, we found pka values ranging from 10.1 to 9.4 (versus 10.5 in our model); using (R)25, we found pka ranging from 9.6 to 11.6 in one conformation, and from 10.9 to 11.7 in another conformation (versus 12.5 in our model), and using (H)30, we found pka variations from 4.7 to 6.3 (versus 6.0 in our model). Clearly this change of pka of the amino acids will impact the equilibrium conformations of PF-DH.

3.2 Predicted models of polypeptides with NMR structures

The experimental information of each well-ordered peptide, given in Table 2, includes the amino acid length varying from 8 to 35 amino acids, the number of NMR models, the WDC (well-defined rigid core) according to the PDB, the topology, the pH varying from 4.3 to 7, the ionic strength varying from 0 to 150 mM NaCl, the blocking of the extremities and the amino acid sequence.

TABLE 2

Target	L	# mod.	WDC	Topo.	pH	ion.Str	Exp. Blck.	Sequence
6mi9	19	10	3–19	α distort.	4.3	0	Nmet.	PMARNKILGKILRKIAAFK
6j9p	12	10	2–12	α	5	0	-	RRLIRLILRLLR
1fsd	28	41	3–25	β₂α	5	0	-	QQYTAKIKGRTFRNEKELRDFIEKFKGR
1j4m	14	1	ND	β₂	5	0	-	RGKWTYNGITYEGR
1le1	12	20	2–11	β₂	5.5	0	Nmet.	SWTWENGKWTWK
6nm3	8	5	ND	α-like	5.8	0	Nmet.	RKIWWWWL
6svc	35	20	2–35	β₃	6	150	-	SKLPPGWEKRMSRNSGRVYYFNHITNASQFERPSG
2evq	12	43	2–10	β₂	6	20	-	KTWNPATGKWTE
1egs	9	20	1–9	β₂-like	6.5	0	Ace, Nmet.	TKSAGGIVL
6r2x	25	15	9–20	Cα	6.5	148	-	FETLRGDERILSILRHQNLLKELQD
7b2f	31	20	6–21	αCα	6.5	100	-	MNNNELTSLPLAERKRLLELAKAAKLSRQHY
6s0n	9	20	ND	α-like	6.8	1	-	QDVNTAVAW
7li2	22	20	1–22	β₂-like	7	0	-	AGTMRVTYPDGQKPGQSDVEKD
1wbr	17	32	1–16	αC	7	0	Ace, Nmet.	QAERMSQIKRLLSEKKT
1pgbF	16	1	ND	β₂	7	0	-	GEWTYDDATKTFTVTE
pep17	17	-		α	2	0	-	ETGTKAELLAKYEATHK
pep38	38	-		αTα	3.6	20	-	DWLKARVEQELQALEARGTDSNAELRAMEAKLKAEIQK
pep10	11	-		β₂	4.3	0	-	IYSNSDGWTWT
tau fragment	17	-		β₂	7	0	-	DNIKHVPGGGSVQIVYK

Peptide set.

For each peptide with an experimental structure available we specify its PDB, identifier (PDB), size (L), the number of NMR, models available (# mod.), its well defined core according to the PDB (WDC), its topology (topo.); and the experimental conditions including the pH, the ionic strenght (ion. Str.) and the presence of extra groups to block the N terminus (acetyl) and C terminus (N methyl) (Exp. Blck.), and the amino acid sequence. Four additional peptides without deposited structures but for which information exists in the literature are reported at the bottom of the table.

Table 3 reports on the CAD-scores using the full sequences and the rigid cores of each of the 15 peptides using the four methods. Note we give the results of PF-noDH, AlphaFold2 and TrRosetta, because these methods which are pH independent are used irrespective of the experimental pH conditions.

TABLE 3

	Exp.		PF-noDH			PF-DH			AlphaFold2			TrRosetta
Target	Topo	pH	Topo	Full	WDC	Topo	Full	WDC	Topo	Full	WDC	Topo	Full	WDC
6mi9	α	4.3	α	0.85	0.86	α, α-like	0.84	0.86	α	0.83	0.86	α	0.84	0.84
6j9p	α	5	α	0.75	0.77	Cα	0.70	0.71	α	0.74	0.76	α	0.74	0.75
1fsd	β₂α	5	αCα	0.62	0.63	β2α	0.66	0.69	β₂α	0.73	0.76	β₂α	0.71	0.73
1j4m	β₂	5	β₂	0.67	0.67	β₂	0.76	0.76	β₂	0.74	0.74	β₂	0.73	0.73
1le1	β₂	5.5	β₂	0.76	0.8	β₂	0.75	0.75	β₂	0.83	0.87	β₂	0.8	0.79
6nm3	α-like	5.8	α	0.72	0.72	α,unstr	0.73	0.73	coil	0.59	0.59	-	-	-
6svc	β₃	6	β₃	0.66	0.66	β₃	0.70	0.70	β₃	0.79	0.8	β₃	0.78	0.79
2evq	β₂	6	β₂	0.84	0.86	β₂	0.82	0.86	β₂	0.89	0.93	β₂	0.8	0.79
1egs	β₂-like	6.5	β₂, β₂-like	0.71	0.71	β₂-like	0.70	0.70	coil	0.55	0.55	-	-	-
6r2x	Cα	6.5	Cα	0.75	0.9	Cα	0.74	0.91	α	0.8	0.92	Cα	0.8	0.9
7b2f	αCα	6.5	αC, α	0.77	0.82	αCα	0.79	0.84	αCα	0.79	0.84	αCα	0.77	0.83
6s0n	α-like	6.8	α	0.76	0.76	α	0.79	0.78	coil	0.67	0.67	-	-	-
7li2	β₂-like	7	β₂, β₂-like	0.62	0.62	β₂, β₂-like	0.64	0.64	coil	0.6	0.6	coil	0.58	0.58
1wbr	αC	7	α	0.69	0.7	α	0.70	0.70	α	0.69	0.7	α	0.68	0.69
1pgbF	β₂	7	β₂	0.77	0.77	β₂	0.79	0.78	β₂, ext − untr	0.83	0.83	β₂	0.91	0.91
MEAN				0.73	0.75		0.74	0.76		0.74	0.76		0.76	0.78
STDEV				0.07	0.09		0.06	0.08		0.10	0.12		0.08	0.09
MEDIAN				0.75	0.76		0.74	0.75		0.74	0.76		0.77	0.78

Performance prediction for structured peptides.

For each structure, we report a short description of the topology of the 5 best models, and the CAD-score values (see methods) obtained for PEP-FOLD without and with Debye-Huckel (PF-noDH, PF-DH), AlphaFold2 and TrRosetta. Note that TrRosetta is not functional for amino acid lengths 10.

The first striking result is that (the mean, standard deviation and median) values of the CAD-scores averaged over the 15 peptides are nearly identical for the four methods using both the full sequences or the rigid cores. They reach (0.73, 0.07, 0.75) for PF-noDH, (0.74, 0.10, 0.74) for AlphaFold2, (0.74, 0.06, 0.74) for PF-DH and (0.76, 0.08, 0.77) for TrRosetta using the full sequences. Similar trends are observed considering the well-defined rigid cores, the average CAD-scores being of 0.75, 0.76, 0.76 and 0.78 for PF-noDH, PF-DH, AlphaFold2 and TrRosetta, respectively.

The second result is that PF-noDH and PF-DH do not predict any low quality models (CAD-score 0.6), while AlphaFold2 produces CAD-scores of 0.59, 0.55 and 0.6 for the three targets 6nm3,1egs and 7li2 (Figures 4–6, respectively). It has to be noted that the structures of these three peptides were solved at pH 5.8, 6.5 and 7. This low score results in differences between the experimental and predicted topologies. Experimentally, 6nm3 adopts a helical-like conformation, 1egs adopts a beta2-like conformation and 7li2 adopts a beta-2 like conformation. Of note, a beta-2 like conformation has the topology of a beta-hairpin but lacks the H-bond network.

FIGURE 4

FIGURE 5

FIGURE 6

For these three systems, AlphaFold2 predicts an extended-unstructured conformation. The 7li2 target is also problematic for TrRosetta, as it is the single system with a CAD-score 0.6, namely 0.58 leading to an extended-unstructured conformation. Inversely, TrRosetta is the best method to predict the beta-hairpin of 1pgbF (Barducci et al., 2011) with a CAD-score of 0.91 versus 0.83 with AlphaFold2 and 0.79 with PF-DH.

The third result is related to the performance of PF-DH with respect to PF-noDH, which provides evidence that the weights of the Debye-Huckel salt bridge interactions are consistent with the weights of sOPEP2 interactions. It was far from being evident that the addition of charges at extremities and charged amino acids in the core of the sequences would not change the quality of the models for pH varying between 2.9 and 7. The number of titratable amino acids varies from 1–2 (1le1, 1egs - 6nm3, 6evq), 4 for 1j4m, 5 for 6j9p and 1pgbF, 6 for 6mi9, 7 for 1wbr and 7li2, 9 for 6r2x, 10 for 6svc and 7b2f, to 12 for 1fsd. The results also show that the pH-independent PEP-FOLD version and the pH-dependent PEP-FOLD version perform similarly for peptides containing charged, hydrophilic and hydrophobic amino acids.

Finally, using NMR structures as a reference, a very recent study benchmarked the accuracy of AlphaFold2 in predicting 588 peptide structures between 10 and 40 amino acids, including soluble peptides, membrane-associated peptides, and disulfide-rich peptides (McDonald et al., 2023). Although the study ignores pH conditions and the presence of the membrane, AlphaFold2 can be used for the modeling of peptide structures anticipated to have a well-defined secondary structure. AlphaFold2 is particularly successful in the prediction of alpha-helical membrane-associated peptides and disulfide-rich peptides, but also shows some shortcomings in predicting phi and psi angles. It was found that AlphaFold2 performs at least as well if not better than TrRosetta and PEP-FOLD using our 2016 set of parameters.

3.3 Predicted models of polypeptides without any NMR structures

The last four peptides have been discussed in literature in terms of topological features without delivering any NMR structure. Their sequences are given at the bottom of Table 2.

Two peptides are rather well described by all four methods. Pep17 has been shown as a stable monomeric helix at pH two using CD and NMR experiments (Bradley et al., 1990). PF-noDH, PH-DH at pH 2 and AlphaFold2 predict a helical conformation with a frayed N-terminus, while TrRosetta predicts a full helical conformation (Figure 7). Pep38 determined experimentally as a helix-turn-helix at pH 3.6 (Fezoui et al., 1997) is also well reproduced by the four methods.

FIGURE 7

There are two cases, where AlphaFold2 and TrRosetta fail to produce the experimental data. The first peptide is pep10 which is described experimentally by an ensemble of distinct transient beta-hairpins at pH 4.3(Alba et al., 1997). It is described as an unstructured turn-like conformation by TrRosetta (Figure 8D), and an ensemble of extended and beta2-like conformations by AlphaFold2 (Figure 8C). In contrast, PF-noDH and PF-DH predict well a beta-hairpin conformation (Figures 8A,B).

FIGURE 8

The second peptide is the tau fragment encompassing residues 295–306 containing the aggregation-prone PHF6 motif (306–311). Using cross-linking mass-spectrometry, ab initio Rosetta (Ovchinnikov et al., 2018), and CS-Rosetta which leveraged available chemical shifts (Lange et al., 2012) for the tau repeat spanning residues 243–365, the tau fragment 295–306 was predicted as a beta-hairpin at pH 7 (Chen et al., 2019). PF-noDH and PF-DH predict the same conformation (Figures 9A, B). In contrast, AlphaFold2 predicts extended conformations (Figure 9C), and surprisingly TrRosetta finds a random coil conformation (Figure 9D).

FIGURE 9

Overall, this small set of peptides provides evidence of some limitations of AlphaFold2 and TrRosetta when the target does not have an homologous sequence in the PDB.

4 Conclusion

Integrating pH variation effects to a coarse-grained model, where the side chains are represented by one single bead, is an important step toward accurate polypeptide structure prediction in aqueous solution, as coarse-graining with various granularities (de Vries and Zacharias, 2013; Sieradzan et al., 2022), enhance sampling. This task has been performed by combining a Debye-Hückel formalism for charged - charged side chain interactions and the sOPEP2 potential. By using a total of 25 peptides of amino acid lengths varying between 7 and 38 amino acids, this study provides evidence that PF-noDH, PF-DH, AlphaFold2 and TrRosetta perform similarly on peptides deposited in the Protein data Bank, but PF-DH outperforms the two recent machine-learning methods for poly-charged peptides, and peptides for which homologous sequences are not deposited in the PDB. Of note, our new formulation takes into account the impact of salt concentration variations, but we could not identify from the literature any case reporting a conformation change upon ionic strength variation.

Overall this work is one step towards peptide structure prediction in mimicking in vivo conditions. We are currently working on IDP’s in aqueous solution and de novo structure prediction of peptides at the surface of two-dimensional cell membranes.

Statements

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author contributions

PT, and PD contributed to conception and design of the study. PT performed the PEP-FOLD implementation and ran the PEP-FOLD and Colabfold (AlphaFold2) simulations. PD ran the TrRosetta simulations. PT and PD performed the analyses. PT, and PD wrote sections of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.

Funding

This work was supported by the “Initiative d’Excellence” program from the French State, Grant “DYNAMO,” ANR-11-LABX-0011, and by INSERM U1133 recurrent funding.

Acknowledgments

The authors thank Andrew Doig for helpful discussions about pKa.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1
AhoN.BuslaevP.JansenA.BauerP.GroenhofG.HessB. (2022). Scalable constant pH molecular dynamics in GROMACS. J. Chem. Theory Comput.18, 6148–6160. 10.1021/acs.jctc.2c00516
- CrossRef
- Google Scholar
2
AlbaE. D.RicoM.JiménezM. A. (1997). Cross-strand side-chain interactions versus turn conformation in β-hairpins. Protein Sci.6, 2548–2560. 10.1002/pro.5560061207
- CrossRef
- Google Scholar
3
AnandakrishnanR.AguilarB.OnufrievA. V. (2012). H++ 3.0: automating pK prediction and the preparation of biomolecular structures for atomistic molecular modeling and simulations. Nucleic Acids Res.40, W537–W541. 10.1093/nar/gks375
- CrossRef
- Google Scholar
4
ApostolopoulosV.BojarskaJ.ChaiT.-T.ElnagdyS.KaczmarekK.MatsoukasJ.et al (2021). A global review on short peptides: Frontiers and perspectives. Molecules26, 430. 10.3390/molecules26020430
- CrossRef
- Google Scholar
5
BarducciA.BonomiM.DerreumauxP. (2011). Assessing the quality of the OPEP coarse-grained force field. J. Chem. Theory Comput.7, 1928–1934. 10.1021/ct100646f
- CrossRef
- Google Scholar
6
Barroso da SilvaF. L.PasqualiS.DerreumauxP.DiasL. G. (2016). Electrostatics analysis of the mutational and pH effects of the n-terminal domain self-association of the major ampullate spidroin. Soft Matter12, 5600–5612. 10.1039/C6SM00860G
- CrossRef
- Google Scholar
7
BatysP.MorgaM.BonarekP.SammalkorpiM. (2020). pH-induced changes in polypeptide conformation: Force-field comparison with experimental validation. J. Phys. Chem. B124, 2961–2972. 10.1021/acs.jpcb.0c01475
- CrossRef
- Google Scholar
8
Ben-ShushanS.MillerY. (2021). Neuropeptides: Roles and activities as metal chelators in neurodegenerative diseases. J. Phys. Chem. B125, 2796–2811. 10.1021/acs.jpcb.0c11151
- CrossRef
- Google Scholar
9
BinetteV.MousseauN.TufferyP. (2022). A generalized attraction–repulsion potential and revisited fragment library improves PEP-FOLD peptide structure prediction. J. Chem. Theory Comput.18, 2720–2736. 10.1021/acs.jctc.1c01293
- CrossRef
- Google Scholar
10
BonneauR.StraussC. E.BakerD. (2001). Improving the performance of Rosetta using multiple sequence alignment information and global measures of hydrophobic core formation. Proteins Struct. Funct. Bioinforma.43, 1–11. 10.1002/1097-0134(20010401)43:1<1::aid-prot1012>3.0.co;2-a
- CrossRef
- Google Scholar
11
BradleyE. K.ThomasonJ. F.CohenF. E.KosenP. A.KuntzI. D. (1990). Studies of synthetic helical peptides using circular dichroism and nuclear magnetic resonance. J. Mol. Biol.215, 607–622. 10.1016/S0022-2836(05)80172-X
- CrossRef
- Google Scholar
12
CabriW.CantelmiP.CorbisieroD.FantoniT.FerrazzanoL.MartelliG.et al (2021). Therapeutic peptides targeting PPI in clinical development: Overview, mechanism of action and perspectives. Front. Mol. Biosci.8, 697586. 10.3389/fmolb.2021.697586
- CrossRef
- Google Scholar
13
ChenD.DromboskyK. W.HouZ.SariL.KashmerO. M.RyderB. D.et al (2019). Tau local structure shields an amyloid-forming motif and controls aggregation propensity. Nat. Commun.10, 2493. 10.1038/s41467-019-10355-1
- CrossRef
- Google Scholar
14
de VriesS.ZachariasM. (2013). Flexible docking and refinement with a coarse-grained protein model using ATTRACT: Flexible Protein-Protein Docking and Refinement. Proteins Struct. Funct. Bioinforma.81, 2167–2174. 10.1002/prot.24400
- CrossRef
- Google Scholar
15
DebyeP.HückelE. (1923). Zur Theorie der Elektrolyte. I. Gefrierpunktserniedrigung und verwandte Erscheinungen. Phys. Z.24, 305.
- Google Scholar
16
DuZ.SuH.WangW.YeL.WeiH.PengZ.et al (2021). The trRosetta server for fast and accurate protein structure prediction. Nat. Protoc.16, 5634–5651. 10.1038/s41596-021-00628-9
- CrossRef
- Google Scholar
17
FetissovS. O.LegrandR.LucasN. (2019). Bacterial protein mimetic of peptide hormone as a new class of protein-based drugs. Curr. Med. Chem.26, 546–553. 10.2174/0929867324666171005110620
- CrossRef
- Google Scholar
18
FezouiY.ConnollyP. J.OsterhoutJ. J. (1997). Solution structure of αtα, a helical hairpin peptide of de novo design. Protein Sci.6, 1869–1877. 10.1002/pro.5560060907
- CrossRef
- Google Scholar
19
FrishmanD.ArgosP. (1995). Knowledge-based protein secondary structure assignment. Proteins Struct. Funct. Genet.23, 566–579. 10.1002/prot.340230412
- CrossRef
- Google Scholar
20
HuangY.ChenW.WallaceJ. A.ShenJ. (2016). All-atom continuous constant pH molecular dynamics with particle mesh ewald and titratable water. J. Chem. Theory Comput.12, 5411–5421. 10.1021/acs.jctc.6b00552
- CrossRef
- Google Scholar
21
HuangJ.RauscherS.NawrockiG.RanT.FeigM.De GrootB. L.et al (2017). CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat. methods14, 71–73. 10.1038/nmeth.4067
- CrossRef
- Google Scholar
22
IwaokaM.YoshidaK.ShimosatoT. (2020). Application of a distance-dependent sigmoidal dielectric constant to the REMC/SAAP3D simulations of chignolin, trp-cage, and the G10q mutant. Protein J.39, 402–410. 10.1007/s10930-020-09936-7
- CrossRef
- Google Scholar
23
JesusC. N. (2020). On the self-assembly of pH-sensitive histidine-based copolypeptides. London: University College London.
- Google Scholar
24
JumperJ.EvansR.PritzelA.GreenT.FigurnovM.RonnebergerO.et al (2021). Highly accurate protein structure prediction with AlphaFold. Nature596, 583–589. 10.1038/s41586-021-03819-2
- CrossRef
- Google Scholar
25
LamiableA.ThévenetP.ReyJ.VavrusaM.DerreumauxP.TufféryP. (2016). PEP-FOLD3: Faster de novo structure prediction for linear peptides in solution and in complex. Nucleic Acids Res.44, W449–W454. 10.1093/nar/gkw329
- CrossRef
- Google Scholar
26
LangeO. F.RossiP.SgourakisN. G.SongY.LeeH.-W.AraminiJ. M.et al (2012). Determination of solution structures of proteins up to 40 kDa using CS-Rosetta with sparse NMR data from deuterated samples. Proc. Natl. Acad. Sci.109, 10873–10878. 10.1073/pnas.1203013109
- CrossRef
- Google Scholar
27
LevinA.HakalaT. A.SchnaiderL.BernardesG. J. L.GazitE.KnowlesT. P. J. (2020). Biomimetic peptide self-assembly for functional materials. Nat. Rev. Chem.4, 615–634. 10.1038/s41570-020-0215-y
- CrossRef
- Google Scholar
28
MaupetitJ.TufferyP.DerreumauxP. (2007). A coarse-grained protein force field for folding and structure prediction. Proteins Struct. Funct. Bioinforma.69, 394–408. 10.1002/prot.21505
- CrossRef
- Google Scholar
29
McDonaldE. F.JonesT.PlateL.MeilerJ.GulsevinA. (2023). Benchmarking alphafold2 on peptide structure prediction. Structure31, 111–119.e2. 10.1016/j.str.2022.11.012
- CrossRef
- Google Scholar
30
MieG. (1903). Zur kinetischen Theorie der einatomigen Körper. Ann. Phys.316, 657–697. 10.1002/andp.19033160802
- CrossRef
- Google Scholar
31
MookherjeeN.AndersonM. A.HaagsmanH. P.DavidsonD. J. (2020). Antimicrobial host defence peptides: Functions and clinical potential. Nat. Rev. Drug Discov.19, 311–332. 10.1038/s41573-019-0058-8
- CrossRef
- Google Scholar
32
MorgaM.BatysP.KosiorD.BonarekP.AdamczykZ. (2022). Poly-L-arginine molecule properties in simple electrolytes: Molecular dynamic modeling and experiments. Int. J. Environ. Res. Public Health19, 3588. 10.3390/ijerph19063588
- CrossRef
- Google Scholar
33
MuttenthalerM.KingG. F.AdamsD. J.AlewoodP. F. (2021). Trends in peptide drug discovery. Nat. Rev. Drug Discov.20, 309–325. 10.1038/s41573-020-00135-8
- CrossRef
- Google Scholar
34
NguyenP. H.DerreumauxP. (2020). Structures of the intrinsically disordered Aβ, tau and α-synuclein proteins in aqueous solution from computer simulations. Biophys. Chem.264, 106421. 10.1016/j.bpc.2020.106421
- CrossRef
- Google Scholar
35
OlechnovičK.KulberkytėE.VenclovasČ. (2013). CAD-score: A new contact area difference-based function for evaluation of protein structural models. Proteins Struct. Funct. Bioinforma.81, 149–162. 10.1002/prot.24172
- CrossRef
- Google Scholar
36
OvchinnikovS.ParkH.KimD. E.DiMaioF.BakerD. (2018). Protein structure prediction using Rosetta in CASP12. Proteins Struct. Funct. Bioinforma.86, 113–121. 10.1002/prot.25390
- CrossRef
- Google Scholar
37
PavlicevicM.MarmiroliN.MaestriE. (2022). Immunomodulatory peptides—a promising source for novel functional food production and drug discovery. Peptides148, 170696. 10.1016/j.peptides.2021.170696
- CrossRef
- Google Scholar
38
RobustelliP.PianaS.ShawD. E. (2018). Developing a molecular dynamics force field for both folded and disordered protein states. Proc. Natl. Acad. Sci.115, E4758–E4766. 10.1073/pnas.1800690115
- CrossRef
- Google Scholar
39
RoseP. W.BiC.BluhmW. F.ChristieC. H.DimitropoulosD.DuttaS.et al (2012). The RCSB protein Data Bank: New resources for research and education. Nucleic acids Res.41, D475–D482. 10.1093/nar/gks1200
- CrossRef
- Google Scholar
40
Sabri DashtiD.MengY.RoitbergA. E. (2012). pH-replica exchange molecular dynamics in proteins using a discrete protonation method. J. Phys. Chem. B116, 8805–8811. 10.1021/jp303385x
- CrossRef
- Google Scholar
41
ShenY.MaupetitJ.DerreumauxP.TufféryP. (2014). Improved PEP-FOLD approach for peptide and miniprotein structure prediction. J. Chem. theory Comput.10, 4745–4758. 10.1021/ct500592m
- CrossRef
- Google Scholar
42
SieradzanA. K.CzaplewskiC.KrupaP.MozolewskaM. A.KarczyńskaA. S.LipskaA. G.et al (2022). “Modeling the structure, dynamics, and transformations of proteins with the UNRES force field,” in Protein folding. Methods in molecular biology. Editor MuñozV. (New York, NY: Springer US), 399–416. 10.1007/978-1-0716-1716-8_23
- CrossRef
- Google Scholar
43
SinghS.SinghH.TuknaitA.ChaudharyK.SinghB.KumaranS.et al (2015). PEPstrMOD: structure prediction of peptides containing natural, non-natural and modified residues. Biol. Direct10, 1–19. 10.1186/s13062-015-0103-4
- CrossRef
- Google Scholar
44
SmithJ.McMullenP.YuanZ.PfaendtnerJ.JiangS. (2020). Elucidating molecular design principles for charge-alternating peptides. Biomacromolecules21, 435–443. 10.1021/acs.biomac.9b01191
- CrossRef
- Google Scholar
45
SterponeF.MelchionnaS.TufferyP.PasqualiS.MousseauN.CragnoliniT.et al (2014). The OPEP protein model: From single molecules, amyloid formation, crowding and hydrodynamics to DNA/RNA systems. Chem. Soc. Rev.43, 4871–4893. 10.1039/C4CS00048J
- CrossRef
- Google Scholar
46
TimmonsP. B.HewageC. M. (2021). APPTEST is a novel protocol for the automatic prediction of peptide tertiary structures. Briefings Bioinforma.22, bbab308. 10.1093/bib/bbab308
- CrossRef
- Google Scholar
47
ZhangY. (2008). I-TASSER server for protein 3D structure prediction. BMC Bioinform.9, 40. 10.1186/1471-2105-9-40
- CrossRef
- Google Scholar

Summary

Keywords

peptide, structure, pH dependence, coarse grained models, prediction

Citation

Tufféry P and Derreumaux P (2023) A refined pH-dependent coarse-grained model for peptide structure prediction in aqueous solution. Front. Bioinform. 3:1113928. doi: 10.3389/fbinf.2023.1113928

Received

01 December 2022

Accepted

06 January 2023

Published

16 January 2023

Volume

3 - 2023

Edited by

Igor N. Berezovsky, Bioinformatics Institute (A∗STAR), Singapore

Reviewed by

Marianne Rooman, Université libre de Bruxelles, Belgium

Michał Koliński, Mossakowski Medical Research Institute, Polish Academy of Sciences, Poland

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Pierre Tufféry, pierre.tuffery@u-paris.fr

This article was submitted to Protein Bioinformatics, a section of the journal Frontiers in Bioinformatics

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Protein Bioinformatics

ORIGINAL RESEARCH article

A refined pH-dependent coarse-grained model for peptide structure prediction in aqueous solution

Abstract

1 Introduction