- 1Faculty of Science, Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
- 2Faculty of Science, National Centre for Biomolecular Research (NCBR), Masaryk University, Brno, Czechia
The rapid development of protein structure prediction tools has created a need for systematic performance comparisons to guide method selection for different applications, particularly given the trade-offs between computational speed and prediction accuracy. We benchmarked AlphaFold2, ESMFold, and OmegaFold using 1,337 protein chains deposited in the Protein Data Bank between July 2022 and July 2024, ensuring no overlap with training data, and evaluated predictions using Root Mean Square Deviation (RMSD), Template Modeling score (TM-score), Global Distance Test–Total Score (GDT-TS) and predicted Local Distance Difference Test (pLDDT) metrics. AlphaFold2 achieved the highest median TM-score (0.96), highest median GDT-TS (94%), and lowest median RMSD (1.30 Å), outperforming ESMFold (TM-score 0.95, GDT-TS 90%, RMSD 1.74 Å) and OmegaFold (TM-score 0.93, GDT-TS 89%, RMSD 1.98 Å), with all tools showing reduced accuracy for proteins lacking family annotations, leucine-rich repeats, and NMR-determined structures, while alignment-free methods unexpectedly excelled at de novo designed proteins. The performance differences between methods were negligible for many proteins, suggesting that faster alignment-free predictors (10–30 times faster) can be sufficient for numerous applications; we developed LightGBM classifiers using ProtBert embeddings and confidence scores that accurately predict when AlphaFold2’s computational investment is warranted, providing practitioners with actionable guidance for selecting between speed and precision in structural pipelines.
1 Introduction
All living organisms—from simple bacteria and algae to plants, fungi, animals, and humans—contain a multitude of proteins that participate in virtually every cellular process (Alberts, 2017; Cooper, 2000). These molecular machines must fold into specific three-dimensional structures, organized hierarchically at four distinct levels: from the linear sequence of amino acids (primary structure), through local folding patterns of
The field of protein structure prediction has been transformed by artificial intelligence approaches. The introduction of AlphaFold2 in 2020 marked a watershed moment, achieving near-experimental accuracy (Jumper et al., 2021). This success has spurred the development of alternative approaches, particularly language model-based predictors like ESMFold and OmegaFold that can generate predictions without requiring multiple sequence alignments (Lin et al., 2023; Wu et al., 2022). These newer methods promise faster predictions and potentially better performance on challenging targets like designed or rapidly evolving proteins.
Despite these advances, the field lacks a comprehensive comparison of these tools’ performance on truly novel proteins—structures solved after the tools’ training cutoff dates (Kovalevskiy et al., 2024). Such evaluation is crucial for understanding each method’s strengths and limitations, particularly as these tools become increasingly integrated into structural biology workflows. While the Critical Assessment of Structure Prediction (CASP) (Moult et al., 1995) and Continuous Automated Model EvaluatiOn (CAMEO) (Robin et al., 2021) provide valuable benchmarks, they are limited to participating methods and may not reflect real-world usage patterns.
Here, we present a systematic comparison of AlphaFold2, ESMFold, and OmegaFold using a dataset of over 1,300 protein structures deposited in the PDB between 2022 and 2024. Using multiple evaluation metrics including RMSD (Kufareva and Abagyan, 2012), TM-score (Zhang and Skolnick, 2004), GDT-TS (Zemla, 2003), and pLDDT (Tunyasuvunakool et al., 2021), we assess both overall performance and specific challenging cases. Our analysis reveals that while AlphaFold2 achieves the highest average accuracy, ESMFold and OmegaFold excel in particular niches, especially for proteins with limited homology information. Given 10–30-fold speed difference between alignment-free methods and AlphaFold2, our findings help researchers assess when the faster tools may provide sufficient accuracy for large-scale structural analyses.
2 Materials and methods
2.1 Dataset
We compiled a benchmark dataset of 1,337 protein structures deposited in the Protein Data Bank (PDB) between July 2022 and July 2024. This temporal restriction ensures no overlap with training data used by AlphaFold2 (cutoff April 2020), ESMFold (June 2020), or OmegaFold (2021). The dataset contains three distinct groups: (1) single-chain monomers (980 structures), (2) small multi-chain complexes (245 structures with 2–6 chains), and (3) de novo designed proteins whose sequence does not naturally occur in any living organism (102 structures). De novo proteins were identified through PDB annotations marking them as “designed” or “synthetic construct” in the source organism field.
Structures were selected using the RCSB PDB Search API (Rose et al., 2021; Bittrich et al., 2023) with the following criteria: (i) deposition date between July 2022 and July 2024, (ii) protein-only structures without nucleic acids or oligosaccharides, (iii) chain lengths between 20 and 400 amino acids to ensure compatibility with all prediction tools, and (iv) availability of structural information in PDB format. To ensure diversity, structures within monomer and de novo protein groups were filtered to have at most 70% pairwise sequence identity.
We developed a custom PDB file parsing pipeline to extract complete amino acid sequences and experimental
Each structure was annotated with protein family classifications using UniProt and PDBe APIs to map PDB identifiers to Pfam and InterPro database entries. These annotations enable analysis of prediction tools’ performance across different protein families and structural motifs. The numbers of protein structures the dataset contained in various stages of the experiment are stated in Supplementary Table S1. The final curated dataset, including all protein sequences, is available at HuggingFace Hub repository.
2.2 Structure prediction tools
Three tools were selected for protein structure prediction: AlphaFold2, ESMFold, and OmegaFold. While alignment-based AlphaFold2 is an obvious choice, considering how widely used it is (Kovalevskiy et al., 2024), language model-based ESMFold and OmegaFold were chosen because they provide promising results with much lower requirements on time and computational power, making them more suitable for large-scale applications (Lin et al., 2023; Wu et al., 2022).
2.2.1 AlphaFold2
We used AlphaFold v2.1.1 running on the institute’s infrastructure with its monomer model and reduced database settings to optimize computational resources. The model architecture consists of two main components: (i) an Evoformer module, which processes multiple sequence alignments (MSAs) and pairwise representations through 48 transformer blocks, and (ii) a structure module that converts the refined representations into 3D coordinates through 8 equivariant transformer blocks with Invariant Point Attention. MSAs were generated using Uniref90, BFD, and MGnify databases. For each sequence, five model predictions were generated and ranked by predicted confidence, with the highest-confidence model (ranked_0.pdb) selected for evaluation.
2.2.2 ESMFold
Predictions were obtained via REST API calls to the ESM Metagenomic Atlas. ESMFold combines two components: (i) the ESM-2 protein language model with 15B parameters, pre-trained on masked sequence prediction, and (ii) a folding head consisting of 48 folding blocks that process sequence and pairwise representations. Unlike AlphaFold2, ESMFold predicts structures directly from single sequences without requiring MSA generation.
2.2.3 OmegaFold
Predictions were performed using OmegaFold v1.0 running on university computational cluster with NVIDIA A40 GPU. OmegaFold employs: (i) OmegaPLM, a 670M parameter language model trained on masked protein sequences, and (ii) a Geoformer architecture that refines the language model representations to be geometrically consistent before structure prediction. Like ESMFold, OmegaFold operates on single sequences without MSA requirements.
All predictions were made for individual protein chains, as both ESMFold and OmegaFold do not support prediction of protein complexes. While AlphaFold2 offers a multimer model, we used its monomer model to ensure fair comparison. The original dataset together with prediction outputs is available at HuggingFace Hub repository.
2.3 Evaluation metrics
We employed four complementary metrics to assess prediction quality: RMSD, measuring atomic distance deviation; TM-score, evaluating topological similarity; GDT-TS, quantifying the fraction of residues within distance thresholds after superposition; and pLDDT, reflecting model confidence.
2.3.1 Root mean square deviation (RMSD)
RMSD (1) quantifies the average distance between corresponding
where
2.3.2 Template modeling score (TM-score)
TM-score (2) (Zhang and Skolnick, 2004) evaluates the topological similarity of protein structures while accounting for protein length:
where
2.3.3 Global distance test–total score (GDT-TS)
The Global Distance Test–Total Score (GDT-TS) is a widely used metric in CASP for assessing the similarity between predicted and experimental protein structures, (Zemla, 2003). Unlike RMSD, which is sensitive to outliers, GDT-TS (3) focuses on the fraction of residues that fall within a set of distance thresholds, providing a more robust measure of overall structural agreement.
Here,
2.3.4 Predicted LDDT (pLDDT)
The predicted local distance difference test (pLDDT) is a confidence metric provided by each prediction tool. For each residue, it estimates the expected agreement between predicted and experimental structures on 0–100 scale. Scores above 90 indicate high prediction confidence. Scores above 70 suggest at least reliable backbone prediction.
For our analysis, we used the mean pLDDT across all residues in each protein chain. While pLDDT correlates with prediction accuracy, high confidence scores do not guarantee correct structure prediction, particularly for challenging targets like intrinsically disordered regions or proteins with limited homology information.
2.4 Statistical analysis and annotation
We compared these metrics across our dataset using Kruskal–Wallis tests followed by Dunn’s method with Bonferroni correction for multiple comparisons. The correlation between metrics was assessed using Spearman’s rank correlation coefficient.
Protein chains were mapped to functional annotations using UniProt and PDBe APIs. For family-specific analysis, we focused on Pfam and InterPro families with at least 10 member proteins in our dataset. The experimental method of structure determination (X-ray crystallography, cryo-EM, or NMR) was recorded for each chain to assess potential biases in prediction accuracy.
Predictions were classified as “poor” if they met any of the following criteria: average pLDDT
2.5 Implementation and availability
All preprocessing was implemented in Python using BioPython (Cock et al., 2009) for structure manipulation and tmtools for TM-score calculation (Xu and Zhang, 2010). Statistical analysis and visualization were performed in R (R Core Team, 2020). The complete dataset, including protein sequences, experimental structures, predictions, and evaluation results is available at HuggingFace Hub, https://huggingface.co/datasets/hyskova-anna/proteins. Source code and documentation are provided at GitHub, https://github.com/ML-Bioinfo-CEITEC/CAoPSPT.
3 Results
Structure predictions were attempted for 1,337 protein chains using AlphaFold2, ESMFold, and OmegaFold. During the initial run, our AlphaFold2 pipeline failed to generate a prediction for one chain (8B2M:A). A subsequent rerun of the pipeline successfully produced a prediction for this chain, indicating that the original failure was due to a transient issue in our university computing service rather than a problem with the structure itself. All chains were successfully predicted by all three tools and form the basis of our evaluation. Selected examples of predictions aligned with their experimental structures are shown in Figure 1.
Figure 1. Examples of structure predictions from AlphaFold2 (red), ESMFold (blue) and OmegaFold (yellow) aligned with corresponding experimentally determined structures (green). (a) An example of a poorly predicted structure (8P4Y:A) by AlphaFold2. (b) Structure of protein 8PTF:A showing varying prediction quality across tools.
3.1 Comparative performance analysis
All three tools demonstrated generally satisfactory performance, with AlphaFold2 achieving the highest accuracy across all metrics. AlphaFold2 predictions showed the highest median TM-score (0.96), lowest median RMSD (1.30 Å), and highest median GDT-TS (94%), followed by ESMFold (TM-score: 0.95, RMSD: 1.74 Å, GDT-TS: 90%) and OmegaFold (TM-score: 0.93, RMSD: 1.98 Å, GDT-TS: 89%). Consistently, AlphaFold2 displayed the highest confidence in its predictions with median pLDDT of 92.65, compared to 87.40 for ESMFold and 89.00 for OmegaFold (see Supplementary Figure S1). These differences were statistically significant across tools for all four metrics (Kruskal–Wallis test,
3.2 Metric correlations and their dependencies on sequence length and other factors
We observed significant correlations between prediction confidence (pLDDT) and accuracy metrics. Most notably, there was a negative correlation between average pLDDT and RMSD (Spearman’s
While low-confidence predictions rarely achieved good accuracy metrics, we found numerous cases of incorrect structures with high pLDDT scores across all tools (see Supplementary Figure S3).
Analysis of sequence length dependency also revealed interesting patterns. While RMSD showed weak correlation with sequence length, TM-score and GDT-TS displayed stronger positive associations, particularly for AlphaFold2 (TM-score:
The experimental method used for structure determination significantly influenced prediction accuracy (Supplementary Figure S4). All tools performed best on X-ray crystallography structures (median RMSD: 1.24 Å, 1.65 Å, and 1.89 Å for AlphaFold2, ESMFold, and OmegaFold, respectively) but struggled with NMR-determined structures (median RMSD: 2.31 Å, 2.89 Å, and 3.12 Å). This pattern likely reflects both the inherent flexibility of proteins amenable to NMR analysis and the predominance of X-ray structures in training data.
When comparing performance across different protein types (monomers, complexes, and de novo proteins), we observed an interesting pattern. While all tools generally performed similarly across these categories, there are two notable exceptions. First, ESMFold and OmegaFold achieved significantly lower RMSD values for de novo proteins compared to natural proteins. Statistical significance was assessed using Kruskal–Wallis tests with Dunn post hoc comparisons and Bonferroni correction (Figure 2). Second, AlphaFold2 showed a unique weakness with de novo proteins, achieving significantly lower TM-scores for these proteins compared to monomers and complexes. This suggests that language model-based tools may have an advantage in predicting structures of artificial proteins where evolutionary information is limited.
Figure 2. Dependency of average pLDDT, TM-score, RMSD, and GDT-TS on the type of protein chain being predicted. The differences between groups were tested by Kruskal–Wallis test, post hoc comparisons were done using Dunn’s method with a Bonferroni correction for multiple tests. Different letters above boxplots indicate statistically significant differences among label groups within each prediction tool (compact letter display; groups sharing a letter are not significantly different). Sample points with RMSD greater than 30 Å are omitted from the visualization for better clarity.
3.3 Analysis of prediction failures
We classified predictions as incorrect if they met any of the following criteria: average pLDDT
Figure 3. Comparison of the overlap of poorly predicted protein chains. (a) Venn diagrams show the overlap of poorly predicted chains among the three structure prediction tools (AlphaFold2, ESMFold, and OmegaFold) for each evaluation metric: average pLDDT
Analysis of protein families revealed that proteins lacking Pfam annotations were particularly challenging for AlphaFold2 but not for ESMFold or OmegaFold, highlighting the importance of evolutionary information in AlphaFold2’s predictions. Conversely, viral proteins, especially from coronavirus, were better predicted by AlphaFold2 than by the language model-based tools. All tools showed reduced accuracy for proteins containing leucine-rich repeats or von Willebrand factor A-like domains, suggesting these structural motifs pose particular challenges for current prediction methods.
The analysis of protein family associations revealed distinctive patterns in prediction accuracy. Notably, AlphaFold2 showed significantly reduced performance for proteins lacking Pfam family annotations (odds ratio = 0.67,
Certain protein families were consistently well-predicted across all tools. These included protein kinase domains (PF00069, IPR000719), the SH2 domain (IPR000980), and the NAD(P)-binding domain superfamily (IPR036291). Conversely, all tools struggled with leucine-rich repeats (IPR001611, IPR003591) and von Willebrand factor A-like domains (IPR036465), suggesting these structural motifs remain challenging for current prediction methods.
Interestingly, several protein families showed tool-specific prediction patterns. AlphaFold2 excelled at predicting viral protein families, particularly the viral RNA-dependent RNA polymerase (PF00680, IPR001205) and coronavirus-specific proteins (PF05409, IPR043503), achieving significantly better accuracy than ESMFold or OmegaFold
3.4 Prediction of structure determination success using machine learning
To assess and anticipate potential failures in structure prediction, we trained gradient boosting LightGBM models (Ke et al., 2017) separately for AlphaFold2, ESMFold, and OmegaFold. For each method, models were trained both with and without inclusion of the model-specific confidence estimate (pLDDT), resulting in six model configurations in total. Input features included ProtBert BFD sequence embeddings (Brandes et al., 2022), sequence length, experimental acquisition method, and pLDDT where applicable.
Model performance was evaluated using mean squared error (MSE) and coefficient of determination
Feature contributions were interpreted using SHAP analysis (Figure 4). Across all methods, higher pLDDT values contributed positively to predicted TM-scores, whereas shorter sequence length and experimental acquisition methods other than X-ray crystallography showed negative contributions. Selected embedding dimensions also showed consistent contributions, reflecting sequence-level patterns associated with prediction difficulty.
Figure 4. SHAP analysis of LightGBM models predicting TM-score. SHAP summary plots for LightGBM regressors trained separately for AlphaFold2, ESMFold, and OmegaFold using ProtBert sequence embeddings, sequence length, experimental acquisition method, and model-specific confidence estimates (pLDDT) as input features. Features are ordered by mean absolute SHAP value, indicating their overall influence on predicted TM-score. Each point represents an individual protein chain, colored by feature contribution.
4 Discussion
Since the beginning of this decade, structural biology and protein structure prediction fields have undergone a significant transition. Currently, there are two large projects dealing with this issue: CASP (Moult et al., 1995) and CAMEO (Robin et al., 2021). While AlphaFold2 has participated in both CASP14 and CAMEO, ESMFold has entered only CASP15, and OmegaFold has not been included in either. However, both ESMFold and OmegaFold have been subsequently evaluated on CAMEO and CASP15 datasets by independent research groups (Moussad et al., 2023; Huang et al., 2023). There are also a few publications dealing with the comparison of protein structure prediction tools, but they usually focus mainly on AlphaFold2 and similar tools (e.g., ColabFold) (Kalogeropoulos et al., 2024) or perform the evaluation on a particular set of proteins, namely, human proteins (Manfredi et al., 2024; Manfredi et al., 2025), snake venom toxins (Kalogeropoulos et al., 2024), and nanobodies (Valdés-Tresanco et al., 2023). This paper tries to increase our understanding by creating an inclusive dataset of protein structures recently added to PDB.
The key finding of this work is that AlphaFold2 outperforms ESMFold and OmegaFold on a majority of proteins in the dataset, measured by both RMSD, TM-score, and GDT-TS. When comparing the two protein language-based models, ESMFold seems to be a slightly better choice, as it produced fewer incorrect structures than OmegaFold and achieved significantly better median RMSD and TM-score. Still, the difference in performance between ESMFold and OmegaFold is much smaller compared to the gap between both of these tools and AlphaFold2.
While all three tools rarely produce a good prediction with low confidence, wrong structures with a high average pLDDT are outputted quite frequently. Our analysis revealed that prediction accuracy is influenced by various factors. All three tools performed best when predicting proteins whose experimental structure was determined by X-ray crystallography, while structures determined by NMR proved to be the most challenging. Because NMR is typically used to determine the structures of small proteins, a corresponding decrease in prediction accuracy is observed for shorter sequences. Additionally, for NMR-determined structures, evaluation against a single representative conformer from an ensemble may further contribute to the observed reduction in apparent prediction accuracy.
Interestingly, proteins without family annotations proved particularly difficult for AlphaFold2 but did not change the performance of ESMFold and OmegaFold. A possible explanation is that proteins belonging to no family lack homologs with a known structure, which AlphaFold2 could use as a template during the prediction. In contrast, ESMFold and OmegaFold do not rely on MSAs and modeling templates, so their performance remained largely unaffected.
Our analysis shows several key insights, yet certain constraints of our study must be noted. First, the dataset does not contain only proteins whose experimental structure was previously unknown but also proteins that were just recently analyzed again, usually in different conditions. This might be an advantage for AlphaFold2, which uses a reduced PDB database for template searching during the prediction process. Moreover, the whole analysis focuses only on single protein chains without the context of their interacting partners, which might be crucial for structure formation, especially in protein complexes. Additionally, speed comparisons should be interpreted with caution, as pipelines for OmegaFold and AlphaFold2 predictions with different hardware configurations, potentially affecting relative performance metrics. Last but not least, all the protein chains in the dataset have a maximum length of 400 amino acids due to using ESMAtlas API.
The performance patterns we observed reflect fundamental architectural differences between these approaches. AlphaFold2’s superior accuracy stems from leveraging evolutionary information through MSAs, but this becomes a limitation for de novo proteins where we observed reduced TM-scores. In contrast, language models learn protein grammar from sequence patterns alone, potentially capturing more general folding principles. The limited overlap in prediction failures between tools suggests complementary error modes that could be exploited through ensemble approaches, though computational costs may be prohibitive for large-scale applications.
The recent proliferation of AlphaFold3 (Abramson et al., 2024; Callaway, 2024) and its alternatives, including Chai-1 (Chai Discovery, 2024), Boltz-1 (Wohlwend et al., 2024), and HelixFold3 (Liu et al., 2024), demonstrates the community’s commitment to structure prediction. Independent benchmarks have begun evaluating these tools: FoldBench (Xu et al., 2025), evaluating 1,522 biological assemblies across nine tasks, found AlphaFold3 consistently outperforming alternatives across most categories, though all methods showed concerning failure rates exceeding 50% for antibody-antigen predictions. For protein-peptide interactions, newer models achieve dramatic improvements, with success rates of 70%–80% under stringent criteria compared to 53% for AlphaFold2-multimer, and Protenix reaching 80.8% accuracy (Zhou et al., 2025). However, as shown in (Škrinjar et al., 2025), protein-ligand predictions reveal a critical limitation: current methods largely memorize poses from training data rather than genuinely predicting novel interactions, particularly struggling with ligands not seen in their training sets. Practical deployment is being facilitated by tools like ABCFold (Elliott et al., 2025), which standardizes inputs and outputs across different methods. This proliferation of capable yet specialized tools, each with distinct strengths and limitations, reinforces our findings: optimal structure prediction requires matching tools to specific tasks based on target type, available computational resources, and accuracy requirements rather than relying on any single universal solution.
Data availability statement
The datasets generated/analyzed for this study can be found in the HuggingFace Hub repository at https://huggingface.co/datasets/hyskova-anna/proteins. The source code and documentation are provided on GitHub: https://github.com/ML-Bioinfo-CEITEC/CAoPSPT.
Author contributions
AH: Data curation, Formal Analysis, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review and editing. EM: Conceptualization, Supervision, Validation, Writing – original draft, Writing – review and editing. PŠ: Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Supervision, Visualization, Writing – original draft, Writing – review and editing.
Funding
The author(s) declared that financial support was received for this work and/or its publication. The project was supported by the OPUS LAP program of the Czech Science Foundation, project no. 23-04260L (“Biological code of knots–identification of knotted patterns in biomolecules via AI approach”). Computational resources were provided by the project “e-Infrastruktura CZ” (e-INFRA CZ LM2018140) supported by the Ministry of Education, Youth and Sports of the Czech Republic. This work was motivated by our research on knotted proteins with Joanna Sulkowska’s Lab.
Acknowledgements
We would like to thank the lab members for their insights and collaboration. The authors also thank the staff at the Institute of Computer Science, Masaryk University for computational resources and technical support.
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that generative AI was used in the creation of this manuscript. We used large language models for editorial assistance limited to reviewing, consistency checks, and language polishing. Specifically, ChatGPT-4o and ChatGPT-5 (ChatGPT; OpenAI) and Claude 4.1 (Anthropic) were used. The authors verified the factual accuracy of all AI-assisted text, checked for plagiarism, and accept full responsibility for the manuscript. No generative AI system is listed as an author and none performed original data analysis or drew scientific conclusions.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2025.1715037/full#supplementary-material
References
Abramson, J., Adler, J., Dunger, J., Evans, R., Green, T., Pritzel, A., et al. (2024). Accurate structure prediction of biomolecular interactions with alphafold 3. Nature 630, 1–3. doi:10.1038/s41586-024-07487-w
Bittrich, S., Bhikadiya, C., Bi, C., Chao, H., Duarte, J., Dutta, S., et al. (2023). RCSB protein data bank: efficient searching and simultaneous access to one million computed structure models alongside the PDB structures enabled by architectural advances. J. Mol. Biol. 435, 167994. doi:10.1016/j.jmb.2023.167994
Brandes, N., Ofer, D., Peleg, Y., Rappoport, N., and Linial, M. (2022). Proteinbert: a universal deep-learning model of protein sequence and function. Bioinformatics 38, 2102–2110. doi:10.1093/bioinformatics/btac020
Callaway, E. (2024). Ai protein-prediction tool alphafold3 is now more open. Nature 635, 531–532. doi:10.1038/d41586-024-03708-4
Chai Discovery (2024). Chai-1: decoding the molecular interactions of life. bioRxiv. doi:10.1101/2024.10.10.615955
Cock, P., Antao, T., Chang, J., Chapman, B., Cox, C., Dalke, A., et al. (2009). Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423. doi:10.1093/bioinformatics/btp163
Elliott, L. G., Simpkin, A. J., and Rigden, D. J. (2025). Abcfold: easier running and comparison of alphafold 3, boltz-1 and chai-1. Bioinforma. Adv. 5, vbaf153. doi:10.1093/bioadv/vbaf153
Hu, Y., Cheng, K., He, L., Zhang, X., Jiang, B., Jiang, L., et al. (2021). NMR-based methods for protein analysis. Anal. Chem. 93, 1866–1879. doi:10.1021/acs.analchem.0c03830
Huang, B., Kong, L., Wang, C., Ju, F., Zhang, Q., Zhu, J., et al. (2023). Protein structure prediction: challenges, advances, and the shift of research paradigms. Genomics Proteomics Bioinform. 21, 913–925. doi:10.1016/j.gpb.2022.11.014
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589. doi:10.1038/s41586-021-03819-2
Kalogeropoulos, K., Bohn, M., Jenkins, D., Ledergerber, J., Sørensen, C., Hofmann, N., et al. (2024). A comparative study of protein structure prediction tools for challenging targets: snake venom toxins. Toxicon 238, 107559. doi:10.1016/j.toxicon.2023.107559
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., et al. (2017). Lightgbm: a highly efficient gradient boosting decision tree. Adv. Neural Inform. Process. Syst. 30. doi:10.5555/3294996.3295074
Kovalevskiy, O., Mateos-Garcia, J., and Tunyasuvunakool, K. (2024). AlphaFold two years on: validation and impact. Proc. Natl. Acad. Sci. 121, e2315002121. doi:10.1073/pnas.2315002121
Kufareva, I., and Abagyan, R. (2012). Methods of protein structure comparison. Methods Mol. Biol. 857, 231–257. doi:10.1007/978-1-61779-588-6_10
Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., et al. (2023). Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130. doi:10.1126/science.ade2574
Liu, L., Zhang, S., Xue, Y., Ye, X., Zhu, K., Li, Y., et al. (2024). Technical report of helixfold3 for biomolecular structure prediction.
Manfredi, M., Savojardo, C., Iardukhin, G., Salomoni, D., Costantini, A., Martelli, P., et al. (2024). Alpha&ESMhFolds: a web server for comparing AlphaFold2 and ESMFold models of the human reference proteome. J. Mol. Biol. 436, 168593. doi:10.1016/j.jmb.2024.168593
Manfredi, M., Savojardo, C., Martelli, P. L., and Casadio, R. (2025). Evaluation of the structural models of the human reference proteome: Alphafold2 versus esmfold. Curr. Res. Struct. Biol. 9, 100167. doi:10.1016/j.crstbi.2025.100167
Milne, J., Borgnia, M., Bartesaghi, A., Tran, E., Earl, L., Schauder, D., et al. (2013). Cryo-electron microscopy: a primer for the non-microscopist. FEBS J. 280, 28–45. doi:10.1111/febs.12078
Moult, J., Pedersen, J., Judson, R., and Fidelis, K. (1995). A large-scale experiment to assess protein structure prediction methods. Proteins Struct. Funct. Bioinforma. 23, ii–iv. doi:10.1002/prot.340230303
Moussad, B., Roche, R., and Bhattacharya, D. (2023). The transformative power of transformers in protein structure prediction. Proc. Natl. Acad. Sci. 120, e2303499120. doi:10.1073/pnas.2303499120
Robin, X., Haas, J., Gumienny, R., Smolinski, A., Tauriello, G., and Schwede, T. (2021). Continuous automated model EvaluatiOn (CAMEO)—Perspectives on the future of fully automated evaluation of structure prediction methods. Proteins Struct. Funct. Bioinforma. 89, 1977–1986. doi:10.1002/prot.26213
Rose, Y., Duarte, J., Lowe, R., Segura, J., Bi, C., Bhikadiya, C., et al. (2021). RCSB protein data bank: architectural advances towards integrated searching and efficient access to macromolecular structure data from the PDB archive. J. Mol. Biol. 433, 166704. doi:10.1016/j.jmb.2020.11.003
Škrinjar, P., Eberhardt, J., Durairaj, J., and Schwede, T. (2025). Have protein-ligand co-folding methods moved beyond memorisation? BioRxiv. doi:10.1101/2025.02.03.636309
Smyth, M., and Martin, J. (2000). X ray crystallography. Mol. Pathol. 53, 8–14. doi:10.1136/mp.53.1.8
Tunyasuvunakool, K., Adler, J., Wu, Z., Green, T., Zielinski, M., Žídek, A., et al. (2021). Highly accurate protein structure prediction for the human proteome. Nature 596, 590–596. doi:10.1038/s41586-021-03828-1
Valdés-Tresanco, M., Valdés-Tresanco, M., Jiménez-Gutiérrez, D., and Moreno, E. (2023). Structural modeling of nanobodies: a benchmark of state-of-the-art artificial intelligence programs. Molecules 28, 3991. doi:10.3390/molecules28103991
Wohlwend, J., Corso, G., Passaro, S., Reveiz, M., Leidal, K., Swiderski, W., et al. (2024). Boltz-1: democratizing biomolecular interaction modeling. bioRxiv. doi:10.1101/2024.11.19.624167
Wu, R., Ding, F., Wang, R., Shen, R., Zhang, X., Luo, S., et al. (2022). High-resolution de novo structure prediction from primary sequence. Bioinformatics. doi:10.1101/2022.07.21.500999
Xu, J., and Zhang, Y. (2010). How significant is a protein structure similarity with tm-score= 0.5? Bioinformatics 26, 889–895. doi:10.1093/bioinformatics/btq066
Xu, S., Feng, Q., Qiao, L., Wu, H., Shen, T., Cheng, Y., et al. (2025). Foldbench: an all-atom benchmark for biomolecular structure prediction. bioRxiv. doi:10.1101/2025.05.22.655600
Zemla, A. (2003). Lga: a method for finding 3d similarities in protein structures. Nucleic Acids Res. 31, 3370–3374. doi:10.1093/nar/gkg571
Zhang, Y., and Skolnick, J. (2004). Scoring function for automated assessment of protein structure template quality. Proteins Struct. Funct. Bioinforma. 57, 702–710. doi:10.1002/prot.20264
Keywords: AlphaFold2, ESMFold, foundation models, LightGBM, OmegaFold, protein structure prediction, protein folding, structural bioinformatics
Citation: Hýsková A, Maršálková E and Šimeček P (2026) Balancing speed and precision in protein folding: a comparison of AlphaFold2, ESMFold, and OmegaFold. Front. Genet. 16:1715037. doi: 10.3389/fgene.2025.1715037
Received: 30 September 2025; Accepted: 22 December 2025;
Published: 14 January 2026.
Edited by:
Gajendra P. S. Raghava, Indraprastha Institute of Information Technology Delhi, IndiaReviewed by:
Anna Marabotti, University of Salerno, ItalyArjun Ray, Indraprastha Institute of Information Technology Delhi, India
Copyright © 2026 Hýsková, Maršálková and Šimeček. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Petr Šimeček, c2ltZWNla0BtYWlsLm11bmkuY3o=
Anna Hýsková1,2