ORIGINAL RESEARCH article
Front. Bioinform.
Sec. Genomic Analysis
Volume 5 - 2025 | doi: 10.3389/fbinf.2025.1622931
CoMPHI: A Novel Composite Machine Learning Approach Utilizing Multiple Feature Representation to Predict Hosts of Bacteriophages
Provisionally accepted- 1Strawberry Crest High School, Dover, United States
- 2Internal Medicine/Allergy and Immunology, University of South Florida, Tampa, FL, United States
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Phage therapy has reemerged as a compelling alternative to antibiotics in treating bacterial infections, especially for superbugs that have developed antibiotic resistance. The challenge in the broader application of phage therapy is identifying host targets for the vast array of uncharacterized phages obtained through next-generation sequencing. To solve this issue, this paper introduces an innovative Composite Model for Phage Host Interaction, CoMPHI, to predict phage-host interactions by combining the accuracy of alignment-based methods with the efficiency and flexibility of machine learning techniques. The model initially generates multiple feature encodings from nucleotide and protein sequences of both phages and hosts to enhance prediction accuracies. It is further enriched by incorporating alignment scores between phage-phage, phage-host, and host-host, creating a composite model. During the 5-fold cross-validation, the composite model exhibited an Area Under the ROC Curve (AUC-ROC) of 94%, 96.4%, 96.5%, 96.6%, 96.6%, and 96.7% and accuracy of 92.3%, 93.3%, 93.6%, 94%, 94.9%, and 95.1% at the Species, Genus, Family, Order, Class, and Phylum levels, respectively. A comparative analysis revealed a 6-8% increase in model performance due to the inclusion of alignment scores. Additionally, an ablation study highlighted that including both nucleotide and protein sequences from both phages and hosts increased the prediction accuracy of the model. Another ablation study provided evidence that phage-host and host-host alignment scores, combined with phage-phage scores, equally contributed to enhancing the composite model's performance. In conclusion, this paper presents a robust and comprehensive composite model advancing the use of phage therapy in modern medicine.
Keywords: Sequence Alignment, machine learning, Bacteriophages, antibiotic resistance, phagehost prediction
Received: 05 May 2025; Accepted: 19 Sep 2025.
Copyright: © 2025 Bodaka and Kolliputi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Narasaiah Kolliputi, nkollipu@usf.edu
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.