Identification of Distinct Characteristics of Antibiofilm Peptides and Prospection of Diverse Sources for Efficacious Sequences

A majority of microbial infections are associated with biofilms. Targeting biofilms is considered an effective strategy to limit microbial virulence while minimizing the development of antibiotic resistance. Toward this need, antibiofilm peptides are an attractive arsenal since they are bestowed with properties orthogonal to small molecule drugs. In this work, we developed machine learning models to identify the distinguishing characteristics of known antibiofilm peptides, and to mine peptide databases from diverse habitats to classify new peptides with potential antibiofilm activities. Additionally, we used the reported minimum inhibitory/eradication concentration (MBIC/MBEC) of the antibiofilm peptides to create a regression model on top of the classification model to predict the effectiveness of new antibiofilm peptides. We used a positive dataset containing 242 antibiofilm peptides, and a negative dataset which, unlike previous datasets, contains peptides that are likely to promote biofilm formation. Our model achieved a classification accuracy greater than 98% and harmonic mean of precision-recall (F1) and Matthews correlation coefficient (MCC) scores greater than 0.90; the regression model achieved an MCC score greater than 0.81. We utilized our classification-regression pipeline to evaluate 135,015 peptides from diverse sources for potential antibiofilm activity, and we identified 185 candidates that are likely to be effective against preformed biofilms at micromolar concentrations. Structural analysis of the top 37 hits revealed a larger distribution of helices and coils than sheets, and common functional motifs. Sequence alignment of these hits with known antibiofilm peptides revealed that, while some of the hits showed relatively high sequence similarity with known peptides, some others did not indicate the presence of antibiofilm activity in novel sources or sequences. Further, some of the hits had previously recognized therapeutic properties or host defense traits suggestive of drug repurposing applications. Taken together, this work demonstrates a new in silico approach to predicting antibiofilm efficacy, and identifies promising new candidates for biofilm eradication.


DATASET STATISTICS
presents the number of peptides for training, validation and out-of-sample test sets for both the positive and negative datasets. The table also contains details of the dataset used for training and evaluating the regression models.  Figure S1 presents the ten dipeptides with the highest composition percentage from the negative dataset. Interestingly, most of the dipeptides in the top ten set contain leucine, a non-polar amino acid.  Table S2 presents results from our evaluation of different machine learning models based on individual features while Table S3 displays the performance of different models when we combine two features together. Finally, Table S4 showcases the performance of our models when we combine more than two features. Our best performing model combines the AAC, DPC, CTD and Motif features.

Visualization
We have evaluated the 2D structures of the peptides using PEP2D server Singh et al. (2019).
We further evaluated the structure of the peptides with probable antibiofilm activity. We evaluated helical wheel structure ( Figure S5) for the peptides which showed higher percentage of helices in secondary structure evaluation.
The 2D structures were evaluated using the PEP2D server. The pink cylinders represent helix, yellow arrows represent sheet, and the black line is coil.  Figure S4. Predicted 2D structures of previously characterized peptides with potential antibiofilm activity.

Frontiers
The 2D structures were evaluated using the PEP2D server. The pink cylinders represent helix, yellow arrows represent sheet, and the black line is coil Here, hydrophilic amino acids are shown in circles, hydrophobic as diamonds. Negatively charged amino acids are triangles, and positively charged are pentagons. The hydrophobic amino acids are green, and the green shade decreases to yellow as per decreasing hydrophobicity. Hydrophilic amino acids are in red and the amount of red decreases as per decreasing hydrophilicity. The highly charged amino acids are in light blue and non-polar amino acids are in dark red. The numbers indicate the hydrophobic moment and the direction of the moment. The wheel structures were obtained using the software created by Don Armstrong andRaphael Zidovetzki, version 1.4, 2009-10-20 Schiffer andEdmundson (1967); Armstrong and Zidovetzki (2009).

Alignment
We also analyzed a few newly found antibiofilm peptides against some well known antibiofilm peptides which already have an eradication effect on preformed biofilm. For example, we aligned human cathelicidin, LL-37, against the set of Mastoparan-like peptides from our list. The alignment was done using the Clustal default webservice Madeira et al. (2019). The alignment is displayed in Figure S6 using Jalview V2 Waterhouse et al. (2009). Q16228 RT2

Peptide List
The list of probable antibiofilm peptides from our pipeline are listed in Tables S6-S13. The tables contain peptide sequences and predicted MBEC values. We grouped the peptides in several MBIC value ranges.

Positive Dataset
The details of our positive dataset, including the peptide sequence and its length, are given in Tables S14-S18.

MBEC Dataset
Antibiofilm peptides with MBEC values are listed in Tables S19-S20. The pathogens against which the MBEC values are effective are also listed in the 'pathogen' column. The MBEC values are listed in µM.