Skip to main content


Front. Chem., 02 November 2021
Sec. Medicinal and Pharmaceutical Chemistry
Volume 9 - 2021 |

Effective Search of Triterpenes with Anti-HSV-1 Activity Using a Classification Model by Logistic Regression

www.frontiersin.orgKeiko Ogawa1* www.frontiersin.orgSeikou Nakamura2* www.frontiersin.orgHaruka Oguri1 www.frontiersin.orgKaori Ryu2 www.frontiersin.orgTaichi Yoneda2 www.frontiersin.orgRumiko Hosoki1
  • 1Laboratory of Regulatory Science, College of Pharmaceutical Sciences, Ritsumeikan University, Kusatsu, Japan
  • 2Department of Pharmacognosy, Kyoto Pharmaceutical University, Kyoto, Japan

Natural products are an excellent source of skeletons for medicinal seeds. Triterpenes and saponins are representative natural products that exhibit anti-herpes simplex virus type 1 (HSV-1) activity. However, there has been a lack of comprehensive information on the anti-HSV-1 activity of triterpenes. Therefore, expanding information on the anti-HSV-1 activity of triterpenes and improving the efficiency of their exploration are urgently required. To improve the efficiency of the development of anti-HSV-1 active compounds, we constructed a predictive model for the anti-HSV-1 activity of triterpenes by using the information obtained from previous studies using machine learning methods. In this study, we constructed a binary classification model (i.e., active or inactive) using a logistic regression algorithm. As a result of the evaluation of predictive model, the accuracy for the test data is 0.79, and the area under the curve (AUC) is 0.86. Additionally, to enrich the information on the anti-HSV-1 activity of triterpenes, a plaque reduction assay was performed on 20 triterpenes. As a result, chikusetsusaponin IVa (11: IC50 = 13.06 μM) was found to have potent anti-HSV-1 with three potentially anti-HSV-1 active triterpenes. The assay result was further used for external validation of predictive model. The prediction of the test compounds in the activity test showed a high accuracy (0.83) and AUC (0.81). We also found that this predictive model was found to be able to successfully narrow down the active compounds. This study provides more information on the anti-HSV-1 activity of triterpenes. Moreover, the predictive model can improve the efficiency of the development of active triterpenes by integrating many previous studies to clarify potential relationships.


HSV-1 is a common human pathogen (Arduino and Porter, 2008). According to a report from the World Health Organization, HSV-1 has been widespread and estimated to have infected 3.7 billion people globally (World Health Organization, 2020). The symptoms are usually benign; however, in some cases, severe conditions may occur with the development of herpetic encephalitis (Bradshaw and Venkatesan, 2016). Standard therapeutic drugs such as acyclovir, penciclovir, and vidarabine are all based on the nucleobase structure. Long-term prophylaxis and treatment with acyclovir or other nucleobase drugs has been reported to result in the development of resistance (Piret and Boivin, 2011). Therefore, searching for new anti-HSV-1 compounds with other structural characteristics is essential.

Our study group has been conducting research to discover new compounds (Kondo et al., 2020; Nakamura et al., 2021) and anti-HSV-1 active natural compounds (Ogawa et al., 2018; Yoneda et al., 2018). Triterpenes are known to contain a number of anti-HSV-1 compounds. In a previous study, Ikeda et al. evaluated the anti-HSV-1 activity of 15 oleanane triterpenes and reported several triterpenes such as glycyrrhizic acid methyl ester with an IC50 in the single-digit molar range (Ikeda et al., 2005). Baltina et al. evaluated the anti-HSV-1 activity of lupane triterpenes and proposed that simple structural modifications for lupane triterpenes should exhibit increased activity (Baltina et al., 2003). Hassan et al. reported that cucurbitacin B had strong anti-HSV-1 activity with high selectivity index (Hassan et al., 2017). The activity of triterpenes has been reported in many studies, especially related to the isolation reports of natural compounds (Lv et al., 2016; Isaka et al., 2017).

However, the structure-activity relationship (SAR) of anti-HSV-1 activity of triterpenes seems to be quite difficult. Wachsman pointed out that the SAR did not follow any clear pattern of brassinosteroids (Wachsman et al., 2004). In another study, a slight structural change can greatly affect anti-HSV-1 activity (Kinjo et al., 2000). Another limitation involves the inconsistent experimental protocol and the bias of test compounds, as most activity reports have been related to isolation reports. Although many studies have been conducted about triterpenes with HSV-1 assay, there are few reports that comprehensively examine the relationship between triterpene skeletons and their anti-HSV-1 activity. These backgrounds has presented difficulties in elucidating the relationship between triterpenes and anti-HSV-1 activity. Therefore, we attempted to develop the strategy to facilitate the narrowing down of anti-HSV-1 active compounds by integrating previously reported data using machine learning methods.

In recent years, quantitative structure–activity relationship (QSAR) studies have been successful in investigating the relationship between the structure and activity of compounds (Saiz-Urra et al., 2007; Masand et al., 2020). In QSAR studies, the structural features of compounds are converted into numerical values as molecular descriptors to analyze their relationship to activity. For instance, Saiz-Urra et al. described the antitumor activity of naphthoquinone ester derivatives using regression with 2D-autocorrelation descriptors.

In this study, to construct a predictive anti HSV-1 activity model, the problem to be solved is the difference in experimental conditions depending on the references. The results of the anti-HSV-1 activity assay were affected by experimental conditions, such as incubation time or cell type. Regression-based methods are commonly used and successfully perform activity prediction (Nagai et al., 2019; Kaushik et al., 2021). However, because these methods provide concrete activity values as output, they may contain large errors when using data with inconsistent experimental protocols.

Therefore, we decided to apply a binary classification model to predict the anti-HSV-1 activity. The classification model sets a threshold value and classifies the samples into two groups, “active” or “inactive”: “active” if the activity is stronger than the threshold value, and “inactive” if the activity is weaker than the threshold value. This method allowed us to demonstrate the basic trend of the anti-HSV-1 activity. Because it is only based on whether the activity is stronger or weaker than the threshold value, it minimizes the influence of differences in experimental conditions on the activity value.

In addition, we performed anti-HSV-1 assay to use for external-validation of constructed binary classification model and to expanding information on the anti-HSV-1 activity of triterpenes. Therefore, we selected 20 triterpenes randomly from a natural product library constructed by our study group and measured their anti-HSV-1 activity. Twenty triterpenes were initially isolated from three plants: Pfaffia glomerata (Amaranthaceae), Inonotus obliquus (Hymenochaetaceae), and Isodon japonicus (Lamiaceae). P. glomerata and I. japonicus have been reported to contain various types of triterpenes (Shiobara et al., 1993; Mroczek, 2015). However, they have not been studied for their anti-HSV-1 activity. I. obliquus has been reported to contain various triterpenes (Nakata et al., 2007; Ying et al., 2020) and its extracted form exhibited efficacy against HSV-1 (Polkovnikova et al., 2014). However, the active compounds have not yet been determined. We performed a plaque reduction assay on Vero cells to evaluate anti-HSV-1 activity of triterpenes. The assay results were further used for external validation of the constructed predictive model.

In this study, a predictive model for the anti-HSV-1 activity of triterpenes by logistic regression was constructed using previously reported data from databases and original studies. In addition, 20 triterpenes were examined for their anti-HSV-1 activity by using a plaque reduction assay. The predictive model was evaluated through cross-validation and validated by comparison with the activity test results. This study describes an approach to improve the efficiency of searching for active compounds by using machine learning integrating information from previous reports and databases, as well as a new report on the anti-HSV-1 activity of 20 triterpenes.

Materials and Methods

Data Collection

Triterpenes and their anti-HSV-1 activity were collected from two databases, namely, ChEMBL (Gaulton, 2017) and Dictionary of Natural Products (DNP) (Taylor and Francis Group, 2021), and 53 references were searched for on SciFinder. In the ChEMBL database, “plaque reduction assay,” “cytopathic effect,” and “CPE” were used as search words, and the activity data were downloaded as CSV files. The assay data related to HSV-1 were extracted by “Assay description” and “Target name” with “Herpes simplex virus (type 1/strain F)” (CHEMBL613200). A total of 76 triterpenes and their activities were obtained from the ChEMBL database. From the DNP, 14 triterpenes were collected using the search words “herpes” for biological use. We used SciFinder to look for references on anti-HSV-1 active triterpenes by using combinations of following words: “triterpene,” “saponin,” “steroid,” “herpes,” “HSV-1,” “isolated,” and “synthesis.” We carefully scrutinized the references and selected reliable results. The following information was retrieved from the reliable references: compound name, activity (IC50, EC50 or inhibition (%) with test concentration), virus strain, cell line, assay protocol, journal name, DOI, year of publication. A total of 472 triterpenes and their anti HSV-1 activity were obtained.

Data Cleaning and Threshold Setting

All collected triterpenes were converted to the canonical SMILES format to identify duplicates. 76 triterpenes from the ChEMBL database contained SMILES information. The SMILES for the remaining triterpenes was created by obtaining the CAS numbers or mol files from SciFinder and then using ChemDraw Professional (ver. All SMILES were converted to canonical SMILES using OpenBabelGUI (ver. 3.1.1 ,O'Boyle et al., 2011). In total, 429 triterpenes with activity test results were obtained. The SMILES of collected triterpenes and references were shown in Supplementary Material S1.

The activity values were expressed as IC50, EC50, and inhibition rate. The activity expressed in μg/mL was transformed into μM using the following equation.


We set the threshold at 25 μM to define anti-HSV-1 active/inactive. Data with activity expressed as inhibition rates were judged to be active or inactive, considering the test concentration. If the test concentration was smaller than 25 μM and the inhibition rate did not exceed 50%, the data were excluded because the exact activity value could not be determined. Several activity data from the ChEMBL database were represented as “active” or “inactive.” This category was applied to the classification as it stands. For compounds with multiple activity test results, the median value was derived and assigned to a class.

Molecular Representation

Molecular descriptors were used to convert the chemical structures into numerical values. Molecular descriptors are features created based on the physical properties and partial structures of compounds (Brown, 2018). The mol files of each compound were downloaded from PubChem (Kim et al., 2021) or constructed from CAS numbers or canonical SMILES using ChemDraw Professional. The structures of all triterpenes were carefully checked for their conformations. Then, each compound was converted to 3874 descriptors using mol files using alvaDesc (ver. 1.0.16). Variable reduction was conducted to eliminate any values inappropriate for model building in the following four steps: (1) eliminate descriptors with one or more missing values, (2) eliminate descriptors with all constant values or near constant values, (3) eliminate descriptors with standard deviations less than 0.01, and (4) eliminated one of the descriptors if the pair correlation coefficient was larger or equal to 0.75. After variable reduction, 267 descriptors were standardized and used to build a predictive model. The number of descriptors for each block is listed in Table 1.


TABLE 1. Number of the molecular descriptors using model construction.

Principal Component Analysis

Principal component analysis (PCA) was performed to check the distribution of the compounds from ChEMBL, DNP, and original papers. The 267 structural descriptors, which are the same as those used for constructing the predictive model, were used as variables in the PCA. All compounds were visualized in three-dimensional space. PCA and visualization were performed using JMP® Pro (ver. 15.1.0).

Model Construction

Prior to the construction of the predictive model, we developed initial models using various prediction methods to determine which methods were suitable for predicting anti-HSV-1 activity. Initial models were constructed with logistic regression, random forest, decision tree, support vector machine, and k-nearest neighbor methods by fivefold cross-validation (data not shown). Among the initial models, the model using logistic regression showed the best performance. Therefore, we used the logistic regression (LR) algorithm to construct the classification model. The parameters of the model were automatically optimized using the grid search algorithm. The division of the data into training and testing sets was carried out at a ratio of 4:1. We performed 10 attempts with different divisions methods and employed the optimal split. All the steps of model construction were implemented in Python (ver. 3.7.3) using the scikit-learn (ver. 0.20.3) library, which is a machine learning package.

The LR algorithm is the major classification method that is mainly applied to linearly separable problems. In this method, the probability of an active or inactive class is obtained by utilizing a logistic function. Three parameters of the LR classification model were optimized by grid search CV to improve the model performance: solver = {liblinear, saga, newton-cg, sag, lbfgs}, penalty = {l1, l2}, and C = {0.1, 1, 10, 100}.

Evaluation of the Model Performance

The performance of the constructed model for predicting anti-HSV-1 activity was evaluated in terms of accuracy, sensitivity, precision, F1-score, and receiver operating characteristic (ROC)—area under the curve (AUC) by fivefold cross-validation. The definitions of these metrics are as follows:


where TP denotes true positive, TN denotes true negative, FP denotes false positive, and FN denotes false negative. ROC is a graph plotting FPR and TPR with threshold change, and its AUC is the evaluation index of the model. The values of ROC-AUC were closer to 1, which indicates a higher model performance.

The QSAR model encounters some difficulty in constructing accurate predictions for compounds that are dissimilar to the structure of the compound used for training. Therefore, the range of predictable compounds should be defined as the applicability domain (AD) (OECD, 2014). The compounds were checked for similarity to the training compounds using a one-class support vector machine (OCSVM) (Kaneko et al., 2015), and the compounds in the AD were used for prediction and evaluation. To perform the OCSVM method, all compounds were represented by extended connectivity fingerprints (ECFP4).

Compounds for the anti-HSV-1 Assay

The 20 test triterpenes for the HSV-1 assay were provided by a natural product library assembled by the Department of Pharmacognosy at Kyoto Pharmaceutical University (Figure 2). The triterpenes were isolated from three plants, including I. japonicus, P. glomerata, and I. obliquus. Euscaphic acid (1), bayogenin (2), 2α,3α,23-trihydroxyurs-12-en-28-oic acid (3) were isolated from the aerial parts of I. japonicus. Pfaffiaglycoside B (4), (20α)-3-oxoolean-12-ene-28,29-dioic acid (5), pfaffianol A (6), pterosterone (7), taxisterone (8), 2β,3β,14α,17β-tetrahydroxy-5β-androst-7-en-6-one (9), oleanolic acid 3-O-β-D-glucuronopyranoside (10), chikusetsusaponin IVa (11), boussingoside A2 (12), pfaffoside C (13), oleanolic acid 28-O-β-D-glucopyranoside (14), 22-oxo-20-hydroxyecdysone (15) were isolated from the roots of P. glomerata. 5α,8α-epidioxyergosta-3β-ol (16), 3β,22R-Dihydroxylanosta-8,23E-diene-25-peroxide (17), 3β-hydroxylanosta-8,24-dien-21-al (18), lanosterol (19), and (3β,23E)-25-hydroperoxylanosta-8,23-dien-3-ol (20) were isolated from the sclerotia of I. obliquus. The isolation procedure and structural determination of these compounds have been described in previous reports (Nakamura et al., 2009; Nakamura et al., 2010).

Evaluation of Anti-HSV-1 Activity and Cytotoxicity

Antiviral effects against HSV-1 (HF strain: acyclovir-sensitive) were evaluated using a plaque reduction assay with some modifications as previously reported (Ogawa et al., 2018). The activity of all test compounds was measured at 25 μM, corresponding to the threshold of the prediction model. Compounds with significantly strong activity at 25 μM were further measured at lower concentrations to calculate the IC50. The protocol was performed follows.

Vero cells were maintained in Dulbecco’s Modified Eagle Medium (DMEM) supplemented with 10% fetal calf serum (FBS). Vero cells (1.7–2.8 × 105 cells/well) were seeded onto a 12-well tissue culture plate and precultured for 1–2 days. The cells were inoculated with 100 plaque-forming units (PFU) of HSV-1 in 0.1 ml of FBS-free DMEM. After 30 min, the inoculum was removed. HSV-infected cells were maintained in DMEM containing 10% FBS, methylcellulose, and serially diluted test compounds. After 48 h of incubation at 37°C, the medium was removed. The cell sheets were stained with 1% crystal violet dissolved in 50% methanol, and the number of plaques larger than 0.4 mm was counted. HSV-1 replication causes cytopathic effects, resulting in plaque formation. The number and size of plaques reflected virus yields. To examine the effect of virus yield reduction, the total plaque number in cells untreated with the compound was defined as 100% and set as the control group. The significant difference of test compounds at 25 μM from the control group was calculated using the Student’s t-test. The significant difference of each concentration of 1013 and 16 from the control group was calculated by Dunnett’s test. Probability (p) values less than 0.05 were considered to be significant (*p < 0.05, **p < 0.01).

All test compounds were evaluated their cytotoxicity by using CellTiter-Glo 3D® cell viability assay (Matsumoto et al., 2020). Briefly, Vero cells (3.0 × 103 cells/ well) were seeded in 96-well plates (96F Nunclon TM Delta Microwell SI; Thermo Fisher Scientific), and precultured for 24 h. The cells were treated with test compounds (10 μL/well). After 48 h, CellTiter-Glo® 3D Reagent (Promega, Madison, WI, United States) was added at 100 μL per well, mixed by shaking for 5 min at RT, and incubated for 25 min at 37°C. The cell-containing media were transferred to a 96-well white plate (96F Nunclon TM Delta White Microwell SI; Thermo Fisher Scientific). Luminescence was measured with a luminometer (FLUOstar OPTIMA®; BMG Labtech, Ortenberg, Germany).

To verify the performance of the constructed model, 20 compounds were predicted to be active or inactive and checked with the assay results. Compounds within the AD defined by OCSVM were predicted for their anti-HSV-1 activity.


Overview of the Collected Data

For the data collection, 472 triterpenes were collected from ChEMBL, DNP, and original papers. After data cleaning and Setting threshold, 416 triterpenes with activity data were used as training and evaluation. The anti-HSV-1 assay conditions for these data included incubation times of 12–48 h and cell types of HeLa cells, Vero cells, and virus strains of mainly KOS (ATCC® VR1493), Strain F (ATCC® VR-733), and 17 (GenBank acc. no. NC_001,806) and HSV-1 MI (ATCC® VR-539), which were acyclovir-sensitive strain. With the threshold of activity set to 25 μM, 166 active and 250 inactive compounds were defined (Table 2); Figure 1 shows the results of the PCA of active or inactive compounds collected from ChEMBL, DNP, and the original papers using 267 descriptors. The first three principal components (PC), namely, PC-1, PC-2, and PC-3, showed 13.18, 8.11, and 7.23% variance, respectively. The total variance of the first 3 PC was 28.52%. The distribution of anti-HSV-1 active compounds and inactive compounds seemed to be similar, whereas they were slightly wider in the inactive compounds.


TABLE 2. Number of compounds for each dataset.


FIGURE 1. Principal component analysis of the active or inactive compounds of the training dataset. PC = principal component. Blue dot: HSV-1 inactive, orange dot: HSV-1 active.


FIGURE 2. Chemical structures of 20 triterpenes.

Predictive Model for the Anti HSV-1 Effect of Triterpenes

A predictive model for anti-HSV-1 activity was constructed using collected triterpene compounds. The structures of the triterpenes were initially converted to 3874 2D molecular descriptors. In this study, the targeted compounds were confined to triterpenes; hence, some of the structural descriptors had similar values. Therefore, in order to select structural descriptors that clearly identified compounds, descriptor reduction was performed with a correlation coefficient of 0.75 and other conditions. Consequently, 267 descriptors were used to construct a predictive model with classification using logistic regression. Optimization using 5-fold cross-validation showed that the best parameters involved setting the solver = “liblinear”, C = 1.0, and penalty = “l2”. At these optimal parameter settings, the accuracy was 0.77, AUC was 0.86, sensitivity was 0.71, precision was 0.77 and F1-score was 0.74 (Table 3).


TABLE 3. Performance of the predictive model for the anti-HSV-1 activity.

The range of compounds that can be applied to the prediction model depends on the structural diversity of the compounds used for training. Thus, we considered AD for this predictive model from the perspective of chemical space. In this study, the AD was determined using the OCSVM method. Among 84 test compounds, 13 of the test compounds were excluded from the prediction because they were judged to be out of the AD. For the remaining 71 test compounds determined to be within the AD, the optimized predictive model was used to predict anti-HSV-1 active/inactive, with an accuracy of 0.78 and an AUC of 0.87. On the other hand, without considering AD, the accuracy was 0.79 and the AUC was 0.86.

Evaluation of the anti-HSV-1 Effect of Triterpenes and Prediction Results

A plaque reduction assay and the cytotoxicity assay were performed to evaluate the 20 triterpenes. The anti-HSV-1 activity of all triterpenes was evaluated at 25 μM, and compounds which showed significant anti-HSV-1 activity at 25 μM were further evaluated at 5 and 10 μM. The result of the cytotoxicity assay was described in Supplementary Material S1. The therapeutic agent acyclovir (IC50: 3.08 μM, cell viability: 102.92 ± 13.78%, at 10 μM) and the typical triterpene oleanolic acid (inhibition: 21.40 ± 11.56% at 25 μM) were used as positive controls. As a result, 4 triterpenes 10 (83.33 ± 41.11%), 11 (51.83 ± 9.61%), 12 (24.72 ± 10.22%), 13 (27.19 ± 5.52%), showed significant anti-HSV-1 activity at 25 μM (Table 4). In particular, 11 exhibited potent activity with IC50 of 13.06 μM respectively with no cytotoxicity (viability: 111.53 ± 4.45%). Compound 10 showed 14.13 μM of IC50, however cytotoxicity was observed at 25 μM (viability: 14.64 ± 2.88%). At 10 μM, 10 showed significant anti-HSV-1 (27.49 ± 8.92%) activity with no cytotoxicity (viability: 103.99 ± 9.28%). Compounds 10 and 11 were both oleanolic glycosides with a glucuronic acid at the 3-position, and 11 also had glucose at the 28-position.


TABLE 4. Anti-HSV-1 activity of triterpenes.

The anti-HSV-1 activity of these 20 compounds was also predicted using an optimized model. The OCSVM was used to determine whether these triterpenes were within the AD of the model, and 18 of the 20 compounds were acceptable for this model. Compounds 9 and 13 were judged to be outside the AD; thus, they were excluded from the compounds to be predicted. As a result of prediction, 6 triterpenes (2, 5, 10, 11, 12 and 18) were determined to be active and 12 triterpenes (1, 3, 4, 68, 1417 and 1920) were determined to be inactive.

Then, the results of the anti-HSV-1 assay and prediction were compared. Compounds 11, which showed anti-HSV-1 activity stronger than 25 μM of IC50, were predicted active with high probability (11: 0.8005). Overall, the accuracy and AUC of our model for these external-validation compounds were 0.83 and 0.81, respectively (Table 5). The sensitivity was 1.00, and the F1-score was 0.40.


TABLE 5. Prediction performance for 20 triterpenes.


The chemical structure of natural products is a rich source of medicinal seeds (Newman and Cragg, 2020). However, many efforts have been made to find compounds with the desired activity. The development of a new approach to efficiently determine the desired activity was therefore required.

There has been a need for anti-HSV-1 therapeutic agents with a different skeleton from that of the existing nucleobases. Natural products are good source to develop the efficient and safe treatment of HSV-1 infections (Treml et al., 2020). There have been various types of anti-HSV-1 active natural products such as terpenoids, flavonoids, polysaccharides and other miscellaneous compounds. Among them, terpenes showed the highest percentage (34.4%) of the reported natural products with anti-HSV-1 activity (Zhong et al., 2013). In particular, they mentioned that glycosides are useful as active compounds. However, most SAR remain unclear.

In recent years, QSAR research has been actively conducted. For instance, Masand et al. constructed a QSAR model for SARS-CoV inhibitors from a dataset of peptide compounds (Masand et al., 2020). Banerjee et al. reported on the development of a predictor that used structural descriptors to classify the taste of compounds (Banerjee et al., 2018). As these reports show, QSAR studies have successfully provided new insights into SAR and new possibilities for the discovery of active compounds. For anti-HSV-1 activity, Sabatino et al. reported a predictive model for essential oils that exhibit antiviral activity using the partial least square discriminant analysis algorithm (Sabatino et al., 2020). Another report described the protein‒protein interaction between human and HSV-1 using machine learning techniques (Lian et al., 2020). However, to the best of our knowledge, predictive models for compounds that exhibit anti-HSV-1 activity have not been investigated so far.

Considering these backgrounds, we have started this study to accomplish a comprehensive understanding by integrating the information from previous original papers with machine learning techniques. Briefly, we constructed a predictive model for anti-HSV-1 activity by summarizing previous activity reports on triterpenes to efficiently identify the active triterpenes. Despite the diversity of the assay condition used for training, such as incubation time, cell types, and virus strains, our model was successfully developed with high predictive performance. On the other hand, there is still a need for further information on the anti-HSV-1 activity of triterpenes. We evaluated the anti-HSV-1 activity of 20 triterpenes using a plaque reduction assay and identified several active compounds.

In the step of collecting information on the anti-HSV-1 activity of triterpenes, the activity information of triterpenes, saponins, and steroids with different skeletons were obtained. We set the activity threshold to 25 μM IC50. The reason for this threshold is that empirically accessible to obtain compounds that do not exhibit cytotoxicity but exhibit anti-HSV-1 activity. Additionally, sufficient active and inactive data were available to construct the model, and the active and inactive compounds were evenly divided in terms of their structures. PCA analysis was performed using 267 molecular descriptors, as in the model construction process. This result showed that the distributions of the active and inactive compounds were similar. The results of model construction indicated that our predictive model showed high discriminant ability with an accuracy of 0.79, and an AUC of 0.86, even though the training data were based on non-unified assay protocols. The precision and sensitivity were 0.76, and 0.76, respectively, suggesting that our predictive model is effective in finding active compounds with a good balance. In addition, it should be emphasized that our model showed high discrimination ability, not only for simple triterpenes but also for their glycosides and steroids.

In a prediction model using compound structures, determining whether the compound to be predicted is in the AD of the model based on structural similarity is generally necessary. In this study, the OCSVM method was used to define the AD. Consequently, 13 out of 84 test compounds were found to be outside the AD. Among the 13 excluded compounds, 7 were active and 6 were inactive. When comparing the prediction performance of using and not considering AD, we found that the accuracy was slightly higher when AD was not used than when using AD. In this study, we limited the targets to triterpene skeletons; thus, the structural similarity was considered initially high.

In the anti-HSV-1 assay, four triterpenes (10, 11, 12, and 16) exhibited significant activity against HSV-1. Specifically, compounds 11 showed a potent anti-HSV-1 activity with IC50 of 13.06 μM. Compound 11 was previously reported to exhibit anti-HSV-1 activity (Rattanathongkom et al., 2009). This report was not included in the training data because it did not include our search words. Our result on the activity of 11 was also supportive of a previous study. In our classidication model, which was created by integrating various protocols, compound 11 was predicted to have a high probability (0.8005) of showing activity. This suggests that compound 11 may be broadly active under different assay conditions, such as different cell types and virus strains. compound 10 showed IC50 of 14.13 μM. Although significant cytotoxicity was observed at 25 μM, no cytotoxicity was observed at 10 μM. It indicated that 10 can be treated as an active compound, with a narrow safety range. Previously, Rattanathongkom et al. reported that 10 did not show any anti-HSV-1 activity but cytotoxicity. Our classification model classified compound 10 as being in the active group with a probability of 0.6441, consistent with our results. Because the probability was near the threshold (0.5), the classification of 10 as active or inactive was sensitive. However, it can be regarded as a good reflection of the current and previous results. From these results, it is meaningful to check the probability of the predictive model proposed in this study to examine the certainty of active compounds.

The active compounds included only three oleanane (10, 11) or noroleanane-type (12) saponins with glucuronic acid at the hydroxy group at the 3-position and an ergostene-type triterpene with a peroxide group. Compounds 10 and 11 showed potent anti-HSV-1 activity with an IC50 of less than 25 μM. In addition compound 12 exhibited significant activity and compound 13 showed no significant activity, glucuronic acid at position 3 may have been involved in the expression of the anti-HSV-1 activity. Compound 10 showed cytotoxicity, while 11 and 12 with glucose at position 28 showed no cytotoxicity, suggesting that glucose at position 28 may be involved in reducing cytotoxicity. All the compounds that showed activity had a hydroxy group at the 3-position and no substituent at the 2-position. On the other hand, compounds with hydroxy groups at both 2 and 3 positions showed no activity. This suggests that the hydroxy group at the 2-position may be involved in the decrease in activity. Comparing the activity of compounds with or without glucose at position 28, the compounds with glucose seemed to be slightly more active. However, compound 14, which had glucose at position 28, but no glucuronic acid at position 3, did not show significant activity. Therefore, glucose at position 28 may have less of an effect on activity than glucuronic acid at position 3.

When comparing the results from the constructed predictive model and the results from the anti-HSV-1 assay, the predictive model successfully found both 10 and 11 as active compounds, which showed IC50 stronger than 25 μM. The predictions for 20 compounds showed high sensitivity (1.00), but lower precision (0.40). This predictive model is expected to improve the efficiency of finding the active compounds. Therefore, it is more important not to miss the potentially active compounds (sensitivity) than to find them accurately (precision). Meanwhile, compound 12 was determined to be active with a high probability (0.9833), but the prediction was incorrect. However, 12 showed a moderate anti-HSV-1 activity at 25 μM, suggesting that the prediction model can play a role in identifying compounds with potential anti-HSV-1 activity. Considering these facts, the prediction model was consistent with our purpose.


This study was conducted to improve the search efficiency for active triterpenes. We have successfully developed a predictive model for anti-HSV-1 activity by integrating the results of previous reports. The predictive model showed satisfactory performance with an accuracy of 0.79, AUC of 0.86, and high ability to identify active compounds. In addition, we measured the anti-HSV-1 activity of 20 triterpenes from a natural product library. We found that several triterpenes showed a significant anti-HSV-1 activity at 25 μM. Among them, 11 showed significantly strong activity, with IC50 values of 13.06 μM with no toxicity. Further, our prediction model showed high prediction performance with an accuracy of 0.83 and AUC of 0.81, even in external validation using these triterpenes in assay. The proposed model succeeded in finding all compounds that showed stronger activity than 25 μM of IC50. This indicates that the model can determine the anti-HSV-1 activity of various structural triterpenes, their glycosides, and sterols with good performance.

In this study, our binary classification model was able to integrate the results of assays with different experimental protocols and filter out active compounds based on a comprehensive relationship pattern. Using this predictive model, it is possible to determine which triterpenes are likely to be active, allowing for more rapid access to anti-HSV-1 active triterpenes. However, because the training for the prediction model depends on known data, there are some limitations in this approach. It is difficult to accurately determining the activity of a rare compound because the training data would not contain structurally similar compounds to rare compounds. Therefore, it is expected that further improvement of the predictive model will be achieved by accumulating more data on the anti-HSV-1 activity of various compounds.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.

Author Contributions

KO and SN designed the research plan. HO and KO. collected the data and developed a prediction model. KR, TY, and SN performed the activity test measurements. SN, RH, and KO interpreted the results. KO wrote the manuscript. All authors have checked the final version of the manuscript and have agreed to the submission.


This work was supported by the JSPS KAKENHI Grant Numbers 21K15325 (KO) and 20K07109 (SN)

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.


We would like to thank Editage ( for English language editing.

Supplementary Material

The Supplementary Material for this article can be found online at:


Arduino, P. G., and Porter, S. R. (2008). Herpes Simplex Virus Type 1 Infection: Overview on Relevant Clinico-Pathological Features*. J. Oral Pathol. Med. 37, 107–121. doi:10.1111/j.1600-0714.2007.00586.x

CrossRef Full Text | Google Scholar

Baltina, L. A., Flekhter, O. B., Nigmatullina, L. R., Boreko, E. I., Pavlova, N. I., Nikolaeva, S. N., et al. (2003). Lupane Triterpenes and Derivatives with Antiviral Activity. Bioorg. Med. Chem. Lett. 13, 3549–3552. doi:10.1016/S0960-894X(03)00714-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Banerjee, P., and Preissner, R. (2018). BitterSweetForest: A Random Forest Based Binary Classifier to Predict Bitterness and Sweetness of Chemical Compounds. Front. Chem. 6, 93. doi:10.3389/fchem.2018.00093

PubMed Abstract | CrossRef Full Text | Google Scholar

Bradshaw, M. J., and Venkatesan, A. (2016). Herpes Simplex Virus-1 Encephalitis in Adults: Pathophysiology, Diagnosis, and Management. Neurotherapeutics 13, 493–508. doi:10.1007/s13311-016-0433-7

PubMed Abstract | CrossRef Full Text | Google Scholar

J. B. Brown (Editor) (2018). Computational Chemogenomics. New York: Humana Press, Springer City

Google Scholar

Gaulton, A., Hersey, A., Nowotka, M., Bento, A. P., Chambers, J., Mendez, D., et al. (2017). The ChEMBL Database in 2017. Nucleic Acids Res. 45 (D1), D945–D954. doi:10.1093/nar/gkw1074

PubMed Abstract | CrossRef Full Text | Google Scholar

Hassan, S. T. S., Berchová-Bímová, K., Petráš, J., and Hassan, K. T. S. (2017). Cucurbitacin B Interacts Synergistically with Antibiotics against Staphylococcus aureus Clinical Isolates and Exhibits Antiviral Activity against HSV-1. South Afr. J. Bot. 108, 90–94. doi:10.1016/j.sajb.2016.10.001

CrossRef Full Text | Google Scholar

Ikeda, T., Yokomizo, K., Okawa, M., Tsuchihashi, R., Kinjo, J., Nohara, T., et al. (2005). Anti-herpes Virus Type 1 Activity of Oleanane-type Triterpenoids. Biol. Pharm. Bull. 28, 1779–1781. doi:10.1248/bpb.28.1779

PubMed Abstract | CrossRef Full Text | Google Scholar

Isaka, M., Chinthanom, P., Srichomthong, K., and Thummarukcharoen, T. (2017). Lanostane Triterpenoids from Fruiting Bodies of the Bracket Fungus Fomitopsis Feei. Tetrahedron Lett. 58, 1758–1761. doi:10.1016/j.tetlet.2017.03.066

CrossRef Full Text | Google Scholar

Kaneko, H., and Funatsu, K. (2015). Strategy of Structure Generation within Applicability Domains with One-Class Support Vector Machine. Bull. Chem. Soc. Jpn. 88, 981–988. doi:10.1246/bcsj.20150054

CrossRef Full Text | Google Scholar

Kaushik, P., Shakil, N. A., and Rana, V. S. (2021). Synthesis, Biological Evaluation, and QSAR Studies of 3-Iodochromone Derivatives as Potential Fungicides. Front. Chem. 9, 636882. doi:10.3389/fchem.2021.636882

PubMed Abstract | CrossRef Full Text | Google Scholar

Kim, S., Chen, J., Cheng, T., Gindulyte, A., He, J., He, S., et al. (2021). PubChem in 2021: New Data Content and Improved Web Interfaces. Nucleic Acids Res. 49, D1388–D1395. doi:10.1093/nar/gkaa971

PubMed Abstract | CrossRef Full Text | Google Scholar

Kinjo, J., Yokomizo, K., Hirakawa, T., Shii, Y., Nohara, T., and Uyeda, M. (2000). Anti-herpes Virus Activity of Fabaceous Triterpenoidal Saponins. Biol. Pharm. Bull. 23, 887–889. doi:10.1248/bpb.23.887

PubMed Abstract | CrossRef Full Text | Google Scholar

Kondo, Y., Nakamura, S., Ino, S., Yamashita, H., Nakashima, S., Yamashita, M., et al. (2020). Asymmetric Nitrogen-Containing Dimer from Aerial Parts of Mercurialis leiocarpa and its Synthesis by Mimicking Generation Process through Radical Intermediates. Chem. Pharm. Bull. 68, 520–525. doi:10.1248/cpb.c20-00058

PubMed Abstract | CrossRef Full Text | Google Scholar

Lian, X., Yang, X., Shao, J., Hou, F., Yang, S., Pan, D., et al. (2020). Prediction and Analysis of Human-Herpes Simplex Virus Type 1 Protein-Protein Interactions by Integrating Multiple Methods. Quant. Biol. 8, 312–324. doi:10.1007/s40484-020-0222-5

CrossRef Full Text | Google Scholar

Lv, X.-J., Li, Y., Ma, S.-G., Qu, J., Liu, Y.-B., Li, Y.-H., et al. (2016). Antiviral Triterpenes from the Twigs and Leaves of Lyonia Ovalifolia. J. Nat. Prod. 79, 2824–2837. doi:10.1021/acs.jnatprod.6b00585

PubMed Abstract | CrossRef Full Text | Google Scholar

Masand, V. H., Rastija, V., Patil, M. K., Gandhi, A., and Chapolikar, A. (2020). Extending the Identification of Structural Features Responsible for Anti-SARS-CoV Activity of Peptide-type Compounds Using QSAR Modelling. SAR QSAR Environ. Res. 31, 643–654. doi:10.1080/1062936x.2020.1784271

PubMed Abstract | CrossRef Full Text | Google Scholar

Matsumoto, T., Imahori, D., Saito, Y., Zhang, W., Ohta, T., Yoshida, T., et al. (2020). Cytotoxic Activities of Sesquiterpenoids from the Aerial Parts of Petasites Japonicus against Cancer Stem Cells. J. Nat. Med. 74, 689–701. doi:10.1007/s11418-020-01420-x

CrossRef Full Text | Google Scholar

Mroczek, A. (2015). Phytochemistry and Bioactivity of Triterpene Saponins from Amaranthaceae Family. Phytochem. Rev. 14, 577–605. doi:10.1007/s11101-015-9394-4

CrossRef Full Text | Google Scholar

Nagai, J., Imamura, M., Sakagami, H., and Uesawa, Y. (2019). QSAR Prediction Model to Search for Compounds with Selective Cytotoxicity against Oral Cell Cancer. Medicines 6, 45. doi:10.3390/medicines6020045

PubMed Abstract | CrossRef Full Text | Google Scholar

Nakamura, S., Nakashima, S., and Matsuda, H. (2021). Sulfur-containing Compounds from Leaves of Allium Plants, A. Fistulosum, A. Schoenoprasum Var. Foliosum, and A. Sativum. J. Asian Assoc. Schools Pharm. 10, 1–8.

Google Scholar

Nakamura, S., Chen, G., Nakashima, S., Matsuda, H., Pei, Y., and Yoshikawa, M. (2010). Brazilian Natural Medicines. IV. New Noroleanane-type Triterpene and Ecdysterone-type Sterol Glycosides and Melanogenesis Inhibitors from the Roots of Pfaffia Glomerata. Chem. Pharm. Bull. 58, 690–695. doi:10.1248/cpb.58.690

PubMed Abstract | CrossRef Full Text | Google Scholar

Nakamura, S., Iwami, J., Matsuda, H., Mizuno, S., and Yoshikawa, M. (2009). Absolute Stereostructures of Inoterpenes A-F from Sclerotia of Inonotus Obliquus. Tetrahedron 65, 2443–2450. doi:10.1016/j.tet.2009.01.076

CrossRef Full Text | Google Scholar

Nakata, T., Yamada, T., Taji, S., Ohishi, H., Wada, S.-i., Tokuda, H., et al. (2007). Structure Determination of Inonotsuoxides A and B and In Vivo Anti-tumor Promoting Activity of Inotodiol from the Sclerotia of Inonotus Obliquus. Bioorg. Med. Chem. 15, 257–264. doi:10.1016/j.bmc.2006.09.064

PubMed Abstract | CrossRef Full Text | Google Scholar

Newman, D. J., and Cragg, G. M. (2020). Natural Products as Sources of New Drugs over the Nearly Four Decades from 01/1981 to 09/2019. J. Nat. Prod. 83, 770–803. doi:10.1021/acs.jnatprod.9b01285

PubMed Abstract | CrossRef Full Text | Google Scholar

O'Boyle, N. M., Banck, M., James, C. A., Morley, C., Vandermeersch, T., and Hutchison, G. R. (2011). Open Babel: an Open Chemical Toolbox. J. Cheminform 3, 33. doi:10.1186/1758-2946-3-33

CrossRef Full Text | Google Scholar

OECD (2014). Guidance Document on the Validation of (Quantitative) Structure-Activity Relationship [(Q)SAR] Models, OECD Series on Testing and Assessment, No. 69. Paris: OECD Publishing. doi:10.1787/9789264085442-en

CrossRef Full Text | Google Scholar

Ogawa, K., Nakamura, S., Hosokawa, K., Ishimaru, H., Saito, N., Ryu, K., et al. (2018). New Diterpenes from Nigella Damascena Seeds and Their Antiviral Activities against Herpes Simplex Virus Type-1. J. Nat. Med. 72, 439–447. doi:10.1007/s11418-017-1166-6

CrossRef Full Text | Google Scholar

Piret, J., and Boivin, G. (2011). Resistance of Herpes Simplex Viruses to Nucleoside Analogues: Mechanisms, Prevalence, and Management. Antimicrob. Agents Chemother. 55, 459–472. doi:10.1128/AAC.00615-10

PubMed Abstract | CrossRef Full Text | Google Scholar

Polkovnikova, M. V., Nosik, N. N., Garaev, T. M., Kondrashina, N. G., Finogenova, M. P., and Shibnev, V. A. (2014). A Study of the Antiherpetic Activity of the Chaga Mushroom (Inonotus Obliquus) Extracts in the Vero Cells Infected with the Herpes Simplex Virus. Vopr Virusol 59, 45–48.

PubMed Abstract | Google Scholar

Rattanathongkom, A., Lee, J.-B., Hayashi, K., Sripanidkulchai, B.-o., Kanchanapoom, T., and Hayashi, T. (2009). Evaluation of Chikusetsusaponin IV a Isolated fromAlternanthera Philoxeroidesfor its Potency against Viral Replication. Planta Med. 75, 829–835. doi:10.1055/s-0029-1185436

PubMed Abstract | CrossRef Full Text | Google Scholar

Sabatino, M., Fabiani, M., Božović, M., Garzoli, S., Antonini, L., Marcocci, M. E., et al. (2020). Experimental Data Based Machine Learning Classification Models with Predictive Ability to Select In Vitro Active Antiviral and Non-toxic Essential Oils. Molecules 25, 2452. doi:10.3390/molecules25102452

PubMed Abstract | CrossRef Full Text | Google Scholar

Saíz-Urra, L., Pérez González, M., and Teijeira, M. (2007). 2D-autocorrelation Descriptors for Predicting Cytotoxicity of Naphthoquinone Ester Derivatives against Oral Human Epidermoid Carcinoma. Bioorg. Med. Chem. 15, 3565–3571. doi:10.1016/j.bmc.2007.02.032

PubMed Abstract | CrossRef Full Text | Google Scholar

Shiobara, Y., Inoue, S.-S., Kato, K., Nishiguchi, Y., Oishi, Y., Nishimoto, N., et al. (1993). A Nortriterpenoid, Triterpenoids and Ecdysteroids from Pfaffia Glomerata. Phytochemistry 32, 1527–1530. doi:10.1016/0031-9422(93)85172-N

CrossRef Full Text | Google Scholar

Taylor and Francis Group (2021). Dictionary of Natural Products. Available at: (Accessed January 30, 2021).

Google Scholar

Treml, J., Gazdová, M., Šmejkal, K., Šudomová, M., Kubatka, P., and Hassan, S. T. S. (2020). Natural Products-Derived Chemicals: Breaking Barriers to Novel Anti-HSV Drug Development. Viruses 12, 154. doi:10.3390/v12020154

PubMed Abstract | CrossRef Full Text | Google Scholar

Wachsman, M., Ramirez, J., Talarico, L., Galagovsky, L., and Coto, C. (2004). Antiviral Activity of Natural and Synthetic Brassinosteroids. Curr. Med. Chem. - Anti-Infective Agents 3, 163–179. doi:10.2174/1568012043354026

CrossRef Full Text | Google Scholar

World Health Organization (2020). Fact Sheets: Herpes Simplex Virus. Available at: (Accessed May 25, 2021).

Google Scholar

Ying, Y.-M., Yu, H.-F., Tong, C.-P., Shan, W.-G., and Zhan, Z.-J. (2020). Spiroinonotsuoxotriols A and B, Two Highly Rearranged Triterpenoids from Inonotus Obliquus. Org. Lett. 22, 3377–3380. doi:10.1021/acs.orglett.0c00866

PubMed Abstract | CrossRef Full Text | Google Scholar

Yoneda, T., Nakamura, S., Ogawa, K., Matsumoto, T., Nakashima, S., Matsumura, K., et al. (2018). Oleanane-type Triterpenes with Highly-Substituted Oxygen Functional Groups from the Flower Buds of Camellia Sinensis and Their Inhibitory Effects against NO Production and HSV-1. Nat. Product. Commun. 13, 131. doi:10.1177/1934578X1801300206

CrossRef Full Text | Google Scholar

Zhong, M.-G., Xiang, Y.-F., Qiu, X.-X., Liu, Z., Kitazato, K., and Wang, Y.-F. (2013). Natural Products as a Source of Anti-herpes Simplex Virus Agents. RSC Adv. 3, 313–328. doi:10.1039/C2RA21464D

CrossRef Full Text | Google Scholar

Keywords: natural products, triterpene, saponin, herpes simplex virus type 1 (HSV-1), machine learing, QSAR

Citation: Ogawa K, Nakamura S, Oguri H, Ryu K, Yoneda T and Hosoki R (2021) Effective Search of Triterpenes with Anti-HSV-1 Activity Using a Classification Model by Logistic Regression. Front. Chem. 9:763794. doi: 10.3389/fchem.2021.763794

Received: 24 August 2021; Accepted: 11 October 2021;
Published: 02 November 2021.

Edited by:

Tara Louise Pukala, University of Adelaide, Australia

Reviewed by:

Fujun Jin, Jinan University, China
Sherif T. S. Hassan, Czech University of Life Sciences Prague, Czechia

Copyright © 2021 Ogawa, Nakamura, Oguri, Ryu, Yoneda and Hosoki. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Keiko Ogawa,; Seikou Nakamura,