ORIGINAL RESEARCH article
Sec. Experimental Pharmacology and Drug Discovery
Volume 13 - 2022 | https://doi.org/10.3389/fphar.2022.1040838
Development of QSAR models to predict blood-brain barrier permeability
- 1US Food and Drug Administration, Center for Drug Evaluation and Research, Silver Spring, MD, United States
- 2Instem Inc, Columbus, OH, United States
- 3Multicase Inc, Beachwood, OH, United States
Assessing drug permeability across the blood-brain barrier (BBB) is important when evaluating the abuse potential of new pharmaceuticals as well as developing novel therapeutics that target central nervous system disorders. One of the gold-standard in vivo methods for determining BBB permeability is rodent log BB; however, like most in vivo methods, it is time-consuming and expensive. In the present study, two statistical-based quantitative structure-activity relationship (QSAR) models were developed to predict BBB permeability of drugs based on their chemical structure. The in vivo BBB permeability data were harvested for 921 compounds from publicly available literature, non-proprietary drug approval packages, and University of Washington’s Drug Interaction Database. The cross-validation performance statistics for the BBB models ranged from 82 to 85% in sensitivity and 80–83% in negative predictivity. Additionally, the performance of newly developed models was assessed using an external validation set comprised of 83 chemicals. Overall, performance of individual models ranged from 70 to 75% in sensitivity, 70–72% in negative predictivity, and 78–86% in coverage. The predictive performance was further improved to 93% in coverage by combining predictions across the two software programs. These new models can be rapidly deployed to predict blood brain barrier permeability of pharmaceutical candidates and reduce the use of experimental animals.
The BBB is a primary defense system that protects the brain from exposure to potentially toxic substances and ensures an optimal nutrient supply to the brain. An essential part of BBB is the brain capillary endothelium, a tight membrane junction that separates the blood from the brain tissue and restricts the paracellular transport of compounds across the junction thereby providing selective permeability to the compounds (Abbott et al., 2010). Due to the restrictive nature of the BBB, most compounds enter the brain through either passive diffusion or transporter-mediated uptake. Most hydrophobic compounds pass the BBB through simple diffusion driven by the concentration gradient between the brain and the blood. This process is governed by physiochemical parameters including molecular size, lipophilicity, polar surface area and charge (Begley and Brightman, 2003; Di et al., 2008; Geldenhuys et al., 2015; Copur and Oner, 2017).
In addition to uptake transporters, BBB also hosts efflux transporters that actively transport molecules out of the brain. The most common efflux transporters at the BBB are P-glycoprotein (P-gp, ABCB1 or Multi-drug resistance 1 (MDR1) protein) and breast cancer resistance protein (BCRP, ABCG2) which belong to the family of adenosine triphosphate (ATP) binding cassette (ABC) transporters. Both transporters are often referred to as “gatekeeper” transporters as they provide a vital check on limiting the drugs from accessing the brain (Mahringer and Fricker, 2016). The active uptake transporters are responsible for the uptake of a variety of substrates such as amino acids, fatty acids, essential minerals, vitamins, and glucose. Examples of active transporters include the large neutral amino acid transporter (LAT1) for DOPA and gabapentin. Other uptake transporters relevant for drugs include OATP2A1 and ENT2 (Zamek-Gliszczynski et al., 2018). Active transporters are often targeted to improve the delivery of drugs to the central nervous system (CNS) (Begley and Brightman, 2003; Di et al., 2008; Sanchez-Covarrubias et al., 2014; Geldenhuys et al., 2015; Copur and Oner, 2017).
Investigation of BBB permeability is essential when evaluating the abuse potential of new pharmaceuticals and designing CNS drugs, as only 2% of small molecules cross the BBB (Kola and Landis, 2004; Pardridge, 2005). However, experimental determination of BBB permeability in rodents is often tedious and expensive. As a result, several quantitative structure-activity relationship (QSAR) models have been developed over the years to predict BBB permeation using a variety of methodologies and datasets (Table 1) and to reduce the use of laboratory animals. QSAR models describe the correlation between chemical moieties and their biological activities under the general assumption that similar chemical structures exhibit similar biological activities. QSAR models are particularly useful as they provide rapid, early screening of drugs based upon their chemical structure. Most BBB QSAR models are based on log BB data, which is defined as the logarithmic ratio of the steady-state concentration of a drug in the brain to the blood or plasma. BBB permeability has also been modeled using permeability-surface area (log PS) data and free drug concentration ratio between brain and plasma (Kp,uu,brain) in vivo rodent data (Gratton et al., 1997; Liu et al., 2004; Friden et al., 2009; Loryan et al., 2015; Varadharajan et al., 2015). Although log PS and unbound brain-to-plasma concentration (Kp,uu,brain) are widely accepted as critical parameters in drug distribution, the publicly available data are limited and therefore the applicability of these models may also be limited (Abraham, 2004; Liu et al., 2004; Friden et al., 2009).
There are several molecular descriptors that have been used to predict BBB permeability including lipophilicity, polar surface area, and hydrogen bonding ability (Young et al., 1988; Van de Waterbeemd and Kansy, 1992a; Abraham et al., 1994; Clark, 1999). However, more recently, 2D structure-based dragon descriptors (Zhang et al., 2008), 3D structure-based VolSurf descriptors (Crivori et al., 2000), solvation free energies (Lombardo et al., 1996), and 3D conformations (Keserü and Molnár, 2001) have been used in making BBB models. Additionally, in the earlier studies, multiple linear regression (MLR) analysis was utilized to relate molecular descriptors to log BB. One shortcoming of the MLR analysis is the finite number of descriptors that could be employed. Other methods that have been employed include partial least square analysis, genetic algorithms (GA), random forest (RF), support vector machine (SVM) and artificial neural networks (ANN).
A common limitation among many of the previously constructed models is their small training set size, which limits their applicability in a regulatory environment. Although numerous models have been developed in the last decade using much larger training sets (n = 1,000+), these datasets often contain a combination of data types including in silico predicted data, experimental data from in vitro and in vivo studies, and clinical side effects data (Martins et al., 2012; Gao et al., 2017; Fan et al., 2018; Wang et al., 2018; Yuan et al., 2018; Miao et al., 2019; Alsenan et al., 2020, 2021; Liu et al., 2021). Other limitations of the data sets used in previous QSAR models include (i) the use of indirect measurements, (ii) use of unverified or wrongly interpreted data, and (iii) lack of chemical diversity. Finally, challenges affecting implementation of previously developed models such as updating training set data limit the applicability of those models (Fan et al., 2010).
In the present study, two statistical-based models for predicting BBB permeability have been constructed using Leadscope Enterprise (LS) and CASE Ultra (CU). The new training sets contain in vivo rodent data from drugs, drug metabolites and non-drugs, and have the largest number of chemicals compared to previously published models trained on in vivo data. Moreover, the quality of the underlying training data has been enhanced through careful review of original experiments to resolve or remove discrepant studies. In addition, predictive performance of the newly constructed models has been assessed using both internal and external validation experiments and showed good predictive accuracy. Finally, these new models can be rapidly used to design CNS drugs and to assess abuse potential of drug candidates.
2.1 Data sources
All training set data used to construct BBB permeability model were comprised of non-proprietary data harvested from published literature (e.g., PubMed, Web of Science v.5.34, Scopus, Elsevier, and Google Scholar), US FDA approval packages (e.g., Drugs@FDA and PharmaPendium®), EMA approval packages (e.g., PharmaPendium®), and patents. All references for BBB databases are provided in Supplementary Table S1.
2.2 Data scoring
The BBB permeability database contains blood/plasma (B/P) or blood/brain (B/B) ratios obtained from rodents that were treated via intravenous, intraperitoneal, or oral routes. For the majority of data entries, the amount of the chemical present in the brain and blood or plasma was measured in the animals 30 min to a several hours after administration. However, in some cases, the animals were sacrificed at certain intervals after treatment and different B/P ratios were reported. In such cases, the ratio of the area under the curve (AUC) for the brain and plasma concentrations were used. In experiments where different amounts of a chemical were reported in different parts of the brain, the average value was considered. All findings were transformed into a binary scoring system for modeling purposes, where “0” denotes a negative finding (no brain penetration) and “1” denotes a positive finding (brain penetration). Chemicals with a log BB ≥ -1 were considered positive while chemicals with a log BB < -1 were considered negative (Vilar et al., 2010). The final BBB database is comprised of 921 compounds with 52% positives. The dataset and references are provided in Supplementary Table S1.
2.3 Chemical structure curation
The chemical structures were obtained from SciFinder® and published literature. Electronic representations of chemical structures were created using structure data file (SDF) format. Inorganic chemicals, noble gases, mixtures, single atoms, metals, and high molecular weight compounds (MW ≥ 1800; polysaccharides, proteins, polymers, etc.) were excluded from the training set due to processing limitations within the QSAR software. Furthermore, the neutralized free form of any simple salt was included. A final manual inspection was performed to ensure the chemicals, their associated data and references were accurately recorded.
2.4 QSAR software
Two commercial QSAR software platforms, Leadscope Enterprise (LS) version 3.9 (Instem Inc., United States), and CASE Ultra (CU) version 220.127.116.11 (MultiCASE Inc., United States) were used to construct two distinct binary QSAR models. All software programs were acquired and used under Research Collaboration Agreements between FDA/CDER and the software providers mentioned above.
2.4.1 Leadscope Enterprise (LS)
LS is a data mining, visualization, and advanced informatics application that includes the capability to build and apply QSAR models. To construct QSAR models for BBB, a training set of 921 chemicals was imported into LS and fingerprinted using a set of 27,142 pre-defined medicinal chemistry structural features as candidate descriptors for model building. A small predictive subset of these features was used to construct the model. Additionally, a set of unique scaffolds was automatically constructed from the pre-defined structural features that specifically defined structure-activity relationships in the training set. The unique set of scaffolds was generated for the BBB permeability model using the following settings: 1) a minimum of three compounds per scaffold; 2) a minimum six of atoms per scaffold; 3) no restriction on the maximum number of rotatable bonds; and 4) a minimum absolute Z-score of 1.0. Z-score of a structural features is the difference between the mean activity of the subset of compounds having that feature and the mean activity of the full set (Roberts et al., 2000). Molecular properties such as molecular weight, number of rotatable bonds, number of hydrogen bond donors, number of hydrogen bond acceptors, Lipinski score, AlogP (logarithm of 1-octanol/water partition coefficient), polar surface area, and atom count were calculated using Leadscope. The squared Pearson correlation coefficients (R2) for the molecular properties were computed using python and added to the models to improve predictive performance.
Highly predictive features and the corresponding helper features were identified in the feature editor for retention while weakly predicted features were removed using Z-score, frequency, precision and mean activity as discriminating parameters (Roberts et al., 2000). Subsequently some features were divided to better define their chemical environment (acyclic vs cyclic) or expanded using the expand features to more specifically define their functional groups. Additional pruning was manually performed to reduce the number of features while maintaining optimal predictive performance. Specifically, redundant features, highly overlapping or similar features, and coincidental features that were highly correlated were removed. Lastly, the total number of model features was reduced using a partial least-squared regression algorithm leaving only those that best fit the experimental activity scores in the training set (Roberts et al., 2000).
For BBB model, cross-validation was performed 10 times using a 10 × 10% leave-many-out (LMO) method. This method randomly selects 10% of the training set for testing and reconstructs a reduced model using the remaining 90% of the compounds and recalculates the descriptor weights. This process was repeated 10 times with 10 diverse training sets ensuring that all the compounds present in the training set were predicted ten times. The average predicted values were used in calculating the Cooper statistics (Cooper et al., 1979).
A classification threshold was determined by varying the positive cutoff probability thresholds for equivocal results and analyzing the resulting Cooper statistics. The optimal probability range for indeterminate predictions for the BBB model were identified to be 0.4 to 0.6. Predictions that are above the 0.6 probability cutoff were classified as positive, while predictions below 0.4 were classified as negative. A chemical was treated as out-of-domain (OOD) in instances where the test chemical did not contain any structural model features or showed a lack of similarity to the training set compounds (at least 30% similarity to a single training set compound is required).
2.4.2 CASE Ultra (CU)
CU is a QSAR software platform that builds models using various machine learning algorithms applied on training sets of chemical structures and their activity labels. The algorithm automatically generates molecular fragments from the training structures and uses them as descriptors. A CU model contains a set of structural alerts and deactivating features identified from the training data. The structural alerts are substructures primarily associated with active training compounds and the deactivating features decrease the potency of the alerts. These features are incorporated in a global logistic regression QSAR model and therefore contains positive and negative quantitative weights. During application of the model, the alerts and deactivating features are searched in the test chemical, and the regression model is used to generate a score between 0 and one to indicate the likelihood of the test chemical being positive. The model also verifies if all three-atom linear fragments generated from the test compounds are present in the training structures to establish that the test chemical is within the applicability domain of the model. No hyper-parameter optimization is performed.
The BBB model was constructed in CU using a training set of 921 chemicals. The models were cross-validated internally 10 times using a previously described 10 by 10% LMO method. The classification threshold was selected based on optimal balance between sensitivity and specificity on the receiver operating characteristic (ROC) curve. During model application, predictions were classified as equivocal when a predicted confidence was within ±0.1 of the classification threshold. Predicted values above the upper bound of this range were treated as positive, and those below this range were treated as negative. An out-of-domain (OOD) response was given to any chemicals that contained one or more unknown fragments not recognized by the model and do not contain combination of alerts/features strong enough to give a positive prediction.
2.5 External validation
The predictive performance of the BBB models was assessed using an external validation set comprised of 83 chemicals (42 positives and 41 negatives) obtained from published literature. All references and activity scores are provided in Supplementary Table S2 for the external validation set.
2.6 Combining model outputs in external validation
To examine the combined predictive performance of LS and CU, a positive prediction from any one software platform was used to justify an overall positive prediction. Similarly, an equivocal prediction from any one software platform was used to justify an overall equivocal prediction, in the absence of a positive prediction. In the case that one of the models was OOD and the other model generated a prediction, the OOD was disregarded and the prediction was used to generate an overall call. An overall negative prediction was reported when a statistical model generated a negative prediction in the absence of positive or equivocal predictions from the other model.
2.7 Performance statistics
In order to evaluate the performance of individual model outputs, Cooper statistics was employed. Briefly, predictive performance was evaluated using a classic 2x2 contingency table containing counts of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). Chemicals classified as OOD and equivocal were excluded from Cooper statistic calculations. Statistics such as sensitivity [TP/(TP + FN)], specificity [TN/(TN + FP)], positive predictivity [TP/(TP + FP)], negative predictivity [TN/(TN-FN)], and accuracy [(TP + TN)/(TP + TN + FP + FN)] were calculated as described by Cooper et al., 1979 (Cooper et al., 1979). Coverage was calculated as the percentage of all chemicals screened for which a prediction could be made (OOD results do not constitute a prediction).
3.1 Database overview
In the present study, the rodent BBB permeability dataset was compiled from publicly available data sources and the original study data were used. The results from rats and mice were treated as equivalent since previous studies show no significant difference in brain permeability between rats and mice (Murakami et al., 2000; Abraham et al., 2006). The final BBB permeability database contains 921 unique chemicals of which 621 compounds are from in vivo studies in rats and 300 in mice. The database is well-balanced with a total of 478 compounds scored as positive and 443 as negative (activity scores provided in Supplementary Table S1). Furthermore, the database is comprised of 263 drug substances approved between 1939 and 2022, 21 drug derivatives, 13 drug metabolites, 61 investigational drugs undergoing clinical trials, 14 prodrugs, and 549 other non-drug molecules. This database covers a broad range of chemical space, functional groups, and Anatomical Therapeutic Chemical (ATC) classes as presented in Figure 1. Most functional groups and ATC classes have an equal distribution between positives and negatives in the database. Chemicals that contain carboxylic acid, sulfone, sulfonyl and sulfonamide functional groups were mostly negative. As expected, the majority of central nervous system drugs in the database cross the BBB. However, two triptan analgesics (rizatriptan and almotriptan) were identified among negative drugs (Figure 2B). A possible reason is that triptans are usually substrates of human, but not rat, BBB uptake transporters (Zhang et al., 2016). Another interesting finding is that majority of the cardiovascular drugs in the database cross the BBB. A review of the literature suggested that the lipophilicity of many cardiovascular drugs, specifically beta blocking agents, may be the reason for their BBB permeability (McAinsh and Cruickshank, 1990; Goldner, 2012; Shah et al., 2020). When compared to several of the previously described models, the current training set showed improved coverage of almost all chemical functional groups (Supplementary Table S5).
FIGURE 1. Database analysis (A) Assessment of the functional groups present in the entire BBB database. (B) Anatomical Therapeutic Chemical (ATC) level 1 classes present BBB database. (C) ACT level 2 classes of the nervous and cardiovascular systems.
FIGURE 2. Leadscope model analysis. (A) Selected chemical features with highest and lowest Z-scores. The arrow shows the order of Z-scores associated with features in the model. (B) Correlation (R2) between pairs of physicochemical features. (C) Histogram of the predictivity (blue bars) and frequency (grey bars) as a function of probability in LS model.
3.2 QSAR model development
Previous modeling efforts employed calculated physicochemical descriptors such as polar surface area (PSA), number of hydrogen donors/acceptors, and molecular weight to predict BBB (Young et al., 1988; van de Waterbeemd and Kansy, 1992b; Abraham et al., 1994; Lombardo et al., 1996; Norinder et al., 1998; Clark, 1999; Luco, 1999; Feher et al., 2000; Keserü and Molnár, 2001; Platts et al., 2001; Abraham, 2004; Abraham et al., 2006). While these properties influence BBB permeability of molecules and can be applied to simple cases, they are limited in their ability to comprehensively predict BBB permeability of drugs that pass through more complex mechanisms. In the present study, machine-learning algorithms were used to examine all structural features present in the training set and global molecular properties that are useful to predict and interpret BBB permeability. The two modeling platforms that were used to construct BBB models are LS and CU.
The LS QSAR model was optimized by manual refinement of chemical structural features and physicochemical descriptors. Highly predictive features were identified for retention while 14 redundant and less discriminating chemical features were removed. The total number of chemical features present in the final BBB model is 386. Examples of chemical features with highest and lowest Z-scores, corresponding to highest and lowest BBB permeability are presented in Figure 2A. Chemical features with highest Z-scores are comprised of aliphatic and aromatic rings while carboxylic acids and carbonyls have the lowest Z-scores. Moreover, polycyclic secondary and tertiary amines are also positive features.
Additionally, six physicochemical descriptors including molecular weight, number of rotatable bonds, number of hydrogen bond donors, Lipinski score, AlogP, and PSA were assessed for their predictive ability. The overall results showed a very poor correlation between the individual physicochemical descriptors and log BB alone. Specifically, the squared Pearson correlation coefficient (R2) values for log BB ratio and molecular weight, PSA, and number of hydrogen bond donors are 0.02, 0.08 and 0.02, respectively (see Figure 2B and Supplementary Figure S1). However, it should be noted that a 3% increase in statistical performance was observed upon inclusion of the six molecular descriptors. The predictivity of the model and frequency of the compounds as a function of probability is presented in Figure 2C. The U-shaped plots indicate the optimum regression, with the maximum probability located at the two ends of the axis. The lowest predictivity and frequency was identified to be at approximately 0.5 and selected as the equivocal range.
The CU models were optimized using ROC plots that were generated by varying the classification threshold which defines a positive prediction (Figure 3A). The optimal classification threshold was identified to be 0.55 (Figure 3A; orange dot). The number of chemical fragments present in the CU model is 171. Selected examples of chemical fragments with the highest number of positive and negative compounds are presented in Figure 3B. Chemical fragments with the highest number of chemicals that permeate the BBB contain aromatic moieties and amines while chemical fragments with the highest number of negative chemicals contain carboxylic acids and cyclic ethers.
FIGURE 3. Case Ultra model analysis. (A) ROC plot of the BBB model. The orange dot corresponds to the optimal classification threshold (B) Selected examples of chemical fragments with highest number of positive and negative compounds.
3.3 Performance statistics of BBB permeability model using cross-validation and external validation
The predictive performance statistics for the BBB models based on 10% LMO cross-validation experiments as well as the external validation experiments are presented in Table 2. However, it should be noted that the coverage in the cross-validation statistics from LS and CU cannot be compared directly as they are calculated differently. The CU coverage is calculated using domain analysis while LS provides a prediction for all the chemicals in the cross-validation experiment. The LS model achieved a sensitivity of 82% and a negative predictivity of 80% in cross-validation, while the CU model achieved a sensitivity of 85% and a negative predictivity of 83%. Furthermore, when using an external validation set of 83 chemicals (51% positive; 28 drugs and 55 non-drug molecules), the LS model achieved a sensitivity of 70% and a negative predictivity of 72%, while the CU model achieved a sensitivity of 75% and a negative predictivity of 70%. The partitioned predictive performance for drugs and nondrugs is provided in Supplementary Table S3. Additionally, a prediction comparison analysis for LS and CU by functional groups and drug classes is provided in Supplementary Figures S2–4.
TABLE 2. Validation statistics for BBB permeability QSAR models. Columns 2 and 3 show cross-validation performance statistics and columns 4–6 show external validation performance statistics for single and combined models.
In a subsequent evaluation, the combined predictive performance of the LS and CU models was assessed (Table 2). Here, the models achieved a sensitivity of 80% and negative predictivity of 70%. However, a decrease in specificity (51%) and positive predictivity (64%) was observed when predictions across the two software programs were combined. These results were anticipated given that combining predictions across different software platforms results in an increase of false positive predictions. A total of 11 chemicals were outside the applicability domain for LS while 18 were outside applicability domain for CU. However, when predictions from the LS and CU were combined, 93% of all chemicals were within the applicability domain.
4.1 Database development
Obtaining meaningful alerts and a robust QSAR model depend heavily on the quality of the training set data. In the present study, efforts were made to identify and extract high quality data for the BBB permeability model. One of the most commonly reported and trusted measures for BBB permeability is log BB; this parameter is generated by most pharmaceutical companies for drug candidates. Among the challenges of combining Log BB data from multiple sources is the potential for introducing conflicting data into models thereby affecting the quality of the data. To enhance the overall quality of the underlying data, chemicals with contradictory and/or equivocal study results were reviewed and resolved or removed from the databases.
Recently, several studies have suggested that the steady state unbound brain-to-plasma ratio, Kp,uu,brain is a relevant parameter to measure drug concentration as the key driving force for drug distribution is the free concentration in the brain. However, publicly available data for Kp,uu,brain are very limited and therefore a viable model to predict Kp,uu,brain could not be developed at this time.
4.2 Role of descriptors in BBB permeability
Previously published models employed calculated physicochemical descriptors such as lipophilicity, PSA and/or hydrogen bonding (Young et al., 1988; Van de Waterbeemd and Kansy, 1992a; Abraham et al., 1994; Clark, 1999). There is a general agreement that these specific descriptors can influence log BB (Clark, 2003). For instance, lipophilicity is positively correlated with log BB (Young et al., 1988; Calder and Ganellin, 1994; Kaliszan and Markuszewski, 1996; Salminen et al., 1997; Goodwin and Clark, 2005) while hydrogen bonding is negatively correlated to brain penetration (Van de Waterbeemd and Kansy, 1992a; Calder and Ganellin, 1994; Clark, 1999). In addition, several reports indicate log BB is negatively correlated to molecular weight (Calder and Ganellin, 1994; Kaliszan and Markuszewski, 1996; Salminen et al., 1997; Kaznessis et al., 2001; Platts et al., 2001). In this investigation, the use physicochemical descriptors was found to improve the overall performance of the LS models (by 3%) when combined with chemical features, although physicochemical descriptors were poorly correlated with experimental log BB parameter alone. This can be attributed to the larger log BB data set that covers a more diverse chemical space (Brito-Sanchez et al., 2015). It is anticipated that as more data become available, finding a single equation that describes log BB as a function of physicochemical descriptors will become more difficult. Therefore, models that use a combination of chemical and physicochemical features may be advantageous.
A review of alerts and deactivating chemical features in LS and CU models revealed that the top features with highest activity scores belong to polycyclic aromatic compounds. The training set structures representing these alerts are relatively nonpolar (lipophilic), which is favorable for crossing the BBB. In addition, unlike primary amines, polycyclic secondary and tertiary amines are among the top positive alerts. Beside the reduced polarity of those amines, the ability of making hydrogen bonds is also reduced compared to primary amines, which may explain their higher BBB permeability (Silverman et al., 2009). In contrast, features that contained carboxylic acids and alcohols had the lowest activity scores presumably due to their ability to form hydrogen bonds (Abraham et al., 1994). At physiological pH of 7.4, carboxylic acids are dissociated to carboxylate ions, which improves their water solubility and the ability to form hydrogen bonds (Bredael et al., 2022). The low lipophilicity of carboxylic acids, at physiological pH, also limits their BBB penetration (Soloway et al., 1960). Additionally, ethers were also found among negative features due to their ability to accept hydrogen bonds. However, one should be aware of exceptions to the hydrogen bond rule. As discussed earlier, there is a low correlation between log BB and number of hydrogen bond acceptors/donors. A detailed assessment of the functional groups that are present in the BBB database showed that the current training set contains 135 compounds that have carboxylic acid groups with 30 being BBB permeable (Figure 1A). Compounds with carboxylic acids that pass BBB are typically substrates of uptake transporters. An example of this is l-DOPA, a precursor for dopamine, which has a carboxylic acid and a diol group and is capable of crossing BBB (Di et al., 2013).
4.3 External validation of BBB QSAR models
In this investigation, an external validation set was used to examine BBB models individually and by combining predictions across LS and CU. In a regulatory setting, high sensitivity and negative predictivity are preferred to reduce the risk of false negatives and minimize the risk to public health. Towards this end, the current BBB models were tuned to achieve high sensitivity and negative predictivity while maintaining good overall predictive performance in other statistical parameters. Specifically, the new models showed a sensitivity ranging from 70 to 75% and negative predictivity ranging from 70 to 72% in external validation. Furthermore, when predictions from the two methodologies were combined, a sensitivity of 80% and coverage of 93% was achieved. While the increase in the false positive rate is not ideal when predictions are combined, it can be mitigated by evaluating the alerts behind the positive prediction and examining structurally similar analogs. Perhaps the most striking finding was that OOD chemicals in CU were successfully predicted by LS, suggesting that the two software platforms interpret chemicals differently resulting in different OOD domain predictions. Moreover, the overall increase in coverage is desirable for predicting a large diversity of chemicals.
BBB penetration of drugs is a complicated process involving passive diffusion and active transport (efflux or uptake). The current data set includes known substrates of active transporters. The model is agnostic to such complicated processes. Furthermore, the log BB data entries are collected from different experiments where drugs are administered through different routes and brain samples are collected at various time points post administration. Despite all these complications, the model can estimate BBB permeation with relatively high precision. In future, utilization of a combination of models for different transport mechanisms may further improve the log BBB predictivity.
In the present study, a complementary computational model has been developed using two software platforms, LS and CU to predict whether an unknown substance can penetrate the blood-brain barrier (BBB). The model has a large training set and includes up-to-date information for drugs and their metabolites, and non-drugs to provide an optimal domain of applicability. Advantages of the current data set over previous ones are (i) exclusive use of data from in vivo rodent experiments and (ii) use of a more balanced dataset, which allows for more accurate modeling. The current models demonstrate improved coverage of chemical functional groups over several of the previously described models and show good sensitivity and negative predictivity, which are critical parameters for the safety assessment. Furthermore, the use of two software platforms was found to increase coverage to 93%. When predictions are in consensus, greater confidence can be inferred. However, when predictions are inconclusive or conflicting among the two software platforms, an expert review can provide supporting information.
In conclusion, the newly constructed models can be rapidly deployed during drug development to predict BBB permeability of drugs and their metabolites and reduce the need to test laboratory animals. Identification of drug candidates that cross the BBB can inform strategies for derisking the potential for abuse liability and to assist with designing CNS drugs.
Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material further inquiries can be directed to the corresponding author.
LS, MK, and DV conceptualized the project. SF harvested data and performed calculations. SF and SC built and validated models. SF, KC, SC, and LS analyzed data. SF, LS, MK, DV, KC, and SC wrote and edited the manuscript.
This project was supported by intramural funding from the FDA/CDER Safety Research Interest Group and in part by an appointment to the Research Participation Programs at the Oak Ridge Institute for Science and Education through an interagency agreement between the Department of Energy and FDA.
CASE Ultra and Leadscope Enterprise software platforms were used by FDA/CDER under Research Collaboration Agreements with MultiCASE Inc., and Leadscope Inc., respectively. Dr. Rajamani Selvam carried out the initial data curation.
Conflict of interest
Author KC was employed by the Leadscope Inc; Author SC was employed by the MultiCASE Inc. LS reports that she is the co-Principal Investigator on two Research Collaboration Agreements (RCAs) between the US Food and Drug Administration’s Center for Drug Evaluation and Research, and Leadscope Inc., and MultiCASE Inc., respectively.
The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphar.2022.1040838/full#supplementary-material
Abraham, M. H., Chadha, H. S., and Mitchell, R. C. (1994). Hydrogen bonding. 33. Factors that influence the distribution of solutes between blood and brain. J. Pharm. Sci. 83, 1257–1268. doi:10.1002/jps.2600830915
Abraham, M. H., Ibrahim, A., Zhao, Y., and Acree, W. E. (2006). A data base for partition of volatile organic compounds and drugs from blood/plasma/serum to brain, and an LFER analysis of the data. J. Pharm. Sci. 95, 2091–2100. doi:10.1002/jps.20595
Begley, D. J., and Brightman, M. W. (2003). “Structural and functional aspects of the blood-brain barrier,” in Peptide transport and delivery into the central nervous system. Editors L. Prokai, and K. Prokai-Tatrai (Basel: Birkhäuser Basel), 39–78.
Bredael, K., Geurs, S., Clarisse, D., De Bosscher, K., and D’hooghe, M. (2022). Carboxylic acid bioisosteres in medicinal chemistry: Synthesis and properties. J. Chem. 2022, 1–21. doi:10.1155/2022/2164558
Brito-Sanchez, Y., Marrero-Ponce, Y., Barigye, S. J., Yaber-Goenaga, I., Morell Perez, C., Le-Thi-Thu, H., et al. (2015). Towards better BBB passage prediction using an extensive and curated data set. Mol. Inf. 34, 308–330. doi:10.1002/minf.201400118
Bujak, R., Struck-Lewicka, W., Kaliszan, M., Kaliszan, R., and Markuszewski, M. J. (2015). Blood–brain barrier permeability mechanisms in view of quantitative structure–activity relationships (QSAR). J. Pharm. Biomed. Anal. 108, 29–37. doi:10.1016/j.jpba.2015.01.046
Castillo-Garit, J. A., Casanola-Martin, G. M., Le-Thi-Thu, H., and Barigye, S. J. (2017). A simple method to predict blood-brain barrier permeability of drug-like compounds using classification trees. Med. Chem. 13, 664–669. doi:10.2174/1573406413666170209124302
Clark, D. E. (1999). Rapid calculation of polar molecular surface area and its application to the prediction of transport phenomena. 2. Prediction of blood-brain barrier penetration. J. Pharm. Sci. 88, 815–821. doi:10.1021/js980402t
Crivori, P., Cruciani, G., Carrupt, P.-A., and Testa, B. (2000). Predicting Blood−Brain barrier permeation from three-dimensional molecular structure. J. Med. Chem. 43, 2204–2216. doi:10.1021/jm990968+
Deconinck, E., Zhang, M. H., Petitet, F., Dubus, E., Ijjaali, I., Coomans, D., et al. (2008). Boosted regression trees, multivariate adaptive regression splines and their two-step combinations with multiple linear regression or partial least squares to predict blood-brain barrier passage: A case study. Anal. Chim. Acta 609, 13–23. doi:10.1016/j.aca.2007.12.033
Dixon, S. L., Duan, J., Smith, E., Von Bargen, C. D., Sherman, W., and Repasky, M. P. (2016). AutoQSAR: An automated machine learning tool for best-practice quantitative structure–activity relationship modeling. Future Med. Chem. 8, 1825–1839. doi:10.4155/fmc-2016-0093
Doniger, S., Hofmann, T., and Yeh, J. (2002). Predicting CNS permeability of drug molecules: Comparison of neural network and support vector machine algorithms. J. Comput. Biol. 9, 849–864. doi:10.1089/10665270260518317
Fan, J., Yang, J., and Jiang, Z. (2018). Prediction of central nervous system side effects through drug permeability to blood–brain barrier and recommendation algorithm. J. Comput. Biol. 25, 435–443. doi:10.1089/cmb.2017.0149
Fan, Y., Unwalla, R., Denny, R. A., Di, L., Kerns, E. H., Diller, D. J., et al. (2010). Insights for predicting blood-brain barrier penetration of CNS targeted molecules using QSPR approaches. J. Chem. Inf. Model. 50, 1123–1133. doi:10.1021/ci900384c
Friden, M., Winiwarter, S., Jerndal, G., Bengtsson, O., Wan, H., Bredberg, U., et al. (2009). Structure-brain exposure relationships in rat and human using a novel data set of unbound drug concentrations in brain interstitial and cerebrospinal fluids. J. Med. Chem. 52, 6233–6243. doi:10.1021/jm901036q
Fu, X. C., Wang, G. P., Shan, H. L., Liang, W. Q., and Gao, J. Q. (2008). Predicting blood-brain barrier penetration from molecular weight and number of polar atoms. Eur. J. Pharm. Biopharm. 70, 462–466. doi:10.1016/j.ejpb.2008.05.005
Gao, Z., Chen, Y., Cai, X., and Xu, R. (2017). Predict drug permeability to blood–brain-barrier from clinical phenotypes: Drug side effects and drug indications. Bioinformatics 33, 901–908. doi:10.1093/bioinformatics/btw713
Gratton, J. A., Abraham, M. H., Bradbury, M. W., and Chadha, H. S. (1997). Molecular factors influencing drug transfer across the blood-brain barrier. J. Pharm. Pharmacol. 49, 1211–1216. doi:10.1111/j.2042-7158.1997.tb06072.x
Hemmateenejad, B., Miri, R., Safarpour, M. A., and Mehdipour, A. R. (2006). Accurate prediction of the blood–brain partitioning of a large set of solutes using ab initio calculations and genetic neural network modeling. J. Comput. Chem. 27, 1125–1135. doi:10.1002/jcc.20437
Hou, T., and Xu, X. (2002). ADME evaluation in drug discovery. 1. Applications of genetic algorithms to the prediction of blood-brain partitioning of a large set of drugs. J. Mol. Model. 8, 337–349. doi:10.1007/s00894-002-0101-1
Jiang, L., Chen, J., He, Y., Zhang, Y., and Li, G. (2016). A method to predict different mechanisms for blood–brain barrier permeability of CNS activity compounds in Chinese herbs using support vector machine. J. Bioinform. Comput. Biol. 14, 1650005. doi:10.1142/S0219720016500050
Kaznessis, Y. N., Snow, M. E., and Blankley, C. J. (2001). Prediction of blood-brain partitioning using Monte Carlo simulations of molecules in water. J. Comput. Aided. Mol. Des. 15, 697–708. doi:10.1023/a:1012240703377
Kim, T., You, B. H., Han, S., Shin, H. C., Chung, K.-C., and Park, H. (2021). Quantum artificial neural network approach to derive a highly predictive 3D-QSAR model for blood–brain barrier passage. Int. J. Mol. Sci. 22, 10995. doi:10.3390/ijms222010995
Kortagere, S., Chekmarev, D., Welsh, W. J., and Ekins, S. (2008). New predictive models for blood-brain barrier permeability of drug-like molecules. Pharm. Res. 25, 1836–1845. doi:10.1007/s11095-008-9584-5
Kunwittaya, S., Nantasenamat, C., Treeratanapiboon, L., Srisarin, A., Isarankura-Na-Ayudhya, C., and Prachayasittikul, V. (2013). Influence of logBB cut-off on the prediction of blood-brain barrier permeability. Biomed. Appl. Technol. J. 1, 16–34.
Liu, L., Zhang, L., Feng, H., Li, S., Liu, M., Zhao, J., et al. (2021). Prediction of the blood–brain barrier (BBB) permeability of chemicals based on machine-learning and ensemble methods. Chem. Res. Toxicol. 34, 1456–1467. doi:10.1021/acs.chemrestox.0c00343
Liu, X., Tu, M., Kelly, R. S., Chen, C., and Smith, B. J. (2004). Development of a computational approach to predict blood-brain barrier permeability. Drug Metab. Dispos. 32, 132–139. doi:10.1124/dmd.32.1.132
Loryan, I., Sinha, V., Mackie, C., Van Peer, A., Drinkenburg, W. H., Vermeulen, A., et al. (2015). Molecular properties determining unbound intracellular and extracellular brain exposure of CNS drug candidates. Mol. Pharm. 12, 520–532. doi:10.1021/mp5005965
Luco, J. M. (1999). Prediction of the brain− blood distribution of a large set of drugs from structurally derived descriptors using partial least-squares (PLS) modeling. J. Chem. Inf. Comput. Sci. 39, 396–404. doi:10.1021/ci980411n
Martins, I. F., Teixeira, A. L., Pinheiro, L., and Falcao, A. O. (2012). A Bayesian approach to in silico blood-brain barrier penetration modeling. J. Chem. Inf. Model. 52, 1686–1697. doi:10.1021/ci300124c
Muehlbacher, M., Spitzer, G. M., Liedl, K. R., and Kornhuber, J. (2011). Qualitative prediction of blood-brain barrier permeability on a large and refined dataset. J. Comput. Aided. Mol. Des. 25, 1095–1106. doi:10.1007/s10822-011-9478-1
Murakami, H., Takanaga, H., Matsuo, H., Ohtani, H., and Sawada, Y. (2000). Comparison of blood-brain barrier permeability in mice and rats using in situ brain perfusion technique. Am. J. Physiol. Heart Circ. Physiol. 279, H1022–H1028. doi:10.1152/ajpheart.2000.279.3.H1022
Narayanan, R., and Gunturi, S. B. (2005). In silico ADME modelling: Prediction models for blood-brain barrier permeation using a systematic variable selection method. Bioorg. Med. Chem. 13, 3017–3028. doi:10.1016/j.bmc.2005.01.061
Norinder, U., Sjöberg, P., and Österberg, T. (1998). Theoretical calculation and prediction of brain–blood partitioning of organic solutes using MolSurf parametrization and PLS statistics. J. Pharm. Sci. 87, 952–959. doi:10.1021/js970439y
Obrezanova, O., Csányi, G., Gola, J. M. R., and Segall, M. D. (2007). Gaussian processes: A method for automatic QSAR modeling of ADME properties. J. Chem. Inf. Model. 47, 1847–1857. doi:10.1021/ci7000633
Ooms, F., Weber, P., Carrupt, P.-A., and Testa, B. (2002). A simple model to predict blood–brain barrier permeation from 3D molecular fields. Biochim. Biophys. Acta 1587, 118–125. doi:10.1016/s0925-4439(02)00074-1
Platts, J. A., Abraham, M. H., Zhao, Y. H., Hersey, A., Ijaz, L., and Butina, D. (2001). Correlation and prediction of a large blood-brain distribution data set--an LFER study. Eur. J. Med. Chem. 36, 719–730. doi:10.1016/s0223-5234(01)01269-7
Plisson, F., and Piggott, A. M. (2019). Predicting blood–brain barrier permeability of marine-derived kinase inhibitors using ensemble classifiers reveals potential hits for neurodegenerative disorders. Mar. Drugs 17, 81. doi:10.3390/md17020081
Radchenko, E. V., Dyabina, A. S., and Palyulin, V. A. (2020). Towards deep neural network models for the prediction of the blood–brain barrier permeability for diverse organic compounds. Molecules 25, 5901. doi:10.3390/molecules25245901
Roberts, G., Myatt, G. J., Johnson, W. P., Cross, K. P., and Blower, P. E. (2000). LeadScope: Software for exploring large sets of screening data. J. Chem. Inf. Comput. Sci. 40, 1302–1314. doi:10.1021/ci0000631
Salminen, T., Pulli, A., and Taskinen, J. (1997). Relationship between immobilised artificial membrane chromatographic retention and the brain penetration of structurally diverse drugs. J. Pharm. Biomed. Anal. 15, 469–477. doi:10.1016/s0731-7085(96)01883-3
Sanchez-Covarrubias, L., Slosky, L. M., Thompson, B. J., Davis, T. P., and Ronaldson, P. T. (2014). Transporters at CNS barrier sites: Obstacles or opportunities for drug delivery? Curr. Pharm. Des. 20, 1422–1449. doi:10.2174/13816128113199990463
Shen, J., Du, Y., Zhao, Y., Liu, G., and Tang, Y. (2008). In silico prediction of blood–brain partitioning using a chemometric method called genetic algorithm based variable selection. QSAR Comb. Sci. 27, 704–717. doi:10.1002/qsar.200710129
Shin, H. K., Lee, S., Oh, H.-N., Yoo, D., Park, S., Kim, W.-K., et al. (2021). Development of blood brain barrier permeation prediction models for organic and inorganic biocidal active substances. Chemosphere 277, 130330. doi:10.1016/j.chemosphere.2021.130330
Silverman, R. B., Lawton, G. R., Ranaivo, H. R., Chico, L. K., Seo, J., and Watterson, D. M. (2009). Effect of potential amine prodrugs of selective neuronal nitric oxide synthase inhibitors on blood–brain barrier penetration. Bioorg. Med. Chem. 17, 7593–7605. doi:10.1016/j.bmc.2009.08.065
Subramanian, G., and Kitchen, D. B. (2003). Computational models to predict blood–brain barrier permeation and CNS activity. J. Comput. Aided. Mol. Des. 17, 643–664. doi:10.1023/b:jcam.0000017372.32162.37
Varadharajan, S., Winiwarter, S., Carlsson, L., Engkvist, O., Anantha, A., Kogej, T., et al. (2015). Exploring in silico prediction of the unbound brain-to-plasma drug concentration ratio: Model validation, renewal, and interpretation. J. Pharm. Sci. 104, 1197–1206. doi:10.1002/jps.24301
Vilar, S., Chakrabarti, M., and Costanzi, S. (2010). Prediction of passive blood–brain partitioning: Straightforward and effective classification models based on in silico derived physicochemical descriptors. J. Mol. Graph. Model. 28, 899–903. doi:10.1016/j.jmgm.2010.03.010
Wang, W., Kim, M. T., Sedykh, A., and Zhu, H. (2015). Developing enhanced blood-brain barrier permeability models: Integrating external bio-assay data in QSAR modeling. Pharm. Res. 32, 3055–3065. doi:10.1007/s11095-015-1687-1
Wang, Z., Yang, H., Wu, Z., Wang, T., Li, W., Tang, Y., et al. (2018). In silico prediction of blood–brain barrier permeability of compounds by machine learning and resampling methods. ChemMedChem 13, 2189–2201. doi:10.1002/cmdc.201800533
Wu, Z., Xian, Z., Ma, W., Liu, Q., Huang, X., Xiong, B., et al. (2021). Artificial neural network approach for predicting blood brain barrier permeability based on a group contribution method. Comput. Methods Programs Biomed. 200, 105943. doi:10.1016/j.cmpb.2021.105943
Young, R. C., Mitchell, R. C., Brown, T. H., Ganellin, C. R., Griffiths, R., Jones, M., et al. (1988). Development of a new physicochemical model for brain penetration and its application to the design of centrally acting H2 receptor histamine antagonists. J. Med. Chem. 31, 656–671. doi:10.1021/jm00398a028
Yuan, Y., Zheng, F., and Zhan, C.-G. (2018). Improved prediction of blood–brain barrier permeability through machine learning with combined use of molecular property-based descriptors and fingerprints. AAPS J. 20, 54–10. doi:10.1208/s12248-018-0215-8
Zamek-Gliszczynski, M. J., Chu, X., Cook, J. A., Custodio, J. M., Galetin, A., Giacomini, K. M., et al. (2018). ITC commentary on metformin clinical drug-drug interaction study design that enables an efficacy-and safety-based dose adjustment decision. Clin. Pharmacol. Ther. 104, 781–784. doi:10.1002/cpt.1082
Zhang, D., Xiao, J., Zhou, N., Zheng, M., Luo, X., Jiang, H., et al. (2015). A genetic algorithm based support vector machine model for blood-brain barrier penetration prediction. Biomed. Res. Int. 2015, 292683. doi:10.1155/2015/292683
Zhang, L., Zhu, H., Oprea, T. I., Golbraikh, A., and Tropsha, A. (2008). QSAR modeling of the blood-brain barrier permeability for diverse organic compounds. Pharm. Res. 25, 1902–1914. doi:10.1007/s11095-008-9609-0
Zhang, Y.-Y., Liu, H., Summerfield, S. G., Luscombe, C. N., and Sahi, J. (2016). Integrating in silico and in vitro approaches to predict drug accessibility to the central nervous system. Mol. Pharm. 13, 1540–1550. doi:10.1021/acs.molpharmaceut.6b00031
ABC ATP binding cassette transporters
ANN: Artificial neural networks
ATP Adenosine triphosphate
B/P Brain concentration/plasma concentration
BB Brain concentration/blood concentration
BBB Blood-brain barrier
BCRP Breast cancer resistant protein
BRT Boosted regression trees
CDER Center for drug evaluation and research
CU CASE Ultra
DT Decision tree
FN False negatives
FP False positives
GA Genetic algorithm
GA-CG-SVM Genetic algorithm-conjugate gradient-SVM
GAVS Genetic algorithm based variable selection
kNN k-nearest neighbor
Kp,uu,brain Unbound brain-to-plasma concentration
LDA Linear discriminant analysis
LLC-PK1 Lilly Laboratories cell-porcine kidney cells
MC Monte Carlo
MDR1 Multi-drug resistance protein 1 (same as P-gp)
ML Machine learning
MLP Multilayer perceptron
MLR Multiple linear regression
NA not applicable
NN Neural network
PCR Principle component regression
PHASE Public health assessment via structural evaluation
PLS Partial least-squares
PS Permeability-surface area
PSA Polar surface area
QSAR Quantitative structure-activity relationship
RF Random forest
ROC Receiver operating characteristic
SMILES Simplified molecular input-line entry systems
SMO Sequential minimal optimization
SVM Support vector machine
SVR Support vector regression
TN True negatives
TP True positives
VSMP Variable selection and modeling method based on the prediction
Keywords: blood-brain barrier, permeability, QSAR, in silico, log BB
Citation: Faramarzi S, Kim MT, Volpe DA, Cross KP, Chakravarti S and Stavitskaya L (2022) Development of QSAR models to predict blood-brain barrier permeability. Front. Pharmacol. 13:1040838. doi: 10.3389/fphar.2022.1040838
Received: 09 September 2022; Accepted: 10 October 2022;
Published: 20 October 2022.
Edited by:Terry R. Van Vleet, AbbVie, United States
Reviewed by:Andy Vo, AbbVie, United States
Amit Kumar Halder, Dr. B. C. Roy College of Pharmacy and Allied Health Sciences, India
Copyright © 2022 Faramarzi, Kim, Volpe, Cross, Chakravarti and Stavitskaya. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Lidiya Stavitskaya, Lidiya.Stavitskaya@fda.hhs.gov