Construction of a Quantitative Structure Activity Relationship (QSAR) Model to Predict the Absorption of Cephalosporins in Zebrafish for Toxicity Study

Cephalosporins are beta-lactam antibiotics that are widely used in China. Five generations of cephalosporins have been introduced in clinical practice to date; moreover, some new candidates are also undergoing clinical evaluations. To improve the success rates of new drug development, we need to have a comprehensive understanding about the relationship between the structure of cephalosporins and the toxicity that it induces at an early stage. In the cephalosporins toxicity study using zebrafish, the drug absorption is a key point. In this study, we determined the absorption of cephalosporins in zebrafish during toxicity test. The internal concentrations of 19 cephalosporins in zebrafish were determined using a developed liquid chromatography-tandem mass spectrometry (LC-MS/MS) method. Furthermore, a quantitative structure-activity relationship (QSAR) model was established by multilinear regression; moreover, it was used to predict the absorption of cephalosporins in zebrafish. During leave-one-out cross-validation, a satisfactory performance was obtained with a predictive ability (q2) of 0.839. The prediction ability of the model was further confirmed when the predictive ability (q2) was 0.859 in external prediction. The best QSAR model, which was based on five molecular descriptors, exhibited a promising predictive performance and robustness. In experiments involving drug toxicity, the developed QSAR model was used to estimate internal concentrations of cephalosporins. Thus, the toxicity results were correlated with the internal concentration of the drug within the larvae. The developed model served as a new powerful tool in zebrafish toxicity tests.


INTRODUCTION
Cephalosporins are one kind of potentβ-lactam antibiotics. A cephalosporin was synthesized for the first time in 1964. Thereafter, five generations of cephalosporins have been developed for clinical use (Bryskier, 2000;Hwu et al., 2003;Singh, 2004;Butler and Cooper, 2011). Cephalosporins were derived from the core structure of 7-aminocephalosporanic acid (7-ACA), which attracted a lot of attention from chemists. The molecule 7-ACA had a unique structure, which could be modified to develop new antibiotic candidates; however, these new antibiotic candidates have to be clinically evaluated before being introduced into the market (Macheboeuf et al., 2007). To minimize the attrition rates of drug development, the toxicity of candidate drugs must be determined accurately. This is a crucial part of the drug discovery process. Drug toxicity is a chief reason for withdrawing drugs from the market (Li, 2001(Li, , 2004. Thus, toxicity assessment in the early stage of drug discovery can increase the chances that a drug will reach consumers. Currently, the success rate of new drug development is very low. To improve this parameter, we need to have a comprehensive understanding of the relationship between toxicity and drug structure. In previous studies, the toxic effects of cephalosporins were determined by zebrafish embryo toxicity testing (Zhang et al., 2010(Zhang et al., , 2013(Zhang et al., , 2015Chen et al., 2017;Qian et al., 2018). One of a preliminary conclusion was that the toxic effects of drugs were both influenced by toxic functional groups and drug absorption. The procedure for zebrafish embryo toxicity testing is as follows: zebrafish was typically administered with aqueous solutions of compounds. The diffusion of these compounds predominantly occurred through the skin (Ali et al., 2012;Padilla et al., 2012;Tal et al., 2017). Drug uptake is governed by the physicochemical properties of each compound. The aqueous concentration of the drug in the exposure medium is not necessarily equal to the concentration of the drug in the tissue of zebrafish embryo/larvae (Padilla et al., 2012). In fact, some investigators have pointed out that there is a discrepancy between aqueous concentration and the body burden of chemicals in zebrafish embryo/larva; however, this discrepancy has been observed for only select chemicals (Padilla et al., 2012). To evaluate drug toxicity, it is necessary to determine the internal concentration of drugs in zebrafish.
The quantitative structure-activity relationship (QSAR) model is one of the most popular computer-aided tools, which are employed in medicinal chemistry for drug discovery (Wang et al., 2015). Using QSAR model, a combination of theoretical or semi-experimental calculations and statistical analysis were performed to determine the relationship between molecular structure descriptors and physicochemical properties or biological activities of compounds. The methods were widely used for predicting various physicochemical properties and biological activities of organic compounds; these compounds have wide applications in pharmaceutical industry (Chen, 2015;Thareja, 2015;Raczakgutknecht et al., 2017;Lo et al., 2018). One of the primary application areas is helping researchers understand and exploit the relationship between chemical structures and their biological activities. In general, a suitable set of chemical descriptors is generated from a series of active and inactive molecules. A model is then constructed to determine the relationship between chemical descriptors and the bioactivity of interest. The developed model is then rigorously validated to select those that show best performance. Finally, the model is used to predict the activity of test compounds (Wang et al., 2015). The developed model can accurately predict in silico how chemical modifications might influence biological behavior. Many physiochemical properties of drugs, such as toxicity, metabolism, drug-drug interactions, and carcinogenesis, have been effectively determined by QSAR techniques (Cherkasov et al., 2014). In previous studies, QSAR models involved the use of different molecular descriptors and statistical methods. These models were used to determine the absorption of drugs in human intestine (Shen et al., 2014;Basant et al., 2016;Edwards et al., 2016;Liu, 2018).
To the best of our knowledge, there is no QSAR model for predicting the absorption of cephalosporins in zebrafish. Using the developed LC-MS/MS method , we determined the absorption of 19 cephalosporins in zebrafish. Then, a QSAR model was developed using the absorption data we obtained. The QSAR model was used to predict the absorption of other cephalosporins, which were used in the toxicity test in zebrafish.

Chemicals and Reagents
Cefalexin, cefradine, cefadroxil, cefaclor, ceftizoxime, cefuroxime, cefoxitin, ceftezole, cefazolin, cefotaxime, cefathiamidine, flomoxef, cefmenoxime, cefpirome, cefminox, cefotiam, ceftazidime, cefodizime, cefoperazone, and clenbuterol [internal standard (IS)] were national reference substances, which were obtained from the National Institutes for Food and Drug Control (Beijing, China) ( Table 1). Ammonium hydroxide and acetonitrile were purchased from Fisher Scientific International (Fair Lawn, NJ, United States). Formic acid was purchased from Sigma-Aldrich Corporation (St. Louis, MO, United States). A stock solution (1.0 mg/mL) of each of the cephalosporins was prepared by dissolving each reference standard in deionized water. This stock solution was stored at −70 • C. A stock solution of ammonium formate buffer was prepared by mixing ammonium formate (1.0 M) and formic acid (1.0 M). The resultant stock solution was stored at 4 • C. Different cephalosporins were dissolved in breeding water, which is artificial sea water prepared by dissolving instant ocean sea salt (CNSG, Tianjin, China) in deionized water, respectively.

Zebrafish Embryo Treatment by Cephalosporins
Zebrafish (Danio rerio) were fed at the Institute of Medicinal Biotechnology, Beijing Union Medical College, Beijing, China. In a study conducted by Zhang et al. (2013), the breeding water was prepared for zebrafish with instant ocean sea salt (CNSG, Tianjin, China). Zebrafish wild-type (WT) embryos (n = 30) at 6 h post-fertilization (6 hpf) were used. After immersing the embryos in 2 mL of each drug solution (Drugs were dissolved in breeding water), they were incubated in a 20 mm dish for 3 days. Three concentrations of each drug (0.28, 1.0, and 2.0 mM) were used for incubating 6 hpf zebrafish embryos, and each experiment was repeated at least thrice. The zebrafish bodies were collected at 3 days post-fertilization (dpf). The blank controls were zebrafish WT embryos (n = 30) that were incubated in breeding water.

Determination of Drug Absorption in Zebrafish by LC-MS/MS Method
The anesthetic MS-222 (3-aminobenzoic acid ethylester, methanesulfonate salt, Sigma) was added to the larval solution until its final concentration was 0.016%. Using a cell strainer (100 µm, BD-Falcon, United States), zebrafish bodies (30 in each dish) were washed five times with deionized water. Zebrafish bodies were triturated in 100 µL of deionized water, and then they were homogenized for 1 min. Zebrafish homogenate (50 µL) and internal standard (IS) working solution (10 µL) were individually transferred into a 1.5-mL microcentrifuge tube. Then, 200 µL of 1% formic acid-methanol solution was added into the tube. The samples were vortexed for 20 s. To precipitate proteins, the samples were centrifuged at 10,800 g for 10 min. The resultant supernatant (10 µL) was injected into LC-MS/MS for analysis.

LC-MS/MS Method
The LC-MS/MS analysis was carried out by using the validated method, which was developed by our group . The LC-MS/MS technique was performed by combining a 20A LC (Shimadzu, Kyoto, Japan) and a 6500 Q-Trap mass spectrometer (AB Sciex, Foster City, CA, United States), which was equipped with an electrospray ionization (ESI) source.
Moreover, LC separations were performed on a reversedphase ACE C 18 column (5 cm × 2.1 mm ID, 3 µm particle size, Advanced Chromatography Technologies, Aberdeen, United Kingdom), which was attached to a corresponding guard column (1 cm × 2.1 mm ID, 3 µm particle size; Advanced Chromatography Technologies, United Kingdom). The mobile phase comprised of 5 mM ammonium formate (pH = 3.5) (solvent A) and acetonitrile (solvent B). The composition of the gradient was as follows: 2% solvent B for the first 0.1 min; 3.5 min, 90% solvent B; 6.0 min, 90% solvent B; 6.1 min, 2% solvent B; and 10 min, 2% solvent B. Mass spectrometric analysis was performed in a positive mode. Ion-spray voltage was set at 5000 V, and source temperature was kept fixed at 500 • C. Curtain gas, gas 1, and gas 2 were 25, 40, and 40 psi, respectively. Multiple reaction monitoring (MRM) scan was used for the quantification of 19 analytes. In this scan, we determined the most intense product ion (PI) of each analyte. Table 1 presents the compound-dependent parameters of each analyte. The entrance potential (EP) of all analytes was 10 eV. For each ion transition, dwell time and pause time were set to 30 and 5 ms, respectively. The Analyst 1.6.2 software (AB Sciex, Foster City, CA, United States) was used to perform data acquisition and analysis.

Data Set and Descriptor Generation
Cephalosporins are currently grouped into four generations. The widely used cephalosporins in market are mainly chemicals with ammonia thioxime group at C-7, chemicals with tetrazole group at C-3 and oral cephalosporins with simple substituents such as methyl group. The detailed development history of cephalosporins can be found in the following paper (Bryskier, 2000). Although only 19 analogs were selected in the model, they can totally represent all the cephalosporins widely used in the clinical. The 3D chemical structures of all drugs were obtained from PubChem Compound database. Before minimizing energy, we defined the existing forms of all drugs in the zebrafish. The predicted pKa values of each drug were determined by using Discovery Studio 4.0 software (Accelrys Software Inc., San Diego, CA, United States). We had to determine the molecular conformation of lowest energy prior to computing relevant parameters. For this purpose, structures were energy-minimized by using DS 4.0 software. For energy minimization, the smart minimizer algorithm was implemented by using Chemistry at Harvard Macromolecular Mechanics (CHARMM) force field. The Smart Minimizer algorithm performed 5000 maximum steps with a residual mean square (RMS) gradient, which has a tolerance of 0.01 kcal/(mol × Å). Then, we performed conjugate gradient minimization. We used an implicit solvent model termed distance-dependent dielectrics, and the dielectric constant was set to 80. For calculating molecular descriptors, we considered all the minimized compounds in the next step. A set of 2D and 3D molecular descriptors were calculated by using DS 4.0 software. The software DS 4.0 provided the following molecular descriptors: AlogP series, molecular properties (logD and pKa), surface area and volume, topological descriptors, dipole, jurs descriptors, principal moments of Inertia, and shadow indices. The meaning of all descriptors can be determined with the help of DS 4.0 software. After eliminating small variance, under-represented, and non-numerical variables, we selected 148 molecular descriptors for further QSAR analysis.

QSAR Study
Dose-internal concentration curves were constructed by linear regression analysis. The slope of each linear regression was defined as k-value for each drug. The k-values of 19 drugs were considered as output variables in QSAR study. The entire data set was split into the training set and test set by a random index, which was operated by DS4.0 software. The training set percentage was set at 80. To build and train the model, we did not use data points of the test set; however, these data points were used after model building in order to judge the performance of the model by external validation.
Genetic Function Approximation (GFA) was used to further select the key variables and build QSAR models. In some ways, GFA is a model-selection procedure that is analogous to stepwise regression; however, the key difference is that the GFA produces a population of models (e.g., 100), instead of generating a  . To determine GFA, we combined multivariate adaptive regression splines (MARS) algorithm with genetic algorithm. Our main objective was to evolve the population of equations that fit best with the training set data. It provides an error measure called the lack-of-fit (LoF) score, which automatically penalizes models with too many features. Splines may also be used as a powerful tool for non-linear modeling. The range of variations in this population would give more information about the quality of fit and the importance of descriptors. Thus, GFA algorithm is a simple and powerful way through which we can obtain an accurate interpolative model, which is a superior model as compared to other similar techniques (Rogers and Hopfinger, 1994;Fan et al., 2001). The functional form of the model was linear. The obtained model would lose statistical significance with the inclusion of too many descriptors; therefore, the number of descriptors was restricted between 2 to 6 in these models. The key parameters of GFA were as follows: the population size was 100; maximum generations were 5000; maximum correlation threshold for any two variables was 0.95 in the input data; other parameters were considered as default values. Friedman LOF values were used to determine the quality of obtained equations. Finally, we obtained 100 GFA models.
The following statistical parameters of models were calculated by DS 4.0 software: r 2 is the coefficient of determination; r 2 adj is r 2 adjusted for the number of terms in the model; r 2 pred is the prediction (PRESS) r 2 , which is equivalent to q 2 from a leave-1out cross validation; LoF is the Friedman lack-of-fit score; p is the p-value for the significance of regression; and RMSE is defined as root mean square error. In addition, an external validation was used to determine the model's generalization and predictive accuracy. It was measured by squared correlation coefficient (q 2 ) and root mean square error of external validation (RMSE-EV). The model shows both better model statistics and external validation statistics, which were used for selecting a lower number of input variables.

Determination of the Absorption of Cephalosporins in Zebrafish Embryo
Using the aforementioned method, we analyzed the internal concentrations of cephalosporins in zebrafish. To obtain the calibration curve of each analyte, we prepared calibration samples (1,5,10,25,50,75,100,250, and 500 ng/ml) of each cephalosporin in blank zebrafish homogenate. The calibration samples and administration samples were processed in a timely manner. The internal concentration of each cephalosporin in the zebrafish was calculated by using the corresponding calibration curve ( Table 2). Table 2 shows the results of internal concentrations determined in zebrafish. Zebrafish embryos were exposed to three concentration solutions of each drug (0.28, 1.0, and 2.0 mM) at 6 hpf-3 dpf. There were significant differences between the internal concentrations of different cephalosporins, and indicated that the absorption of each drug was different.
Dose-internal concentration curves were constructed by linear regression analysis (Figure 1). For each drug, the slope of each linear regression was defined as k-value. This parameter was used to evaluate the absorption capacity of each drug. As shown in Table 3, the k-value of each drug was in the range of 0.3018-9.8111. The drug was more readily absorbed when the k-value was higher in magnitude. Cefotaxime, cefathiamidine, cefmenoxime, cefpirome, cefodizime, cefotiam, cefuroxime, cefazolin, and ceftazidime were the drugs whose k-value was lower (<5). This indicates that these drugs are not easily to be absorbed into the system of zebrafish. Cefalexin, cefradine, cefadroxil, cefaclor, ceftizoxime, ceftezole, cefoxitin, flomoxef, cefminox, and cefoperazone were the drugs whose k-value was higher in number (>5). This indicates that these drugs can be absorbed easily.
For approximately 72 h, developing zebrafish embryos are surrounded by the chorion. The zebrafish chorion is a 1.5-2.5 µm thick acellular envelope. The chorion membrane complex to consist of three layers: the middle and inner layers are pierced by cone-shaped pore canals which display a corkscrew-like ridged wall with lamellae ringing the inner surface (Bonsignorio et al., 1996). The chorion of zebrafish has been discussed as a potential barrier for the uptake of chemical substances (Pelka et al., 2017).  So in zebrafish embody toxicity testing, the uptake of chemicals is a complex procedure. The absorption of chemicals in zebrafish could not be estimated only based on simple property of the compound, such as polar surface area (PSA) and lipophilicity (logP or logD) (Kislyuk et al., 2017). However, based on the absorption data we determined in zebrafish, we could use QSAR model to predict the absorption of other cephalosporins or impurities in zebrafish. This would enable us to thoroughly compare the uptake and toxicity of the drug in zebrafish.
The statistical significance of the model was confirmed by p-value, which was smaller than the critical value at 95% confidence limit. The r 2 adj value of model indicates that the model can explain 88.7% of the variance, whereas its q 2 value shows the 85.9% predictive variance of the model. The small difference between r 2 and r 2 prep (less than 0.1) confirms that the model is not over-fitted. Figure 2 illustrates the plot of the observed k-value vs. the predicted k-value of the training set. It can be seen that the predicted data of this model is in accordance with the experimental results. The performance of the model is also validated by comparing statistical parameters of the training set with those of the test set. The determination coefficient and RMSE of the training and external test sets were very similar, which confirms that the prediction ability of the model was good for cephalosporins. Moreover, the built model was used to accurately predict and evaluate the absorption ability of cephalosporins in zebrafish without doing complex experiments. The molecular descriptors involved in model development are as follows: HBD_Count, Jurs_WPSA_1, Num_AliphaticSingleBonds, Num_AromaticBonds, and QED_MW. Table 4 presents the meanings of these molecular descriptors, which are related to hydrogen bond donation, molecular shapes, electronic configuration, and drug-likeness information of compounds. The p-values (Table 4) of each molecular descriptor were smaller than the critical value at 95% confidence limit. This indicates that the statistical significance of selected molecular descriptors. Smaller the p-value of molecular descriptor, more influential would be the k-value of the molecular descriptor. Num_AliphaticSingleBonds and QED_MW are the most important molecular descriptors in the five. Besides, it can be seen from Eq. (1) that Num_AliphaticSingleBonds, Num_AromaticBonds, and QED_MW have positive contribution to the k-value of drugs; however, HBD_Count, and Jurs_WPSA_1 have a negative effect on the k-value of drugs.
As we can see from the model, the descriptors selected are mostly 3D descriptors and they are determined by the 3D structures of the analogs. Though the analogs are very similar Frontiers in Pharmacology | www.frontiersin.org in terms of their 2D structures, their 3D structures are totally different (Figure 3). For example, the 2D structures of cefradine and cefaclor are very similar as is shown in Table 3, however, their 3D structures are different. In the zebrafish, the absorption of drugs is mainly determined by the 3D structures, so this maybe the main reason why cephalosporins have different k-values in zebrafish even though they all have the core structure of 7-ACA. Differences of substitutes at the periphery of their structures can have an important effect on their 3D conformations and further influence their absorption in zebrafish. This study enlightens the necessity of monitoring the absorption of cephalosporins analogs in zebrafish and also provides a creative way of predicting the absorption via QSAR.

CONCLUSION
In this experiment, the internal concentrations of 19 cephalosporins were determined by a developed LC-MS/MS method in zebrafish. An accurate and simple QSAR model was built with five molecular descriptors to predict the absorption of cephalosporins using the obtained internal concentration data of zebrafish. The prediction ability was confirmed by statistical significance and external validation. To the best of our knowledge, this is the first study that predicts the absorption of zebrafish by using the QSAR model. Moreover, this model is used to predict the absorption data of other cephalosporins without involve complex experiments. In addition, this model is also used to predict the absorption data of cephalosporins' impurities, which have organic structures similar to those of cephalosporins. Thus, the absorption data can be correlated with toxicity results, which illustrate that the actual concentration of the drug within the larvae of zebrafish. This is a powerful tool to determine the early stage of toxicity during new drug development for new antibiotics.