Modeling Agrobacterium-Mediated Gene Transformation of Tobacco (Nicotiana tabacum)—A Model Plant for Gene Transformation Studies

The multilayer perceptron (MLP) topology of an artificial neural network (ANN) was applied to create two predictor models in Agrobacterium-mediated gene transformation of tobacco. Agrobacterium-mediated transformation parameters, including Agrobacterium strain, Agrobacterium cell density, acetosyringone concentration, and inoculation duration, were assigned as inputs for ANN–MLP, and their effects on the percentage of putative and PCR-verified transgenic plants were investigated. The best ANN models for predicting the percentage of putative and PCR-verified transgenic plants were selected based on basic network quality statistics. Ex-post error calculations of the relative approximation error (RAE), the mean absolute error (MAE), the root mean square error (RMS), and the mean absolute percentage error (MAPE) demonstrated the prediction quality of the developed models when compared to stepwise multiple regression. Moreover, significant correlations between the ANN-predicted and the actual values of the percentage of putative transgenes (R2 = 0.956) and the percentage of PCR-verified transgenic plants (R2 = 0.671) indicate the superiority of the established ANN models over the classical stepwise multiple regression in predicting the percentage of putative (R2 = 0.313) and PCR-verified (R2= 0.213) transgenic plants. The best combination of the multiple inputs analyzed in this investigation, to achieve maximum actual and predicted transgenic plants, was at OD600 = 0.8 for the LB4404 strain of Agrobacterium × 300 μmol/L acetosyringone × 20 min immersion time. According to the sensitivity analysis of ANN models, the Agrobacterium strain was the most important influential parameter in Agrobacterium-mediated transformation of tobacco. The prediction efficiency of the developed model was confirmed by the data series of Agrobacterium-mediated transformation of an important medicinal plant with low transformation efficiency. The results of this study are pivotal to model and predict the transformation of other important Agrobacterium-recalcitrant plant genotypes and to increase the transformation efficiency by identifying critical parameters. This approach can substantially reduce the time and cost required to optimize multi-factorial Agrobacterium-mediated transformation strategies.


INTRODUCTION
A rapid improvement in the important economic traits of plants is needed due to climate change and the steady increase in global population. Nowadays, in vitro-based biotechnological methods are applied for breeding with the aim of improving plant genotypes through rapid multiplication, micropropagation of disease-free plants, production of plantderived metabolites, and gene transformation (Hesami et al., 2020b). Genetic transformation (genetic engineering) is one of the key biotechnological tools to improve plant performance.
The genetic transformation of plantscan be achieved by direct and indirect methods (Niazian et al., 2017). The most effective and well-known laboratory method for indirect gene transfer in plants is through Agrobacterium infection (Meyers et al., 2010). The Agrobacterium method is a simple, efficient, and practical protocol for the transfer of foreign DNA and is the first prerequisite to produce genetically modified plants (Abbasi et al., 2020). However, this is challenging because of the low efficiency in most of the important plants, as many factors may affect this process. The Agrobacterium strain, Agrobacterium cell density, immersion time, type and concentration of antibiotics to kill Agrobacterium, type and concentration of the selected antibiotics, concentration of acetosyringone, duration of cocultivation, pH and temperature of co-cultivation, and wounding treatments are the key factors that can affect Agrobacteriummediated gene transformation and should be taken into account in all gene delivery studies . The first group of factors that affects Agrobacterium-mediated gene delivery are Agrobacterium strain, Agrobacterium cell density, and antibiotic eliminating Agrobacterium, while the second group of influential factors is explant type and age along with immersion time and wounding treatment. The third group of significant factors is the concentrations of other additives, such as the selected antibiotics for the plant along with the chemical stimulants. Finally, the fourth group of factors that is involved in Agrobacterium-mediated gene transformation which can affect its efficiency are the co-cultivation parameters, such as duration, pH, and temperature. Plasmids for optimizing expression in plants (sub-optimal promoter, enhancer, poor codon usage, 5'UTR sequence, trigger silencing, integration of the gene into a silent region of chromatin) are another important influential factor for the genetic transformation of plants. Numerous studies are underway in order to increase the efficiency of Agrobacterium-mediated gene transformation in different plant species and genotypes by optimizing the aforementioned parameters. In ajowan (Trachyspermum ammi L.), a medicinal plant, different levels of gene transformation parameters, including the Agrobacterium optical density (OD), Agrobacterium strain, Agrobacterium killing antibiotic, acetosyringone concentration, and inoculation duration were assessed during the introduction of the BADH gene, and greatest gene transformation efficiency was obtained using the LB4404 strain of Agrobacterium at OD 600 = 0.6-0.8 × 160 mg/L timentin × 250 µmol/L acetosyringone × 30 min inoculation duration (Niazian et al., 2019). The different levels of cell density of A. tumefaciens (OD 600 = 0.2, 0.3, 0.5, 0.8, 1.0, 1.2, 1.4) and the concentrations of acetosyringone (0-100 µM) were investigated in Agrobacterium-mediated gene transformation of Veratrum dahuricum, a medicinal plant. An optical cell density of 0.8 (OD 600 = 0.8) at 600 nm along with 20 µM of acetosyringone were reported as the optimum levels for these parameters . Different suspension solutions (OD 600 = 0.2, 0.4, 0.6, 0.8, and 1.0), along with the immersion durations (10, 20, 30, and 40 min) and acetosyringone concentrations (50, 100, 150, and 200 µM), were investigated in Agrobacteriummediated gene transformation of Pinus tabuliformis and at 600 nm for Agrobacterium, an optical density of 0.8 × 150 µM acetosyringone × 30 min immersion time were reported as the optimal gene transformation factors . The effects of various optical densities of the Agrobacterium suspension (OD 600 = 0.3, 0.35), duration of incubation (5, 10, 15, 20, and 25 min), and co-cultivation time (24, 48, 72, 96, and 120 h) were investigated in the Agrobacterium-mediated transformation efficiency of pigeon pea [Cajanus cajan (L.) Millsp] (Karmakar et al., 2019). The authors reported a transformation efficiency of 83% using Agrobacterium cells at an optical density (OD 600 ) of 0.25, with an immersion time of 15 min, co-culturing with explants for 72 h which served as the optimized parameters of transformation (Karmakar et al., 2019). The different levels of acetosyringone concentration (0, 50, 100, 150, and 200 mM) and A. tumefaciens cell density (OD 600 = 0.2, 0.4, 0.6, 0.8, and 1.0) were assessed to improve the Agrobacteriummediated transformation efficiency of cotton (Gossypium hirsutum L.'KC3') and an A. tumefaciens cell suspension of OD 600 nm = 0.6 containing 100 mM acetosyringone led to the maximum transformation efficiency of 20.25% (Gurusaravanan et al., 2020).
The incorporation of different chemicals and additives into the medium, which trigger the transformation activity of Agrobacterium or increase the regeneration efficiency, is the next solution to increase the efficiency of Agrobacterium-mediated transformation. In lily (Lilium cv'Manissa') as a cut flower, a higher transformation efficiency (11.1%) was achieved when chloroxynil was used instead of acetosyringone (6.6%) as the phenolic compound (Abbasi et al., 2020). In soybean [Glycine max (L.) Merrill], the transformation efficiency was improved (34.6% vs. 23%) when sodium nitroprusside (as the leading source of the nitric oxide donor) was applied in both shoot inducing and rooting media (Karthik et al., 2020). Spermidine incorporation in the culture medium (15 mg/L), as a polyamine, led to the increased Agrobacterium-mediated transformation efficiency (17.3%) in watermelon (Citrullus lanatus Thunb.) cv. Arka Manik (Vasudevan et al., 2020).
The interaction between the plant genotype and the aforementioned factors, challenges the implementation of the transformation strategies, leading to genotype-dependency in gene transformation studies; meaning the different responses of different plant genotypes to a specific protocol. As an in vitro procedure, Agrobacterium-mediated transformation is a multifactorial biological system that is highly variable and complex in nature, making it a non-deterministic and a non-linear process. Highly non-linear and complex relationships of biological events are difficult to predict by regression-based models. Analysis and interpretation of non-linear biological systems by nonlinear and non-parametric methods is much more efficient. One of the powerful non-linear and non-parametric computational methods to overcome the problem of regression models is the artificial neural network (ANN) (Emamgholizadeh et al., 2015;Pentoś, 2016). A good illustration of this approach is the work conducted by Wawrzyniak (Wawrzyniak, 2020), which demonstrates the superiority of neural networks over a traditional regression-based model in describing the dynamics of fungal growth in the mass of stored rapeseeds. The author reported that the use of neural networks not only allows models that better describe the phenomenon but are also, due to the lack of preliminary assumptions, able to use the wide spectrum of experimental data that could not be used in non-linear regression. Artificial neural networks have been successfully used to analyze non-linear relationships prevalent in a variety in vitro studies in plants (Dutta Gupta and Pattanayak, 2017;Arab et al., 2018). Tobacco is a model plant for gene transformation studies (Mushtaq et al., 2020). Modeling and optimizing the gene transformation protocol in this plant can encourage researchers to establish efficient protocols in other desired recalcitrant plant species. The aim of this study was to create two predictive models in Agrobacterium-mediated gene transfer of tobacco under the influence of different gene transformation parameters.

MATERIALS AND METHODS
The Agrobacterium-Mediated Gene Transformation Procedure and Calculation of the Percentage of Putative and PCR-Verified Transgenic plants A routine Agrobacterium-mediated gene delivery of tobacco was conducted using the pCAMBIA2301 binary vector (11634 bp). This vector carries the kanamycin resistance gene of neomycin phosphotransferase (nptII), which acts as a selectable marker for plant selection, and the gusA reporter gene. In the left border (LB), the nptII gene is driven by the cauliflower mosaic virus 35S (CaMV35S2) promoter and CaMV35S terminator, whereas the intron-gusA in the right border (RB) is driven by the CaMV35S promoter and NOS-terminator. The effects of the factors in Agrobacterium-mediated gene transformation, including Agrobacterium strains (AGL1, LB4404, and GV3101), Agrobacterium cell density (OD 600 = 0.6, 0.7, and 0.8), acetosyringone concentration (200, 300, and 400 µmol/L), and inoculation duration (immersion time) (1, 10, and 20 min) were assessed. Pre-incubated (2 days) leaf disk explants (1 × 1 cm 2 ) of 4-week-old in vitro-obtained plantlets of tobacco were inoculated into the Agrobacterium suspension, containing the binary vector, were blot dried on sterile filter paper and co-cultivated in a phytotron at 25 ± 1 • C with 60 to 70% relative humidity for 48 h (in dark). The Murashige and Skoog medium (MS) (Murashige and Skoog, 1962) supplemented with 0.1 mg/L indole-3-acetic acid (IAA) + 1 mg/L 6-benzylaminopurine (6-BA) and 100 µM acetosyringone was used for co-cultivation (Pathi et al., 2013;Leng et al., 2020). Explants were then transferred to the selective regeneration medium. An agar-solidified MS medium, supplemented with 1 mg/L 2,4-Dichlorophenoxyacetic acid (2,4-D) + 0.5 mg/L kinetin (Kin) + 200 mg/L kanamycin + 160 mg/L ticarcillin disodium/clavulanate potassium was used for direct shoot regeneration. All cultures were maintained in a phytotron at 25 ± 1 • C with 60 to 70% relative humidity under a 16/8 (light/dark) photoperiod. Regular subcultures of explants were performed in the same medium. Then, the emerging shoots (15th day of the experiment) were transferred to a PGR-free MS medium, supplemented with 100 mg/L kanamycin + 300 mg/L cefotaxime, for rooting, maturation, and elongation. After 2 weeks, the roots of the plantlets were washed with distilled water to remove any traces of agar. Then, plantlets were transferred to plastic pots (200-ml) containing autoclaved perlite:peat moss (1:1) and were kept in a phytotron, with aforementioned conditions, for 7 days. The surviving plantlets were assumed as putative transgenic plants and were used as the first dependent variable (output) of the experiment. Figures 1A-F reports all the laboratory steps, from inoculation of the explants to the acclimatization of the putative transformed tobacco plants. Basal MS medium, plant agar, all PGRs, and antibiotics were supplied by Duchefa (Haarlem, The Netherlands). The genomic DNA of putative transformed (kanamycin-resistant) plants was extracted using the method described by Sika et al. (2015). The specific primers (forward: 5 ′ -CCACCATGATATTCGGCAAC-3 ′ and reverse: 5 ′ -GTGGAGAGGCTATTCGGCTA-3 ′ ) were used for the amplification of the NPTII gene (0.54 kb fragment) with polymerase chain reaction (PCR). The amplification was performed using a thermal cycler (MyCycler, BIO-RAD, USA) under the conditions described by Saini and Sonia (2003). The PCR products were separated by electrophoresis on a 1% agarose gel and visualized by ethidium bromide. The percentage of PCRpositive plants were recorded and used as output for the second predictor model.

Construction of Artificial Neural Networks and Statistical Analysis
A factorial experiment based on a completely randomized design with three replications (Petri dishes) was carried out to study the combined effect of independent variables (number of independent observations = 3 × 3 × 3 × 3 × replications = 243). Five explants were cultured in each Petri dish and the percentage of putative transgenes and PCR-verified plants were considered as dependent variables (y 1 , y 2 ). The ANOVA for the number of putative and PCR-verified transgenic plants was performed using SAS R (SAS Institute Inc., Cary, NC, USA) software. Two MLP topologies of the ANN (Kujawa et al., 2014), each with two hidden layers containing neurons, was trained and tested to assess the effect of independent variables (Agrobacterium strains, Agrobacterium cell density, acetosyringone concentration, and inoculation duration) on dependent variables (percentage of putative and PCR-verified plants). The first of the independent variables was qualitative and involved the three adopted values of the feature considered. The other independent values were quantitative. The extended ANN topologies, including input, output, and number of hidden layers, for percentage of putative transgenic plants and PCR-verified plants are presented in Figures 2A,B, respectively. In both models, the independent variables were directly fed into the input layer and their interaction effects on the output were evaluated. An Automatic Network Designer (AND) from Statistica v7.1 (StatSoft Inc., Tulsa, OK, USA) was applied to build the ANN models. The following assumptions were made when using the AND: MLP structure of the neural network; 1 or 2 hidden layers; 1-20 neurons in each hidden layer; activation function-linear or logistic. The optimization process was conducted using trial and error to find the best topology of the AND: MLP structure. In both models, the experimental data (243 data) were divided into 70%:15%:15% parts for training, validation, and testing, respectively. Basic network quality statistics, including SD, mean error, deviation error, mean absolute error, quotient of deviation, and correlation coefficient, were calculated and the model with the lowest mean absolute error and the largest correlation coefficient values was considered the best model. The efficiency of developed ANN models for predicting the dependent variables (the quality of the prediction) was assessed through the ex-post measures of the prediction error, including the mean absolute error (MAE), the root mean square error (RMS), the relative approximation error (RAE), and the mean absolute percentage error (MAPE) statistics (Niedbała, 2019). The impact of independent variables (network inputs) on dependent variables was determined through a sensitivity analysis of the neural network. Error quotient and the rank of variables were applied in the sensitivity analysis (Niedbała et al., 2019a,b). A classical stepwise multiple regression model was generated with SAS R software using the same inputs and outputs of the ANN-MLP model.

RESULTS
The analysis of variance (ANOVA) reported a significant effect of the treatments and their interactions with the percentage of putative and PCR-verified transgenic plants ( Table 1). The four-way interaction of Agrobacterium strain, Agrobacterium cell density, acetosyringone concentration, and inoculation duration was significant for the percentage of putative and PCR-verified transgenic tobacco plants at the 1% probability level ( Table 1).
The best networks were selected, among 10,000 networks, according to the ideal qualitative indicators. The ANNs selected for further analysis were characterized by a linear aggregation function in all of the layers. A linear activation function was applied to the input layer. The neurons in the hidden layers included a hyperbolic activation function. Moreover, a logistic activation function was adopted in the output layer. Basic information on the quality of the adapted neural networks for modeling Agrobacterium-mediated gene transformation of tobacco is presented in Table 2. The small values of errors (learning, validation, test, average, deviation, and mean absolute errors) and the high amount of correlation obtained indicate the strength of the expanded models ( Table 2).
The response surface for the interaction effects of Agrobacterium cell densities and acetosyringone concentrations Frontiers in Plant Science | www.frontiersin.org on putative transgenic tobacco plants showed that the highest percentage of putative transgenic tobacco plants can be obtained by the Agrobacterium cell density, at 600 nm, of 0.8 (OD 600 = 0.8) × 280-300 µM of acetosyringone ( Figure 3A). The chart for the response surface of the interaction effects of Agrobacterium cell densities and inoculation durations showed that Agrobacterium cell density of 0.8 (OD 600 = 0.8) × 20 min inoculation is the best combination to reach the highest percentage of putative transgenic tobacco plants ( Figure 3B). The chart for the response surface of the interaction effects of acetosyringone concentrations and inoculation durations showed that acetosyringone concentrations of 400 µM × 20 min inoculation is the best combination of these two variables in terms of percentage of the putative transgenic plants (Figure 3C).
The response surface for the interaction effects of Agrobacterium cell densities and acetosyringone concentrations on PCR-verified plants showed the highest percentage of transgenic tobacco plants can be obtained by the Agrobacterium cell density, at 600 nm, of 0.8 (OD 600 = 0.8) × 400 µM of acetosyringone ( Figure 4A). The chart for the response surface of the interaction effects of Agrobacterium cell densities and inoculation durations showed that Agrobacterium cell density of 0.8 (OD 600 = 0.8) × 20 min inoculation is the best combination to reach the highest percentage of transgenic tobacco plants ( Figure 4B). The chart for the response surface of the interaction effects of acetosyringone concentrations and inoculation durations reported that acetosyringone concentrations of 220-240 µM × 18-20 min inoculation is the best combination of  these two variables in terms of percentage of the PCR-verified transgenic plants ( Figure 4C).
The results of the prediction errors of the developed models, using measures prediction ex-post of analyzed neural models, are presented in Table 3. The small number of prediction errors (MAPE, RAE, RMS, and MAE) obtained, indicted the strength and accuracy of the developed models in forecasting the percentage of putative and PCR-verified transgenic tobacco plants ( Table 3). The scatter plot of the observed vs. the predicted values of the percentage of putative transgenic tobacco plants showed there was no significant difference between the observed and the ANN-MLP predicted data ( Figure 5A). The high value of the determination coefficient of the model (R = 0.97) suggests high repeatability of the established model ( Figure 5A). The scatter plot of the observed vs. the predicted values of the percentage of PCR-verified transgenic tobacco plants reported no significant difference between the observed and the ANN-MLP predicted data ( Figure 5B). The determination coefficient of the model (R 2 = 0.97) indicates a good performance of the established model (R = 0.819) to predict the percentage of transgenic tobacco plants ( Figure 5B).
Comparison of the actual values of outputs, achieved from the interaction of the different levels of the four inputs, with predicted values achieved from the ANN models, showed the greatest measured and predicted values of the percentage of putative and PCR-verified transgenic tobacco plants were related to the interaction of the LB4404 strain of Agrobacterium × cell density of 0.8 (OD 600 = 0.8) × 300 µM of acetosyringone × 20 min inoculation duration (Supplementary Tables 1, 2). Therefore, this combination of inputs was selected as the optimal condition for Agrobacterium-mediated transformation of tobacco. The next important combination of inputs that led to the higher measured and predicted values of the percentage of putative and PCR-verified transgenic tobacco plants was related to the GV3101 strain of Agrobacterium × cell density of 0.8 (OD 600 = 0.8) × 300 µM of acetosyringone × 20 min inoculation duration (Supplementary Tables 1, 2). The lowest measured and predicted values of the putative transgenic plants were obtained from the AGL1 strain of Agrobacterium × cell density of 0.6 (OD 600 = 0.6) × 400 µM of acetosyringone × 1 min inoculation duration and AGL1 strain of Agrobacterium × cell density of 0.7 (OD 600 = 0.7) × 200 µM of acetosyringone × 1 min inoculation duration (Supplementary Table 1). Results from the stepwise multiple regression analysis revealed the lower performance of this classical statistical model when compared to the ANN-MLP models, in terms of R 2 , RAE, MAE, RMS, and MAPE statistical values ( Table 4). The correlation coefficient of the classical stepwise regression analysis for putative (R 2 = 0.313) and PCR-verified transgenic plants (R 2 = 0.213) was significantly lower than those obtained from MLP-ANN models (R 2 = 0.956, R 2 = 0.671). Instead, the prediction errors (MAPE, RAE, RMS, and MAE) of the stepwise multiple regression analysis for putative and PCR-verified transgenic plants (Table 4) were higher than those of the developed MLP-ANN models ( Table 3).
The network sensitivity analysis of the produced neural models reported the highest rank (1) of investigated factors was related to the Agrobacterium strain, which suggests that this factor is the most important input variable with the potential to affect the percentage of putative transgenic tobacco plants ( Table 5). Based on their impact on the Agrobacterium-mediated gene transformation of tobacco, Agrobacterium cell density, acetosyringone concentration, and inoculation duration were in the second, third, and fourth places of importance, respectively ( Table 5).

DISCUSSION
Earlier Agrobacterium-mediated transformation experiments were analyzed by classical statistical approaches, such as ANOVA and mean comparison analysis, to interpret the experimental results and establish an optimized Agrobacterium-mediated gene transformation protocol (Niazian et al., 2019;Gurusaravanan et al., 2020;Liu et al., 2020;Ma et al., 2020). In plant sciences, there are certain assumptions in order to utilize parametric analysis methods (e.g., ANOVA, correlation, ttests, and regression). Normal distribution of residuals and homogeneity of variance of the errors are the most important assumptions (Ghasemi and Zahediasl, 2012). Checking data distribution, using the normality test (especially frequentist tests), is a prerequisite to apply parametric tests in plant data. However, in plant dynamic study, data have leptokurtic distribution and univariate normality tests are not efficient (Delmail et al., 2011). Therefore, non-parametric tests need to be applied when the basic mentioned assumptions are violated and there is evidence of non-normality and presence of outliers. Consequently, assumptions made on the distribution and homogeneity of the variance of errors are not required in non-parametric algorithms (Pour-Aboughadareh et al., 2019;Hocaoglu et al., 2020).
Results of this study reported that the highest levels of Agrobacterium optical density and acetosyringone concentration as well as inoculation duration led to an optimal response in the plant transformation. However, when all the experimental  inputs are considered (four-way interactions), it was observed the highest level of acetosyringone did not lead to the highest predicted value of the outputs and the negative effect of the higher levels of acetosyringone was evident in the measured and the predicted values of putative and PCR-verified transgenic plants. Actually, we selected the effective levels of parameters investigated as the final level, and increased levels of these three inputs may lead to a lower percentage of transgenic plants.
In biological experiments, such as Agrobacterium-mediated gene transformation, there is a threshold for the applied parameters. Levels and concentrations greater than the optimum will lead to an inhibitory effect. For example, higher Agrobacterium cell density and longer inoculation durations have destructive effects on host cells and significantly reduce regeneration. It is a similar situation for acetosyringone where concentrations higher than the optimum level have inhibitory effects on both Agrobacterium as well as the host cells (Niazian et al., 2019). The efficiency of Agrobacterium-mediated gene transformation depends on the main and interactive effects of several pivotal parameters. Testing the serial levels (concentrations) of each parameter independently, and/or in combination with the serial levels of other parameters, is the methodology that has been applied to find the optimum level(s) of input and increase the efficiency of the Agrobacteriummediated gene transformation strategies. However, this is a costly and time-consuming process. Predicting the value of the response from the values of the independent variables can help to understand the effect of a factor on a process (Barone, 2019). Usually, there are complex non-linear relationships that exist between the independent (input) and dependent (output) parameters of plant in vitro studies. In addition, the interaction of these parameters with the plant genotype and environment leads to a non-deterministic condition (Prasad and Gupta, 2008). Classical linear regression methods are unable to predict and interpret the non-linear and complex relationships between the variables investigated (Niazian et al., 2018). Non-parametric algorithms, such as ANNs, partial least square regression (PLSR), random forest (RF), and support vector machines (SVMs), have great efficiency for processing non-linear data (Zheng et al., 2018; Hesami and Jones, 2020;Niazian and Niedbała, 2020). Artificial neural networks, especially MLP, have superiority over the classical statistical methods for analyzing and interpreting unpredictable data sets (Salehi et al., 2020). As a data-driven model, ANN can be used for predicting and optimizing non-linear plant in vitro studies (Hesami et al., 2017). Through the full use of all spectral data and by avoiding multicollinearity, ANNs are able to manage non-normal, non-linear, and nondeterministic data sets obtained from multi-factorial plant in vitro studies. In MLP-ANN, the interconnection of neurons  (Niazian et al., 2019). The results obtained showed a high similarity of the ANN-predicted and the actual values of the transformation efficiency in ajowan (Trachyspermum ammi L.), a medicinal plant (Supplementary Table 3). All similar actual and predicted values of the percentage of putative transgenic ajowan plants showed the efficiency of our predictor model in other plant species with low transformation efficiency. By using different data sets of the percentage of putative and/or PCR-verified transformants, other researchers can use these models in Agrobacterium-mediated transformation studies and predict the gene transformation efficiency of their protocols in desired dicot and monocot species. The efficiency of an Agrobacterium-mediated gene transformation protocol can be determined through the percentage of regenerated plants in the selective medium, the molecular methods employed (percentage of PCRverified plants and percentage of plants verified through advanced molecular techniques such as RT-PCR and Southern blot), and the expression methods used. In this study, we presented two different predictor models. The first one is a cost-effective predicting model, in Agrobacterium-mediated gene transformation of tobacco, as it used regenerated plants in the selective medium (putative transgenic plants) as output of the model and was found to be independent of time-consuming and expensive advanced molecular techniques. The regeneration percentage of inoculated explants in the selective medium is the most important factor that determines the efficiency of the final gene transformation study. The more the number of putative transgenic plants, the more the transgenic events. Finding and predicting the best combination(s) of factors influencing the percentage of putative transgenic lines will help increase the number of transgenic events in an Agrobacterium-mediated transformation study. Hence, the first model of this study is a fast and cost-effective model used to predict putative transformants in an Agrobacterium-mediated transformation study. However, the second model is suitable for predicting the transformation efficiency, as the output of this model is the percentage of PCRverified plants. There is only one recently published paper in this field, which utilized gene transformation efficiency as output of the machine learning models. Actually, it is a data mining work that used an "ensemble model" for combining and mixing previously published data sets of Agrobacterium-mediated gene transformation of chrysanthemums employing different in vitro regeneration and Agrobacterium-mediated gene transformation protocols (Hesami et al., 2020a). The obtained high values of R 2 and low values of prediction errors in developed ANN-MLP models over the classical stepwise multiple regression indicate the high similarities and low differences between the experimental data (observed) and the predicted values through the established ANN models and the superiority of the models in comparison with the classical model. The high similarity of the observed (measured) and the ANN-predicted data has been reported in earlier plant in vitro studies (Dutta Gupta and Pattanayak, 2017). With all the mentioned results, it should be noted that uncertainty is of major importance in machine learning algorithms for the purpose of prediction. Uncertainty caused by randomness (aleatoric) and uncertainty caused by ignorance (epistemic) are the two inherently different sources of uncertainty, which are usually not distinguishable in a learning algorithm. Additional data (information) can reduce the epistemic uncertainty in supervised learning algorithms (Hüllermeier and Waegeman, 2021).
The sensitivity analysis of the established ANN models showed the Agrobacterium strain is more important when compared to other parameters investigated in the gene transformation of tobacco. According to the sensitivity analysis on the developed ensemble model in Agrobacterium-mediated gene transformation of chrysanthemums, (Hesami and Jones, 2020) also reported the highest variable sensitivity ratio (1.86) for the Agrobacterium strain, Gehl et al. (2020) reported a better regeneration rate and potentially higher number of shoots in the genetic transformation of Campanula medium by the AGL1 Agrobacterium strain compared to the GV3101:pMP90 and ABI strains. The LB4404 Agrobacterium strain was more efficient than the AGL1 strain in the gene transformation of foxtail millet (Setaria italic L.) (Sood et al., 2020). Host-pathogen interaction is an important factor in Agrobacterium-mediated T-DNA delivery, as it can significantly affect the survival rate and regeneration activity of the transformed explants (Agarie et al., 2020). The transcription levels of the virulence (vir) genes in the induction medium can affect the transformation efficiency of Agrobacterium tumefaciens strains, as high transcriptional levels of vir genes were important for successful transformation (Wang et al., 2020).
It is obvious that there are other parameters that affect the efficiency of a gene delivery study. Parameters related to plasmids for optimizing expression (sub-optimal promoter, enhancer, poor codon usage, 5'UTR sequence, trigger silencing, integration of the gene into a silent region of chromatin) along with the in vitro regeneration parameters (type of the basal culture medium, explant type, explant age, type and concentration of PGRs, additives, etc.) are also involved in the results of an Agrobacterium-mediated transformation. Therefore, two sets of optimizations are needed to develop an efficient gene transformation protocol, including optimization of the tissue culture protocol parameters and optimization of the gene transformation protocol parameters, considering these parameters together can help to achieve a comprehensive and reliable model. However, this requires a huge number of experiments. In addition, there are some in planta transformation methods that are independent of the in vitro regeneration parameters (Niazian et al., 2017). Therefore, the results of this study are valuable for standard and in planta Agrobacterium-mediated transformation of plants.

CONCLUSIONS
The optimization of plant in vitro studies, by taking into account all the influential factors, is laborious, time-consuming, and challenging because of its multi-factorial nature. A powerful data analysis can help researchers improve the efficiency, time, and cost-effectiveness of their techniques, and subsequently generate a better decision-making tool in complex biological processes. Using complex mathematical functions, ANNs are able to analyze non-deterministic and non-linear data sets of plant in vitro studies. Modeling and predicting complicated in vitro processes, such as Agrobacterium-mediated gene transformation, through ANNs can be useful in identifying the influencing factors. Establishing an optimized model in a specific plant genotype can be helpful in overcoming the barriers of genetic engineering in important Agrobacterium-recalcitrant plant genotypes.
This study demonstrates that a novel ANN is an accurate approach for assessing the effect of Agrobacterium strains, Agrobacterium cell densities, acetosyringone concentrations, and inoculation durations on the percentage of putative and PCR-verified transgenes in tobacco. Based on our results, the greatest actual and predicted values of the percentage of putative and PCR-verified transgenic plants were obtained by 20-min inoculation of tobacco leaf explants in a suspension of the LB4404 strain of Agrobacterium at optical density (OD 600 ) of 0.8 and 300 µM of acetosyringone. The Agrobacterium strain was the most important factor among all parameters studied. The predicted values of the percentage of putative and PCR-verified transgenic plants were close to those observed, which indicates the efficiency of the established ANN models. The developed model was also efficient in predicting the gene transformation efficiency of an important medicinal plant with a low rate of transformation. Through the precise and efficient data interpretation, ANN could be helpful for optimizing the gene transformation conditions in Agrobacterium-mediated gene transformation studies, regarding all the influential parameters (expression plasmid optimization, Agrobacterium, and in vitro regeneration parameters), and open the way for targeted genome editing methods, such as clustered regularly interspaced short palindromic repeats-associated (CRISPR/Cas).

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

AUTHOR CONTRIBUTIONS
GN and MN conceptualized this study, curated the data, formal analysis, and validated this study. Investigation was carried out by MN. The methodology used in this study was framed by GN and MN, while MN handled the project administration and provided the resources. GN provided the software and visualization of the results. GN and MN wrote the original draft, reviewed, and edited it, while PS revised the manuscript for important intellectual content. All authors contributed to the article and approved the submitted version.