Process Characterization by Definitive Screening Design Approach on DNA Vaccine Production

Plasmid DNA is a vital biological tool for molecular cloning and transgene expression of recombinant proteins; however, decades ago, it has become an exceptionally appealing as a potential biopharmaceutical product as genetic immunization for animal and human use. The demand for large-quantity production of DNA vaccines also increases. Thus, we, herein, presented a systematic approach for process characterization of fed-batch Escherichia coli DH5α fermentation producing a porcine DNA vaccine. Design of Experiments (DoE) was employed to determine process parameters that have impacts on a critical quality attribute of the product, which is the active form of plasmid DNA referred as supercoiled plasmid DNA content, as well as the performance attributes, which are volumetric yield and specific yield from fermentation. The parameters of interest were temperature, pH, dissolved oxygen, cultivation time, and feed rate. Using the definitive-screening design, there were 16 runs, including 3 additional center points to create the predictive model, which then was used to simulate the operational ranges for capability analysis.


INTRODUCTION
Gene immunization including DNA vaccines has become an attractive approach for vaccination because of its well-documented safety unlike live attenuated viral vaccine, the absence of specific immune responses to the plasmid, and its absence of genetic integration (Robinson, 2000;Wahren and Liu, 2014). DNA vaccination is genetically engineered DNA containing a transgene that expresses a specific antigen into cells or tissues (Ingolotti et al., 2010). To date, there are six DNA vaccines approved for veterinary applications, which include preventive vaccines for West Nile virus infection in horses (Davidson et al., 2005), hematopoietic necrosis virus infection in salmon (Garver et al., 2005), therapeutic cancer vaccine for dogs (Bergman et al., 2006), a growth hormone gene therapy to increase litter survival in breeding pig sows (Person et al., 2008), pancreas disease infection in Atlantic salmon (EMA, 2017), and H5N1 in chicken with conditional license (Agrilabs, 2017). There are also a number of DNA vaccines undertaking clinical studies for human uses including GX188E and VGX-3100 for human papillomavirus (Cheng et al., 2018), a prime/boost of DNA.Mel3 with MVA.Mel3 for advanced metastatic melanoma cancer treatment (Dangoor et al., 2010) and INO-4800 for COVID-19 (Smith et al., 2020).
The production process for DNA vaccine consists of several steps with aims of achieving high quantity and quality to meet product specifications. The US Food and Drug Administration recommends that at least 80% supercoiled content shall be obtained as this has superior biological activity as compared to other plasmid forms (Urthaler et al., 2005;U.S. FDA, 2007). Therefore, it is indispensable to demonstrate process robustness with identified critical process parameters (PPs) and critical quality attributes (CQAs) during the process development phase. This process characterization applies Quality by Design principle to help establish a rational and cost-effective approach on process design and optimization (ICH, 2009;McCurdy, 2011). By performing Design of Experiments (DoE), it is allowed to use a minimum number of experimental runs where all experimental parameters studied are varied simultaneously to obtain sufficient information (Mandenius and Brundin, 2008;Montgomery, 2017). Conventional DoE scheme is first via screening designs to determine significant main effects followed by response-surface models to justify the design space. This requires many experimental runs to gain sufficient data for further analysis, resulting in taking more times and resources (Abu-Absi et al., 2010;Erler et al., 2013). An alternative onestep design method named definitive-screening design (DSD) was introduced by Jones and Nachtsheim. This design method contains several desirable properties (Jones and Nachtsheim, 2011;Xiao et al., 2012;Erler et al., 2013;Nguyen and Stylianou, 2013;Tai et al., 2015); 1. offering the identification for main effects, quadratic effects, and two-factor interaction based on the sparsity-ofeffects system, 2. providing an orthogonal model that the main effects are uncorrelated with two-factor interaction and quadratic effects, and two-factor interaction is not fully aliased with each other, 3. requiring a minimum number of runs as few as 2m + 1 or 2m + 3, for m variable factors when m is even and odd number, respectively.
DSD has become widely used in diverse applications including paint manufacturing, green energy, and biotechnology. In this study, we employed DSD method to characterize the process of porcine DNA vaccine production. The method involved five PPs at the Escherichia coli DH5α fermentation step to investigate the impact on CQAs, which is supercoiled content, and performance attributes (PAs), which are volumetric yield and specific yield. Finally, a simulation run accounting for variability expected in larger-scale production was executed to provide a prediction model. The relationships presented here were expected to demonstrate the robustness of the fed-batch fermentation process for subsequent process validation and future commercial manufacturing. The utilization of knowledge gained can then be used to improve the process performance during the process development.

Inoculum and Fermentation
Inoculum was cultured in 20 mL Luria broth overnight at 37 • C with agitation at 200 rpm (New Brunswick Innova 43R, United States) and transferred to a main cultivation in 2 L fermentor (Sartorius BIOSTAT B Plus, Germany) with semidefined media that comprised 3 g/L KH 2 PO 4 , 6 g/L Na 2 HPO 4 , 2 g/L NH 4 Cl, 1 g/L MgSO 4 , 20 g/L yeast extract, and 5 g/L glycerol. The starting volume was 0.7 L, and an initial OD 600 was set around 0.02. Batch fermentation was performed with set points at 37 • C, pH 7, 30%DO, and 1 vvm air flow rate. When the glycerol in the culture was totally consumed, the fed-batch was started with 200 g/L glycerol. The feed rate was at 3 mL/h and linearly increased varying from 0 to 0.6 mL/h/h, depending on the designed experimental runs. Set points for PPs including temperatures, pH, %DO, cultivation time, and feed rate were varied with regard to DSD as described in section "Design of Experiment".

Design of Experiment
With prior knowledge and risk assessment tools, CQAs and PAs for DNA vaccine production were identified and led to a generation of a list of PPs with their associated ranges shown in Table 1. Five PPs for fed-batch fermentation including temperature, pH, dissolved oxygen, feed rate, and cultivation time were investigated. Sixteen experimental runs designed by DSD using JMP Pro software (SAS Institute Inc., Cary, United States) are listed in Table 2. Three replicate runs at the center points were also included in order to better estimate the error of experiments.

Predictive Model Building and Process Robustness Study
Predictive model was fitted with all possible models using JMP Pro software, and the model selection determining active effects were the corrected Akaike information criterion (AICc) and the Bayesian information criterion (BIC) (Burnham and Anderson, 2004). The relationship between the responses (y) and variable factors (x) can be described by using the following quadratic predictive model: where β 0 is constant; β j , β jk , and β jj are regression coefficients for linear, interaction, and quadratic terms, respectively; and i is error. The model selection was evaluated using a combined AICc and BIC approach where models containing AICc less than or equal to 4 and BIC less than or equal 2 were selected. Then other statistical values, such as coefficient of determination (R 2 ), predicted residual error sum of squares (PRESS), and root mean squared error (RMSE) were used as criteria to choose the best prediction model for each attributes. The prediction profiler function in JMP Pro software was then used for process optimization and simulation studies. The optimization was expected to provide the understanding of what factors highly influence the fermentation process of this vaccine production. For simulation study, due to a variety of distribution type, Monte Carlo simulations approach was conducted in 100,000 simulation runs. The results of this simulation provide the tolerance interval of the PAs, which can then be set as the action and alert limit, Frontiers in Bioengineering and Biotechnology | www.frontiersin.org as well as provide the acceptable range on the CQA to establish the specification of the product. The algorithm based on the construction of all possible models with DSD is displayed in Figure 1.

Sample and Sampling Preparation
Cell sample from each experimental run was taken at the end of batch for OD 600 measurement, and DNA quantification and qualification. OD 600 was measured by spectrophotometer (Eppendorf BioSpectrometer Kinetic, United States). The plasmid was extracted from 250 µL cell sample and followed by DNA extraction regarding the manufacturing's protocol (QIAprep Spin Miniprep Kit, United States). DNA was then eluted with 100 µL elution buffer (10 mM Tris, pH 8.5). Plasmid DNA quantification was determined using A 260 method (McGown, 2000;Stephenson, 2003). The volumetric yield (mg pDNA/L) calculation was conducted in which the dilution factor of amount sample taken and DNA elution were included. The specific yield of DNA (mg pDNA/L/OD600) was done by dividing the volumetric yield with the amount of cell, OD 600 . The supercoiled DNA content was determined on an agarose gel (0.7%) stained with ethidium bromide (0.5 µg/mL). As different DNA isoforms have distinct run patterns on agarose gels (Aaij and Borst, 1972), the supercoiled DNA band can   be identified, and the ratio of supercoiled DNA was then calculated based on the band's intensity using ImageJ software (Schneider et al., 2012).

RESULTS AND DISCUSSION
Prior knowledge on scientific understanding of DNA vaccine and risk assessment were used to select attributes. According to their impacts on clinical safety or efficacy and on manufacturing process, the form of plasmid DNA, % supercoiled DNA, referred to as %SC, directly affects the efficacy of DNA vaccines and hence is listed as one of the specifications given by regulatory (U.S. FDA, 2007) and consequently considered as CQA in this study. In order to evaluate how well the process performs, PAs has been introduced. They also relate to the process acceptable range that is used to ensure effectiveness of process performance and achievement of desired product specifications. For the purpose  of our experiment, the volumetric yield, derived from DNA yield extracted at the end of each experimental batch and its corresponding OD600 value, and the specific yield of plasmid DNA were selected as our PAs. DoE for E. coli DH5α fed-batch fermentation producing pTH.PRRSV_GP5 of five selected PPs was created to build the predictive models for DNA process production. Five PPs including temperature, pH, dissolved oxygen (%DO), cultivation time, and feed rate were evaluated. Using conventional approach, the experimental runs would require 16 runs for fractional factorial design and 46 runs for response surface design, whereas with DSD, the number of experimental runs was then reduced to 13 runs. This is the main advantage of DSD, an orthogonal model and free of aliasing with quadratic effects and two-way interactions. Therefore, DSD has been recommended for use in the earliest stage of the experimental study, where the numbers of potentially important variables are large, typically more than 4, to identify what highly influential factors are.
As mentioned above, this work aims to characterize the process of porcine DNA vaccine production particularly in the fermentation step. Thus, we chose DSD as a tool to design the experiment to better understand the potential important factors that may affect our interest attributes, PAs and CQA. With DSD, 13 runs were designed, and additional three replicate runs at the center points were taken into account for a better estimation on the error of experiments. The experimental results are shown in Table 2 where substantial variation ranges from 70.18 to 80.35 for %SC, 64.56 to 121.60 mg/L for volumetric yield, and 2.28 to 5.26 mg/L/OD600 for specific yield.
The data were then fitted using JMP Pro software. Generally, the model estimates only the main effects, two-factor interaction and quadratic. Therefore, with the PPs of 5, there is a total of 21 terms that is estimated including 5 main effects, 5 quadratic effects, 10 two-factor interaction, and 1 intercept. However, for DSD, the maximum number of terms that can be estimated is 11 because of the principle of effect sparsity. Then, subsets of all possible models with information criterion theory were used to investigate which model consists of the active effects (Ward, 2008;Mangan et al., 2017). The AICc and the BIC are one of the most used approaches for model prediction. These AICc and BIC calculations measure the model performance in which the smaller values indicate better model prediction (Burnham and Anderson, 2004). Herein, the candidate sets of models were generated with each attribute displayed in Figure 2. The models with the number of term of 4-6, showing the lowest AICc or BIC, are expected to provide sufficient prediction capability for %SC (Figure 2A), whereas 4-6 terms ( Figure 2B) are for volumetric yield prediction, and 3-5 terms ( Figure 2C) are for specific yield models.
The model selection was further considered using criteria of combined AICc and BIC, where AICc was less than or equal to 4, and BIC was less than or equal 2, and then using statistical values including R 2 , PRESS, and RMSE. As a result, in Figure 2A, the model with the number of terms = 6 was selected for %SC attribute. This 6-term prediction model included the main effects of %DO, feed rate, and cultivation time; the quadratic effects of pH; and two-factor interaction of pH and cultivation time, as well as temperature and cultivation time. The selected models provided a good description of the process as shown in the prediction plot by the actual plot in Figure 3A with R 2 of 0.90, and the predictive models were highly significant with no evidence of lack of fit listed in Table 3.
Similar approach was applied to other attributes. The volumetric yield model with five terms contained the main effect of pH, temperature, and feed rate; the quadratic effects of temperature; and the two-factor interaction of pH and temperature. The specific yield model had three active terms consisting of the main effect and quadratic effect of temperature, and the quadratic effect of pH. All predictive model illustrated in the prediction plot by the actual plot (Figures 3B,C) and regression analysis (Tables 3-5) showed low F ratio of lack of fit, indicating that these models can be used for predicting the results. Hence, three quadratic models on %SC, volumetric yield, and specific yield, respectively, are illustrated below:  Results from these predictive modeling studies were then used to define ranges of CQA and PAs. Using JMP prediction profiler function, 100,000 runs were simulated with Monte Carlo simulation as its main advantage is that several types of probability distributions can be executed (Wang et al., 2007). Input PPs including temperature, pH, and %DO were modeled as a normal distribution, whereas feed rate and cultivation time were modeled as a fixed and uniform distribution, respectively. The simulations were performed with specified ranges of PPs listed in Table 6. Three key sources of variation, including the mathematical expression or model from the characterized process, variation of each factor at the targeted set point, and the residual variation not accounted for by the model, were included in this simulation. The residual variation is derived from the RMSE of the predictive model, the analytical method, and any other uncontrolled factor when building the predictive model. The population of CQA and PAs from these simulations is shown in Figure 4.
The simulations provided ranges for each attribute as follows 80.5 ± 1.12 of %SC plasmid DNA, 112.4 ± 10.39 mg pDNA/L of volumetric yield, and 4.6 ± 0.53 mg pDNA/L/OD600 of specific yield. From tolerance interval analysis with portion of population, the action range for PAs was set at 99.7% of the results, and the alert range was at 95.5% of the results, whereas the acceptable range for CQA was at 99.7% of the results. As a result, the action and alert limits of the model are provided and shown in Table 7. These values can then be applied for scale-up production of this vaccine.
DSD can significantly reduce the development time and cost in the early stage of process development. The Monte Carlo simulations with predictive models are useful tools for process optimization, robustness study and subsequent process validation, and future commercial manufacturing. The utilization of knowledge gained from this study can be used to improve the process performance during the process development.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.