An Optimal Control Framework for the Automated Design of Personalized Cancer Treatments

One of the key challenges in current cancer research is the development of computational strategies to support clinicians in the identification of successful personalized treatments. Control theory might be an effective approach to this end, as proven by the long-established application to therapy design and testing. In this respect, we here introduce the Control Theory for Therapy Design (CT4TD) framework, which employs optimal control theory on patient-specific pharmacokinetics (PK) and pharmacodynamics (PD) models, to deliver optimized therapeutic strategies. The definition of personalized PK/PD models allows to explicitly consider the physiological heterogeneity of individuals and to adapt the therapy accordingly, as opposed to standard clinical practices. CT4TD can be used in two distinct scenarios. At the time of the diagnosis, CT4TD allows to set optimized personalized administration strategies, aimed at reaching selected target drug concentrations, while minimizing the costs in terms of toxicity and adverse effects. Moreover, if longitudinal data on patients under treatment are available, our approach allows to adjust the ongoing therapy, by relying on simplified models of cancer population dynamics, with the goal of minimizing or controlling the tumor burden. CT4TD is highly scalable, as it employs the efficient dCRAB/RedCRAB optimization algorithm, and the results are robust, as proven by extensive tests on synthetic data. Furthermore, the theoretical framework is general, and it might be applied to any therapy for which a PK/PD model can be estimated, and for any kind of administration and cost. As a proof of principle, we present the application of CT4TD to Imatinib administration in Chronic Myeloid leukemia, in which we adopt a simplified model of cancer population dynamics. In particular, we show that the optimized therapeutic strategies are diversified among patients, and display improvements with respect to the current standard regime.


INTRODUCTION
The increasing availability of reliable experimental data on cancer patients and the concurrent decreasing costs of computational power are fueling the development of algorithmic strategies for the automated generation of experimental hypotheses in cancer research. This is particularly relevant in the sphere of precision and personalized medicine, as efficient methods are urgently needed to make sense of available data and support clinicians in delivering patient-specific therapeutic strategies (Salgado et al., 2018). To this end, methods borrowed from optimal control theory (e.g., Bertsekas, 1995;Bailey and Haddad, 2005;Lenhart and Workman, 2007;Aström and Murray, 2010) can be employed in combination with efficient techniques for data analysis (Michor et al., 2005;Tang et al., 2011;Olshen et al., 2014;Rainero et al., 2018), to produce accurate predictive model of the clinical outcome of a given therapy in single cancer patients.
Here, we introduce a theoretical framework named CT4TD (Control Theory for Therapy Design), which employs the RedCRAB optimal control algorithm (Heck et al., 2018b;Omran et al., 2019), on patient-specific pharmacokinetics and pharmacodynamics (PK/PD) models (Welling, 1997), with the goal of delivering an optimized drug administration schedule (see Figure 1 for a schematic representation of the framework).
In brief, PK models describe the temporal dynamics of the concentration of a given drug in a certain tissue or organ, whereas PD models depict the efficacy of the drug with respect to distinct concentration values. The CT4TD framework first defines patient-specific PK models based on demographic factors, such as, e.g., age, sex, and body weight. Such models are employed to automatically identify optimized therapy dosages and/or schedules to reach given target drug concentrations, as those commonly used in clinical practice, also by respecting any desired constraint such as, e.g., the maximum allowed number of doses per day or the maximum dosage. In this way, our framework can guide clinicians in the setting of optimized regimes at diagnosis, allowing for an either more or less aggressive tuning; this approach mimics the steady state optimization commonly proposed in pharmacological studies.
Furthermore, when longitudinal experimental data on tumor burden-e.g., the fraction of tumor cells on the total, in liquid tumors-are available for patients under standard treatment, CT4TD allows to determine optimized personalized strategies to be used in order to minimize or even eradicating the cancer cell subpopulation. In fact, with the CT4TD framework it is possible to fit experimental data with a hierarchical population dynamics model, which describes the temporal evolution of cancer subpopulations in a given tumor (Michor et al., 2005;Stiehl and Marciniak-Czochra, 2012;Werner et al., 2016;Stiehl et al., 2018). Such model allows to measure the impact of a given therapy over the tumor's ability to expand and develop and, accordingly, to estimate patient-specific PD models from clinical data, which are then employed to design optimized therapeutic regimes aimed at reducing the tumor burden.
Therefore, CT4TD can support clinicians in designing personalized therapies both at diagnosis and when longitudinal data on disease progression have become available. In all scenarios, with our approach it is possible to compare the actual therapeutic regime with the optimized one, showing improvements in terms of efficacy, toxicity, and overall costs.
The CT4TD theoretical framework is general and applicable to any kind of drugs, as long as PK/PD models can be retrieved or estimated. Yet, liquid tumors allow to safely adopt several simplifications and define simple and reliable models of population dynamics, avoiding possible complications due to the spatial and morphological properties of solid tumors (Graudenzi et al., 2014(Graudenzi et al., , 2018. For this reason, in this work we apply the CT4TD to the specific case of Imatinib administration in patients with Chronic Myeloid leukemia (CML), and we show the advantages of employing our automated and data-driven framework in terms of increased efficacy of the therapy and reduction of the overall costs and toxicity. In particular, we here present the results of the application of the CT4TD framework in two ideally subsequent scenarios.
In the first case, CT4TD is used to identify the best therapeutic regime to reach selected target drug concentrations, as those commonly used in the clinic (e.g., Gambacorti-Passerini et al., 1997;Peng et al., 2005;Baccarani et al., 2014). This scenario provides indications which can be employed by clinicians at the time of the diagnosis. Importantly, the inclusion of demographic factors within the PK models (Widmer et al., 2006) allows to define personalized drug schedules that are different from standard practice. A robustness analysis to assess the impact of intra-patient variability and of possible systematic errors proves the safety of the hypotheses generated with our approach, especially with respect to possible technical or measurement errors.
In the second case, we employ longitudinal data on tumor burden of a selected cohort of CML patients under standard treatment, in order to retrieve personalized PD models and, accordingly, to identify an adjusted therapy that is most effective in minimizing cancer subpopulation, once the major molecular response has been observed. In both cases the results allow to explicitly evaluate the advantages in costs and improved efficacy with respect to standard therapies.

Pharmacokinetic and Pharmacodynamic Models
Pharmacokinetic (PK) models (Welling, 1997) are mathematical models that describe the temporal evolution of the concentration of a substance in a certain tissue of the body. Commonly used techniques to study such processes are the so-called compartmental models (Welling, 1997), i.e., dynamical models based on the law of conservation of mass, and which assume that the body is composed by a certain number of macroscopic coupled subsystem, namely compartments. Such models assume an instantaneous mixing of the drug in a compartment and a perfect transport among them, and are usually defined via systems of differential equations (e.g., Schwilden, 1981).
The CT4TD framework employs demographic factors such as body weight, age, and sex to define patient-specific parameters of the pharmacokinetic models. We here focus on the case of Imatinib administration in Chronic Myeloid leukemia (CML). (B) CT4TD manages two working scenarios: (i) at Frontiers in Bioengineering and Biotechnology | www.frontiersin.org FIGURE 1 | time of diagnosis, CT4TD can be used to reach given optimal/lower-bound drug concentration targets, e.g., from clinical studies (steady state optimization); (ii) when longitudinal data on tumor burden variation under standard therapy are available, CT4TD fits the data points with a hierarchical population dynamics model of CML, and this allows to estimate patient-specific pharmacodynamics (PD) parameters, based on the observed cancer cell death rate. In both scenarios, optimization on pharmacokinetics/pharmacodynamics (PK/PD) models is performed via RedCRAB, on distinct cost functions, aimed at: either being close to given target concentrations (and strictly larger in the lower-bound case)-WS (i); minimizing the Area Under the Curve (AUC) and the tumor burden-Working Scenario (WS) (ii). (C) optimized personalized dosage and schedule are returned, allowing to measure in silico the differences with respect to standard administration, in terms of dosage, drug concentration, cost, and AUC. WS (ii) allows to predict the tumor burden evolution in case of an optimized therapy.
The solution of such systems provide predictions about the variation of drug concentration in time, in a certain tissue. A limitation of PK models is the employment of coarse-grained oversimplifications, which require ad-hoc assumptions and are valid only for sufficiently long timescales. Pharmacodynamic (PD) models (Welling, 1997;Rowland et al., 2011) study the relationship between the concentration of a drug and the resulting effect, in terms of efficacy and possible adverse effects (AEs). The effects of a certain substance are estimated by modeling relevant biochemical reactions, usually by exploiting the law of mass action (see, e.g., Goutelle et al., 2008). One of the major limitations of PD models is that it is usually impossible to have all the measurements necessary to determine the kinetic constants of the involved chemical reactions. For this reason, the efficacy of a drug is usually estimated with statistical methods and target concentrations are defined with respect to some arbitrary criteria (Peng et al., 2005;Larson et al., 2008a;Takahashi et al., 2010;von Mehren and Widmer, 2011;Baccarani et al., 2014).
PK/PD models are increasingly used to define new drug dosage guidelines and protocols (e.g., Peng et al., 2005). Nonetheless, standard approaches to this end are affected by several major issues. Usually the optimal dose is identified in phase I dose escalating clinical trials. Moreover, such trials may suffer from possible idiosyncrasies of the study, from the presence of unknown confounding factors and from the often limited sample size. Another problem is that the recommended dosage is often defined as optimal for an ideal-and non existingaverage patient, because the efficacy is only defined statistically. As a consequence, the same drug dosage/schedule might be either insufficient or exceeding for different patients. In the former case, this might lead to a non-optimal clinical outcome, in terms of lower efficacy of the treatment, whereas in the latter case an excess of drug may raise the probability of AEs, as well as the economic cost of the therapy, an aspect that is particular relevant for oncological therapies (Fojo and Grady, 2009;Himmelstein et al., 2009;Experts in Chronic Myeloid Leukemia, 2013;Gomez-de León et al., 2017;Jabbour et al., 2017).
Therefore, effective strategies for the identification of optimized personalized therapy schedules are needed, in order to possibly reduce the amount of drug and minimize the probability of related adverse effects, while providing the same or an even better efficacy-i.e., clinical outcome-, with respect to the standard administration schedule. As a side effect, an optimized personalized schedule would also deliver a minimal economic cost, i.e., more patients will be able to afford its costs.
In this respect, CT4TD allows to: (i) define patient-specific PK models that depend on a number of demographic factors and biological covariates, such as age, sex, ethnicity, and body weight, as proposed by Widmer et al. (2006); (ii) estimate personalized PD models from longitudinal experimental data on tumor burden (if available). This allows to identify personalized therapeutic strategies, which explicitly account for the expected differences in PK and PD, due to the physiological heterogeneity of the individuals. It is important to stress that population PK/PD models are employed in a wide range of distinct diseases such, e.g., cancer (Yoshitsuga and et al., 2012), HIV (Chan et al., 2011), diabetes (Landersdorfer and Jusko, 2008), as well as in anesthesia administration (Potts et al., 2008).

Applications of Optimal Control Theory in Medicine
Control theory is an interdisciplinary field bridging engineering and mathematics, whose main objective is to define an opportune control function that modifies the state of a given dynamical system in order to perform a specific task, while minimizing the cost and maximizing the performance (Bertsekas, 1995;Lenhart and Workman, 2007;Aström and Murray, 2010) (see section 3 for a technical description).
Two main classes of controls exist: (i) open-loop control, and (ii) closed-loop (feedback) control. In the former case, the set and sequence of control actions is chosen a priori, by exploiting theoretical study on the models. In this case, the input is independent with respect to the output (e.g., possible measurements on the system) (Lenhart and Workman, 2007). Closed-loop control, instead, introduces in the procedure one or more feedback loops, which are able to quantify the real response of the system to variations of the control functions, and adjust them according to the differences recorded between the theoretical and real behaviors of the system (Aström and Murray, 2010).
There are several examples of successful applications of control theory in pharmacology (see Bailey and Haddad, 2005;Shi et al., 2014). In this respect, the final goal is to determine the optimal set of therapeutic choices-e.g., dosages and schedulesto obtained a desired efficacy, while minimizing the overall costs. Closed-loop controls are extremely effective in achieving this goal and have been often implemented in real-world healthcare settings (Haddad et al., 2006;Steil, 2013;Jayachandran et al., 2014;Shi et al., 2014;Babaei and Salamci, 2015;Fuentes-Garí et al., 2015;Naşcu et al., 2015). Nonetheless, technological problems, such the absence of real-time measurements, as well as possible problems in titrating drugs to the right concentration, are still limiting real-life applications (Bailey and Haddad, 2005;Cunningham et al., 2018). For this reason, open-loop controls are still a viable option, mostly because of the applicability in a wide range of real-world scenarios for which, for instance, realtime measurements and/or therapy adjustments are unfeasible. Moreover, open-loop controls have proven to identify more effective drug concentration in therapeutic ranges than standard clinical practice (Barbolosi and Iliadis, 2001;Ledzewicz and Schättler, 2006;Zhu and Qian, 2014;Bara et al., 2017;Rocha et al., 2018;Yoon et al., 2018).
However, many approaches in both categories are based on limiting assumptions. Certain techniques, for instance, assume continuous-yet unrealistic-drug infusion procedures (e.g., Pefani et al., 2013). Some methods rely on often speculative mathematical models, which cannot be evaluated due to the lack of opportune experimental data (Yoon et al., 2018).
The CT4TD framework aims at improving the current state-of-the-art, by solving an open-loop control problem on PK/PD models via RedCRAB (Heck et al., 2018b;Omran et al., 2019), a remote suite based on dCRAB (Doria et al., 2011;Rach et al., 2015), an algorithm for optimization and control. The dCRAB algorithm is particularly suitable for complex optimization problems when it is neither possible or efficient to build the gradient from the set of differential equations, defined by the main dynamics. In the aforementioned case, the standard gradient-based methods could not be efficient or failed the gradient calculation. The dCRAB optimal control tool has the peculiarity to avoid local traps by changing the optimization basis and paves also the possibility to perform a closed-loop optimization, using the feedback provided by the patient's response. Extensions in this sense are underway.

Mathematical Modeling of Cell Population Dynamics
Many healthy and aberrant biological tissues are characterized by a hierarchical organization, constituted by an ordered sequence of discrete maturation states, as driven by differentiation processes. In this respect, a number of mathematical models have been proposed to study the cell population dynamics, both in healthy systems (Marciniak-Czochra and Stiehl, 2013) and in cancer (Michor et al., 2005;Tang et al., 2011;Stiehl and Marciniak-Czochra, 2012;Olshen et al., 2014;Altrock et al., 2015;Werner et al., 2016;Stiehl et al., 2018).
In such models, cells are divided in n non-intersecting compartments, with every ensemble representing a certain stage of cell differentiation. The time ordering of the differentiation stage defines an explicit hierarchy among such ensembles. Accordingly, a lineage is defined as a collection of compartments that fully describe all the stages of differentiation of cells within a certain tissue (see Figure S7).
Various approaches are employed to model the dynamics of such systems such as, e.g., ordinary differential equations (ODEs), discrete-time and continuous-time Markov chains, master equations, etc. (see Altrock et al., 2015 for a recent review). Each strategy displays a specific trade-off in terms of expressivity and computational complexity. For instance, ODEs are very convenient from the computational perspective, but they are not suitable in certain cases, e.g., when representing low numbers of cells. Conversely, probabilistic models allow for a richer representation of the system, yet at the cost of a higher computational burden and mathematical complexity.
For sake of simplicity, the CT4TD framework employs a ODEs hierarchical model of cell population dynamics to fit longitudinal data on tumor burden (Michor et al., 2005;Tang et al., 2011;Stiehl and Marciniak-Czochra, 2012;Olshen et al., 2014;Altrock et al., 2015;Werner et al., 2016;Stiehl et al., 2018). On the one hand, this model provides a description of cell population dynamics in time for any given patient. On the other hand, it allows to estimate the efficacy of the therapy in each patient, on the basis of the observed cancer subpopulation decay, which is then used to estimate patient-specific PD models (see section 3 for further details).

Estimation of Patient-Specific PK Models of Imatinib in CML
In order to describe the various steps of the CT4TD pipeline in detail, we here present its application to the specific case of Imatinib administration in CML. Yet, we stress that the theoretical approach is general and could be applied for any therapy for which PK/PD models can be estimated.
We here employ the PK model of oral administration of Imatinib introduced by Widmer et al. (2006) (see Figure S1): if χ g (t) is the amount of Imatinib in the gastrointestinal tract, k a is the first order absorption rate, f is the bioavailability, i.e., the fraction of an administered dose of unchanged drug that reaches the circulatory system, and D the ingested dose, then: so if v is the volume of the distribution, i.e., the theoretical volume needed to account for the overall amount of drug in the body in case the drug was evenly distributed throughout the body, CL the clearance, i.e., the volume of plasma cleared of the drug per unit time, C(t) the concentration in the blood, χ b (t) the amount of Imatinib in the blood ( An example of the solution of Equation (2) can be found in Figure S2. Both equations can be tuned to consider demographic factors as body weight, age and sex, thus providing patient-specific PK models. More in detail, such demographic factors can be incorporated in the clearance CL and in the volume of the distribution v, as initially proposed by Widmer et al. (2006): where θ i , for i = a, b, 1, 2, 3, 4 are constants, BW is the body weight of the patient and BW is its population-average, AGE is the age of the patient and AGE its population-average and q is a binary variable which takes value 1 for male and 0 for female. Estimation of such parameters is provided by Widmer et al. (2006) and shown in Tables S1, S2. As in the dataset used in the case study, only the information about age and sex was available, we estimated the corresponding BW in each patient on the basis of average measures provided by McDowell et al. (2005) 1 .

Therapy Simulation
In order to simulate a therapy, we need to model a multi-dose oral administration. Let t in and t fin be the initial and the final time of the therapy, we suppose to give n+1 doses at time t 0 , t 1 , t 2 , . . . , t n , with a dose amount D i (i = 0, 1, 2, 3, . . . n), respectively. Thus, we have n first order differential equations like Equation (1); by using the superposition principle the solution will be: where (2) it is possible to study the dynamics of blood concentration of a certain drug for a multi-dose oral administration (see Figure S2 for an example). Notice that

Estimation of Patient-Specific PD Models From Experimental Data
CT4TD includes a data analysis module, aimed at identifying patient-specific parameters of the PD model from experimental data, and which relies on a widely-used model of cancer population dynamics. The following subsection include details on each pipeline step.

Population Dynamics Model of Leukemia
In CT4TD we use the simplest compartmental model of population dynamics for which it is possible to estimate the parameters from available experimental data. In detail, the organization of leukemic systems is characterized by a hierarchy that is analogous to the healthy hematopoietic counterpart, and which can be modeled in the simplest case with two compartments: (i) cancer stem cells (CSC) and (ii) progressively differentiated cancer cells (Michor et al., 2005;Tang et al., 2011;Stiehl and Marciniak-Czochra, 2012;Olshen et al., 2014;Wodarz et al., 2014;Werner et al., 2016;Stiehl et al., 2018). 1 It is known that other factors can change the value of the clearance and the volume of the distribution. For instance, the concentration of the α 1 -acid glycoprotein affects the volume of the distribution, whereas the MDR1 genotype, the CYP3A4 activity and the creatinine clearance affect both CL and v (Widmer et al., 2006). However, we here limit to consider the aforementioned demographic measurements, because of their availability in our and in most datasets. Note that the model could be generalized to account for m lineages, in order to represent the possible presence of subpopulations of tumor cells with distinct properties (e.g., therapy resistant phenotypes), and to account for complex interaction phenomena (e.g., between lymph nodes/bonemarrow and blood-stream). However, in order to allow for an accurate and robust parameter estimation, more complex models of population dynamics would typically requireamong other things-a much higher number of data points than those usually available. In addition, limitations regarding parameter identification of ODE models with inadequate data (i.e., identifiability of a model) were described in Saccomani et al. (2010) and Hong et al. (2019), and justify our choice of adopting highly simplified models, at least until new suitable experimental data will become available.
In this case, first order differential equations are suitable to describe the population dynamics, because experimental evidences show that the proliferation of healthy cells display an exponential increase (Marciniak-Czochra and Stiehl, 2013), whereas cancer cells under therapy display an exponential decay (Michor et al., 2005;Tang et al., 2011;Olshen et al., 2014;Rainero et al., 2018). Thus, we analyse the fluxes between compartments, by defining the following constants: • p i,k is the division rate of the cells in the i ensemble of the k lineage. • d i,k is the death rate of the cells in the i ensemble of the k lineage-this rate will be estimated from experimental data. • a i,k ∈ [0, 1] is the probability that, when a cell undergoes mitosis, both of its daughters belong to the i ensemble in the k lineage; therefore, 1 − a i,k is the probability of belonging to the i + 1 ensemble. With respect to CSCs (or SC) this quantifies the self-renewal process.
Note that, we consider a symmetric differentiation scheme, according to which after mitosis both cells are of the same type, either stem or differentiated. In a single individual, the dynamics of the healthy systemwhich includes stem cells, progenitors and differentiated cellsand that of the leukemic system coexist. Yet, we here assume that CML cells are cytokine-independent (as formulated by Werner et al., 2016), so the equations describing the dynamics of leukemic subpopulation do not include terms related to the healthy counterpart, (i.e., the ODE system of healthy and leukemic subpopulations becomes uncoupled). As a consequence, the dynamics of the leukemic system can be defined as follows: where: This is a typical example of a linear autonomous system, and the solution could be obtained analytically in a recursive way:

Experimental Data Fitting
Once the leukemia 2-compartments model has been defined, it is possible to estimate its parameters from experimental data. In particular, we here focus on the specific case of CML. As every cancer cell in CML is characterized by the BCR-ABL mutation, it is possible to distinguish healthy from cancer cells with Q-PCR measurement, and this allows to have longitudinal experimental data returning the fraction of cancer cells over the total, i.e., the tumor burden (Michor et al., 2005;Tang et al., 2011;Olshen et al., 2014;Rainero et al., 2018). The CT4TD fits the longitudinal data on tumor burden in each patient with a biphasic exponential, which in log-scale describes two distinct and intersecting lines, as proposed in Michor et al. (2005), Tang et al. (2011), andOlshen et al. (2014). In particular, we selected the combination of straight lines minimizing the value of R 2 (i.e., the standard coefficient of the goodness of a linear regression, which quantifies the portion of the response that is explained by a linear model), by scanning all the points of intersection (with a step of 1 day) and fitting data with two lines with distinct slopes, via a standard non-linear fit (see the Table S3 for all parameter estimation). We also tried to fit data with either one or three distinct lines, yet in our case study the best fit was obtained in the two-lines case.
Once the two best fitting curves have been obtained for each patient, we adopt a simplifying assumption that allows us to estimate the parameters of the compartmental model from data. In Marciniak-Czochra et al. (2009), it is shown that the leftmost curve (with higher slope) is likely to represent the overall population dynamics involving cancer stem cells, cancer progenitors and cancer differentiated cells (decreasing in population size), together with that of healthy blood cells (increasing in population size). Considering that the Q-PCR measurements of the BCR-ABL fusion gene return the ratio between cancer cells and the total number of cells in the system, it would be impossible to disentangle the contribution of each subpopulation to the overall dynamics, and to reliably estimate the values of γ and τ in Equation (7), without ad-hoc assumptions and/or further opportune experiments.
Instead, it is possible to hypothesize that the rightmost curve (i.e., the second exponential decay, with lower slope) accounts for the dynamics involving a completely recovered healthy cell subpopulation-thus, healthy cells can be considered as constant in number-and a decaying cancer stem cell subpopulation, with no progenitors and differentiated cancer cells left in the system, as a consequence of the therapy (Michor et al., 2005;Tang et al., 2011;Olshen et al., 2014;Rainero et al., 2018). With this assumption, it is possible to estimate the parameters of the first compartment, and in particular, the cancer stem cell death rate d 1,l in Equation (9), from experimental data. This also allows us not to explicit consider a model for the healthy hematopoietic system.
In detail, we assume that the exponential decay of the rightmost curve (i.e. the exponential decay of CSC, given by the first equation in Equation 8) accounts for the dynamics of the CSC subpopulation only. In this way, it is possible to evaluate the effect of a standard Imatinib therapy-400 mg per day-directly on the CSC decay, as estimated from any patient's data.
In fact, β j , i.e., the measured slope accounting for the decay of CSCs, will be given by: (9) where j is the patient's index.

Identification of Patient-Specific PD Models
Various PD models can be employed to estimate the efficacy of Imatinib in CML. In our case, we use a PD model based on the maximum-inhibition effect (E max ) (Peng et al., 2005) (see Figure S3): where E(t) is the effect, E max is the maximum effect, C(t) is the concentration of the drug, EC 50 the concentration of the drug that produces half of maximal effect, and n is a shape factor. In order to identify patient-specific PD models from the parameters of the leukemia model estimated from data, we can safely suppose a linear relation between the population-average of the efficacy E and the population-average of d 1,l (Gambacorti-Passerini et al., 1997). Hence, the relation is the following: where K is a conversion constant. In this case, the efficacy of a certain concentration of a drug is directly proportional to the increase of the cancer cell death rate. It is possible to estimate K from the available dataset, by employing patient-specific PK models and an average benchmark PD model (n = 1, EC 50 = 0.123 [mg/L] and E max = 1). To do this, we first compute the time-average of the concentrationC j (t) for each patient, with respect to a 40-days standard therapy-400 mg Imatib per day-, which we then use to compute the timeaverage of the efficacyĒ j as per Equation (10), by considering unique average PD parameters for all patients. Finally, we consider the population-average the efficacy E , as computed on all patients. d 1,l,j is then obtained by using formula (Equation 9), setting a 1,l = 0.87 and p 1,l = 0.45 [days −1 ] as proposed (Stiehl et al., 2018), and by taking the mean over all patients. As a result, the conversion factor for this dataset is K ≈ 0.377 ± 0.0007 [days −1 ]. Since we suppose a linear relation between K, d 1,l and E (see Equation 11), the confidence interval of K is determined via standard (linear) error propagation procedure.
At this point, we are able to estimate the personalized parameters of the PD model, and in particular of EC 50 , by supposing that the maximum efficacy is E max = 1 and the shape factor is n = 1, for all patients (Peng et al., 2005;Weigel et al., 2010). Therefore, the relation is the following: With this procedure, we can estimate the value of EC 50,j for each patient from individual longitudinal data on tumor burden. This result leads to the definition of PK/PD personalized models, which integrate demographic factors and Q-PCR data that measure the response of each patient to the therapy, and represents one of the major novelties of our approach.
The results for all patients are presented in Table S4 and in Figure 4. Notice that, if longitudinal data on single patients are not available, the CT4TD allows to employ a unique (average) PD model for all patients, as estimated from experimental studies (see e.g., Gambacorti-Passerini et al., 1997;Peng et al., 2005;Picard et al., 2007;Baccarani et al., 2014).

Definition of the PK/PD Control Problem
We formally define the PK/PD control problem for the administration of discrete doses as follows. Let be t in and t fin the initial and the final time of the therapy. Here we aim at finding: (i) the optimal doses D * 0 , D * 1 , . . . D * n , and (ii) the optimal schedule of administration t * 0 , t * 1 , . . . , t * n , such that a functional that represents the cost L(C(t, {(D 0 , t 0 ), (D 1 , t 1 ), . . . , (D n , t n )})) is minimized. Notice that the set {(D * 0 , t * 0 ), (D * 1 , t * 1 ), . . . , (D * n , t * n )} are the control functions and C * is the optimal unknown drug concentration, described in a general setting with the ODE in Equation (18) (the solution of the particular case of multi-dose oral administration is shown in Equation 5). To simplify the notation, in the following we will refer to The definition of the cost functional L is the core of the PK/PD control problem and can include various weighed terms, which one should wisely select with respect the specific problem and goals. In particular, L may (or may not) include distinct terms accounting for: (i) the efficacy E of the therapy, as derived via PD models, such as the Hill equation (Goutelle et al., 2008) or the E max model (Peng et al., 2005) (if average or patientspecific parameters can be estimated); (ii) the toxicity of the therapy and/or the possible AEs as measured, e.g., via the Area Under the Curve (AUC); (iii) in case the PD model is unknown or indefinable-a distance between the optimized concentration C * (t) and a target concentration C tg as estimated, for instance, from clinical trials (Baccarani et al., 2014); (iv) the economic cost of the therapy; (v) the properties and the temporal evolution of the disease, as in the case of the tumor burden estimation from longitudinal experimental data (Michor et al., 2005;Stiehl and Marciniak-Czochra, 2012;Altrock et al., 2015;Werner et al., 2016;Stiehl et al., 2018) (in this case the goal will be the optimization of the performance of the therapy with respect to the minimization of cancer subpopulations; (vi) the probability of developing resistance to the therapy (Michor et al., 2005;Tang et al., 2011), etc. Obviously, some of these terms are highly correlated, as for example the AUC and the economic cost. Notice also that the choice of opportune weights is crucial in defining an effective control, if more than one term is used, and that it is necessary to fix n a priori when minimizing L. The latter choice is related to the applicability to current realworld scenarios, in which practical limitations usually prevent to exceed a certain amount of doses per day, as well as to administer continuous dosages, especially with respect to cancer therapies. In detail, we here define cost functions L with respect to two distinct scenarios: (i) optimization of therapy for fixed target concentrations, (ii) optimization of therapy for tumor burden reduction or stabilization (as proposed by West et al., 2018).

Working Scenario (i): Optimal Control With Fixed Target Concentration at Diagnosis Time (Patient-Specific PK Models-No PD Models)
In many real-world scenarios it is not possible to retrieve or estimate the parameters of the PD models, for instance at the time of diagnosis. In this case, CT4TD can be used to find the best personalized therapeutic strategy to either: (i) be as close as possible to a given optimal target drug concentration, or (ii) be close to, but strictly larger than a given lower-bound target (i.e., steady state optimization; Shargel et al., 1999). In the first case-i.e., optimal target concentration-CT4TD employs a simple Euclidean distance between two concentrations: C tg is the target drug concentration, necessary to have a major molecular response, estimated, e.g., via clinical trials. For example, in clinical studies, Imatinib concentration in blood is required to be above 0.57 [mg/L] and with a time average of 1 [mg/L] (Peng et al., 2005). C * (t) is the unknown concentration of drug in blood, which will be identified by solving the control problem. Then, in this case the cost is defined as follows: This cost favors the solutions which are close to the target, with no preference between above or under the target. In the second case-i.e., lower-bound target concentration-CT4TD uses a step distance in the space of concentrations. In this case the distance between two concentrations becomes: where G is a constant. It is then possible to define the cost as follows:

Working Scenario (ii): Optimal Control for Tumor Burden Reduction (Patient-Specific PK Models-Patient-Specific PD Models)
When it is possible to estimate patient-specific PD parameters from longitudinal data on tumor burden variation under standard treatment, CT4TD can be used to identify an adjusted optimized therapy to reduce such burden in each patient. In particular, our approach can use a cost function with the aim of: (i) minimizing the number of cancer stem cells (e.g., at the end of the treatment, (ii) minimizing the AUC of the therapy, which accounts for toxicity and possible AEs. To do this, we need to introduce two arbitrary weights W 1 and W 2 , which account for the relative relevance of the two distinct components. Notice that we cannot consider the exponent of Equation (8) only, as the dynamics of l 1 (t) is a monotone function and the minimization of cancer stem cells is reached for D i → ∞, which implicates that if the maximum amount of drug is bounded (i.e., D max ) the optimal solution is trivially reached for D i = D max , ∀i. Therefore, we have: Notice that cost functional in Equation (17) includes the net growth rate of the CSCs, i.e., λ in Equation (8). This choice allows us not to know or estimate the initial number of CSCs l 1 (0), since is not possible to infer this quantity from tumor burden data only. The ratio φ = W 1 W 2 determines the overall of the optimal solution and should be wisely chosen. In order to provide some indications on this modeling choice, we performed an extensive scan of φ (in the range φ ∈ [10, 100]), with respect to all patients and we analyzed the variation of the time-average concentration C. The results are shown in Figure 5. From this analysis, one can see that a sound choice for φ might be in the range [60-65] for males and in [70][71][72][73][74][75] for females (note this choice depends on the units of measurement), meaning that the weight corresponding to the time evolution of CSCs is relatively more relevant than that corresponding to the toxic effects.
We finally specify that working scenario (ii) was originally designed for liquid tumor therapies, as it requires the definition and the measurement of the tumor burden, which is used to estimate the CSCs net growth rate. Whether measurements on tumor burden and an appropriate model for cancer population dynamics would be available for distinct cancer types, our framework might be applied without any significant theoretical modification.

Resolution of Control Problem via RedCRAB
In CT4TD the control problem is heuristically solved by using RedCRAB, the remote version of the dressed Chopped RAndom Basis (dCRAB) optimal control via a cloud server (Caneva and et al., 2011;Doria et al., 2011;Rach et al., 2015;Heck et al., 2018a;Omran et al., 2019). Optimal control theory has been used for decades to optimize classical processes, and its quantum counterpart has been increasingly exploited in the last years (Khaneja and et al., 2005;Spörl et al., 2007;Caneva and et al., 2009;Brif et al., 2010;Lloyd and Montangero, 2014;Koch, 2016;Pichler et al., 2016;Sørensen and et al., 2016;van Frank and et al., 2016;Deffner and Campbell, 2017;Goerz et al., 2017). In its simplest version, optimal control drives the state of the system to a goal one, characterized by some desired properties, by using a set of time-dependent controls.
Here, the dynamics of the system is identified by the concentration of the drug in a certain compartment C(t, D(t)), which obeys the time evolution equation: where D(t) is the time-dependent control function, i.e., the doses function defined in section (see the section describing the patient-specific PK models of Imatinib in CML). The goal here is to optimize the drug administration schedule (see section 3.4.1) while minimize the cost functional as defined in subsections describing working scenario (i) and working scenario (ii) (see above). Starting from the standard administration schedule D 0 (t), the optimization proceeds by looking for the optimal correction g(t) such the optimal administration schedule will be D(t) = D 0 (t) + g(t). Following (Rach et al., 2015), the correction g(t) is expanded in a truncated function space, specifically in random Fourier components as: where ω k = 2π(k + r k )/T and r k ∈ [−0.5, 0.5], n c is the total number of frequency used, T is the final time, and Ŵ(t) is a fixed scaling function to keep the values at initial and final times unchanged. In conclusion, the control problem is reformulated as maximization of a multivariable function L(A k , B k ) with fixed ω k , and can be efficiently solved numerically by searching the best combination of {A k , B k } with the preferred method of choice, here a direct-search method (Nelder and Mead, 1965). Notice that, each frequency ω k is independently optimized: indeed, after a certain number of iterations, we move to the next ω k+1 , by introducing an external loop on the frequencies, i.e., superiterations. This allows the algorithm to include a high number of Fourier components and efficiently find the optimal solution, by avoiding local traps that can stick the optimization into not the global minimum (Rach et al., 2015).
In the RedCRAB optimization, the server generates and transmits a set of controls to the CT4TD, which evaluates the cost function, by interfacing with MATLAB and communicates it to the server completing one iteration. The optimization continues iteratively by providing the optimal set of controls as well as giving back the figure of merit, until the convergence is reached.
We specify that, as for any heuristic method, the solution provided by our approach might be sub-optimal. This depends on the complexity of the search space and by the computational resources available. Yet, as proven in several real-world applications (Doria et al., 2011;Rach et al., 2015;Hoeb and et al., 2017;Omran et al., 2019) RedCRAB was proven to be a computationally efficient and robust technique.

Optimal Dosage
To solve the optimization problem with respect to dosages, we optimize a control field D(t) defined between (t 0 ≤ t ≤ t f ) via RedCRAB. Then, we proceed by mapping the D(t) doses function into (n + 1)-integer values which correspond to (n + 1)doses D * j (j = 0,..., n), where n is the number of total doses given to the patient; t f is the final time of the therapy and t 0 = 0 the initial time. Accordingly, we can define the schedule of administration as: i.e., t i = i · t f n+1 for i = 0, 1, ..., n + 1 Indeed, the n-doses D * j are obtained by integrating the doses function D(t) between adjacent times in the schedule administration as follows: with j = 0, ..., n. The more general case where also the time schedule of the administration is optimized (i.e., optimal schedule case) is described in the SM.

Imatinib Administration in Chronic Myeloid Leukemia-CML
We here show the application of CT4TD to the specific case of Imatinib mesylate administration in patients with CML. The final goal is to determine the drug optimized dosage and schedule in two distinct scenarios.
1. In the first case the goal is to optimize personalized therapeutic strategies to reach given target concentrations, as those commonly used in clinical protocols, and by assuming to be at diagnosis time. 2. In the second case, we employ the population dynamics models, as retrieved by fitting longitudinal data on single patients under standard treatment, to deliver patient-specific therapies that are most effective in reducing/eradicating the tumor subpopulation after the major molecular response, on the basis of PK/PD personalized models.
Imatinib is an inhibitor of the BCR-ABL tyrosine kinase, which is known to bind to the inactive form of BCR-ABL at nanomolar concentration, competing with the ATP for its binding pocket and hindering the switch of the fusion kinase to the active form, therefore impairing the catalytic activity of the enzyme (Gambacorti-Passerini et al., 2003). The therapy is in most cases long-life (Michor et al., 2005;Tang et al., 2011;Branford et al., 2013;Olshen et al., 2014;Rainero et al., 2018) and the treatment is expensive (≈ 30,000 US$ per year; Cole and Dusetzina, 2018). Therefore, the impact of an optimized and personalized administration would be two-fold: on the one hand, it could be effective in optimizing the performance, while reducing the toxicity and minimizing the adverse effects for long-term therapies (Larson et al., 2008b;Mughal and Schrieber, 2010;Hu et al., 2012); on the other hand, it could help in reducing the overall economical costs, which currently limit the access to therapy, hence making long-term health care more sustainable (Fojo and Grady, 2009;Himmelstein et al., 2009)

Datasets
We applied the CT4TD framework to a longitudinal dataset from Michor et al. (2005), in which 29 CML patients have been monitored with a peripheral blood draw taken every 90 days, from the time of diagnosis up to a maximum time of about 3, 500 days (average time ≈ 2, 659 ± 938 days) For these patients, the administration schedule has been 400 mg Imatinib/day for the whole considered period.
In particular the fraction of cancer cells in blood-that will be referred to as tumor burden from now on-can be reliably estimated by analyzing the expression level of the fusion gene BCR-ABL, thus providing an easy way to monitor the disease progression, as well the response to therapy. As BCR-ABL transcript is solely expressed by the leukemic cells, its measurement by mean of quantitative PCR (Q-PCR) is considered one of the most sensitive and specific techniques to indirectly assess the tumor burden, and is the standard de facto for monitoring minimal residual disease in CML.
More in detail, we selected a subset of the dataset provided in Michor et al. (2005), by removing all the patients that displayed too few data points (i.e., < 3), or that were characterized by resistant mutations, i.e., specific DNA alterations that render the therapy via Imatinib ineffective, usually due to steric impediments (Shah et al., 2004). In such cases, it is common practice to employ an alternative therapy, based either on Dasatinib, Nilotinib, Ponatinib, or Bosutinib (Shah et al., 2004). We decided to leave resistant patients out of the analysis for two distinct technical reasons. First, this scenario would require a more complex population dynamics model-i.e., with more subpopulations-, characterized by many more parameters, often impossible to estimate. Second, in this case the identification of an optimized therapy should involve two distinct controls, and even if theoretically possible, this would require to obtain data concerning the effect of Dasatinib/Nilotinib/Ponatinib/Bosutinib on tumor burden, which are not present in the used dataset.
We eventually selected 22 (out of 29) patients, for which the therapy led to a successful major molecular response (MMR), i.e., the ratio of cells with BCR-ABL mutation is ≤ 0.1 on the international scale (Griffiths et al., 2014). FIGURE 2 | Patient-specific PK models. Personalized pharmacokinetic curves (blue solid lines), as estimated from demographic factors, such as age, sex, and body weight, as per Equations (3) and (4), in the range t ∈ [0, 48] h (x axis); y axis describes the drug concentration C(t) in [mg/L]. The blue curves correspond to the 22 distinct patients included in the dataset, whereas the dashed red curve represents the population-average pharmacokinetics.

Patient-Specific PK Models
We first use patient-specific PK models by incorporating demographic factors-i.e., body weight, age, and sex-in the clearance and in the volume of the distribution, as per Equations (3) and (4) (Widmer et al., 2006)(see section 3). The parameter settings of RedCRAB optmization for this case study are shown in Table S6.
In Figure 2, the PK curves corresponding to the 22 patients (in blue) and the average PK model (in red) over a selected time window ([0, 48] h) are displayed.

Defining Personalized Optimized Administration at Diagnosis Time
CT4TD can be employed when CML is diagnosed, in order to identify optimized therapeutic strategies that lead to drug concentrations as close as possible to given targets. We here present the application of CT4TD to two distinct targets.
The first target concentration is C targ (t) = 0.57 [mg/L], which is currently the most widely employed in the clinic (Peng et al., 2005). It is hypothesized that any effective therapy should ensure a drug concentration close to, but strictly larger than this value, in order to lead to a good performance, while minimizing the AEs (Graham et al., 2002;Faber et al., 2016). In this case, we consider this concentration as a lower-bound target, and the goal of the CT4TD framework will be to design an optimized therapy to be close to, but strictly larger than this concentration value, by employing an opportune distance notion (see section 3).
The second target concentration is C targ (t) = 1 [mg/L] and is supposed to provide a more effective therapy, but at the cost of an increased likelihood of AEs and toxicity (Gambacorti-Passerini et al., 1997;Picard et al., 2007;Baccarani et al., 2014). From common practice, an effective therapy is that leading to values of drug concentrations around this target. For this reason, CT4TD will return an optimized therapy ensuring a drug concentration as close as possible to this optimal target (see section 3). Note that one could select any arbitrary target concentration, or even combinations of targets, and this would not affect the validity of our approach. For both targets we tested distinct settings, in which we considered, respectively, 1 and 3 doses per day at fixed times (i.e., 1 dose each 24 and 8 h, respectively) 3 .
In Figure 3, we present the application of CT4TD to a selected patient (n. 0001 00004 AJR, male), with respect to the lower-bound target C targ (t) = 0.57 [mg/L] (left panels), and the optimal target C targ (t) = 1 [mg/L] (right panels). In particular, we compared the standard administration (red), the 1-dose optimized therapy (blue) and the 3-doses optimized therapy (green), on a temporal window of 14 days, with respect to drug dosage (Figures 3A-E),drug concentration in blood (Figures 3B-F), cumulative (Euclidean) distance with respect to the target concentration (Figures 3C-E), and AUC (Figures 3D-H).
When assessing the goodness of a therapy in the lowerbound scenario-i.e., C targ = 0.57 [mg/L]-it is important to look at both the distance to the target and the overall time in which the drug concentration is above such target. In Figures 3A-D, one can see that the optimized 1-dose strategy displays higher cumulative distance and area under the curve-AUC-with respect to the standard schedule, due to the fact that drug concentration is always strictly larger than the lowerbound target. This is proven by the proportion of time spent above the target (computed on the whole period), which is 100, 100, and 88.6%, for the 1-dose, the 3-doses, and the standard administrations, respectively. The 3-doses optimized strategy displays a remarkable improvement also with respect to cumulative distance and AUC, proving to be an effective therapeutic choice for this specific patient. This expected result shows the effectiveness of our methodological approach in producing biologically-plausible experimental hypotheses.
With respect to the optimal target-i.e., C targ (t) = 1 [mg/L]-, an effective therapy should ensure a drug concentration as close as possible to the target, thus reducing the drug surpluses, while minimizing the cases of insufficient dosage. In this case, the 1-dose optimized scenario almost overlaps with the standard administration (yet, this is not always the case as one can see, for example, with respect to 0006 00007 RJW in Figure S18), whereas the 3-doses optimized strategy displays an improvement in terms of cumulative distance, as the drug concentration is constantly kept much closer to the desired target, thus importantly reducing under-and over-dosing (Figures 3E-H). In Figures S8-S28, one can find the results of the analyses on the other 21 patients included the dataset.

Adjusting Treatment for Tumor Burden Reduction
The CT4TD framework can be employed in order to identify optimized therapeutic strategies for patients that are currently treated with a standard regime, and for which longitudinal data on tumor burden variation are available. In this case, in order to estimate personalized PD models from experimental datawhich describe the individual therapeutic response to identical drug concentrations-, CT4TD employs a module which fits FIGURE 3 | Patient-specific optimized therapy with fixed target drug concentrations. Optimized Imatinib administration returned by CT4TD for patient 0001 00004 AJR from Michor et al. (2005), in the cases of: 1-dose/day (blue) and 3-doses/day (green), with respect to: lower-bound target concentration C targ = 0.57 [mg/L] (A-D), and optimal target concentration C targ = 1 [mg/L] (E-H). Standard administration-i.e., 400 mg Imatinib/day-is shown with a red dashed line. In this case, the optimization is obtained on patient-specific PK parameters, without considering the PD models. In particular, we fitted each patient's data with a biphasic exponential, which in log-scale describes the presence of two straight lines with distinct slopes, as proposed by Michor et al. (2005), Tang et al. (2011), andOlshen et al. (2014) (see section 3 for further details). With a few assumptions, the slope of such lines can be used to estimate the parameters of a 2-compartment population dynamics model of CML and, in particular, the  (stem) cancer subpopulation death rate in presence of a standard Imatinib therapy-i.e., 400 mg per day-in each patient. This allows to estimate the patient-specific parameters of the PD model. The results of the data analysis on all patients are presented in Table S3 and in Figures S4-S6. In Figure 4A, one can see the personalized PD curves for the 22 patients, computed via Equation (10), as compared to the average one.
We also assessed the relative relevance of the personalized parameters of the PD and PK models with respect to the efficacy of the therapy. The heat-map in Figure 4B returns the variation of the efficacy with respect to combination of timeaverage concentrationC and EC 50 , highlighting the personalized parameters of the 22 patients. As a first result, one can see that much of the variance in our dataset is due to differences in PK, rather than to PD, which however is still relevant. Notice also that the two visible clusters basically overlap with the male and the female groups, providing a possible explanation of the distinct therapeutic response observed in clinical studies (Branford et al., 2013). We stress that the estimation of personalized PD models from experimental data of patients under treatment is one of the major novelties of our approach, and, in combination with the demographics-based PK models, allow to identify patientspecific therapeutic regimes that are optimized to minimize the tumor burden.
In order to identify personalized optimized therapies, we finally defined a cost function with the goals of: (i) maximizing the reduction of the tumor burden, and (ii) minimizing the toxicity and possible AEs, in terms of AUC (see section 3 for further details). Such cost function requires to set opportune weights W 1 and W 2 for the two terms, respectively. In particular, a parameter φ = W 1 W 2 is defined, which can be opportunely tuned to favor either the first or the second term. However, the choice FIGURE 6 | Adjusting therapy for tumor burden minimization. Imatinib administration optimized for tumor burden minimization in patient 0001 00004 AJR (male) from Michor et al. (2005), in the cases of: 1-dose/day (purple) and 3-doses/day (green), with φ = 60, as compared to standard administration (red). In this case, the optimization is obtained on patient-specific PK and PD models. (A) Imatinib scheduled dosage in mg (y axis), displayed on 14 days (x axis). (B) Imatinib concentration in blood in [mg/L] (y axis). (C) Temporal variation of the AUC in [mg · h/L] (y axis). (D) Longitudinal data points on tumor burden recorded in the interval t ∈ [0, 2340] days (purple). The best fit is shown with red lines. The slope of the right-most line is used to determine the cancer stem cell death rate and, in turn, the patient-specific PD parameters. The blue (green) line represents the predicted cancer subpopulation decay in case the 1-dose (3-doses) optimized therapy was adopted from day 2, 340 to day 3, 500.
of a specific value for φ is arbitrary and depends on subjective research and clinical criteria.
To investigate the sensitivity of our framework to the variation of this parameter, we repeatedly applied the CT4TD framework to the CML dataset, by scanning various values of φ, and eventually assessed the differences in: (i) the time-average AUC as computed on a 14-days temporal window, and (ii) the efficiency computed on the time-average concentration in the same period, with respect to a 1-dose optimization scenario (the 3-doses scenario can be found in Figure S50). In Figure 5, one can see the distribution of both quantities with respect to the 22 patients in the dataset, divided in males (blue) and females (pink), as compared to the average AUC and efficiency for the standard administration case (red).
A first important thing to notice is that the results are highly sensitive with respect to the choice of φ, and can either display improvements (e.g., higher AUC and/or lower efficiency) or worsening with respect to the standard case in distinct cases. Moreover, male and female groups show significantly different distributions, thus pointing at physiological differences that should be considered in therapy design. As a rule-of-thumb, we suggest to select a value of φ for which a slightly larger value of efficiency is observed, while not inducing a too high increase in AUC. In our case, we selected a value of φ equal to 60 for men and of 75 for women.
In Figures 6A-D, we show the comparison among the actual therapeutic regime administered to a selected patient (n.0001 00004 AJR, male-code ID in Table S3) and the optimized therapies identified via CT4TD by setting φ = 60, in both 1-dose (blue) and 3-doses (green) scenarios, in terms of: (i) drug dosage, (ii) drug concentration, (iii) AUC, and (iv) variation of the tumor burden in time. In particular, the temporal evolution of the tumor burden from diagnosis to the present is displayed by showing the experimental data points (purple) and the best fit (red), whereas the predicted future evolution is shown with respect to the 1-dose (blue) and the 3-doses (green) optimized strategies.
A result is that, given similar AUC curves (i.e., similar toxicity and AEs), both the 1-dose and the 3-doses optimized strategies lead to a significantly faster predicted tumor burden decay. In particular, the tumor burden decay is, respectively 3.07 and 1.82 times faster for the 1-dose and the 3-doses regimes, with respect to standard administration. This result paves the way for an automated strategy for therapy adjustment design, which might be further developed by employing closed-loop controllers.
In Figures S29-S49, one can find the results of the analyses on the other 21 patients included the dataset. FIGURE 7 | Robustness analysis with respect to intra-patient variability. Intra-patient variability is here defined as stochastic noise randomly picked in the uniform distribution χ r = [−σ r , σ r ] and applied to distinct PK parameters. In particular, we here show the average and the standard deviation of the relative variation of the cost L/L 0 , with respect to distinct levels of noise, in the cases: r = k a (A), r = v in (B), and r = CL (C). Values of the fit are provided in Table S5.

Robustness Analysis
In order to assess the reliability of the results produced by CT4TD, we tested its robustness with respect to intra-patient variability and to possible systematic errors. To account for intrapatient variability, we introduced a stochastic and uniformly distributed noise, i.e., χ r = [−σ r , σ r ] with r = k a , CL, v, to the following parameters of the PK model: k a , CL, and v, for every time point in the analysis. We performed 700 distinct PK simulations, on the average patient, in the specific scenario of a target concentration C targ (t) = 0.57 [mg/L] and 1 dose per day. We then analyze the relative variation of the average cost L, as compared to the noise-free case, with respect to the width of the distribution of noise σ r . In Figure 7, one can notice that L variation with respect to the noise level follows an approximately quadratic trend, which is proven by fitting the data points with a curve with equation b + aσ 2 r (the complete results of the fit are provided in Table S5). Note that such results are in agreement with other works that use quantum optimal control (Montangero et al., 2007;Kallush et al., 2014;Hoeb and et al., 2017). We performed a further robustness analysis, to assess the impact of systematic errors, as those possibly due to scarce reliability of the demographic study and/or to small or imbalanced datasets, and which may result in errors in the estimation of the PK parameters. To this end, we generated an optimized schedule for a set of PK parameters {k a , CL, v} and then we applied such schedule to a simulated patient where a parameter at time is FIGURE 8 | Robustness analysis with respect to systematic errors. Relative variation of the cost L/L 0 with respect to the relative value of the systematic error δ ka /k a . different, e.g., {k ′ a , CL, v} with k ′ a = k a + δ k a . We finally measured the difference of L, as a function of δ k a . Also in this case, we show in Figure 8 that the results produced by CT4TD are robust with respect to possible technical or measurement errors. In fact, with respect to an error of ≈ ±30%, we observe a maximum difference ≈ 10% in performance, as compared to the noise-free case.

DISCUSSION
The introduction of the CT4TD framework aims at providing an automated and data-driven procedure for decision support in health care and personalized therapy design in cancer, especially by exploiting the increasing available computational power, which allows one to perform large-scale simulations and efficient search in the parameter space, and to deal with noisy and imperfect data.
In particular, CT4TD aims at overcoming the limitations of current control-based methods for therapeutic hypothesis generation. First, its completely general theoretical approach allows to consider: (i) any disease for which a PK/PD model can be derived and its parameters measured, (ii) any kind of administration, e.g., continuous drug infusion or discrete doses, (iii) any measurable term that is considered as relevant in the definition of a therapeutic cost. CT4TD eventually allows to evaluate in silico the outcome of the designed therapy.
Furthermore, CT4TD introduces the possibility of designing optimized therapeutic strategies based on experimental data concerning the disease progression. The identification of databased patient-specific PK/PD models is one of the major novelties of CT4TD and has a profound impact on the characterization of tumor heterogeneity and, accordingly, on the customization of cancer therapies.
One of the main limitations of CT4TD derives from the adoption of highly simplified models of cancer population dynamics. Unfortunately, the shortage of adequate longitudinal data on tumor dynamics prevents to estimate the parameters of more sophisticate and biologically realistic models, which may take into account, for instance, the existence of various competing cancer subpopulations, or the complex interplay occurring within the tumor microenvironment. However, we claim that our theoretical approach is completely general and it will hold whether and when higher-resolution longitudinal data on disease progression would become available, allowing for instance to measure the (sub)clonal prevalence variation in time (as proposed, e.g., by Acar et al., 2019).
Several developments of CT4TD are underway. In particular, the possibility of tuning the PK/PD models to include information on the somatic evolutionary history of the tumors (Ramazzotti et al., 2015;Caravagna et al., 2016Caravagna et al., , 2018 will be essential in delivering more effective personalized therapeutic strategies. This is especially important for tumors displaying high levels of intra-tumor heterogeneity, which is known to be responsible for drug resistance, therapy failure and relapse (McGranahan and Swanton, 2015).
As CT4TD relies on the RedCRAB optimization framework (Heck et al., 2018b;Omran et al., 2019), the overall procedure could be implemented in remote, paving the way for a wireless decision support system for therapy design, to be used directly by clinicians (Jeong et al., 2015). In this respect, as a future development, an open-source computational tool will be made available to the scientific community, allowing to perform individual-specific analysis for a wide range of disease.

DATA AVAILABILITY STATEMENT
All data used in this paper are available from the supplementary material of Michor et al. (2005). We provide the source code and the input data to reproduce the case studies at: https://github.com/BIMIB-DISCo/CT4TD.

AUTHOR CONTRIBUTIONS
FA, AG, SM, and MA designed the research. FA and MR implemented the method. FA, AG, and SM performed the research. FA, AG, DM, TC, and RP analyzed the data. All authors wrote and revised the paper.

ACKNOWLEDGMENTS
This work was partially supported by the Elixir Italian Chapter and the SysBioNet project, a Ministero dell'Istruzione, dell'Universitá e della Ricerca initiative for the Italian Roadmap of European Strategy Forum on Research Infrastructures. Furthermore we acknowledge financial support from IQST alliance. Partial support was also given by the CRUK/AIRC Accelerator Award #22790, Single-cell Cancer Evolution in the Clinic. We thank Giulio Caravagna for helpful discussions. This manuscript has been released as a pre-print at bioRxiv (Angaroni et al., 2019).