# An Optimal Control Framework for the Automated Design of Personalized Cancer Treatments

^{1}Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy^{2}Institute of Molecular Bioimaging and Physiology, Consiglio Nazionale delle Ricerche (IBFM-CNR), Segrate, Milan, Italy^{3}Center for Integrated Quantum Science and Technologies, Institute for Quantum Optics, Universitat Ulm, Ulm, Germany^{4}Istituto Nazionale di Fisica Nucleare (INFN), Padova, Italy^{5}Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy^{6}Forschungszentrum Jülich, Institute of Quantum Control (PGI-8), Jülich, Germany^{7}Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy^{8}Hematology and Clinical Research Unit, San Gerardo Hospital, Monza, Italy^{9}Department of Physics and Astronomy “G. Galilei”, University of Padova, Padova, Italy^{10}Bicocca Bioinformatics Biostatistics and Bioimaging Centre - B4, Milan, Italy

One of the key challenges in current cancer research is the development of computational strategies to support clinicians in the identification of successful personalized treatments. Control theory might be an effective approach to this end, as proven by the long-established application to therapy design and testing. In this respect, we here introduce the Control Theory for Therapy Design (CT4TD) framework, which employs optimal control theory on patient-specific pharmacokinetics (PK) and pharmacodynamics (PD) models, to deliver optimized therapeutic strategies. The definition of personalized PK/PD models allows to explicitly consider the physiological heterogeneity of individuals and to adapt the therapy accordingly, as opposed to standard clinical practices. CT4TD can be used in two distinct scenarios. At the time of the diagnosis, CT4TD allows to set optimized personalized administration strategies, aimed at reaching selected target drug concentrations, while minimizing the costs in terms of toxicity and adverse effects. Moreover, if longitudinal data on patients under treatment are available, our approach allows to adjust the ongoing therapy, by relying on simplified models of cancer population dynamics, with the goal of minimizing or controlling the tumor burden. CT4TD is highly scalable, as it employs the efficient dCRAB/RedCRAB optimization algorithm, and the results are robust, as proven by extensive tests on synthetic data. Furthermore, the theoretical framework is general, and it might be applied to any therapy for which a PK/PD model can be estimated, and for any kind of administration and cost. As a proof of principle, we present the application of CT4TD to Imatinib administration in Chronic Myeloid leukemia, in which we adopt a simplified model of cancer population dynamics. In particular, we show that the optimized therapeutic strategies are diversified among patients, and display improvements with respect to the current standard regime.

## 1. Introduction

The increasing availability of reliable experimental data on cancer patients and the concurrent decreasing costs of computational power are fueling the development of algorithmic strategies for the automated generation of experimental hypotheses in cancer research. This is particularly relevant in the sphere of precision and personalized medicine, as efficient methods are urgently needed to make sense of available data and support clinicians in delivering patient-specific therapeutic strategies (Salgado et al., 2018). To this end, methods borrowed from optimal control theory (e.g., Bertsekas, 1995; Bailey and Haddad, 2005; Lenhart and Workman, 2007; Aström and Murray, 2010) can be employed in combination with efficient techniques for data analysis (Michor et al., 2005; Tang et al., 2011; Olshen et al., 2014; Rainero et al., 2018), to produce accurate predictive model of the clinical outcome of a given therapy in single cancer patients.

Here, we introduce a theoretical framework named CT4TD (Control Theory for Therapy Design), which employs the RedCRAB optimal control algorithm (Heck et al., 2018b; Omran et al., 2019), on patient-specific pharmacokinetics and pharmacodynamics (PK/PD) models (Welling, 1997), with the goal of delivering an optimized drug administration schedule (see Figure 1 for a schematic representation of the framework).

**Figure 1**. CT4TD pipeline. **(A)** The CT4TD framework employs demographic factors such as body weight, age, and sex to define patient-specific parameters of the pharmacokinetic models. We here focus on the case of Imatinib administration in Chronic Myeloid leukemia (CML). **(B)** CT4TD manages two working scenarios: (i) at time of diagnosis, CT4TD can be used to reach given optimal/lower-bound drug concentration targets, e.g., from clinical studies (*steady state optimization*); (ii) when longitudinal data on tumor burden variation under standard therapy are available, CT4TD fits the data points with a hierarchical population dynamics model of CML, and this allows to estimate patient-specific pharmacodynamics (PD) parameters, based on the observed cancer cell death rate. In both scenarios, optimization on pharmacokinetics/pharmacodynamics (PK/PD) models is performed via RedCRAB, on distinct cost functions, aimed at: either being close to given target concentrations (and strictly larger in the lower-bound case)—WS (i); minimizing the Area Under the Curve (AUC) *and* the tumor burden—Working Scenario (WS) (ii). **(C)** optimized personalized dosage and schedule are returned, allowing to measure *in silico* the differences with respect to standard administration, in terms of dosage, drug concentration, cost, and AUC. WS (ii) allows to predict the tumor burden evolution in case of an optimized therapy.

In brief, PK models describe the temporal dynamics of the concentration of a given drug in a certain tissue or organ, whereas PD models depict the efficacy of the drug with respect to distinct concentration values. The CT4TD framework first defines patient-specific PK models based on demographic factors, such as, e.g., age, sex, and body weight. Such models are employed to automatically identify optimized therapy dosages and/or schedules to reach given target drug concentrations, as those commonly used in clinical practice, also by respecting any desired constraint such as, e.g., the maximum allowed number of doses per day or the maximum dosage. In this way, our framework can guide clinicians in the setting of optimized regimes at diagnosis, allowing for an either more or less aggressive tuning; this approach mimics the *steady state* optimization commonly proposed in pharmacological studies.

Furthermore, when longitudinal experimental data on tumor burden—e.g., the fraction of tumor cells on the total, in liquid tumors—are available for patients under standard treatment, CT4TD allows to determine optimized personalized strategies to be used in order to minimize or even eradicating the cancer cell subpopulation. In fact, with the CT4TD framework it is possible to fit experimental data with a hierarchical population dynamics model, which describes the temporal evolution of cancer subpopulations in a given tumor (Michor et al., 2005; Stiehl and Marciniak-Czochra, 2012; Werner et al., 2016; Stiehl et al., 2018). Such model allows to measure the impact of a given therapy over the tumor's ability to expand and develop and, accordingly, to estimate patient-specific PD models from clinical data, which are then employed to design optimized therapeutic regimes aimed at reducing the tumor burden.

Therefore, CT4TD can support clinicians in designing personalized therapies both at diagnosis and when longitudinal data on disease progression have become available. In all scenarios, with our approach it is possible to compare the actual therapeutic regime with the optimized one, showing improvements in terms of efficacy, toxicity, and overall costs.

The CT4TD theoretical framework is general and applicable to any kind of drugs, as long as PK/PD models can be retrieved or estimated. Yet, liquid tumors allow to safely adopt several simplifications and define simple and reliable models of population dynamics, avoiding possible complications due to the spatial and morphological properties of solid tumors (Graudenzi et al., 2014, 2018).

For this reason, in this work we apply the CT4TD to the specific case of Imatinib administration in patients with Chronic Myeloid leukemia (CML), and we show the advantages of employing our automated and data-driven framework in terms of increased efficacy of the therapy and reduction of the overall costs and toxicity. In particular, we here present the results of the application of the CT4TD framework in two ideally subsequent scenarios.

In the first case, CT4TD is used to identify the best therapeutic regime to reach selected target drug concentrations, as those commonly used in the clinic (e.g., Gambacorti-Passerini et al., 1997; Peng et al., 2005; Baccarani et al., 2014). This scenario provides indications which can be employed by clinicians at the time of the diagnosis. Importantly, the inclusion of demographic factors within the PK models (Widmer et al., 2006) allows to define personalized drug schedules that are different from standard practice. A *robustness analysis* to assess the impact of intra-patient variability and of possible systematic errors proves the safety of the hypotheses generated with our approach, especially with respect to possible technical or measurement errors.

In the second case, we employ longitudinal data on tumor burden of a selected cohort of CML patients under standard treatment, in order to retrieve personalized PD models and, accordingly, to identify an *adjusted* therapy that is most effective in minimizing cancer subpopulation, once the major molecular response has been observed. In both cases the results allow to explicitly evaluate the advantages in costs and improved efficacy with respect to standard therapies.

## 2. Background

### 2.1. Pharmacokinetic and Pharmacodynamic Models

Pharmacokinetic (PK) models (Welling, 1997) are mathematical models that describe the temporal evolution of the concentration of a substance in a certain tissue of the body. Commonly used techniques to study such processes are the so-called *compartmental* models (Welling, 1997), i.e., dynamical models based on the law of conservation of mass, and which assume that the body is composed by a certain number of macroscopic coupled subsystem, namely compartments. Such models assume an instantaneous mixing of the drug in a compartment and a perfect transport among them, and are usually defined via systems of differential equations (e.g., Schwilden, 1981). The solution of such systems provide predictions about the variation of drug concentration in time, in a certain tissue. A limitation of PK models is the employment of coarse-grained oversimplifications, which require *ad-hoc* assumptions and are valid only for sufficiently long timescales.

Pharmacodynamic (PD) models (Welling, 1997; Rowland et al., 2011) study the relationship between the concentration of a drug and the resulting effect, in terms of efficacy and possible adverse effects (AEs). The effects of a certain substance are estimated by modeling relevant biochemical reactions, usually by exploiting the law of mass action (see, e.g., Goutelle et al., 2008). One of the major limitations of PD models is that it is usually impossible to have all the measurements necessary to determine the kinetic constants of the involved chemical reactions. For this reason, the efficacy of a drug is usually estimated with statistical methods and target concentrations are defined with respect to some arbitrary criteria (Peng et al., 2005; Larson et al., 2008a; Takahashi et al., 2010; von Mehren and Widmer, 2011; Baccarani et al., 2014).

PK/PD models are increasingly used to define new drug dosage guidelines and protocols (e.g., Peng et al., 2005). Nonetheless, standard approaches to this end are affected by several major issues. Usually the optimal dose is identified in phase I dose escalating clinical trials. Moreover, such trials may suffer from possible idiosyncrasies of the study, from the presence of unknown confounding factors and from the often limited sample size. Another problem is that the recommended dosage is often defined as optimal for an ideal—and non existing—*average* patient, because the efficacy is only defined statistically. As a consequence, the same drug dosage/schedule might be either insufficient or exceeding for different patients. In the former case, this might lead to a non-optimal clinical outcome, in terms of lower efficacy of the treatment, whereas in the latter case an excess of drug may raise the probability of AEs, as well as the economic cost of the therapy, an aspect that is particular relevant for oncological therapies (Fojo and Grady, 2009; Himmelstein et al., 2009; Experts in Chronic Myeloid Leukemia, 2013; Gomez-de León et al., 2017; Jabbour et al., 2017).

Therefore, effective strategies for the identification of optimized personalized therapy schedules are needed, in order to possibly reduce the amount of drug and minimize the probability of related adverse effects, while providing the same or an even better efficacy—i.e., clinical outcome—, with respect to the standard administration schedule. As a side effect, an optimized personalized schedule would also deliver a minimal economic cost, i.e., more patients will be able to afford its costs.

In this respect, CT4TD allows to: (i) define patient-specific PK models that depend on a number of demographic factors and biological covariates, such as age, sex, ethnicity, and body weight, as proposed by Widmer et al. (2006); (ii) estimate personalized PD models from longitudinal experimental data on tumor burden (if available). This allows to identify personalized therapeutic strategies, which explicitly account for the expected differences in PK and PD, due to the physiological heterogeneity of the individuals. It is important to stress that population PK/PD models are employed in a wide range of distinct diseases such, e.g., cancer (Yoshitsuga and et al., 2012), HIV (Chan et al., 2011), diabetes (Landersdorfer and Jusko, 2008), as well as in anesthesia administration (Potts et al., 2008).

### 2.2. Applications of Optimal Control Theory in Medicine

Control theory is an interdisciplinary field bridging engineering and mathematics, whose main objective is to define an opportune control function that modifies the state of a given dynamical system in order to perform a specific task, while minimizing the cost and maximizing the performance (Bertsekas, 1995; Lenhart and Workman, 2007; Aström and Murray, 2010) (see section 3 for a technical description).

Two main classes of controls exist: (i) *open-loop* control, and (ii) *closed-loop* (feedback) control. In the former case, the set and sequence of control actions is chosen a priori, by exploiting theoretical study on the models. In this case, the input is independent with respect to the output (e.g., possible measurements on the system) (Lenhart and Workman, 2007). Closed-loop control, instead, introduces in the procedure one or more feedback loops, which are able to quantify the real response of the system to variations of the control functions, and adjust them according to the differences recorded between the theoretical and real behaviors of the system (Aström and Murray, 2010).

There are several examples of successful applications of control theory in pharmacology (see Bailey and Haddad, 2005; Shi et al., 2014). In this respect, the final goal is to determine the optimal set of therapeutic choices—e.g., dosages and schedules—to obtained a desired efficacy, while minimizing the overall costs. Closed-loop controls are extremely effective in achieving this goal and have been often implemented in real-world health-care settings (Haddad et al., 2006; Steil, 2013; Jayachandran et al., 2014; Shi et al., 2014; Babaei and Salamci, 2015; Fuentes-Garí et al., 2015; Naşcu et al., 2015). Nonetheless, technological problems, such the absence of real-time measurements, as well as possible problems in titrating drugs to the right concentration, are still limiting real-life applications (Bailey and Haddad, 2005; Cunningham et al., 2018). For this reason, open-loop controls are still a viable option, mostly because of the applicability in a wide range of real-world scenarios for which, for instance, real-time measurements and/or therapy adjustments are unfeasible. Moreover, open-loop controls have proven to identify more effective drug concentration in therapeutic ranges than standard clinical practice (Barbolosi and Iliadis, 2001; Ledzewicz and Schättler, 2006; Zhu and Qian, 2014; Bara et al., 2017; Rocha et al., 2018; Yoon et al., 2018).

However, many approaches in both categories are based on limiting assumptions. Certain techniques, for instance, assume continuous—yet unrealistic—drug infusion procedures (e.g., Pefani et al., 2013). Some methods rely on often speculative mathematical models, which cannot be evaluated due to the lack of opportune experimental data (Yoon et al., 2018).

The CT4TD framework aims at improving the current state-of-the-art, by solving an open-loop control problem on PK/PD models via RedCRAB (Heck et al., 2018b; Omran et al., 2019), a remote suite based on dCRAB (Doria et al., 2011; Rach et al., 2015), an algorithm for optimization and control. The dCRAB algorithm is particularly suitable for complex optimization problems when it is neither possible or efficient to build the gradient from the set of differential equations, defined by the main dynamics. In the aforementioned case, the standard gradient-based methods could not be efficient or failed the gradient calculation. The dCRAB optimal control tool has the peculiarity to avoid local traps by changing the optimization basis and paves also the possibility to perform a closed-loop optimization, using the feedback provided by the patient's response. Extensions in this sense are underway.

### 2.3. Mathematical Modeling of Cell Population Dynamics

Many healthy and aberrant biological tissues are characterized by a hierarchical organization, constituted by an ordered sequence of discrete maturation states, as driven by differentiation processes. In this respect, a number of mathematical models have been proposed to study the cell population dynamics, both in healthy systems (Marciniak-Czochra and Stiehl, 2013) and in cancer (Michor et al., 2005; Tang et al., 2011; Stiehl and Marciniak-Czochra, 2012; Olshen et al., 2014; Altrock et al., 2015; Werner et al., 2016; Stiehl et al., 2018).

In such models, cells are divided in *n* non-intersecting compartments, with every ensemble representing a certain stage of cell differentiation. The time ordering of the differentiation stage defines an explicit hierarchy among such ensembles. Accordingly, a *lineage* is defined as a collection of compartments that fully describe all the stages of differentiation of cells within a certain tissue (see Figure S7).

Various approaches are employed to model the dynamics of such systems such as, e.g., ordinary differential equations (ODEs), discrete-time and continuous-time Markov chains, master equations, etc. (see Altrock et al., 2015 for a recent review). Each strategy displays a specific trade-off in terms of expressivity and computational complexity. For instance, ODEs are very convenient from the computational perspective, but they are not suitable in certain cases, e.g., when representing low numbers of cells. Conversely, probabilistic models allow for a richer representation of the system, yet at the cost of a higher computational burden and mathematical complexity.

For sake of simplicity, the CT4TD framework employs a ODEs hierarchical model of cell population dynamics to fit longitudinal data on tumor burden (Michor et al., 2005; Tang et al., 2011; Stiehl and Marciniak-Czochra, 2012; Olshen et al., 2014; Altrock et al., 2015; Werner et al., 2016; Stiehl et al., 2018). On the one hand, this model provides a description of cell population dynamics in time for any given patient. On the other hand, it allows to estimate the efficacy of the therapy in each patient, on the basis of the observed cancer subpopulation decay, which is then used to estimate patient-specific PD models (see section 3 for further details).

## 3. Materials and Methods

### 3.1. Estimation of Patient-Specific PK Models of Imatinib in CML

In order to describe the various steps of the CT4TD pipeline in detail, we here present its application to the specific case of Imatinib administration in CML. Yet, we stress that the theoretical approach is general and could be applied for any therapy for which PK/PD models can be estimated.

We here employ the PK model of oral administration of Imatinib introduced by Widmer et al. (2006) (see Figure S1): if χ_{g}(*t*) is the amount of Imatinib in the gastrointestinal tract, *k*_{a} is the first order absorption rate, *f* is the bioavailability, i.e., the fraction of an administered dose of unchanged drug that reaches the circulatory system, and *D* the ingested dose, then:

so if *v* is the volume of the distribution, i.e., the theoretical volume needed to account for the overall amount of drug in the body in case the drug was evenly distributed throughout the body, *CL* the clearance, i.e., the volume of plasma cleared of the drug per unit time, *C*(*t*) the concentration in the blood, χ_{b}(*t*) the amount of Imatinib in the blood ($C(t)=\frac{{\chi}_{b}(t)}{v}$), then:

An example of the solution of Equation (2) can be found in Figure S2.

Both equations can be tuned to consider demographic factors as body weight, age and sex, thus providing patient-specific PK models. More in detail, such demographic factors can be incorporated in the clearance *CL* and in the volume of the distribution *v*, as initially proposed by Widmer et al. (2006):

where θ_{i}, for *i* = *a, b*, 1, 2, 3, 4 are constants, *BW* is the body weight of the patient and $\overline{BW}$ is its population-average, *AGE* is the age of the patient and $\overline{AGE}$ its population-average and *q* is a binary variable which takes value 1 for male and 0 for female. Estimation of such parameters is provided by Widmer et al. (2006) and shown in Tables S1, S2. As in the dataset used in the case study, only the information about age and sex was available, we estimated the corresponding BW in each patient on the basis of average measures provided by McDowell et al. (2005)^{1}.

#### 3.1.1. Therapy Simulation

In order to simulate a therapy, we need to model a multi-dose oral administration. Let *t*_{in} and *t*_{fin} be the initial and the final time of the therapy, we suppose to give *n*+1 doses at time *t*_{0}, *t*_{1}, *t*_{2}, …, *t*_{n}, with a dose amount *D*_{i} (*i* = 0, 1, 2, 3, …*n*), respectively. Thus, we have *n* first order differential equations like Equation (1); by using the superposition principle the solution will be:

where ${\chi}_{gi}(t)={\psi}_{gi}({t}_{i}){e}^{-{k}_{a}(t-{t}_{i})}$ is a solution of Equation (1), with ψ_{gi}(*t*_{i}) = *fD*_{i}. Then, substituting χ_{g}(*t*) into the Equation (2) it is possible to study the dynamics of blood concentration of a certain drug for a multi-dose oral administration (see Figure S2 for an example). Notice that $\frac{{\chi}_{g}(t)}{v}\text{}=C(t)$.

### 3.2. Estimation of Patient-Specific PD Models From Experimental Data

CT4TD includes a data analysis module, aimed at identifying patient-specific parameters of the PD model from experimental data, and which relies on a widely-used model of cancer population dynamics. The following subsection include details on each pipeline step.

#### 3.2.1. Population Dynamics Model of Leukemia

In CT4TD we use the simplest compartmental model of population dynamics for which it is possible to estimate the parameters from available experimental data.

In detail, the organization of leukemic systems is characterized by a hierarchy that is analogous to the healthy hematopoietic counterpart, and which can be modeled in the simplest case with two compartments: (i) cancer stem cells (CSC) and (ii) progressively differentiated cancer cells (Michor et al., 2005; Tang et al., 2011; Stiehl and Marciniak-Czochra, 2012; Olshen et al., 2014; Wodarz et al., 2014; Werner et al., 2016; Stiehl et al., 2018).

Note that the model could be generalized to account for *m* lineages, in order to represent the possible presence of subpopulations of tumor cells with distinct properties (e.g., therapy resistant phenotypes), and to account for complex interaction phenomena (e.g., between lymph nodes/bone-marrow and blood-stream). However, in order to allow for an accurate and robust parameter estimation, more complex models of population dynamics would typically require—among other things—a much higher number of data points than those usually available. In addition, limitations regarding parameter identification of ODE models with inadequate data (i.e., identifiability of a model) were described in Saccomani et al. (2010) and Hong et al. (2019), and justify our choice of adopting highly simplified models, at least until new suitable experimental data will become available.

In this case, first order differential equations are suitable to describe the population dynamics, because experimental evidences show that the proliferation of healthy cells display an exponential increase (Marciniak-Czochra and Stiehl, 2013), whereas cancer cells under therapy display an exponential decay (Michor et al., 2005; Tang et al., 2011; Olshen et al., 2014; Rainero et al., 2018). Thus, we analyse the fluxes between compartments, by defining the following constants:

• *p*_{i,k} is the division rate of the cells in the *i* ensemble of the *k* lineage.

• *d*_{i,k} is the death rate of the cells in the *i* ensemble of the *k* lineage—this rate will be estimated from experimental data.

• *a*_{i,k} ∈ [0, 1] is the probability that, when a cell undergoes mitosis, both of its daughters belong to the *i* ensemble in the *k* lineage; therefore, 1 − *a*_{i,k} is the probability of belonging to the *i* + 1 ensemble. With respect to CSCs (or SC) this quantifies the self-renewal process.

Note that, we consider a symmetric differentiation scheme, according to which after mitosis both cells are of the same type, either stem or differentiated.

In a single individual, the dynamics of the healthy system—which includes stem cells, progenitors and differentiated cells—and that of the leukemic system coexist. Yet, we here assume that CML cells are cytokine-independent (as formulated by Werner et al., 2016), so the equations describing the dynamics of leukemic subpopulation do not include terms related to the healthy counterpart, (i.e., the ODE system of healthy and leukemic subpopulations becomes uncoupled). As a consequence, the dynamics of the leukemic system can be defined as follows:

where:

This is a typical example of a linear autonomous system, and the solution could be obtained analytically in a recursive way:

#### 3.2.2. Experimental Data Fitting

Once the leukemia 2-compartments model has been defined, it is possible to estimate its parameters from experimental data. In particular, we here focus on the specific case of CML. As every cancer cell in CML is characterized by the BCR-ABL mutation, it is possible to distinguish healthy from cancer cells with Q-PCR measurement, and this allows to have longitudinal experimental data returning the fraction of cancer cells over the total, i.e., the tumor burden (Michor et al., 2005; Tang et al., 2011; Olshen et al., 2014; Rainero et al., 2018).

The CT4TD fits the longitudinal data on tumor burden in each patient with a biphasic exponential, which in log-scale describes two distinct and intersecting lines, as proposed in Michor et al. (2005), Tang et al. (2011), and Olshen et al. (2014). In particular, we selected the combination of straight lines minimizing the value of *R*^{2} (i.e., the standard coefficient of the goodness of a linear regression, which quantifies the portion of the response that is explained by a linear model), by scanning all the points of intersection (with a step of 1 day) and fitting data with two lines with distinct slopes, via a standard non-linear fit (see the Table S3 for all parameter estimation). We also tried to fit data with either one or three distinct lines, yet in our case study the best fit was obtained in the two-lines case.

Once the two best fitting curves have been obtained for each patient, we adopt a simplifying assumption that allows us to estimate the parameters of the compartmental model from data. In Marciniak-Czochra et al. (2009), it is shown that the leftmost curve (with higher slope) is likely to represent the overall population dynamics involving cancer stem cells, cancer progenitors and cancer differentiated cells (decreasing in population size), together with that of healthy blood cells (increasing in population size). Considering that the Q-PCR measurements of the BCR-ABL fusion gene return the ratio between cancer cells and the total number of cells in the system, it would be impossible to disentangle the contribution of each subpopulation to the overall dynamics, and to reliably estimate the values of γ and τ in Equation (7), without *ad-hoc* assumptions and/or further opportune experiments.

Instead, it is possible to hypothesize that the rightmost curve (i.e., the second exponential decay, with lower slope) accounts for the dynamics involving a completely recovered healthy cell subpopulation—thus, healthy cells can be considered as constant in number—and a decaying cancer stem cell subpopulation, with no progenitors and differentiated cancer cells left in the system, as a consequence of the therapy (Michor et al., 2005; Tang et al., 2011; Olshen et al., 2014; Rainero et al., 2018). With this assumption, it is possible to estimate the parameters of the first compartment, and in particular, the cancer stem cell death rate *d*_{1,l} in Equation (9), from experimental data. This also allows us not to explicit consider a model for the healthy hematopoietic system.

In detail, we assume that the exponential decay of the rightmost curve (i.e. the exponential decay of CSC, given by the first equation in Equation 8) accounts for the dynamics of the CSC subpopulation only. In this way, it is possible to evaluate the effect of a standard Imatinib therapy—400 mg per day—directly on the CSC decay, as estimated from any patient's data.

In fact, β_{j}, i.e., the measured slope accounting for the decay of CSCs, will be given by:

where *j* is the patient's index.

#### 3.2.3. Identification of Patient-Specific PD Models

Various PD models can be employed to estimate the efficacy of Imatinib in CML. In our case, we use a PD model based on the maximum-inhibition effect (*E*_{max}) (Peng et al., 2005) (see Figure S3):

where *E*(*t*) is the effect, *E*_{max} is the maximum effect, *C*(*t*) is the concentration of the drug, *EC*_{50} the concentration of the drug that produces half of maximal effect, and *n* is a shape factor.

In order to identify patient-specific PD models from the parameters of the leukemia model estimated from data, we can safely suppose a linear relation between the population-average of the efficacy 〈*E*〉 and the population-average of 〈*d*_{1,l}〉 (Gambacorti-Passerini et al., 1997). Hence, the relation is the following:

where *K* is a conversion constant. In this case, the efficacy of a certain concentration of a drug is directly proportional to the increase of the cancer cell death rate.

It is possible to estimate *K* from the available dataset, by employing patient-specific PK models and an average benchmark PD model (*n* = 1, *EC*_{50} = 0.123 [*mg*/*L*] and *E*_{max} = 1). To do this, we first compute the time-average of the concentration ${\stackrel{-}{C}}_{j}(t)$ for each patient, with respect to a 40-days standard therapy—400 mg Imatib per day—, which we then use to compute the time-average of the efficacy Ē_{j} as per Equation (10), by considering unique average PD parameters for all patients. Finally, we consider the population-average the efficacy 〈*E*〉, as computed on all patients. 〈*d*_{1,l,j}〉 is then obtained by using formula (Equation 9), setting *a*_{1,l} = 0.87 and ${p}_{1,l}=0.45\text{}\left[day{s}^{-1}\right]$ as proposed (Stiehl et al., 2018), and by taking the mean over all patients. As a result, the conversion factor for this dataset is *K* ≈ 0.377 ± 0.0007 [*days*^{−1}]. Since we suppose a linear relation between *K*, 〈*d*_{1,l}〉 and 〈*E*〉 (see Equation 11), the confidence interval of *K* is determined via standard (linear) error propagation procedure.

At this point, we are able to estimate the personalized parameters of the PD model, and in particular of *EC*_{50}, by supposing that the maximum efficacy is *E*_{max} = 1 and the shape factor is *n* = 1, for all patients (Peng et al., 2005; Weigel et al., 2010). Therefore, the relation is the following:

With this procedure, we can estimate the value of *EC*_{50,j} for each patient from individual longitudinal data on tumor burden. This result leads to the definition of PK/PD personalized models, which integrate demographic factors and Q-PCR data that measure the response of each patient to the therapy, and represents one of the major novelties of our approach.

The results for all patients are presented in Table S4 and in Figure 4. Notice that, if longitudinal data on single patients are not available, the CT4TD allows to employ a unique (average) PD model for all patients, as estimated from experimental studies (see e.g., Gambacorti-Passerini et al., 1997; Peng et al., 2005; Picard et al., 2007; Baccarani et al., 2014).

### 3.3. Definition of the PK/PD Control Problem

We formally define the *PK/PD control problem* for the administration of discrete doses as follows. Let be *t*_{in} and *t*_{fin} the initial and the final time of the therapy. Here we aim at finding: (i) the optimal doses ${D}_{0}^{*},{D}_{1}^{*},\dots {D}_{n}^{*}$, and (ii) the optimal schedule of administration ${t}_{0}^{*},{t}_{1}^{*},\dots ,{t}_{n}^{*}$, such that a functional that represents the *cost* ${L}(C(t,\left\{({D}_{0},{t}_{0}),({D}_{1},{t}_{1}),\dots ,({D}_{n},{t}_{n})\right\}))$ is minimized. Notice that the set $\left\{({D}_{0}^{*},{t}_{0}^{*}),({D}_{1}^{*},{t}_{1}^{*}),\dots ,({D}_{n}^{*},{t}_{n}^{*})\right\}$ are the control functions and *C*^{*} is the optimal unknown drug concentration, described in a general setting with the ODE in Equation (18) (the solution of the particular case of multi-dose oral administration is shown in Equation 5). To simplify the notation, in the following we will refer to ${L}(C(t,\left\{({D}_{0},{t}_{0}),({D}_{1},{t}_{1}),\dots ,({D}_{n},{t}_{n})\right\}))$ as ${L}(C(t,\left\{({D}_{i},{t}_{i})\right\}))$.

The definition of the *cost* functional ${L}$ is the *core* of the PK/PD control problem and can include various weighed terms, which one should wisely select with respect the specific problem and goals. In particular, ${L}$ may (or may not) include distinct terms accounting for: (i) the efficacy *E* of the therapy, as derived via PD models, such as the Hill equation (Goutelle et al., 2008) or the *E*_{max} model (Peng et al., 2005) (if average or patient-specific parameters can be estimated); (ii) the toxicity of the therapy and/or the possible AEs as measured, e.g., via the Area Under the Curve (AUC); (iii) in case the PD model is unknown or indefinable—a distance between the optimized concentration *C*^{*}(*t*) and a target concentration *C*_{tg} as estimated, for instance, from clinical trials (Baccarani et al., 2014); (iv) the economic cost of the therapy; (v) the properties and the temporal evolution of the disease, as in the case of the tumor burden estimation from longitudinal experimental data (Michor et al., 2005; Stiehl and Marciniak-Czochra, 2012; Altrock et al., 2015; Werner et al., 2016; Stiehl et al., 2018) (in this case the goal will be the optimization of the performance of the therapy with respect to the minimization of cancer subpopulations; (vi) the probability of developing resistance to the therapy (Michor et al., 2005; Tang et al., 2011), etc. Obviously, some of these terms are highly correlated, as for example the AUC and the economic cost. Notice also that the choice of opportune weights is crucial in defining an effective control, if more than one term is used, and that it is necessary to fix *n* *a priori* when minimizing ${L}$. The latter choice is related to the applicability to current real-world scenarios, in which practical limitations usually prevent to exceed a certain amount of doses per day, as well as to administer continuous dosages, especially with respect to cancer therapies. In detail, we here define cost functions ${L}$ with respect to two distinct scenarios: (i) optimization of therapy for fixed target concentrations, (ii) optimization of therapy for tumor burden reduction or stabilization (as proposed by West et al., 2018).

#### 3.3.1. Working Scenario (i): Optimal Control With Fixed Target Concentration at Diagnosis Time (Patient-Specific PK Models—No PD Models)

In many real-world scenarios it is not possible to retrieve or estimate the parameters of the PD models, for instance at the time of diagnosis. In this case, CT4TD can be used to find the best personalized therapeutic strategy to either: (i) be as close as possible to a given *optimal* target drug concentration, or (ii) be close to, but strictly larger than a given *lower-bound* target (i.e., *steady state optimization*; Shargel et al., 1999). In the first case—i.e., *optimal* target concentration—CT4TD employs a simple Euclidean distance between two concentrations:

*C*_{tg} is the target drug concentration, necessary to have a major molecular response, estimated, e.g., via clinical trials. For example, in clinical studies, Imatinib concentration in blood is required to be above 0.57 [*mg*/*L*] and with a time average of 1 [*mg*/*L*] (Peng et al., 2005). *C*^{*}(*t*) is the unknown concentration of drug in blood, which will be identified by solving the control problem.

Then, in this case the *cost* is defined as follows:

This *cost* favors the solutions which are close to the target, with no preference between above or under the target. In the second case—i.e., *lower-bound* target concentration—CT4TD uses a *step* distance in the space of concentrations. In this case the distance between two concentrations becomes:

where *G* is a constant. It is then possible to define the *cost* as follows:

In this case, CT4TD will give solutions that display concentrations above the lower-bound target. We stress that working scenario (i) is general, as target drug concentration can be derived from any given clinical trial or practice. In such case, the target concentration is a parameter of the cost functional, which can be opportunely modified according to the considered therapy, both in the optimal target (Equation 14) and the lower-bound target cases (Equation 16).

#### 3.3.2. Working Scenario (ii): Optimal Control for Tumor Burden Reduction (Patient-Specific PK Models—Patient-Specific PD Models)

When it is possible to estimate patient-specific PD parameters from longitudinal data on tumor burden variation under standard treatment, CT4TD can be used to identify an adjusted optimized therapy to reduce such burden in each patient. In particular, our approach can use a *cost* function with the aim of: (i) minimizing the number of cancer stem cells (e.g., at the end of the treatment, (ii) minimizing the AUC of the therapy, which accounts for toxicity and possible AEs. To do this, we need to introduce two arbitrary weights *W*_{1} and *W*_{2}, which account for the relative relevance of the two distinct components. Notice that we cannot consider the exponent of Equation (8) only, as the dynamics of *l*_{1}(*t*) is a monotone function and the minimization of cancer stem cells is reached for *D*_{i} → ∞, which implicates that if the maximum amount of drug is bounded (i.e., *D*_{max}) the optimal solution is trivially reached for *D*_{i} = *D*_{max}, ∀*i*. Therefore, we have:

Notice that cost functional in Equation (17) includes the net growth rate of the CSCs, i.e., λ in Equation (8). This choice allows us not to know or estimate the initial number of CSCs *l*_{1}(0), since is not possible to infer this quantity from tumor burden data only. The ratio $\varphi =\frac{{W}_{1}}{{W}_{2}}$ determines the overall of the optimal solution and should be wisely chosen. In order to provide some indications on this modeling choice, we performed an extensive scan of ϕ (in the range ϕ ∈ [10, 100]), with respect to all patients and we analyzed the variation of the time-average concentration $\stackrel{-}{C}$. The results are shown in Figure 5. From this analysis, one can see that a sound choice for ϕ might be in the range [60–65] for males and in [70–75] for females (note this choice depends on the units of measurement), meaning that the weight corresponding to the time evolution of CSCs is relatively more relevant than that corresponding to the toxic effects.

We finally specify that working scenario (ii) was originally designed for liquid tumor therapies, as it requires the definition and the measurement of the tumor burden, which is used to estimate the CSCs net growth rate. Whether measurements on tumor burden and an appropriate model for cancer population dynamics would be available for distinct cancer types, our framework might be applied without any significant theoretical modification.

### 3.4. Resolution of Control Problem via RedCRAB

In CT4TD the control problem is heuristically solved by using RedCRAB, the remote version of the dressed Chopped RAndom Basis (dCRAB) optimal control via a cloud server (Caneva and et al., 2011; Doria et al., 2011; Rach et al., 2015; Heck et al., 2018a; Omran et al., 2019). Optimal control theory has been used for decades to optimize classical processes, and its quantum counterpart has been increasingly exploited in the last years (Khaneja and et al., 2005; Spörl et al., 2007; Caneva and et al., 2009; Brif et al., 2010; Lloyd and Montangero, 2014; Koch, 2016; Pichler et al., 2016; Sørensen and et al., 2016; van Frank and et al., 2016; Deffner and Campbell, 2017; Goerz et al., 2017). In its simplest version, optimal control drives the state of the system to a goal one, characterized by some desired properties, by using a set of time-dependent controls.

Here, the dynamics of the system is identified by the concentration of the drug in a certain compartment *C*(*t, D*(*t*)), which obeys the time evolution equation:

where *D*(*t*) is the time-dependent control function, i.e., the doses function defined in section (see the section describing the *patient-specific PK models of Imatinib in CML*). The goal here is to optimize the drug administration schedule (see section 3.4.1) while minimize the cost functional as defined in subsections describing *working scenario* (i) and *working scenario* (ii) (see above).

Starting from the standard administration schedule *D*_{0}(*t*), the optimization proceeds by looking for the optimal correction *g*(*t*) such the optimal administration schedule will be *D*(*t*) = *D*_{0}(*t*) + *g*(*t*). Following (Rach et al., 2015), the correction *g*(*t*) is expanded in a truncated function space, specifically in random Fourier components as:

where ω_{k} = 2π(*k* + *r*_{k})/*T* and *r*_{k} ∈ [−0.5, 0.5], *n*_{c} is the total number of frequency used, *T* is the final time, and Γ(*t*) is a fixed scaling function to keep the values at initial and final times unchanged. In conclusion, the control problem is reformulated as maximization of a multivariable function ${L}({A}_{k},{B}_{k})$ with fixed ω_{k}, and can be efficiently solved numerically by searching the best combination of {*A*_{k}, *B*_{k}} with the preferred method of choice, here a direct-search method (Nelder and Mead, 1965). Notice that, each frequency ω_{k} is independently optimized: indeed, after a certain number of iterations, we move to the next ω_{k+1}, by introducing an external loop on the frequencies, i.e., super-iterations. This allows the algorithm to include a high number of Fourier components and efficiently find the optimal solution, by avoiding local traps that can stick the optimization into not the global minimum (Rach et al., 2015).

In the RedCRAB optimization, the server generates and transmits a set of controls to the CT4TD, which evaluates the cost function, by interfacing with MATLAB and communicates it to the server completing one iteration. The optimization continues iteratively by providing the optimal set of controls as well as giving back the figure of merit, until the convergence is reached.

We specify that, as for any heuristic method, the solution provided by our approach might be sub-optimal. This depends on the complexity of the search space and by the computational resources available. Yet, as proven in several real-world applications (Doria et al., 2011; Rach et al., 2015; Hoeb and et al., 2017; Omran et al., 2019) RedCRAB was proven to be a computationally efficient and robust technique.

#### 3.4.1. Optimal Dosage

To solve the optimization problem with respect to dosages, we optimize a control field *D*(*t*) defined between (*t*_{0} ≤ *t* ≤ *t*_{f}) via RedCRAB. Then, we proceed by mapping the *D*(*t*) doses function into (*n* + 1)-integer values which correspond to (*n* + 1)-doses ${D}_{j}^{*}$ (j = 0,…, n), where *n* is the number of total doses given to the patient; *t*_{f} is the final time of the therapy and *t*_{0} = 0 the initial time. Accordingly, we can define the schedule of administration as:

i.e., ${t}_{i}=i\xb7\frac{{t}_{f}}{n+1}$ for *i* = 0, 1, …, *n* + 1 Indeed, the n-doses ${D}_{j}^{*}$ are obtained by integrating the doses function *D*(*t*) between adjacent times in the schedule administration as follows:

with *j* = 0, …, *n*. The more general case where also the time schedule of the administration is optimized (i.e., *optimal schedule* case) is described in the SM.

## 4. Results

### 4.1. Imatinib Administration in Chronic Myeloid Leukemia—CML

We here show the application of CT4TD to the specific case of Imatinib mesylate administration in patients with CML. The final goal is to determine the drug optimized dosage and schedule in two distinct scenarios.

1. In the first case the goal is to optimize personalized therapeutic strategies to reach given target concentrations, as those commonly used in clinical protocols, and by assuming to be at diagnosis time.

2. In the second case, we employ the population dynamics models, as retrieved by fitting longitudinal data on single patients under standard treatment, to deliver patient-specific therapies that are most effective in reducing/eradicating the tumor subpopulation after the major molecular response, on the basis of PK/PD personalized models.

Imatinib is an inhibitor of the BCR-ABL tyrosine kinase, which is known to bind to the inactive form of BCR-ABL at nanomolar concentration, competing with the ATP for its binding pocket and hindering the switch of the fusion kinase to the active form, therefore impairing the catalytic activity of the enzyme (Gambacorti-Passerini et al., 2003). The therapy is in most cases long-life (Michor et al., 2005; Tang et al., 2011; Branford et al., 2013; Olshen et al., 2014; Rainero et al., 2018) and the treatment is expensive (≈ 30,000 US$ per year; Cole and Dusetzina, 2018). Therefore, the impact of an optimized and personalized administration would be two-fold: on the one hand, it could be effective in optimizing the performance, while reducing the toxicity and minimizing the adverse effects for long-term therapies (Larson et al., 2008b; Mughal and Schrieber, 2010; Hu et al., 2012); on the other hand, it could help in reducing the overall economical costs, which currently limit the access to therapy, hence making long-term health care more sustainable (Fojo and Grady, 2009; Himmelstein et al., 2009)^{2}.

### 4.2. Datasets

We applied the CT4TD framework to a longitudinal dataset from Michor et al. (2005), in which 29 CML patients have been monitored with a peripheral blood draw taken every 90 days, from the time of diagnosis up to a maximum time of about 3, 500 days (average time ≈ 2, 659±938 days) For these patients, the administration schedule has been 400 mg Imatinib/day for the whole considered period.

In particular the fraction of cancer cells in blood—that will be referred to as *tumor burden* from now on—can be reliably estimated by analyzing the expression level of the fusion gene BCR-ABL, thus providing an easy way to monitor the disease progression, as well the response to therapy. As BCR-ABL transcript is solely expressed by the leukemic cells, its measurement by mean of quantitative PCR (Q-PCR) is considered one of the most sensitive and specific techniques to indirectly assess the tumor burden, and is the standard de facto for monitoring minimal residual disease in CML.

More in detail, we selected a subset of the dataset provided in Michor et al. (2005), by removing all the patients that displayed too few data points (i.e., < 3), or that were characterized by resistant mutations, i.e., specific DNA alterations that render the therapy via Imatinib ineffective, usually due to steric impediments (Shah et al., 2004). In such cases, it is common practice to employ an alternative therapy, based either on Dasatinib, Nilotinib, Ponatinib, or Bosutinib (Shah et al., 2004). We decided to leave resistant patients out of the analysis for two distinct technical reasons. First, this scenario would require a more complex population dynamics model—i.e., with more subpopulations—, characterized by many more parameters, often impossible to estimate. Second, in this case the identification of an optimized therapy should involve two distinct controls, and even if theoretically possible, this would require to obtain data concerning the effect of Dasatinib/Nilotinib/Ponatinib/Bosutinib on tumor burden, which are not present in the used dataset.

We eventually selected 22 (out of 29) patients, for which the therapy led to a successful major molecular response (MMR), i.e., the ratio of cells with BCR-ABL mutation is ≤ 0.1 on the international scale (Griffiths et al., 2014).

### 4.3. Patient-Specific PK Models

We first use patient-specific PK models by incorporating demographic factors—i.e., body weight, age, and sex—in the clearance and in the volume of the distribution, as per Equations (3) and (4) (Widmer et al., 2006)(see section 3). The parameter settings of RedCRAB optmization for this case study are shown in Table S6.

In Figure 2, the PK curves corresponding to the 22 patients (in blue) and the average PK model (in red) over a selected time window ([0, 48] h) are displayed.

**Figure 2**. Patient-specific PK models. Personalized pharmacokinetic curves (blue solid lines), as estimated from demographic factors, such as age, sex, and body weight, as per Equations (3) and (4), in the range *t* ∈ [0, 48] h (x axis); y axis describes the drug concentration *C*(*t*) in [*mg*/*L*]. The blue curves correspond to the 22 distinct patients included in the dataset, whereas the dashed red curve represents the population-average pharmacokinetics.

### 4.4. Defining Personalized Optimized Administration at Diagnosis Time

CT4TD can be employed when CML is diagnosed, in order to identify optimized therapeutic strategies that lead to drug concentrations as close as possible to given targets. We here present the application of CT4TD to two distinct targets.

The first target concentration is *C*_{targ}(*t*) = 0.57 [*mg*/*L*], which is currently the most widely employed in the clinic (Peng et al., 2005). It is hypothesized that any effective therapy should ensure a drug concentration close to, but strictly larger than this value, in order to lead to a good performance, while minimizing the AEs (Graham et al., 2002; Faber et al., 2016). In this case, we consider this concentration as a *lower-bound* target, and the goal of the CT4TD framework will be to design an optimized therapy to be close to, but strictly larger than this concentration value, by employing an opportune distance notion (see section 3).

The second target concentration is *C*_{targ}(*t*) = 1 [*mg*/*L*] and is supposed to provide a more effective therapy, but at the cost of an increased likelihood of AEs and toxicity (Gambacorti-Passerini et al., 1997; Picard et al., 2007; Baccarani et al., 2014). From common practice, an effective therapy is that leading to values of drug concentrations *around* this target. For this reason, CT4TD will return an optimized therapy ensuring a drug concentration *as close as possible* to this *optimal* target (see section 3). Note that one could select any arbitrary target concentration, or even combinations of targets, and this would not affect the validity of our approach. For both targets we tested distinct settings, in which we considered, respectively, 1 and 3 doses per day at fixed times (i.e., 1 dose each 24 and 8 h, respectively)^{3}.

In Figure 3, we present the application of CT4TD to a selected patient (n. 0001 00004 AJR, male), with respect to the *lower-bound* target *C*_{targ}(*t*) = 0.57 [*mg*/*L*] (left panels), and the *optimal* target *C*_{targ}(*t*) = 1 [*mg*/*L*] (right panels). In particular, we compared the standard administration (red), the 1-dose optimized therapy (blue) and the 3-doses optimized therapy (green), on a temporal window of 14 days, with respect to drug dosage (Figures 3A–E), drug concentration in blood (Figures 3B–F), cumulative (Euclidean) distance with respect to the target concentration (Figures 3C–E), and AUC (Figures 3D–H).

**Figure 3**. Patient-specific optimized therapy with fixed target drug concentrations. Optimized Imatinib administration returned by CT4TD for patient 0001 00004 AJR from Michor et al. (2005), in the cases of: 1-dose/day (blue) and 3-doses/day (green), with respect to: *lower-bound* target concentration *C*_{targ} = 0.57 [*mg*/*L*] **(A–D)**, and *optimal* target concentration *C*_{targ} = 1 [*mg*/*L*] **(E–H)**. Standard administration—i.e., 400 mg Imatinib/day—is shown with a red dashed line. In this case, the optimization is obtained on patient-specific PK parameters, without considering the PD models. **(A,E)** Imatinib scheduled dosage in *mg* (y axis), displayed on 14 days (x axis). **(B–F)** Imatinib concentration in blood in [*mg*/*L*] (y axis). **(C–G)** Variation of the cumulative distances between the observed concentration and the selected target in time. **(D–H)** Temporal variation of the AUC in [*mg*·*h*/*L*].

When assessing the goodness of a therapy in the lower-bound scenario—i.e., *C*_{targ} = 0.57 [*mg*/*L*]—it is important to look at both the distance to the target *and* the overall time in which the drug concentration is above such target. In Figures 3A–D, one can see that the optimized 1-dose strategy displays higher cumulative distance and area under the curve—AUC—with respect to the standard schedule, due to the fact that drug concentration is always strictly larger than the lower-bound target. This is proven by the proportion of time spent above the target (computed on the whole period), which is 100, 100, and 88.6%, for the 1-dose, the 3-doses, and the standard administrations, respectively. The 3-doses optimized strategy displays a remarkable improvement also with respect to cumulative distance and AUC, proving to be an effective therapeutic choice for this specific patient. This expected result shows the effectiveness of our methodological approach in producing biologically-plausible experimental hypotheses.

With respect to the optimal target—i.e., *C*_{targ}(*t*) = 1 [*mg*/*L*]—, an effective therapy should ensure a drug concentration as close as possible to the target, thus reducing the drug surpluses, while minimizing the cases of insufficient dosage. In this case, the 1-dose optimized scenario almost overlaps with the standard administration (yet, this is not always the case as one can see, for example, with respect to 0006 00007 RJW in Figure S18), whereas the 3-doses optimized strategy displays an improvement in terms of cumulative distance, as the drug concentration is constantly kept much closer to the desired target, thus importantly reducing under- and over-dosing (Figures 3E–H). In Figures S8–S28, one can find the results of the analyses on the other 21 patients included the dataset.

### 4.5. Adjusting Treatment for Tumor Burden Reduction

The CT4TD framework can be employed in order to identify optimized therapeutic strategies for patients that are currently treated with a standard regime, and for which longitudinal data on tumor burden variation are available. In this case, in order to estimate personalized PD models from experimental data—which describe the individual therapeutic response to identical drug concentrations—, CT4TD employs a module which fits longitudinal data on tumor burden with a hierarchical model of cancer population dynamics^{4}.

In particular, we fitted each patient's data with a biphasic exponential, which in log-scale describes the presence of two straight lines with distinct slopes, as proposed by Michor et al. (2005), Tang et al. (2011), and Olshen et al. (2014) (see section 3 for further details). With a few assumptions, the slope of such lines can be used to estimate the parameters of a 2-compartment population dynamics model of CML and, in particular, the (stem) cancer subpopulation death rate in presence of a standard Imatinib therapy—i.e., 400 *mg* per day—in each patient. This allows to estimate the patient-specific parameters of the PD model. The results of the data analysis on all patients are presented in Table S3 and in Figures S4–S6. In Figure 4A, one can see the personalized PD curves for the 22 patients, computed via Equation (10), as compared to the average one.

**Figure 4**. Patient-specific PD models. **(A)** personalized PD curves obtained from Equation (10) by using *E*_{max} = 1 and *n* = 1 for all patients, and distinct values of *EC*_{50}, based on the death rate of cancer stem cells, as estimated from longitudinal data on tumor burden; x axis (*Log*_{10} scale) describes the concentration in the range *C* ∈ [0, 1.2] [*mg*/*L*], on y axis the efficiency *E* is displayed. The solid blue curves correspond to the 18 distinct male patients included in the dataset and the solid pink curves correspond to the 4 distinct female patients and the dashed red curve represents the population-average pharmacodynamics. **(B)** Heat-map returning the variation of efficiency *E*, computed via Equation (10), with respect to distinct parameters of the PK model—i.e., patient-specific time-average concentration $\stackrel{-}{C}$ (y axis)—and of the PD model—i.e., patient-specific *EC*_{50} (x axis). Red triangles represent the 22 patients in the dataset.

We also assessed the relative relevance of the personalized parameters of the PD and PK models with respect to the efficacy of the therapy. The heat-map in Figure 4B returns the variation of the efficacy with respect to combination of time-average concentration $\stackrel{-}{C}$ and *EC*_{50}, highlighting the personalized parameters of the 22 patients. As a first result, one can see that much of the variance in our dataset is due to differences in PK, rather than to PD, which however is still relevant. Notice also that the two visible clusters basically overlap with the male and the female groups, providing a possible explanation of the distinct therapeutic response observed in clinical studies (Branford et al., 2013). We stress that the estimation of personalized PD models from experimental data of patients under treatment is one of the major novelties of our approach, and, in combination with the demographics-based PK models, allow to identify patient-specific therapeutic regimes that are optimized to minimize the tumor burden.

In order to identify personalized optimized therapies, we finally defined a cost function with the goals of: (i) maximizing the reduction of the tumor burden, and (ii) minimizing the toxicity and possible AEs, in terms of AUC (see section 3 for further details). Such cost function requires to set opportune weights *W*_{1} and *W*_{2} for the two terms, respectively. In particular, a parameter $\varphi =\frac{{W}_{1}}{{W}_{2}}$ is defined, which can be opportunely tuned to favor either the first or the second term. However, the choice of a specific value for ϕ is arbitrary and depends on subjective research and clinical criteria.

To investigate the sensitivity of our framework to the variation of this parameter, we repeatedly applied the CT4TD framework to the CML dataset, by scanning various values of ϕ, and eventually assessed the differences in: (i) the time-average AUC as computed on a 14-days temporal window, and (ii) the efficiency computed on the time-average concentration in the same period, with respect to a 1-dose optimization scenario (the 3-doses scenario can be found in Figure S50). In Figure 5, one can see the distribution of both quantities with respect to the 22 patients in the dataset, divided in males (blue) and females (pink), as compared to the average AUC and efficiency for the standard administration case (red).

**Figure 5**. Assessment of term weights in cost function definition. The definition of the cost function for the adjusting treatment scenario requires to set the weights of the different terms. We here considered two terms, in order to: (i) minimize the tumor burden (weight *W*_{1}), and (ii) minimize the AUC (weight *W*_{2}) (see section 3 for further details). We scanned the values of $\varphi =\frac{{W}_{1}}{{W}_{2}}$ in the range [10, 100], by repeatedly applying the CT4TD framework to the 22-patients CML dataset from Michor et al. (2005). **(A)** Distribution of the value of the AUC after 14-days of the optimized therapy retrieved by CT4TD (1-dose case), for distinct values of ϕ, with respect to the 22 samples in the datasets, divided in males (blue) and females (pink), and compared to the average AUC values returned by standard administration (400 mg Imatinib/day) in males (red solid line) and females (red dashed line). **(B)** Distribution of efficiency computed via Equation (10) on the time-average concentration over 14 days of the optimized therapy retrieved by CT4TD (1-dose case), for distinct values of ϕ, and compared to the average efficiency in the standard administration scenario (solid and dashed red lines overlap).

A first important thing to notice is that the results are highly sensitive with respect to the choice of ϕ, and can either display improvements (e.g., higher AUC and/or lower efficiency) or worsening with respect to the standard case in distinct cases. Moreover, male and female groups show significantly different distributions, thus pointing at physiological differences that should be considered in therapy design. As a rule-of-thumb, we suggest to select a value of ϕ for which a slightly larger value of efficiency is observed, while not inducing a too high increase in AUC. In our case, we selected a value of ϕ equal to 60 for men and of 75 for women.

In Figures 6A–D, we show the comparison among the actual therapeutic regime administered to a selected patient (n.0001 00004 AJR, male—code ID in Table S3) and the optimized therapies identified via CT4TD by setting ϕ = 60, in both 1-dose (blue) and 3-doses (green) scenarios, in terms of: (i) drug dosage, (ii) drug concentration, (iii) AUC, and (iv) variation of the tumor burden in time. In particular, the temporal evolution of the tumor burden from diagnosis to the present is displayed by showing the experimental data points (purple) and the best fit (red), whereas the predicted future evolution is shown with respect to the 1-dose (blue) and the 3-doses (green) optimized strategies.

**Figure 6**. Adjusting therapy for tumor burden minimization. Imatinib administration optimized for tumor burden minimization in patient 0001 00004 AJR (male) from Michor et al. (2005), in the cases of: 1-dose/day (purple) and 3-doses/day (green), with ϕ = 60, as compared to standard administration (red). In this case, the optimization is obtained on patient-specific PK and PD models. **(A)** Imatinib scheduled dosage in *mg* (y axis), displayed on 14 days (x axis). **(B)** Imatinib concentration in blood in [*mg*/*L*] (y axis). **(C)** Temporal variation of the AUC in [*mg*·*h*/*L*] (y axis). **(D)** Longitudinal data points on tumor burden recorded in the interval *t* ∈ [0, 2340] days (purple). The best fit is shown with red lines. The slope of the right-most line is used to determine the cancer stem cell death rate and, in turn, the patient-specific PD parameters. The blue (green) line represents the predicted cancer subpopulation decay in case the 1-dose (3-doses) optimized therapy was adopted from day 2, 340 to day 3, 500.

A result is that, given similar AUC curves (i.e., similar toxicity and AEs), both the 1-dose and the 3-doses optimized strategies lead to a significantly faster predicted tumor burden decay. In particular, the tumor burden decay is, respectively 3.07 and 1.82 times faster for the 1-dose and the 3-doses regimes, with respect to standard administration. This result paves the way for an automated strategy for therapy adjustment design, which might be further developed by employing closed-loop controllers.

In Figures S29–S49, one can find the results of the analyses on the other 21 patients included the dataset.

#### 4.5.1. Robustness Analysis

In order to assess the reliability of the results produced by CT4TD, we tested its robustness with respect to intra-patient variability and to possible systematic errors. To account for intra-patient variability, we introduced a stochastic and uniformly distributed noise, i.e., χ_{r} = [−σ_{r}, σ_{r}] with *r* = *k*_{a}, *CL, v*, to the following parameters of the PK model: *k*_{a}, *CL*, and *v*, for every time point in the analysis. We performed 700 distinct PK simulations, on the average patient, in the specific scenario of a target concentration *C*_{targ}(*t*) = 0.57 [*mg*/*L*] and 1 dose per day. We then analyze the relative variation of the average *cost* $\Delta {L}$, as compared to the noise-free case, with respect to the width of the distribution of noise σ_{r}. In Figure 7, one can notice that $\Delta {L}$ variation with respect to the noise level follows an approximately quadratic trend, which is proven by fitting the data points with a curve with equation $b+a{\sigma}_{r}^{2}$ (the complete results of the fit are provided in Table S5). Note that such results are in agreement with other works that use quantum optimal control (Montangero et al., 2007; Kallush et al., 2014; Hoeb and et al., 2017). We performed a further robustness analysis, to assess the impact of systematic errors, as those possibly due to scarce reliability of the demographic study and/or to small or imbalanced datasets, and which may result in errors in the estimation of the PK parameters. To this end, we generated an optimized schedule for a set of PK parameters {*k*_{a}, *CL, v*} and then we applied such schedule to a simulated patient where a parameter at time is different, e.g., $\left\{{k}_{a}^{\prime},CL,v\right\}$ with ${k}_{a}^{\prime}={k}_{a}+{\delta}_{{k}_{a}}$. We finally measured the difference of $\Delta {L}$, as a function of δ_{ka}. Also in this case, we show in Figure 8 that the results produced by CT4TD are robust with respect to possible technical or measurement errors. In fact, with respect to an error of ≈ ±30%, we observe a maximum difference ≈ 10% in performance, as compared to the noise-free case.

**Figure 7**. Robustness analysis with respect to intra-patient variability. Intra-patient variability is here defined as stochastic noise randomly picked in the uniform distribution χ_{r} = [−σ_{r}, σ_{r}] and applied to distinct PK parameters. In particular, we here show the average and the standard deviation of the relative variation of the *cost* $\Delta {L}/{{L}}_{0}$, with respect to distinct levels of noise, in the cases: *r* = *k*_{a} **(A)**, *r* = *v* in **(B)**, and *r* = *CL* **(C)**. Values of the fit are provided in Table S5.

**Figure 8**. Robustness analysis with respect to systematic errors. Relative variation of the *cost* $\Delta {L}/{{L}}_{0}$ with respect to the relative value of the systematic error δ_{ka}/*k*_{a}.

## 5. Discussion

The introduction of the CT4TD framework aims at providing an automated and data-driven procedure for decision support in health care and personalized therapy design in cancer, especially by exploiting the increasing available computational power, which allows one to perform large-scale simulations and efficient search in the parameter space, and to deal with noisy and imperfect data.

In particular, CT4TD aims at overcoming the limitations of current control-based methods for therapeutic hypothesis generation. First, its completely general theoretical approach allows to consider: (i) any disease for which a PK/PD model can be derived and its parameters measured, (ii) any kind of administration, e.g., continuous drug infusion or discrete doses, (iii) any measurable term that is considered as relevant in the definition of a therapeutic *cost*. CT4TD eventually allows to evaluate *in silico* the outcome of the designed therapy.

Furthermore, CT4TD introduces the possibility of designing optimized therapeutic strategies based on experimental data concerning the disease progression. The identification of data-based patient-specific PK/PD models is one of the major novelties of CT4TD and has a profound impact on the characterization of tumor heterogeneity and, accordingly, on the customization of cancer therapies.

One of the main limitations of CT4TD derives from the adoption of highly simplified models of cancer population dynamics. Unfortunately, the shortage of adequate longitudinal data on tumor dynamics prevents to estimate the parameters of more sophisticate and biologically realistic models, which may take into account, for instance, the existence of various competing cancer subpopulations, or the complex interplay occurring within the tumor microenvironment. However, we claim that our theoretical approach is completely general and it will hold whether and when higher-resolution longitudinal data on disease progression would become available, allowing for instance to measure the (sub)clonal prevalence variation in time (as proposed, e.g., by Acar et al., 2019).

Several developments of CT4TD are underway. In particular, the possibility of tuning the PK/PD models to include information on the somatic evolutionary history of the tumors (Ramazzotti et al., 2015; Caravagna et al., 2016, 2018) will be essential in delivering more effective personalized therapeutic strategies. This is especially important for tumors displaying high levels of intra-tumor heterogeneity, which is known to be responsible for drug resistance, therapy failure and relapse (McGranahan and Swanton, 2015).

As CT4TD relies on the RedCRAB optimization framework (Heck et al., 2018b; Omran et al., 2019), the overall procedure could be implemented in remote, paving the way for a wireless decision support system for therapy design, to be used directly by clinicians (Jeong et al., 2015). In this respect, as a future development, an open-source computational tool will be made available to the scientific community, allowing to perform individual-specific analysis for a wide range of disease.

## Data Availability Statement

All data used in this paper are available from the supplementary material of Michor et al. (2005). We provide the source code and the input data to reproduce the case studies at: https://github.com/BIMIB-DISCo/CT4TD.

## Author Contributions

FA, AG, SM, and MA designed the research. FA and MR implemented the method. FA, AG, and SM performed the research. FA, AG, DM, TC, and RP analyzed the data. All authors wrote and revised the paper.

## Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

## Acknowledgments

This work was partially supported by the Elixir Italian Chapter and the SysBioNet project, a Ministero dell'Istruzione, dell'Universit á e della Ricerca initiative for the Italian Roadmap of European Strategy Forum on Research Infrastructures. Furthermore we acknowledge financial support from IQST alliance. Partial support was also given by the CRUK/AIRC Accelerator Award #22790, Single-cell Cancer Evolution in the Clinic. We thank Giulio Caravagna for helpful discussions. This manuscript has been released as a pre-print at bioRxiv (Angaroni et al., 2019).

## Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fbioe.2020.00523/full#supplementary-material

## Footnotes

1. ^It is known that other factors can change the value of the clearance and the volume of the distribution. For instance, the concentration of the α_{1}-acid glycoprotein affects the volume of the distribution, whereas the MDR1 genotype, the CYP3A4 activity and the creatinine clearance affect both *CL* and *v* (Widmer et al., 2006). However, we here limit to consider the aforementioned demographic measurements, because of their availability in our and in most datasets.

2. ^Note that it was recently hypothesized that CML CSC could be resistant to the effects of Imatinib and persist in all patients on long-term therapy. (Holyoake and Vetrie, 2017). However, only further experimental studies could unravel this point.

3. ^It would be possible to use our theoretical framework to define a free-time schedule optimization procedure. Yet, we believe that current practices in Imatinib oral administration would make a free-time schedule scarcely usable.

4. ^Notice that any arbitrary ODE mathematical model could be employed, as long it is effective in representing the phenomenological properties of the disease (e.g., multi-stable states), and that sufficient and adequate data are available to estimate its parameters.

## References

Acar, A., Nichol, D., Fernandez, J., Cresswell, G. D., Barozzi, I., Hong, S. P., et al. (2019). Exploiting evolutionary herding to control drug resistance in cancer. *BioRxiv* 566950. doi: 10.1101/566950

Altrock, P. M., Liu, L. L., and Michor, F. (2015). The mathematics of cancer: integrating quantitative models. *Nat. Rev. Cancer* 15:730. doi: 10.1038/nrc4029

Angaroni, F., Graudenzi, A., Rossignolo, M., Maspero, D., Calarco, T., Piazza, R., et al. (2019). Personalized therapy design for liquid tumors via optimal control theory. *bioRxiv* 662858. doi: 10.1101/662858

Aström, K. J., and Murray, R. M. (2010). *Feedback Systems: An Introduction for Scientists and Engineers*. Princeton: Princeton University Press.

Babaei, N., and Salamci, M. U. (2015). Personalized drug administration for cancer treatment using model reference adaptive control. *J. Theor. Biol*. 371, 24–44. doi: 10.1016/j.jtbi.2015.01.038

Baccarani, M., Druker, B. J., Branford, S., Kim, D., Pane, F., Mongay, L., et al. (2014). Long-term response to imatinib is not affected by the initial dose in patients with Philadelphia chromosome-positive chronic myeloid leukemia in chronic phase: final update from the tyrosine Kinase inhibitor optimization and selectivity (tops) study. *Int. J. Hematol*. 99, 616–624. doi: 10.1007/s12185-014-1566-2

Bailey, J. M., and Haddad, W. M. (2005). Drug dosing control in clinical pharmacology. *IEEE Control Syst*. 25, 35–51. doi: 10.1109/MCS.2005.1411383

Bara, O., Djouadi, S., Day, J., and Lenhart, S. (2017). Immune therapeutic strategies using optimal controls with L1 and L2 type objectives. *Math. Biosci*. 290, 9–21. doi: 10.1016/j.mbs.2017.05.010

Barbolosi, D., and Iliadis, A. (2001). Optimizing drug regimens in cancer chemotherapy: a simulation study using a PK/PD model. *Comput. Biol. Med*. 31, 157–172. doi: 10.1016/S0010-4825(00)00032-9

Bertsekas, D. P. (1995). *Dynamic Programming and Optimal Control, Vol. 1*. Belmont, MA: Athena Scientific.

Branford, S., Yeung, D. T., Ross, D. M., Prime, J. A., Field, C. R., Altamura, H. K., et al. (2013). Early molecular response and female sex strongly predict stable undetectable BCR-ABL1, the criteria for imatinib discontinuation in patients with CML. *Blood* 121, 3818–3824. doi: 10.1182/blood-2012-10-462291

Brif, C., Chakrabarti, R., and Rabitz, H. (2010). Control of quantum phenomena: past, present and future. *New J. Phys*. 12:075008. doi: 10.1088/1367-2630/12/7/075008

Caneva, T., Calarco, T., Fazio, R., Santoro, G. E., and Montangero, S. (2011). Speeding up critical system dynamics through optimized evolution. *Phys. Rev. A* 84:012312. doi: 10.1103/PhysRevA.84.012312

Caneva, T., Murphy, M., Calarco, T., Fazio, R., Montangero, S., Giovannetti, V., et al. (2009). Optimal control at the quantum speed limit. *Phys. Rev. Lett*. 103:240501. doi: 10.1103/PhysRevLett.103.240501

Caravagna, G., Giarratano, Y., Ramazzotti, D., Tomlinson, I., Graham, T. A., Sanguinetti, G., et al. (2018). Detecting repeated cancer evolution from multi-region tumor sequencing data. *Nat. Methods* 15:707. doi: 10.1038/s41592-018-0108-x

Caravagna, G., Graudenzi, A., Ramazzotti, D., Sanz-Pamplona, R., De Sano, L., Mauri, G., et al. (2016). Algorithmic methods to infer the evolutionary trajectories in cancer progression. *Proc. Natl. Acad. Sci. U.S.A*. 113, E4025-E4034. doi: 10.1073/pnas.1520213113

Chan, P. L., Jacqmin, P., Lavielle, M., McFadyen, L., and Weatherley, B. (2011). The use of the saem algorithm in monolix software for estimation of population pharmacokinetic-pharmacodynamic-viral dynamics parameters of maraviroc in asymptomatic HIV subjects. *J. Pharmacokinet. Pharmacodyn*. 38, 41–61. doi: 10.1007/s10928-010-9175-z

Cole, A. L., and Dusetzina, S. B. (2018). Generic price competition for specialty drugs: too little, too late? *Health Affairs* 37, 738–742. doi: 10.1377/hlthaff.2017.1684

Cunningham, J. J., Brown, J. S., Gatenby, R. A., and Staňková, K. (2018). Optimal control to develop therapeutic strategies for metastatic castrate resistant prostate cancer. *J. Theor. Biol*. 459, 67–78. doi: 10.1016/j.jtbi.2018.09.022

Deffner, S., and Campbell, S. (2017). Quantum speed limits: from Heisenberg's uncertainty principle to optimal quantum control. *J. Phys. A Math. Theor*. 50:453001. doi: 10.1088/1751-8121/aa86c6

Doria, P., Calarco, T., and Montangero, S. (2011). Optimal control technique for many-body quantum dynamics. *Phys. Rev. Lett*. 106:190501. doi: 10.1103/PhysRevLett.106.190501

Experts in Chronic Myeloid Leukemia (2013). The price of drugs for chronic myeloid leukemia (CML) is a reflection of the unsustainable prices of cancer drugs: from the perspective of a large group of CML experts. *Blood* 121, 4439–4442. doi: 10.1182/blood-2013-03-490003

Faber, E., Divoká, M., Skoumalová, I., Novák, M., Marešová, I., Mičová, K., et al. (2016). A lower dosage of imatinib is sufficient to maintain undetectable disease in patients with chronic myeloid leukemia with long-term low-grade toxicity of the treatment. *Leukemia Lymphoma* 57, 370–375. doi: 10.3109/10428194.2015.1056184

Fojo, T., and Grady, C. (2009). How much is life worth: cetuximab, non-small cell lung cancer, and the 440 billion question. *J. Natl. Cancer Instit*. 101, 1044–1048. doi: 10.1093/jnci/djp177

Fuentes-Garí, M., Velliou, E., Misener, R., Pefani, E., Rende, M., Panoskaltsis, N., et al. (2015). A systematic framework for the design, simulation and optimization of personalized healthcare: making and healing blood. *Comput. Chem. Eng*. 81, 80–93. doi: 10.1016/j.compchemeng.2015.03.008

Gambacorti-Passerini, C., Le Coutre, P., Mologni, L., Fanelli, M., Bertazzoli, C., Marchesi, E., et al. (1997). Inhibition of the ABL kinase activity blocks the proliferation of BCR/ABL+ leukemic cells and induces apoptosis. *Blood Cells Mol. Dis*. 23, 380–394. doi: 10.1006/bcmd.1997.0155

Gambacorti-Passerini, C. B., Gunby, R. H., Piazza, R., Galietta, A., Rostagno, R., and Scapozza, L. (2003). Molecular mechanisms of resistance to imatinib in Philadelphia-chromosome-positive leukaemias. *Lancet Oncol*. 4, 75–85. doi: 10.1016/S1470-2045(03)00979-3

Goerz, M. H., Motzoi, F., Whaley, K. B., and Koch, C. P. (2017). Charting the circuit QED design landscape using optimal control theory. *NPJ Quantum Inf*. 3:37. doi: 10.1038/s41534-017-0036-0

Gomez-de León, A., Gómez-Almaguer, D., Ruiz-Delgado, G. J., and Ruiz-Arguelles, G. J. (2017). Insights into the management of chronic myeloid leukemia in resource-poor settings: a Mexican perspective. *Expert Rev. Hematol*. 10, 809–819. doi: 10.1080/17474086.2017.1360180

Goutelle, S., Maurin, M., Rougier, F., Barbaut, X., Bourguignon, L., Ducher, M., et al. (2008). The hill equation: a review of its capabilities in pharmacological modelling. *Fund. Clin. Pharmacol*. 22, 633–648. doi: 10.1111/j.1472-8206.2008.00633.x

Graham, S. M., Jørgensen, H. G., Allan, E., Pearson, C., Alcorn, M. J., Richmond, L., et al. (2002). Primitive, quiescent, Philadelphia-positive stem cells from patients with chronic myeloid leukemia are insensitive to STI571 *in vitro*. *Blood* 99, 319–325. doi: 10.1182/blood.V99.1.319

Graudenzi, A., Caravagna, G., De Matteis, G., and Antoniotti, M. (2014). Investigating the relation between stochastic differentiation, homeostasis and clonal expansion in intestinal crypts via multiscale modeling. *PLoS ONE* 9:e97272. doi: 10.1371/journal.pone.0097272

Graudenzi, A., Maspero, D., and Damiani, C. (2018). “Modeling spatio-temporal dynamics of metabolic networks with cellular automata and constraint-based methods,” in *International Conference on Cellular Automata* (Cham: Springer), 16–29. doi: 10.1007/978-3-319-99813-8_2

Griffiths, M., Patton, S. J., Grossi, A., Clark, J., Paz, M. F., and Labourier, E. (2014). Conversion, correction, and international scale standardization: results from a multicenter external quality assessment study for bcr-abl1 testing. *Arch. Pathol. Lab. Med*. 139, 522–529. doi: 10.5858/arpa.2013-0754-OA

Haddad, W. M., Hayakawa, T., and Bailey, J. M. (2006). Adaptive control for nonlinear compartmental dynamical systems with applications to clinical pharmacology. *Syst. Control Lett*. 55, 62–70. doi: 10.1016/j.sysconle.2005.05.002

Heck, R., Vuculescu, O., Jakob Srensen, J., Zoller, J., Andreasen, M. G., Bason, M. G., et al. (2018a). Remote optimization of an ultracold atoms experiment by experts and citizen scientists. *Proc. Natl. Acad. Sci. U.S.A*. 115, E11231-E11237.

Heck, R., Vuculescu, O., Sørensen, J. J., Zoller, J., Andreasen, M. G., Bason, M. G., et al. (2018b). Remote optimization of an ultracold atoms experiment by experts and citizen scientists. *Proc. Natl. Acad. Sci. U.S.A*. 115, E11231-E11237. doi: 10.1073/pnas.1716869115

Himmelstein, D. U., Thorne, D., Warren, E., and Woolhandler, S. (2009). Medical bankruptcy in the United States, 2007: results of a national study. *Am. J. Med*. 122, 741–746. doi: 10.1016/j.amjmed.2009.04.012

Hoeb, F., Angaroni, F., Zoller, J., Calarco, T., Strini, G., Montangero, S., et al. (2017). Amplification of the parametric dynamical casimir effect via optimal control. *Phys. Rev. A* 96:033851. doi: 10.1103/PhysRevA.96.033851

Holyoake, T. L., and Vetrie, D. (2017). The chronic myeloid leukemia stem cell: stemming the tide of persistence. *Blood* 129, 1595–1606. doi: 10.1182/blood-2016-09-696013

Hong, H., Ovchinnikov, A., Pogudin, G., and Yap, C. (2019). Sian: software for structural identifiability analysis of ode models. *Bioinformatics* 35, 2873–2874. doi: 10.1093/bioinformatics/bty1069

Hu, W., Lu, S., McAlpine, I., Jamieson, J. D., Lee, D. U. D., Marroquin, L., et al. (2012). Mechanistic investigation of imatinib-induced cardiac toxicity and the involvement of C-ABL kinase. *Toxicol. Sci*. 129, 188–199. doi: 10.1093/toxsci/kfs192

Jabbour, E. J., Lin, J., Siegartel, L. R., Lingohr-Smith, M., Menges, B., and Makenbaeva, D. (2017). Evaluation of healthcare resource utilization and incremental economic burden of patients with chronic myeloid leukemia after disease progression to blast phase. *J. Med. Econ*. 20, 1007–1012. doi: 10.1080/13696998.2017.1345750

Jayachandran, D., Rundell, A. E., Hannemann, R. E., Vik, T. A., and Ramkrishna, D. (2014). Optimal chemotherapy for leukemia: a model-based strategy for individualized treatment. *PLoS ONE* 9:e109623. doi: 10.1371/journal.pone.0109623

Jeong, J.-W., McCall, J. G., Shin, G., Zhang, Y., Al-Hasani, R., Kim, M., et al. (2015). Wireless optofluidic systems for programmable in vivo pharmacology and optogenetics. *Cell* 162, 662–674. doi: 10.1016/j.cell.2015.06.058

Kallush, S., Khasin, M., and Kosloff, R. (2014). Quantum control with noisy fields: computational complexity versus sensitivity to noise. *New J. Phys*. 16:015008. doi: 10.1088/1367-2630/16/1/015008

Khaneja, N., Reiss, T., Kehlet, C., Schulte-Herbrggen, T., and Glaser, S. J. (2005). Optimal control of coupled spin dynamics: Design of NMR pulse sequences by gradient ascent algorithms. *J. Magn. Reson*. 172, 296–305. doi: 10.1016/j.jmr.2004.11.004

Koch, C. P. (2016). Controlling open quantum systems: tools, achievements, and limitations. *J. Phys. Condens. Matter* 28:213001. doi: 10.1088/0953-8984/28/21/213001

Landersdorfer, C. B., and Jusko, W. J. (2008). Pharmacokinetic/pharmacodynamic modelling in diabetes mellitus. *Clin. Pharmacokinet*. 47, 417–448. doi: 10.2165/00003088-200847070-00001

Larson, R. A., Druker, B. J., Guilhot, F., O'Brien, S. G., Riviere, G. J., Krahnke, T., et al. (2008a). Imatinib pharmacokinetics and its correlation with response and safety in chronic-phase chronic myeloid leukemia: a subanalysis of the iris study. *Blood* 111, 4022–4028.

Larson, R. A., Druker, B. J., Guilhot, F., O'Brien, S. G., Riviere, G. J., Krahnke, T., et al. (2008b). Imatinib pharmacokinetics and its correlation with response and safety in chronic-phase chronic myeloid leukemia: a subanalysis of the iris study. *Blood* 111, 4022–4028. doi: 10.1182/blood-2007-10-116475

Ledzewicz, U., and Schättler, H. M. (2006). Drug resistance in cancer chemotherapy as an optimal control problem. *Discrete Cont. Dyn. Syst. Ser. B* 6:129. doi: 10.3934/dcdsb.2006.6.129

Lenhart, S., and Workman, J. T. (2007). *Optimal Control Applied to Biological Models*. London: CRC Press. doi: 10.1201/9781420011418

Lloyd, S., and Montangero, S. (2014). Information theoretical analysis of quantum optimal control. *Phys. Rev. Lett*. 113:010502. doi: 10.1103/PhysRevLett.113.010502

Marciniak-Czochra, A., and Stiehl, T. (2013). “Mathematical models of hematopoietic reconstitution after stem cell transplantation,” in *Model Based Parameter Estimation*, eds H. Bock, T. Carraro, W. Jäger, S. Körkel, R. Rannacher, and J. Schlöder (Berlin; Heidelberg: Springer), 191–206. doi: 10.1007/978-3-642-30367-8_9

Marciniak-Czochra, A., Stiehl, T., Ho, A. D., Jäger, W., and Wagner, W. (2009). Modeling of asymmetric cell division in hematopoietic stem cells-regulation of self-renewal is essential for efficient repopulation. *Stem Cells Dev*. 18, 377–386. doi: 10.1089/scd.2008.0143

McDowell, A. M., Fryar, C., Hirsch, R., and Ogden, C. (2005). Anthropometric reference data for children and adults: U.S. population, 1999-2002. *Adv. Data* 361, 1–5. Available online at: https://stacks.cdc.gov/view/cdc/5630

McGranahan, N., and Swanton, C. (2015). Biological and therapeutic impact of intratumor heterogeneity in cancer evolution. *Cancer Cell* 27, 15–26. doi: 10.1016/j.ccell.2014.12.001

Michor, F., Hughes, T. P., Iwasa, Y., Branford, S., Shah, N. P., Sawyers, C. L., et al. (2005). Dynamics of chronic myeloid leukaemia. *Nature* 435:1267. doi: 10.1038/nature03669

Montangero, S., Calarco, T., and Fazio, R. (2007). Robust optimal quantum gates for josephson charge qubits. *Phys. Rev. Lett*. 99:170501. doi: 10.1103/PhysRevLett.99.170501

Mughal, T. I., and Schrieber, A. (2010). Principal long-term adverse effects of imatinib in patients with chronic myeloid leukemia in chronic phase. *Biologics* 4:315. doi: 10.2147/BTT.S5775

Naşcu, I., Krieger, A., Ionescu, C. M., and Pistikopoulos, E. N. (2015). Advanced model-based control studies for the induction and maintenance of intravenous anaesthesia. *IEEE Trans. Biomed. Eng*. 62, 832–841. doi: 10.1109/TBME.2014.2365726

Nelder, J. A., and Mead, R. (1965). A simplex method for function minimization. *Comput. J*. 7, 308–313. doi: 10.1093/comjnl/7.4.308

Olshen, A., Tang, M., Cortes, J., Gonen, M., Hughes, T., Branford, S., et al. (2014). Dynamics of chronic myeloid leukemia response to dasatinib, nilotinib, and high-dose imatinib. *Haematologica*. 99, 1701–1709. doi: 10.3324/haematol.2013.085977

Omran, A., Levine, H., Keesling, A., Semeghini, G., Wang, T. T., Ebadi, S., et al. (2019). Generation and manipulation of Schrödinger cat states in Rydberg atom arrays. *Science* 365, 570–574. doi: 10.1126/science.aax9743

Pefani, E., Panoskaltsis, N., Mantalaris, A., Georgiadis, M. C., and Pistikopoulos, E. N. (2013). Design of optimal patient-specific chemotherapy protocols for the treatment of acute myeloid leukemia (AML). *Comput. Chem. Eng.* 57, 187–195. doi: 10.1016/j.compchemeng.2013.02.003

Peng, B., Lloyd, P., and Schran, H. (2005). Clinical pharmacokinetics of imatinib. *Clin. Pharmacokinet*. 44, 879–894. doi: 10.2165/00003088-200544090-00001

Picard, S., Titier, K., Etienne, G., Teilhet, E., Ducint, D., Bernard, M. A., et al. (2007). Trough imatinib plasma levels are associated with both cytogenetic and molecular responses to standard-dose imatinib in chronic myeloid leukemia. *Blood* 109, 3496–3499. doi: 10.1182/blood-2006-07-036012

Pichler, T., Caneva, T., Montangero, S., Lukin, M. D., and Calarco, T. (2016). Noise-resistant optimal spin squeezing via quantum control. *Phys. Rev. A* 93:013851. doi: 10.1103/PhysRevA.93.013851

Potts, A. L., Warman, G. R., and Anderson, B. J. (2008). Dexmedetomidine disposition in children: a population analysis. *Pediatr. Anesth*. 18, 722–730. doi: 10.1111/j.1460-9592.2008.02653.x

Rach, N., Müller, M. M., Calarco, T., and Montangero, S. (2015). Dressing the chopped-random-basis optimization: a bandwidth-limited access to the trap-free landscape. *Phys. Rev. A* 92:062343. doi: 10.1103/PhysRevA.92.062343

Rainero, A., Angaroni, F., Conti, A., Pirrone, C., Micheloni, G., Tarará, L., et al. (2018). gDNA qPCR is statistically more reliable than mRNA analysis in detecting leukemic cells to monitor cml. *Cell Death Dis*. 9:349. doi: 10.1038/s41419-018-0387-2

Ramazzotti, D., Caravagna, G., Olde, L. L., Graudenzi, A., Korsunsky, I., Mauri, G., et al. (2015). Capri: efficient inference of cancer progression models from cross-sectional data. *Bioinformatics* 31, 3016–3026. doi: 10.1093/bioinformatics/btv296

Rocha, D., Silva, C. J., and Torres, D. F. M. (2018). Stability and optimal control of a delayed HIV model. *Math. Methods Appl. Sci*. 41, 2251–2260. doi: 10.1002/mma.4207

Rowland, M., Tozer, T. N., Derendorf, H., and Hochhaus, G. (2011). *Clinical Pharmacokinetics and Pharmacodynamics: Concepts and Applications*. Philadelphia, PA: Wolters Kluwer Health; Lippincott William & Wilkins.

Saccomani, M. P., Audoly, S., Bellu, G., and DAngió, L. (2010). Examples of testing global identifiability of biological and biomedical models with the daisy software. *Comput. Biol. Med*. 40, 402–407. doi: 10.1016/j.compbiomed.2010.02.004

Salgado, R., Moore, H., Martens, J. W., Lively, T., Malik, S., McDermott, U., et al. (2018). Steps forward for cancer precision medicine. *Nat. Rev. Drug Discov*. 17, 1–2. doi: 10.1038/nrd.2017.218

Schwilden, H. (1981). A general method for calculating the dosage scheme in linear pharmacokinetics. *Eur. J. Clin. Pharmacol*. 20, 379–386. doi: 10.1007/BF00615409

Shah, N. P., Tran, C., Lee, F. Y., Chen, P., Norris, D., and S, C. L. (2004). Overriding imatinib resistance with a novel abl kinase inhibitor. *Science* 305, 399–401. doi: 10.1126/science.1099480

Shargel, L., Andrew, B., and Wu-Pong, S. (1999). *Applied Biopharmaceutics and Pharmacokinetics*. Stamford: Appleton & Lange Stamford.

Shi, J., Alagoz, O., Erenay, F. S., and Su, Q. (2014). A survey of optimization models on cancer chemotherapy treatment planning. *Ann. Oper. Res*. 221, 331–356. doi: 10.1007/s10479-011-0869-4

Sørensen, J. J. W. H., Pedersen, M. K., Munch, M., Haikka, P., Jensen, J. H., Planke, T., et al. (2016). Exploring the quantum speed limit with computer games. *Nature* 532, 210–213. doi: 10.1038/nature17620

Spörl, A., Schulte-Herbrüggen, T., Glaser, S. J., Bergholm, V., Storcz, M. J., Ferber, J., et al. (2007). Optimal control of coupled Josephson qubits. *Phys. Rev. A* 75:012302. doi: 10.1103/PhysRevA.75.012302

Steil, G. M. (2013). Algorithms for a closed-loop artificial pancreas: the case for proportional-integral-derivative control. *J. Diab. Sci. Technol*. 7, 1621–1631. doi: 10.1177/193229681300700623

Stiehl, T., Ho, A. D., and Marciniak-Czochra, A. (2018). Mathematical modeling of the impact of cytokine response of acute myeloid leukemia cells on patient prognosis. *Sci. Rep*. 8:2809. doi: 10.1038/s41598-018-21115-4

Stiehl, T., and Marciniak-Czochra, A. (2012). Mathematical modeling of leukemogenesis and cancer stem cell dynamics. *Math. Model. Nat. Phenomena* 7, 166–202. doi: 10.1051/mmnp/20127199

Takahashi, N., Wakita, H., Miura, M., Scott, S., Nishii, K., Masuko, M., et al. (2010). Correlation between imatinib pharmacokinetics and clinical response in Japanese patients with chronic-phase chronic myeloid leukemia. *Clin. Pharmacol. Ther*. 88, 809–813. doi: 10.1038/clpt.2010.186

Tang, M., Gonen, M., Quintas-Cardama, A., Cortes, J., Kantarjian, H., Field, C., et al. (2011). Dynamics of chronic myeloid leukemia response to long-term targeted therapy reveal treatment effects on leukemic stem cells. *Blood* 118, 1622–1631. doi: 10.1182/blood-2011-02-339267

van Frank, S., Bonneau, M., Schmiedmayer, J., Hild, S., Gross, C., Cheneau, M., et al. (2016). Optimal control of complex atomic quantum systems. *Sci. Rep*. 6:34187. doi: 10.1038/srep34187

von Mehren, M., and Widmer, N. (2011). Correlations between imatinib pharmacokinetics, pharmacodynamics, adherence, and clinical response in advanced metastatic gastrointestinal stromal tumor (GIST): an emerging role for drug blood level testing? *Cancer Treat. Rev*. 37, 291–299. doi: 10.1016/j.ctrv.2010.10.001

Weigel, M. T., Dahmke, L., Schem, C., Bauerschlag, D. O., Weber, K., Niehoff, P., et al. (2010). In vitro effects of imatinib mesylate on radiosensitivity and chemosensitivity of breast cancer cells. *BMC Cancer* 10:412. doi: 10.1186/1471-2407-10-412

Welling, P. G. (1997). *Pharmacokinetics: Processes, Mathematics, and Applications*. Washington, DC: American Chemical Society (ACS).

Werner, B., Scott, J. G., Sottoriva, A., Anderson, A. R., Traulsen, A., and Altrock, P. M. (2016). The cancer stem cell fraction in hierarchically organized tumors can be estimated using mathematical modeling and patient-specific treatment trajectories. *Cancer Res*. 76, 1705–1713. doi: 10.1158/0008-5472.CAN-15-2069

West, J., You, L., Brown, J., Newton, P. K., and Anderson, A. R. (2018). Towards multi-drug adaptive therapy. *bioRxiv*. 476507. doi: 10.1101/476507

Widmer, N., Decosterd, L., Csajka, C., Leyvraz, S., Duchosal, M., Rosselet, A., et al. (2006). Population pharmacokinetics of imatinib and the role of α1-acid glycoprotein. *Br. J. Clin. Pharmacol*. 62, 97–112. doi: 10.1111/j.1365-2125.2006.02719.x

Wodarz, D., Garg, N., Komarova, N. L., Benjamini, O., Keating, M. J., Wierda, W. G., et al. (2014). Kinetics of CLL cells in tissues and blood during therapy with the BTK inhibitor ibrutinib. *Blood* 123, 4132–4135. doi: 10.1182/blood-2014-02-554220

Yoon, N., Vander, V. R., Marusyk, A., and Scott, J. G. (2018). Optimal therapy scheduling based on a pair of collaterally sensitive drugs. *Bull. Math. Biol*. 80, 1776–1809. doi: 10.1007/s11538-018-0434-2

Yoshitsuga, H., Imai, Y., Seriu, T., and Hiraoka, M. (2012). Markov chain Monte Carlo bayesian analysis for population pharmacokinetics of dasatinib in Japanese adult subjects with chronic myeloid leukemia and Philadelphia chromosome positive acute lymphoblastic leukemia. *J. Clin. Pharmacol. Therap*. 43, 29–41. doi: 10.3999/jscpt.43.29

Keywords: personalized therapy, optimal control theory, pharmacodynamics, pharmacokinetics, RedCRAB, chronic myeloid leukemia

Citation: Angaroni F, Graudenzi A, Rossignolo M, Maspero D, Calarco T, Piazza R, Montangero S and Antoniotti M (2020) An Optimal Control Framework for the Automated Design of Personalized Cancer Treatments. *Front. Bioeng. Biotechnol.* 8:523. doi: 10.3389/fbioe.2020.00523

Received: 06 February 2020; Accepted: 01 May 2020;

Published: 28 May 2020.

Edited by:

Paola Lecca, Free University of Bozen-Bolzano, ItalyReviewed by:

Bruno Carpentieri, Free University of Bozen-Bolzano, ItalyFabio Bagagiolo, University of Trento, Italy

Copyright © 2020 Angaroni, Graudenzi, Rossignolo, Maspero, Calarco, Piazza, Montangero and Antoniotti. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Alex Graudenzi, alex.graudenzi@unimib.it

^{†}These authors have contributed equally to this work

^{‡}These authors share senior authorship