^{1}

^{*}

^{†}

^{1}

^{†}

^{2}

^{2}

^{1}

^{1}

^{2}

Edited by: Julio R. Banga, Spanish National Research Council, Spain

Reviewed by: Alexey Goltsov, Abertay University, United Kingdom; Francesco Passamonti, University of Insubria, Italy; Matthew Joseph Simpson, Queensland University of Technology, Australia

This article was submitted to Systems Biology, a section of the journal Frontiers in Physiology

†These authors have contributed equally to this work

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Polycythemia vera (PV) is a slow-growing type of blood cancer, where the production of red blood cells (RBCs) increase considerably. The principal treatment for targeting the symptoms of PV is bloodletting (phlebotomy) at regular intervals based on data derived from blood counts and physician assessments based on experience. Model-based decision support can help to identify optimal and individualized phlebotomy schedules to improve the treatment success and reduce the number of phlebotomies and thus negative side effects of the therapy. We present an extension of a simple compartment model of the production of RBCs in adults to capture patients suffering from PV. We analyze the model's properties to show the plausibility of its assumptions. We complement this with numerical results using exemplary PV patient data. The model is then used to simulate the dynamics of the disease and to compute optimal treatment plans. We discuss heuristics and solution approaches for different settings, which include constraints arising in real-world applications, where the scheduling of phlebotomies depends on appointments between patients and treating physicians. We expect that this research can support personalized clinical decisions in cases of PV.

The disease polycythemia vera (PV) belongs to chronic myeloproliferative neoplasms, meaning that an excess of blood cells are produced. In particular, red blood cells (RBCs) are affected (Lichtman et al.,

In those cases, therapy schedules based on blood image data are proposed by physicians. However, those schedules might not be optimal for each individual (Finazzi and Barbui,

Using our results, it might be possible to enable physicians to schedule therapies individually based on a set of parameters unique to each patient. Thus, on the one hand, the probability of severe complications will decrease, when the time until the next measurement is assumed to be too long. On the other hand, in cases where the frequency of two consecutive measurements is assumed to be too low, the patient will benefit from not needing to go to a hematologist and the patient will be spared additional blood withdrawals.

To our knowledge there is neither a published mathematical model of erythropoiesis, that considers the disease PV, nor a study discussing optimal treatment schedules for PV patients by phlebotomy.

The paper is organized as follows: first, in chapter 2, we present the materials and methods used for this research. In chapter 3 we display the results of the modeling and the optimization approaches. Finally, we summarize and discuss our findings in chapter 4. Given the interdisciplinary nature of this research project, literature surveys are included in the corresponding subsections of this paper.

In this section, we present the concepts and methods for modeling PV and for computing optimal treatment schedules. First, biological properties necessary for the modeling process are summarized. Then, a published compartment model for erythropoiesis in healthy subjects is reviewed. Afterwards, the acquisition of data from real and artificial patients is presented. Finally, computational methods for verifying the proposed model and for generating treatment schedules are discussed.

Understanding the relevant biological processes is crucial for the following modeling process. To this end, basic information about the physiological processes of erythropoiesis and of PV are summarized in this section.

The supply of oxygen from the lungs to tissues and the transport of carbon dioxide back from tissues is central for the maintenance of vital functions in the human body. This exchange of substances is realized by erythrocytes (i.e., RBCs), which are biconcave discoid cells in the blood stream containing the protein complex hemoglobin. This protein complex binds the substances and enables the RBCs to their part. At any given time, a healthy adult human has a total of 2–3·10^{13} erythrocytes, with men and women having about 5–6 million and 4–5 million erythrocytes per microliter of blood, respectively.

Erythropoiesis is the process by which RBCs are produced in the bone marrow. Beginning with stem cells, multi-potent stem cells are matured through several levels of erythroid progenitor cells, i.e., the Blast Forming Unit-Erythroid (BFU-E) and Colony Forming Unit-Eryhroid (CFU-E), and several levels of erythroblasts to bone marrow reticulocytes. These are then released into the blood circulation as blood reticulocytes, which then quickly grow into mature erythrocytes. During this process, which takes ~20 days, the cell undergoes major changes including the removal of nuclei, organelles, and mitochondria to provide more room for hemoglobin. This process is displayed in

Simplified schematic view of erythropoiesis. Certain cell stages over the age of the cell in days are displayed with a corresponding cell partition based on the model by Tetschke et al. (

The hormone erythropoietin (EPO) is mainly responsible for the response of the body to changes in the amount of RBCs. It acts like a negative feedback mechanism for erythropoiesis. The EPO concentration in the blood circulation is inversely related to the concentration of hemoglobin. High EPO concentrations result in an increase to the RBC proliferation rate in the bone marrow. Several precursor cell types are affected, especially CFU-E production. This short summary can be complemented by a more detailed overview of erythropoiesis in Lichtman et al. (

Polycythemia vera, also called primary polycythemia, is a chronic myeloproliferative neoplasm. That is, the production of blood cells increase to pathological levels. Most prominently, erythrocytes (i.e., RBCs) are affected. This causes the main symptoms of the patients: if the ratio of erythrocytes to the total blood volume—which, in medical terms, is called the hematocrit (

If untreated, the mean life expectancy of patients suffering from PV is only ~18 months (Marchioli,

Other symptoms of the disease are not fatal, but can strongly reduce the quality of life of the patient. Most prominently, aquagenic pruritus, a severe itching that patients experience from contact with water, is observed in up to 70% of cases (Siegel et al.,

In low-risk cases, the basic therapy for PV is blood-letting (phlebotomy): ~500 ml of blood on a regular basis (Tefferi et al.,

The most important clinical parameter for the planning of the treatment ist

The regulation of erythropoiesis no longer works in patients suffering from PV. The underlying process has yet to be fully understood, although there are plausible assumptions about it. In the investigation by Eaves and Eaves (

A mathematical model of erythropoiesis in healthy adults was developed in Tetschke et al. (

Basically, the model consists of three ordinary differential equations, that characterize the maturation and differentiation of a stem cell into an RBC until its death. Instead of incorporating EPO directly, the model uses an indirect approach with the help of the feedback function _{3} results in an increased proliferation in _{1}.

The three compartment model for erythropoiesis by Tetschke et al. (

with the following model components:

The compartments _{1} [1] and _{2} [1] reflect certain precursor cells in the bone marrow, that are committed to the erythrocyte lineage. _{1} includes CFU-E and early erythroblasts, which are highly affected by EPO in the blood circulation. _{2} denotes late erythroblasts and reticulocytes, which are unaffected or only slightly affected by EPO.

The compartment _{3} [

_{0} [^{−1}] denotes a constant inflow from the stem cell compartment into the erythroid lineage.

β [1] is a factor for EPO-independent proliferation. This is assumed to be unique to the patient.

γ [^{−1}] is a factor for EPO-dependent proliferation of early precursor cells. This is also assumed to be unique to the patient.

_{1} [^{−1}], _{2} [^{−1}] and α [(^{−1}] are the transition and apoptosis rates given by the literature (Tetschke et al.,

In the case of healthy erythropoiesis, the existence of an average normal erythrocyte level can be assumed, when environmental conditions do not change drastically. The average value is denoted by

_{3}, meaning, that the function decreases with increasing values of _{3} and vice versa. This indirectly incorporates the EPO dependency of the first compartment. By only using this function as a feedback, it was implicitly assumed that this is the only proliferation amplification factor from blood loss. This assumption is reasonable, provided that the blood loss is not too high, as, for example, in the case of severe where anemia emergency reactions like the release of stress reticulocytes (Lichtman et al.,

Blood removal of at most _{max} _{3}:

Here, _{pat} is the subject's total blood volume in

given that _{1}, _{2} and _{3} are positive and _{0}: = α·

The model was verified using data from Pottgiesser et al. (

The clinical parameter

Indeed, our models need to take into account the absolute amount of erythrocytes in the body. As blood counts only provide information relative to the withdrawn amount, the total blood volume is needed for this computation. As described by Ertl et al. (

In cooperation with the Department of Hematology and Oncology at the University Hospital Magdeburg, Germany, we retrospectively collected data from patients suffering from PV. The institutional ethics committee at the University of Magdeburg endorsed the study procedures. Each subject gave written informed consent before participation in this study. Unfortunately, the data were gathered according to routine clinical practice, meaning the quality of the data for use in an optimization study was poor: when treating patients, physicians aim to see patients only when necessary. Thus, the density of the data was quite low. Moreover, only standard blood counts are regularly conducted. Such data suffers the effect of plasma volume deviations and corresponding measurement errors, as described above. Another problem arises with treatment. Phlebotomy is the method of choice, as long as the disease is not too severe. In severe cases, additional therapies with drugs are adopted. For specific medication, a model of the pharmacokinetics and pharmacodynamics of the drug would be helpful. This is beyond the scope of this study, however.

Ultimately, we were able to identify three patient data sets with a reasonable data density and quantity. In

Details about three clinical patients used in 2.3.1.

01 | Male | 45 | 6 | Phlebotomy, chemotherapy since 5 years |

02 | Female | 45 | 9 | Phlebotomy only |

03 | Female | 79 | 15 | Phlebotomy, chemotherapy since 3 years |

As many patients are treated for several years, two of the three data sets cover more than five years. One of the assumptions of the model in Tetschke et al. (

Owing to the described problems arising from the collection of clinical data, we used data from Pottgiesser et al. (

For the artificial generation of parameters for PV patients from those of healthy subjects, the rejection sampling method (von Neumann, _{PV}. These λ_{PV} are suitable, if treatments are necessary and possible with reasonable frequency. For that, a random λ_{PV} was drawn from a uniform distribution on [0, 1]. With the heuristic approach without constraints 2.4.2, a number of necessary treatments within 365 days is generated. A λ_{PV} where that number of treatments is in [1, 26] is accepted. Otherwise, the value is rejected. The interpretation is that the PV patient should be so much affected by the disease that treatment with phlebotomy at least once in a year is necessary. However, it should not be needed more often than twice a month. For patients that are even more sick, physicians proceed with chemotherapy anyway. This process was repeated until, for each subject, five distinct λ_{PV} were found.

This process yielded 140 artificially generated parameter sets of PV patients. The generated values for the five λ_{PV} for each subject were on average in the interval [0.34(±0.12), 0.6(±0.16)] with an overall average number of treatments of 15.56 ± 6.56. The subject parameters with generated λ_{PV} can be found in

Parameter sets of subjects from Tetschke et al. (_{PV} = λ_{i} for each subject as detailed in Section 2.3.2.

_{pat} |
_{1} |
_{2} |
_{3} |
_{4} |
_{5} |
||||
---|---|---|---|---|---|---|---|---|---|

01 | 0.769 | 1.650 | 865.45 | 5530.04 | 0.405 | 0.418 | 0.512 | 0.513 | 0.521 |

02 | 0.388 | 0.867 | 885.42 | 4666.08 | 0.385 | 0.498 | 0.558 | 0.706 | 0.709 |

03 | 0.510 | 1.617 | 863.97 | 5265.93 | 0.326 | 0.393 | 0.413 | 0.480 | 0.549 |

04 | 0.323 | 0.424 | 854.15 | 5984.70 | 0.522 | 0.544 | 0.604 | 0.878 | 0.888 |

05 | 0.061 | 1.381 | 971.67 | 7734.16 | 0.192 | 0.321 | 0.334 | 0.367 | 0.419 |

06 | 0.590 | 2.615 | 1001.42 | 5096.65 | 0.290 | 0.331 | 0.343 | 0.354 | 0.415 |

07 | 0.262 | 1.518 | 964.59 | 7270.17 | 0.343 | 0.349 | 0.370 | 0.424 | 0.480 |

08 | 0.324 | 2.676 | 704.42 | 4091.19 | 0.216 | 0.243 | 0.340 | 0.371 | 0.433 |

09 | 0.356 | 0.891 | 958.55 | 9282.78 | 0.366 | 0.555 | 0.559 | 0.602 | 0.605 |

10 | 0.089 | 2.557 | 851.70 | 4588.62 | 0.199 | 0.207 | 0.298 | 0.391 | 0.396 |

11 | 0.243 | 0.925 | 1006.45 | 4610.27 | 0.384 | 0.385 | 0.533 | 0.615 | 0.652 |

12 | 1.003 | 1.409 | 932.51 | 6127.49 | 0.528 | 0.541 | 0.567 | 0.581 | 0.631 |

13 | 0.057 | 0.879 | 647.98 | 4017.69 | 0.198 | 0.369 | 0.428 | 0.506 | 0.583 |

14 | 0.762 | 0.460 | 1081.34 | 8260.98 | 0.639 | 0.743 | 0.767 | 0.787 | 0.845 |

15 | 0.344 | 2.132 | 939.61 | 6778.40 | 0.289 | 0.334 | 0.387 | 0.397 | 0.408 |

16 | 0.141 | 1.661 | 753.24 | 7102.67 | 0.226 | 0.339 | 0.349 | 0.350 | 0.379 |

17 | 0.470 | 0.544 | 900.53 | 5832.50 | 0.514 | 0.541 | 0.691 | 0.705 | 0.758 |

18 | 0.525 | 0.631 | 841.61 | 4872.18 | 0.529 | 0.661 | 0.689 | 0.695 | 0.847 |

19 | 0.423 | 1.525 | 786.47 | 5109.69 | 0.393 | 0.401 | 0.451 | 0.512 | 0.540 |

20 | 0.661 | 2.798 | 765.99 | 8486.20 | 0.328 | 0.334 | 0.341 | 0.342 | 0.360 |

21 | 0.686 | 1.943 | 908.60 | 5725.97 | 0.345 | 0.389 | 0.404 | 0.408 | 0.463 |

22 | 0.613 | 3.142 | 893.06 | 5438.46 | 0.278 | 0.303 | 0.305 | 0.337 | 0.342 |

23 | 0.421 | 1.528 | 695.05 | 4989.05 | 0.318 | 0.502 | 0.518 | 0.533 | 0.563 |

24 | 0.863 | 2.078 | 768.83 | 6182.47 | 0.435 | 0.454 | 0.469 | 0.476 | 0.479 |

25 | 0.414 | 1.172 | 687.85 | 5733.62 | 0.408 | 0.559 | 0.575 | 0.600 | 0.625 |

26 | 0.635 | 0.836 | 925.62 | 6168.52 | 0.659 | 0.681 | 0.682 | 0.708 | 0.746 |

27 | 0.952 | 1.596 | 869.00 | 6351.31 | 0.440 | 0.444 | 0.466 | 0.490 | 0.555 |

28 | 0.805 | 1.486 | 809.18 | 5987.13 | 0.472 | 0.497 | 0.507 | 0.527 | 0.551 |

_{pat}

In this section, the numerical methods and optimization approaches are described. First, a parameter estimation problem is solved on the available clinical data for proof-of-concept simulations. Then, optimization approaches for the generation of treatment schedules for PV patients are presented and discussed. The software used to evaluate the approach is stated in the corresponding subsection. The most relevant parts of the code are available on GitHub (

One main focus in this paper is the generation of optimal treatment schedules for phlebotomies of PV patients. Important properties of a suitable treatment schedule include the following:

_{3,up}) is identified, which should not be exceeded.

The focus of this work lies on the first three properties. Properties 1 and 3 will be incorporated as constraints of the optimization problem. The minimization of the number of treatments is reflected in the objective function

in the case of a continuous problem formulation. In the integer case it is

where

For improved readability, the schedules generated by the methods presented in the following sections are abbreviated as follows:

The number of treatments for such a schedule is then abbreviated by _{*}, where ^{*} is the one-, two-, or three-letter code of the corresponding method. For example, _{H} describes the number of treatments according to the heuristic approach without constraints. This indexing with the respective letter code also holds for other occurring variables.

As a general test setup for evaluating the optimization methods, a time horizon of 365 days (October 1st to September 30th) is considered. Treatments are possible from Monday to Friday, where the first of October is considered a Monday. In addition, restrictions of the clinic are included as blocked times on days 81–95 and days 280–301. The interpretation of these blocked times is that, around the winter and summer holidays, there are reduced personnel in the clinic, such that routine treatments are not performed. In

Graphical view on the general test setup including restrictions of the clinic in red. Phlebotomies are only allowed during times denoted in white.

The evaluations were performed on 140

To present the following methods, it is sufficient to have a model based on ordinary differential equations, that characterizes PV. In section 3.1.1, the model _{PV} is presented. For our purposes, it here suffices to state that the model includes a fraction λ_{PV} of affected progenitor cells, which influence the severity of the disease. The model dynamics have the following structure:

To get a first impression regarding the validity of the extended model in 3.1.1, the data sets of PV patients presented in 2.3.1 were used to obtain patient specific parameter vectors _{PV} introduced by the model extension.

The following parameter estimation problem with the least-squares objective is formulated:

where

{_{0} = 0, _{1}, …, _{nη}} are the time points where

η_{i} is the measurement value of _{i}.

_{3}(_{i}) is the corresponding model response at time _{i}.

σ_{i} is the standard deviation of the measurement at time _{i}. As all considered data were collected by the same method under similar conditions, σ_{1} = 1 for all measurements.

_{p} entries (including

and the regularization ϕ is selected as

Here, ϕ(

As displayed in 2.1.2 the aim of the treatment is to keep the patient's

Heuristic approach

1: Set initial value _{0} ⊳ (Initialization) |

2: |

3: |

4: _{i, 3}>_{3,up}: |

5: |

6: |

7: |

8: Find largest |

9: Set ^{*} |

10: |

where

_{max} and _{pat} are the constant blood volume per treatment and total blood volume of the patient, respectively.

_{dwell} is the dwell time of the system, which represents the minimal distance between two treatments.

For _{3, lo} into account. Therefore, it is necessary to inspect other approaches.

Another point of view is to see the desired treatment schedule as a solution to an OCP. To apply the solution in clinical practice, we are interested in a mixed integer solution. The next two sections deal with the generation of feasible and optimal integer solutions. First, we showcase the relaxed OCP and generate continuous treatment schedules (C-Schedules). An interpretation of these schedules is that a phlebotomy can be done arbitrarily often with arbitrarily withdrawn blood volumes. An exemplary illustration of a continuous solution with the corresponding

Exemplary result for an optimal relaxed treatment schedule. The continuous control function

The continuous OCP for minimizing the number of phlebotomies while allowing fractional treatments reads as follows:

The objective function only indirectly accounts for the number of necessary treatments. Actually, this formulation minimizes the amount of withdrawn blood over the time horizon. A theoretical analysis of the problem solution is given in ^{*} of the structure:

This optimal control is intuitive in the sense that no treatment is applied when unnecessary. Alternatively, phlebotomies are reduced to a minimum, such that they approach the threshold _{3,up}. The existence of this solution shows that, in general, the OCP is solvable. Computationally, this problem is solved with a non-linear programming formulation in CasADi (3.5.1) (Andersson et al.,

Continuous blood withdrawal, as seen in the case of the relaxed problem, can not be performed in clinical practice with currently available tools. To find an approach that is closer to clinical practice, an MIOCP with a discrete formulation is used. Let _{1}, …, _{N}} and _{1}, …, _{N}}. Then the discrete formulation is given by

Here, _{pat} are the same as in subsection 2.4.2.

Using this objective function, the system solution is not unique. In fact, a solution with _{f} after the end of the schedule. Although it is possible to combine those two objectives, it is unclear how exactly the individual components should be weighted. To circumvent this problem, an iterative approach is proposed.

Using the heuristic approach with schedule constraints _{up} for the necessary number of treatments. Starting with _{up}, we fix the number of treatments in the optimization problem and maximize _{f}. We decrement the number of treatments and repeat, until there are no more feasible solutions. The optimization problem that needs iterative solving is

Here, _{f} reflects the aim of postponing for as long as possible the first phlebotomy after the end of the schedule with the given number of treatments. The algorithm, then, is as follows:

Mixed Integer OCP with _{f}

1: Solve the heuristic algorithm (1) and set _{up} to objective |

2: _{sum} = _{up} |

3: Solve (14) |

4: |

5: _{sum} = _{sum}−1 |

6: Solve (14) |

This problem is solved with BONMIN (Bonami et al.,

Owing to the size of the MIOCP, as described in the previous subsection, computations with standard solvers are only feasible for a rather small number of possible integer control points. Larger problem sizes might be more relevant. Indeed, more control points per week or longer overall time horizons can be included. Thus, it is worthwhile to inspect rounding strategies and to compare them to the heuristic approach.

The SUR approach (Sager, ^{T} is reached. This collection is then reduced by one, and, afterwards, the previous process is repeated.

We use the multiple shooting method on an equidistant time grid for the computation of the relaxed solution ^{*}. The integer solution at the discretization point _{i} using SUR can then be computed as follows:

In the standard SUR approach, ^{T} is set to ^{T} = ε for SUR-Schedules, where ε > 0 is close to zero. This is necessary because only the relaxed solution is non-zero. If the upper bound _{3,up} is already reached, treatment must be done immediately.

This approach has the advantage that it is easy to implement and the computations are extremely fast, once the relaxed problem is solved. Moreover, if the relaxed problem includes blocked times _{j}, _{j} = 0 automatically.

The big disadvantage to SUR is that it is not obvious how to include path constraints. The strategy only takes into account the relaxed solution. There is no guarantee that the upper bound _{3,up} will be respected.

To summarize, although fast and intuitive to implement, SUR-Schedules risk endangering the patient, owing to violations of the treatment aim. Therefore, in clinical applications, the use of this approach should be combined with safety strategies, such as the use of a stricter upper bound _{3,sumup,up} < _{3,up}.

Another approach to generating integer solutions from the relaxed solution is to adopt so-called combinatorial integral approximation (Sager et al.,

Originally, the algorithm was designed to approximate relaxed controls with binary ones. For this purpose, it does not need to take into account the actual states. Therefore, it is unable to deal with path constraints and suffers from the same disadvantage as the SUR approach.

This is why we adapted the algorithm to take into account the states (and especially _{3}) in each iteration through forward integration. If at time point _{i} one of the conditions _{3,lo} ≤ _{3}(_{i}) ≤ _{3,up} is violated, the corresponding branch of the tree is no longer feasible and can be disregarded. This not only helps to include path constraints, but also decreases the size of the B&B tree.

This modified B&B version is able to generate feasible solutions, if we also fix the control

The overall quality of BB-Schedules generated by this approach depends on the maximum number of iterations. As the B&B tree tends to become very large, relatively few iterations search through only part of that tree. This can lead to instances where no solution can be determined, however, even though we implemented the additional pruning of the tree for infeasible solutions. Nevertheless, a large number of iterations leads to very large run times. For our numerical results, we used the default of five million iterations.

A completely different algorithmic idea for the solution of (13) is to generate treatment schedules by dynamic programming (DP-Schedules). Here, discretization is done not only in time, but also in the state space. This approach goes back to Bellman (

First, we introduce an equidistant grid _{x} in state space and tabulate state transitions: for each possible combination of a state value and a possible control value, the corresponding result of an integration over the next time interval must be stored. The result of the integration usually does not match one of the grid points. This is why rounding toward a valid grid point is necessary.

In our provided code this tabulation is stored with the help of indices. Thus, the rounding is done in the following fashion: Let ^{*}, if

The tabulation is then used to compute a so-called cost-to-go function. For each time point and state grid point this function indicates the best possible choice from that state and that time onwards. This is computed backwards in time. The last step is the computation of the optimal control starting in suitable grid points close to _{0} with the help of the tabulation.

This approach is globally optimal with regard to the grid used, as every possible combination of states and controls is evaluated. However, this approach suffers from practical drawbacks, when systems with many states are used, or when there are too many grid points for each state. In the case of the MIOCP (13), only three states have to be regarded and we consider only binary control. For this reason, the algorithm might be a good choice. We used 400 grid points for each of the three states.

After the initial tabulation, the algorithm has a linear complexity in the time discretization. Therefore, this approach is especially suited for schedule generation with large time horizons. It is also easy to include constraints. In our implementation, we worked with sparse matrix structures to account for the exponential growth of the state transition tabulation.

In this section, the results based on the previously introduced methods are presented. The model proposed by Tetschke et al. (

The three-compartment model by Tetschke et al. (

In this section, we describe the proposed model for PV, analyze its properties, and discuss simulation results using clinical data. We generated suitable treatment plans using heuristic and optimization-based approaches. The overall goal of treatment was to ensure the safety of the patient, while aiming to improve quality of life.

Here, we discuss our extension of the model (1) to reflect the relevant dynamics of erythropoiesis in PV patients. For this, we follow the idea in Eaves and Eaves (_{PV}, which corresponds to this fraction and can take values between [0, 1]. Correspondingly, there is a fraction of cells 1−λ_{PV} that responds in a normal way to EPO. A person not affected by PV will correspond to λ_{PV} = 0, whereas higher values give means to quantify the severeness of the disease. As the compartment _{1} mainly consists of CFU-E cells, an intuitive model extension of (1) is given as follows:

with γ^{*} denoting the growth rate of affected cells in _{1}. A phlebotomy can be incorporated in the same way as Equation (2) in section 2.2.

The model components are here discussed with respect to their plausibility in the case of PV.

β, _{1}, _{2}, γ: using this model extension by cell partition with λ_{PV} leads to the assumption that cells affected by PV only proliferate faster in _{1}, and otherwise behave like a healthy cell. We note that there might be physiological processes not covered by this model that affect other components, such as the transition times between the compartments. However, we assume that this is not the case and use the model variables β, _{1}, _{2}, and γ as in Tetschke et al. (

α: there are conflicting studies regarding the average life span of erythrocytes in PV patients. Depending on the investigation, the average life span is either shortened or normal (see London et al., _{PV}, or on patient-specific factors. This could be inspected in a follow-up investigation, once suitable clinical data are available.

γ^{*}: the model variable γ^{*} has a significant impact on proliferation in PV patients, especially in those with a higher number of affected cells described by high values of λ_{PV}. To our knowledge, however, no study has investigated the proliferation rate of CFU-E in PV patients based on the fraction of affected cells. Therefore, an accurate guess for the value of γ^{*} is not possible. In case of unknown model variables, a numerical estimation based on suitable data is optimal. However, there are many unknown patient-specific variables, such as β, γ, λ_{PV}, and (in most cases) ^{*} is unreasonable, given that data of exceptional quality and quantity are unavailable. As the available data do not often meet these criteria, one might opt for a heuristic approach by assuming a dependency of γ^{*} on other model variables, such as γ or β. By definition, γ reflects a proliferation amplification of EPO-affected cells, such that the use of the EPO-independent factor β seems more intuitive. For our investigation, we used

_{0}: the model variable _{0} reflects the inflow from hematopoietic stem cells to the erythrocyte lineage. As the proliferation rate of PV-affected stem cells might also be increased, one might assume _{0} to be higher and to be dependent on λ_{PV}. We assumed that a potentially enhanced stem cell inflow is compensated by the proliferation rate γ^{*}, and we used _{0} as in Tetschke et al. (

In most cases, the system's steady state for the erythrocyte mass _{PV} and the steady state erythrocyte mass

Following the calculations in _{PV} is given by

As described in the _{PV} ∈ [0, 1], such that only the case where λ_{PV} = 1 must be thoroughly investigated. With similar calculations, one can also show that _{PV}(λ_{PV}) increases monotonously for λ_{PV} ∈ [0, 1].

To summarize the results, _{PV} is a continuous, monotonously increasing function with _{PV}(λ_{PV}) ∈ [_{PV} ∈ [0, 1]. This means that an increasing fraction of affected cells can indeed lead to physiological complications, as the system tends to reach critical erythrocyte levels. This is consistent with the main physiological assumptions about the process.

In this section, the numerical results using the proposed model are presented. First, clinical data are evaluated in a proof-of-concept simulation. Then, the computed treatment schedule given by the heuristic method in section 2.4.2 is compared to schedules computed by the other approaches given in 2.4.

In 22 of the available 140 test cases, no H-Schedules could be generated, owing to the constraints. The remaining 118 H-Schedules were thus compared to the schedules from other methods.

The three data sets of patients suffering from PV presented in section 2.3.1 were used to assess the applicability of the model to real-world data. The method described in section 2.4.1 was used to obtain the patient-specific parameter vector _{PV}). The results are displayed in

Erythrocyte trajectories as a result of parameter estimation on three clinical data sets. The computed measurement values are given in red, and the healthy base value

Results of proof-of-concept simulation of clinical data from three patients.

_{PV} |
^{PV} |
^{2} |
||||
---|---|---|---|---|---|---|

01 | 0.6 | 0.3 | 0.89 | 768.25 | 1314.20 | 0.89 |

02 | 0.2 | 0.1 | 0.62 | 501.04 | 607.02 | 0.76 |

03 | 0.2 | 0.1 | 0.61 | 540.17 | 650.21 | 0.45 |

Taking into account all the problems with the collected data, the fits of the trajectories appear satisfying from visual inspection. Objectively, the ^{2} value of the three fits was 0.7. However, for subjects 02 and 03, the parameters β and γ were both equal to the lower bound set, owing to numerical restrictions. This might be a sign of errors in the assumption of

The good fit achieved by this method suggests that our proposed model captures the essential dynamics of this process. However, this must be verified using higher-quality clinical data.

In this section, we compare the HC-Schedules and the IP-Schedules of the MIOCP approach in Algorithm 2. For demonstration purposes, the IP-Schedule was compared to the corresponding H-Schedule and HC-Schedule in one modified test case. For this test case, subject 20 with λ_{PV} = λ_{2} was used with a time horizon of

Results of integer approach run time demonstration.

_{H} |
_{HC} |
_{IP} |
_{Tf}[ |
||
---|---|---|---|---|---|

1 | 4 | 5 | 5 | 0 | 954.8 |

2 | 4 | 5 | 5 | 0 | 17960.4 |

3 | 4 | 5 | 5 | 0 | 72239.5 |

4 | 4 | 5 | 5 | 0 | 195409.4 |

Here, Δ_{Tf}[_{HC} − _{IP}, where both _{HC} and _{IP} are computed as the respective time points in which the first treatment after the observed time horizon occurs. In the documentation, we set Δ_{Tf}: = 0 when

In this test case, the results of the IP-Schedules and HC-Schedules were without notable differences. However, whereas the generation of HC-Schedules had a constant run time of only a few seconds, the run time of IP-Schedules dramatically increased (up to 2.3 days for the largest test case). This demonstrates that the MIOCP approach is only suitable for very small test cases. Therefore, applications for this approach to the general test case in subsection 2.3.2 are unfeasible.

To compare the heuristic approach with the MIOCP approach further, both algorithms were applied to a modified version of the test case. It was modified with a smaller end time

In three cases, the MIOCP approach did not produce a feasible solution. In all other cases, the number of treatments _{IP} and _{HC} were equal. In those cases, differences only occurred with different Δ_{Tf}. In two of the latter cases, the MIOCP schedules were worse by

Exemplary results from three patient configurations are displayed in _{PV} = 0.51 is an example of the general case, in which both generated treatment schedules were identical. By contrast, for patient 02 with λ_{PV} = 0.56, the IP-Schedule was worse, owing to a treatment at approximately _{PV} = 0.71.

Erythrocyte trajectories of three exemplary patients using IP-Schedules and HC-Schedules. The upper threshold (red, dashed) and the end of the time horizon at

The MIOCP optimization approach using BONMIN only rarely yielded an improvement over the heuristic approach. The original problem size (see subsection 2.3.2) had to be reduced by a factor of 17 in terms of the number of integer variables, to produce results in a reasonable amount of time. Nonetheless, the run-times were long (920.22 ± 845.71 s). Therefore, the use of standard integer optimization solvers seems inappropriate for this problem. This motivated the investigation of other heuristic approaches, such as rounding schemes, for generating treatment plans.

In this section, the HC-Schedules and the SUR-Schedules are compared. One relevant property is the difference in the number of treatments _{diff} = _{HC} − _{SUR} of the schedules. The sum-up method does not directly take into account the critical threshold _{3,up}. Therefore, we evaluated the number of days in which the threshold was violated (_{viol}).

In all 140 test cases, SUR-Schedules were successfully computed. In 118 cases where the heuristic also found a feasible solution, the sum-up approach on average had a lower objective function value than the respective HC-Schedules, by an average of

We investigated the reduction in treatment _{diff} by the sum-up method and plotted it over the respective days of violation _{viol} (see _{viol,reg}(_{diff}) = 17.61·_{diff} + 30.04[^{2} = 0.42). The regression only considered the instances with a lower objective function value in the SUR-Schedules.

Duration of constraint violation _{viol} over the difference in the number of treatments for each of the 140 test cases (blue). The purple dashed line shows a linear regression over all instances with _{diff} ≥ 0.

In summary, the SUR-Schedules either had fewer treatments than the respective HC-Schedules, with considerable endangerment to the patient, or were similar or worse than schedules with only slight endangerment in most cases. To overcome these constraint violations, we can decrease the critical threshold _{3, up}, although this would lead to more treatments. Based on our investigation, the sum-up method performed considerably worse, because it did not directly take the upper bound into account.

The BB-Schedule was considered as a rounding approach. In contrast to the SUR-Schedule, the BB-Schedule respects constraints. As a complete B&B tree grows exponentially in the number of variables, the computations were run with a maximal number of iterations. In

Results of BB-Schedule in comparison to HC-Schedule.

HC | 118 | 15.78 ± 7.34 | 7.3 ± 13.3 |

BB small | 111 | 15.47 ± 7.21 | 26.9 ± 3.54 |

BB large | 112 | 15.54 ± 7.21 | 7.3 ± 13.3 |

In comparison to the HC-Schedule, the results of the approach are similar: 22 cases were not feasible with either approach. Additionally, the BB-Schedule failed to find a feasible solution with six patients in the version with a large iteration number (and with seven patients in the faster version). In both cases, there were 13 cases where the heuristic saved one phlebotomy, and two cases where even two phlebotomies were saved in comparison to the BB-schedule.

The results for the BB-Schedule can be improved by increasing the permitted number of iterations even further, although this would increase the average computation time.

The DP approach generates treatment schedules by exploring all possible schedules on a chosen grid. Those DP-Schedules were compared to the corresponding HC-Schedules. Relevant properties were the difference in the number of treatments _{diff,0} = _{HC} − _{DP,0} and _{diff,0.4} = _{HC} − _{DP,0.4}, and the number of failed attempts for both rounding offsets. Moreover, the computation time and the used RAM were documented. The latter was the limiting factor of the approach.

Of all 140 patient data sets, the system memory was exceeded in four configurations of subject 08 (λ_{1}, λ_{2}, λ_{4}, and λ_{5}) for both offsets. Therefore, only the results for the other 136 data sets were available. The system memory per configuration in most cases was close to the maximum available memory (~50 GB). The results are presented in

Erythrocyte trajectories using DP-Schedules for both rounding approaches and HC-Schedules for three exemplary patients. _{3,up} is shown as the red dashed line.

Using the conservative rounding rule with offset

For the commercial rounding rule with

There was no case in which an improvement from the DP method had zero days of violation. However, in some cases, DP-Schedules with only minor violations and a significant reduction in the number of treatments were generated. For example, for subject 02 with λ_{1}, there was a violation of _{viol} = 32.33 days with very small offset from the upper bound, which then reached its _{PV}, slightly below that threshold. As that threshold would probably be selected with some safety region, this subject might not need any treatment at all. Following the HC-Schedule, three treatments were applied, because the upper bound must be strictly respected. A similar case was inspected for subject 04 with λ_{2}, where the number of treatments was reduced by one when _{viol} = 5.5 days of violation were tolerated. There were also some cases in which the DP-Schedules were clearly sub-optimal (see subject 25 with λ_{1}).

Using DP with

To our knowledge, this is the first time that erythropoiesis in PV patients has been described in a framework that can simulate the dynamics of the disease. This is a first step toward clinical decision support, with which medical doctors can use simulation results to predict follow-up treatments. Such a framework has the potential to improve the treatment of PV patients significantly, while decreasing the work-load of clinical personnel by reducing the number of necessary appointments.

There are some drawbacks to the proposed model, however, and these will be addressed in future work. First, the different stages of erythropoiesis are simplified and summarized in few compartments. One can argue that too much information is lost through the agglomeration of complex underlying phenomena.

Second, further investigation in this area is limited by the data available. As PV is a rather rare disease, data sets are difficult to come by. In addition, clinical measurements are performed using

Finally, PV is not yet fully understood. This makes the modeling process difficult, as more black-box components must be introduced. However, the modeling framework can support medical research in this field. For example, investigations are warranted regarding the shortened life span of RBCs which often occurs in PV patients, and regarding the connection between the fraction of affected cells λ_{PV} and the JAK2V617F allele burden. Additional medical parameters might be introduced into this model framework for this purpose, which can, in combination with more clinical data, lead to new insights into the disease.

In the second part of this paper, we evaluated different methods of generating treatment schedules for PV patients based on the proposed model. An overview over the results is given in

Summary of relevant properties of the investigated methods for generating treatment schedules.

Integer solution | ✓ | ✓ | x | ✓ | ✓ | ✓ | ✓ |

No constraint violation | ✓ | ✓ | ✓ | ✓ | x | ✓ | x |

Respecting restrictions of clinic | x | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |

Extension for weighted dates | x | x | ✓ | ✓ | ✓ | (✓) | (✓) |

Run time practicable | ✓ | ✓ | ✓ | x | ✓ | (✓) | (✓) |

Memory practicable | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x |

# Computed schedules (of 140) | 140 | 118 | 140 | 0 | 140 | 112 | 136/136 |

# Feasible instances (of 140) | 140 | 118 | 140 | 0 | 33 | 112 | 8/94 |

The heuristic method of generating schedules follows the intuitive treatment design practiced by medical doctors. The resulting H-Schedules and HC-Schedules can be derived quickly and the schedules are integer solutions by design. Unfortunately, the heuristic is less flexible with regard to the inclusion of new features. As this method was sub-optimal in a formal sense, the quality of this approach was evaluated in comparison to formally derived optimization methods.

The investigated methods led to treatment schedules that in most cases had an equal or higher number of treatments in the observed time horizon, or included violations of safety constraints. Both the I-Schedules and the BB-Schedules were often similar to the respective HC-Schedules. The BB-Schedules were in a few cases even slightly better than the HC-Schedules. However, those approaches are difficult to realize, owing to high run times. The generation of I-Schedules is only possible for very limited time horizons and reduced treatment options. BB-Schedules can be generated relatively quickly, but need a higher run time for an increased rate of successful computation.

It is crucial to respect safety constraints to prevent endangering patients. Therefore, the SUR-Schedules and the DP-Schedules, which do not respect these safety constraints, must be used carefully. The SUR-Schedules were in most cases worse than the corresponding HC-Schedules, and often had significant violations of the constraints. Any strategy that uses this approach will require tighter safety constraints. Consequently, this might lead to feasible treatment schedules, but they would be significantly worse than the HC-Schedules. Therefore, the sum-up approach is not recommended for generating treatment schedules. By contrast, DP-Schedules in many cases demonstrated comparable quality, without any or with only minor constraint violations. There were even cases in which the acceptance of a minor violation led to considerably improved treatment plans. The major drawback here is that the order of magnitude of the violations depends on the selected discretization. This considerably influences memory consumption. Although DP-Schedules can be used in conjunction with the corresponding HC-Schedules, the high demand for system memory renders the approach difficult to realize.

Based on our investigation using a test configuration, the heuristic method with its HC-Schedules seemed to be the method of choice for generating treatment schedules. However, the heuristic method is difficult to extend when the properties of the treatments change. For example, as a quality-of-life feature for the patient, day-based weights might be introduced, assigning more weight to inconvenient days that are preferably avoided. This would give the patient the opportunity to realize treatment on more convenient days—offering more flexibility than a strictly optimal treatment schedule. The patient can thus avoid appointments that conflict with personal commitments. Such day-based weights can be incorporated into all of the other investigated methods. This would make BB-Schedules, DP-Schedules, and (in smaller instances) IP-Schedules desirable suggestions for patient treatment. In all cases, treatment schedules can be used to support decision-making by medical doctors when planning therapy.

Continuous treatment schedules were briefly discussed, but only as a foundation for other approaches, such as the sum-up method and the B&B method. Currently, continuous phlebotomy is technologically impractical in clinics, which makes C-Schedules inapplicable. With increasing technological progress, however, such a method might be derived in the future. Based on the results of this paper, this would lead to superior treatment compared to discrete approaches.

In this paper, a novel compartment model for PV patients was derived from the data-driven model in Tetschke et al. (

This gives the opportunity to simulate the disease patient individually and to provide phlebotomy schedules based on this information. Due to the model structure this can be achieved using tools of mathematical optimization. Thus, in the future many different further aspects of the clinical practice can be included in the treatment design. For example, also a treatment with chemotherapy could be included into the model to also capture more severe cases of the disease. This is a first step toward clinical decision support in the case of the disease PV.

All datasets generated for this study are included in the article/

The studies involving human participants were reviewed and approved by Ethics Committee of the Otto von-Guericke University at the Medical Faculty and at the University Hospital Magdeburg. The patients/participants provided their written informed consent to participate in this study.

PL and MT developed the model, performed the analysis and experiments, and wrote the paper. TF and ES contributed to the design of the research, provided clinical data for the experiments, and analyzed the data. SS contributed to the design of the research and to the discussion of the mathematical model.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

We would like to thank Mirko Hahn and Clemens Zeile from the Mathopt Group of the Faculty of Mathematics at Otto-von-Guericke University Magdeburg for their support with the implementation of the branch and bound method in pycombina. We also express our gratitude to Phillip Klein from the Institute of Mathematical Stochastics of the Faculty of Mathematics at Otto-von-Guericke University Magdeburg for his assistance with the realization of the moving sum approach with clinical data. We would like to thank Editage (

The Supplementary Material for this article can be found online at:

branch and bound

blast forming unit-hematopoietic

colony forming unit-hematopoietic

dynamic programming

erythropoietin

hematocrit

mean corpuscular hemoglobin

mixed-integer optimal control problem

optimal control problem

polycythemia vera

red blood cell

sum up rounding

total hemoglobin mass.