Statistical Evaluation of Different Mathematical Models for Diffusion Weighted Imaging of Prostate Cancer Xenografts in Mice

Purpose To evaluate fitting quality and repeatability of four mathematical models for diffusion weighted imaging (DWI) during tumor progression in mouse xenograft model of prostate cancer. Methods Human prostate cancer cells (PC-3) were implanted subcutaneously in right hind limbs of 11 immunodeficient mice. Tumor growth was followed by weekly DWI examinations using a 7T MR scanner. Additional DWI examination was performed after repositioning following the fourth DWI examination to evaluate short term repeatability. DWI was performed using 15 and 12 b-values in the ranges of 0-500 and 0-2000 s/mm2, respectively. Corrected Akaike information criteria and F-ratio were used to evaluate fitting quality of each model (mono-exponential, stretched exponential, kurtosis, and bi-exponential). Results Significant changes were observed in DWI data during the tumor growth, indicated by ADCm, ADCs, and ADCk. Similar results were obtained using low as well as high b-values. No marked changes in model preference were present between the weeks 1−4. The parameters of the mono-exponential, stretched exponential, and kurtosis models had smaller confidence interval and coefficient of repeatability values than the parameters of the bi-exponential model. Conclusion Stretched exponential and kurtosis models showed better fit to DWI data than the mono-exponential model and presented with good repeatability.


INTRODUCTION
Diffusion weighed imaging (DWI) has extensively been used for cancer characterization in both pre-clinical (1,2) and clinical settings (3) during the last decade. Furthermore, DWI is increasingly being used for monitoring cancer therapy responses (4). In biological tissue, DWI contrast is predominantly affected by microscopic motion of water molecules and water interactions with surroundings. The most recognized DWI imaging acquisition method is the Stejskal-Tanner pulsed field gradient method. With this method, motion caused by self diffusion of a proton is acquired by applying a pair of motion-encoding gradients. The first gradient dephases and second one rephrases stationary protons, while moving water protons stays dephased resulting to decreased signal intensity. The signal attenuation depends on water diffusion coefficient (D [mm 2 /s]) as well as direction of the self diffusion of water (5).
Several different mathematical models have been proposed to describe the DWI signal decay. The mono-exponential model is the simplest and widely used, in which one parameter D (or often the apparent diffusion coefficient, ADC) describes the diffusion. This model fits well to DWI data measured from pure water without any restrictions. At low b-values DWI signal decay deviates from the mono-exponential function due to presence of intra-voxel incoherent motion (IVIM), as originally proposed by Le Bihan and co-workers (6,7). The search for new noninvasive imaging markers has led to increased interests in IVIMderived parameters, which have demonstrated correlation with microvessel density in colorectal cancer (HT29) model (8). Nevertheless, IVIM-derived parameters are not directly related to tissue perfusion (6,7), but perfusion and IVIM-derived parameters are rather related to the capillary structure (9,10). Similarly to low b-values, DWI signal decay at high b-values deviates from the mono-exponential function, and it is better described by non-Gaussian mathematical models (11)(12)(13)(14).
In general, a single mono-exponential decay provides oversimplified description of the complicated water motion in the tissue. However, the modeling with several free parameters could lead to "over-fitting" of data, and poor repeatability of the fitted parameters. Optimal model would have the highest information content and provides independent parameters, which are related to physical quantities (e.g. cell and/or vessel density) while still retain high repeatability/reliability of fitted parameters.
The Akaike information criteria (AIC) has been widely used for model selection in previous studies (15)(16)(17). A model with smaller AIC value would be a preferred model due to less information loss as compared with a model presenting with higher AIC. Similarly to AIC, F-ratio is commonly being applied for model selection (18). Model selection based on F-ratio tends to prefer a more simplified model (19) in contrast to AIC.
In the current study, we evaluated four different mathematical models for DWI within a study applying PC-3 prostate cancer cells grown in immunodeficient mice using both low (0-500 s/mm 2 ) and high b-values (0-2000 s/mm 2 ). The tumor growth was followed for four weeks with repeated MR examinations performed once a week. Corrected Akaike information criteria (AIC c ) and F-ratio were used to evaluate information content of the models. Non-Gaussian DWI models provided better fit to DWI data obtained using both low and high b-values. However, non-Gaussian DWI models were not clearly preferred over the mono-exponential model for DWI data obtained using low bvalues in contrast to DWI data obtained using high b-values. Furthermore, DWI fitted parameters changed significantly during tumor progression.

Animal Tumor Model
One million PC-3 (Anticancer Inc., USA) human prostate cancer cells were inoculated subcutaneously in immunodeficient mice (n=11, HSD: Athymic Nude Foxn 1nu, Harlan Laboratories, Indianapolis, IN, USA). The cells also expressed red florescent protein, while this property was not applied in the present study. Mice were housed in individually ventilated cages under controlled conditions of light (12h light/12h dark), temperature (21 ± 3°C), and humidity (55% ± 15%) in specific pathogen-free conditions at the Central Animal Laboratory, University of Turku for the first 5 days, and thereafter in similar conditions at the University of Eastern Finland Kuopio campus. Mice were provided with irradiated soy-free naturalingredient feed (RM3 (E), Special Diets Services, Essex, UK) and autoclaved tap water ad libitum, and were housed complying with international guidelines on the care and use of laboratory animals. All animal handling was conducted in accordance with the Finnish Committee for the use and care of laboratory animals and the institutional animal care policies, which fully meet the requirements as defined in the U.S. National Institutes of Health guidelines on animal experimentation.

MR Imaging
The first MR examination was performed 8 days after cell implantation. Tumor growth was followed for four weeks with repeated MR examinations once a week. Immediately following the fourth MR examination, six and seven mice had repeated DWI scan performed using low and high b-values, respectively. The second repetition was performed following animal and coil repositioning approximately 60 minutes after the first set of DWI. The repeated DWI examinations were used to evaluate short term repeatability of the measured parameters. The anesthetized mice (1.5% isoflurane in 70%N 2 /30%O 2 ) were imaged using a 7T animal MR scanner (7T Pharmascan, Bruker GmbH, Ettlingen, Germany) with 72 mm volume transmitter (Bruker GmbH) and 10 mm surface receiver coil (Bruker GmbH). Multislice T 2 -weighted anatomical images covering the whole tumor area were obtained (TR/TE 2500 ms/33 ms, field of view (FOV) = 30 × 30 mm 2 , matrix size 256 × 256, 15 slices) to localize a slice with maximum tumor diameter for DWI measurements. Diffusion weighted single shot spin-echo echo planar imaging was applied with the parameters: TR/TE 3750/25.3 (low b-value set) 3000/30 ms (high b-value set), FOV 3 × 1.5 cm 2 , matrix 128 ×64, slice thickness 1 mm, three orthogonal diffusion directions, and two different sets of b-values: low b-value set (15 b-values in total): 0, 2, 4, 6,9,12,14,18,23,25,28,50, 100, 300, 500 s/mm 2 , and high b-value set (12 b-values in total): 0, 100, 300, 500, 700, 900, 1100, 1300, 1500, 1700, 1900, 2000 s/mm 2 . For further analysis, the mean value of the signal from three directions was calculated.

Data Modeling
The following four mathematical models were applied to the DWI signal obtained using low and high b-values: 1. Mono-exponential model: where b is the b-value, S(0) is the signal intensity at b-value of 0 s/ mm 2 , and ADC m is the apparent diffusion coefficient calculated using the mono-exponential model. 2. Stretched exponential model also known as Kohlrausch-Williams-Watts model (20): where ADC s is the apparent diffusion coefficient calculated using the stretched exponential model, and a is the heterogeneity index. The dimensionless a parameter varies from 0 to 1. During the fitting procedure, a parameter was constrained to be in the range of 0 to 1. 3. Kurtosis model: where ADC k is the apparent diffusion coefficient calculated using the kurtosis model, and K is the kurtosis. Jensen at al. (21) originally developed the kurtosis model to fit deviation of diffusion tensor signal from the mono-exponential function. The dimensionless positive K parameter characterizes the deviation from the mono-exponential signal decay. 4a. Bi-exponential model for low b-values: where f p is the "pseudodiffusion" fraction, D f is the fast diffusion coefficient, and D p is the "pseudodiffusion" coefficient. The intravoxel incoherent motion (IVIM) theory is an advanced method to separate diffusion and perfusion effects using DWI (6) at low b-values. According to the IVIM theory, the blood flow in the capillaries causes a dephasing of the magnetization when motion-encoding gradients are applied. This means that the motion of water molecules due to microcirculation of blood in the capillaries has a similar effect on the resulting DWI signal as their motion due to molecular diffusion. 4b. Bi-exponential model for high b-values: where f f is the fraction of fast diffusion, D f is the fast diffusion coefficient, and D s is the slow diffusion coefficient.
The DWI signal decay of each individual voxel has been fitted using four mathematical models, as described above, to generate parametric maps of the parameters. The fitting procedure has been performed using in-house written C++ code utilizing Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm (22) in dlib library (23).
Following multiple initializations values were used to prevent local minima in the fitting procedure in order to avoid local minima in the fitting procedure (initializations values for high bvalues data are in brackets): 1. Mono-exponential:

Image Analysis
The tumor area was manually delineated on T2-weighted anatomical images and the regions of interest (ROIs) were defined to the corresponding parametric images. Voxels with ADC m values higher than 8.0^10 -3 s 2 /mm were discarded as those voxels were considered to represent necrosis. Median values of the fitted parameters of each ROI between repeated scans were compared using one-way analysis of variance with Bonferroni test (p<0.05 statistically significant).

EVALUATION OF FITTING QUALITY
Corrected Akaike information criteria difference (DAIC c ) (15) was used to evaluate model fit to DWI data of each individual voxel: where N is the sample size, P is the number of parameters, SS is the sum of squares between data points and fitted curve, A subscript represents the simpler model, B subscripts represents the more complex model. In addition to DAIC c , F-ratio (F) with 1% level of significance was used to evaluate model fit to data: where DF (= number of data points − number of parameters) is the degree of freedom, A subscript represents the simpler model, B subscripts represents the more complex model.

REPEATABILITY OF THE FITTED PARAMETERS
Repeatability of the fitted parameters was evaluated using the same methodology as in a previous human DWI drug intervention study (24). The difference (d) in median values per ROI between two repeated scans performed 4 weeks after the initial scan was calculated for a subset of mice (six mice for low b-values and seven for high b-values). Mean squared difference (msd) was calculated as follows: where d is the difference between two repeated scans, n is the number of mice with repeated scan. Subsequently, 95% confidence interval (CI) for changes in the study cohort was calculated: where msd is the mean squared difference, n is the number of mice with repeated scan.Finally, coefficient of repeatability (r) was calculated as follows:

RESULTS
PC-3 cancer cell growth was followed for 4 weeks with repeated MR examinations performed once a week. The changes in diffusion parameters are visualized in parametric maps on top of T2-weighted images for a representative case (Figures 1 and 2) while the rest of data are shown in Supplementary material ( Figures S1-S20). Median signal intensity of tumor ROI and the correcting fitted curves of week 1 and week 4 are shown in Figure 3.
The data is shown for the same representative tumor as shown in Figure 1.
Median values of the fitted parameters ADC m , ADC s , ADC k , and D s between week 1 and those measured at weeks 2, 3 and 4 differed significantly while the differences between weeks 2, 3 and 4 were not significant using the low b-value data. Similarly, median values of K parameter increased significantly between week 1 and week 3 and 4. In contrast, no significant differences were present between median values of different weeks for a, D p , and f f parameters (Figure 4). Using high b-values, significant changes were present in median values of ADC m , ADC s , a, and ADC k between week 1 and weeks 2, 3 and 4, while differences between week 2, 3 and 4 were not significant. The changes in K parameter were significant only between week 1 and week 4, while differences between week 1 and weeks 3 and 4 were significant for D f and D s parameters ( Figure 5).

Model Selection
In general, DWI data obtained using low b-values fitted better by the stretched exponential model as compared with monoexponential, kurtosis and bi-exponential models based on AIC c and F-ratio. In more than 50% of voxels the kurtosis model had lower AIC c values than the mono-exponential model. However, the kurtosis model did not provide significantly better fit to data than the mono-exponential model in more than 50% of voxels (averaged medians of 11 mice) based on F-test. Similarly, the biexponential model was not preferred over mono-exponential in more than 50% of voxels (averaged medians of 11 mice). No dramatic changes in model preference were present between different time points (Tables 1 and 2).
In contrast to low b-values, in vast majority of voxels stretched exponential, kurtosis and bi-exponential models fitted DWI data obtained using high b-values better than the mono-exponential model based on AIC c and F-test. The kurtosis model was preferred over the stretched exponential model in average in~75% of voxels based on AIC c . The bi-exponential models still provided significantly better fit to data than the stretched exponential and kurtosis models based on F-test.

Repeatability of the Fitted Parameters
The parameters of mono-exponential, stretched exponential and kurtosis models had confidence interval values smaller than 25% of the corresponding averaged median values ( Table 3) for DWI data obtained using both low as well as high b-values. Similarly, coefficients of repeatability were smaller than 45% of the corresponding averaged median values (Table 3), with the exception of K parameter for low b-value DWI data (r% 59.7%). In contrast, the parameters of the bi-exponential model had much larger confidence interval and coefficient of repeatability values, especially for low b-value DWI data. Large confidence interval and coefficient of repeatability values for the parameters of bi-exponential model implicate poor measurement repeatability. Confidence interval and coefficient of repeatability values for mono-exponential, stretched exponential and kurtosis models we similar for DWI data acquired using low and high b-values. However, K parameter of kurtosis models had approximately 2-times higher relative coefficient of repeatability values than the ADC m , ADC s , ADC k parameters for low as well as high b-values.
Signal intensities at the second repeated DWI examination differed systematically from those measured at the first DWI examination, and the median values of ADC m , ADC s , and ADC k parameters were lower in the repeated DWI in all mice.

DISCUSSION
The use of DWI for cancer detection, characterization and cancer therapy response monitoring continues to increase in both preclinical and clinical settings. Despite wide use of DWI, accurate and robust modeling of DWI signal decay remains a challenge. In the current study, we have evaluated four different mathematical models for DWI data (low and high b-values) of PC-3-cell derived human prostate cancer xenografts in mice, in terms of fitting quality and repeatability. Significant changes were observed in median values of ADC parameters detected between week 1 and those measured 1-3 weeks later, while the difference between the values obtained at weeks 2-4 were not significant. In previous studies cell density was shown to correlate with ADC m parameter (25,26). In the light of these previous studies, our findings indicated lower cell density of the tumors at the first time point measured.
According to our finding, non-Gaussian DWI models provided better fit to DWI data than the most commonly applied mono-exponential model, which is in line with previous findings (6,16,21,27). The use of high b-values and non-Gaussian DWI models for early therapy response evaluation has demonstrated promising results in human brain tumor (28), colon cancer mouse model (29), and glioma mouse model (30). In a study by Hoff and co-workers (30), fast diffusion component of the bi-exponential model had the largest percent change from baseline in glioma mouse model, suggesting a role of non-Gaussian DWI models for prediction of early therapy response.
Several recent pre-clinical (31)(32)(33) and clinical (34)(35)(36) studies demonstrated promising results for IVIM derived parameters especially in organs with high perfusion, such as liver or kidney. In these organs the intravoxel incoherent motion present with relatively larger contribution to signal decay. The perfusion fraction was shown to be in the range of > 20-30%, being substantially more than that observed in brain, for example (37). In the current study, the averaged median perfusion fraction value was 9%, thus, being similar to human brain. Due to relatively low contribution of the intravoxel incoherent motion   (38). The small contribution of "pseudodiffusion" component (f p , D p ) to the final fitting residuals during least square fitting procedure, calls into question the validity of IVIM parameters that are estimated using least square fitting procedure for organs with small "pseudodiffusion" component (39)(40)(41). Efforts to increase fitting robustness of the bi-exponential model resulted into wider use of "segmented analysis" (32,42) where each exponential component is being fitted individually in subsequent fashion. However, it should be noted that the resulting fitting residual is likely to be higher for "segmented analysis" than for "simultaneous" least square fitting of the bi-exponential model. Information content and bias of "segmented analysis" in comparison with other mathematical models remains to be established. Orton and the co-workers (43) have proposed the use of a Bayesian approach for improved estimation accuracy of IVIM parameters. Bayesian approach shrinks the distribution of parameters and "moves" outliers closer to the central distribution. Despite very promising results (36), this approach might not be applicable for cases with limited number of fitting voxels with large physiological voxel heterogeneity. Bayesian approach is a balance between improving quality of parametric maps and suppressing heterogeneity. Other possible approach is the use of neighborhood information during the fitting procedure (44). Deviation of the DWI signal from mono-exponential decay at low b-values is, according to IVIM theory (6), due to intravoxel incoherent motion associated with capillary perfusion. "Perfusion fraction" (f p in eq. 4) was proposed to reflex fractional volume (%) of capillary blood flow while "pseudodiffusion" (D p in eq. 4) probably relates to blood velocity. Biological reasons for non-Gaussian DWI signal decay at high b-values remains open despite several proposed theories (7,27,45,46). In the current study, the highest used b-value was 2000 s/mm 2 , which could have resulted in a less accurate estimation of slow diffusion component. Nevertheless, in vast majority of voxels non-Gaussian models provided better fit to DWI obtained using b-values up to 2000 s/mm 2 . As shown in prior studies (47,48) of DWI models functions for PCa, that b-value distribution and DWI acquisition parameters may contribute to the fitting performance, and exploration of these factors is left for future studies. Our study is limited by relatively small sample size. Furthermore, no attempts to histologically validate our findings with cell density have been made. Thus, further studies are needed to better investigate correlation of parameters derived from non-Gaussian DWI models with histopathological markers. Signal intensities differed systematically in the second DWI examination of the same tumor performed approximately 60   cause of the bias is a temperature drop of the tumor (not mouse core temperature) in the second repeated DWI examination. Nevertheless, it is beyond the scope of the current study to fully explore an effect of temperature and anesthesia (26) on DWI decay curve derived parameters.
In conclusion, we have evaluated four different mathematical models for DWI of PC-3 prostate cancer cell derived xenografts in mice. Significant changes in the fitted parameters were present during tumor progression potentially due to increased cell density in later stages. The "pseudodiffusion" component in the analyzed tumors was shown to be less than 10% of the bi-exponential model. Due to low repeatability of the bi-exponential model parameters derived from low and high b-values DWI data using independent least square fitting on a voxel level, a degree of caution should be applied if these parameters are used for cancer characterization and therapy response monitoring. On the other hand, monoexponential, stretched exponential, and kurtosis models shown high information content and robust repeatability.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The animal study was reviewed and approved by Turku Center for Disease Modeling, University of Turku, Turku, Finland. avg median, averaged median values per mouse; 95% CI (%), 95% confidence interval expressed as percentage of the averaged median values per mouse (avg median); r% (%), coefficient of repeatability as percentage of the averaged median value per mouse (avg median); NA, not applicable.