Detection of Dominant Intra-prostatic Lesions in Patients With Prostate Cancer Using an Artificial Neural Network and MR Multi-modal Radiomics Analysis

Purpose: The aim of this study was to identify and rank discriminant radiomics features extracted from MR multi-modal images to construct an adaptive model for characterization of Dominant Intra-prostatic Lesions (DILs) from normal prostatic gland tissues (NT). Methods and Materials: Two cohorts were retrospectively studied: Group A consisted of 98 patients and Group B 19 patients. Two image modalities were acquired using a 3.0T MR scanner: Axial T2 Weighted (T2W) and axial diffusion weighted (DW) imaging. A linear regression method was used to construct apparent diffusion coefficient (ADC) maps from DW images. DILs and the NT in the mirrored location were drawn on each modality. One hundred and sixty-eight radiomics features were extracted from DILs and NT. A Partial-Least-Squares-Correlation (PLSC) with one-way ANOVA along with bootstrapping ratio techniques were recruited to identify and rank the most discriminant latent variables. An artificial neural network (ANN) was constructed based on the optimal latent variable feature to classify the DILs and NTs. Nineteen patients were randomly chosen to test the contour variability effect on the radiomics analysis and the performance of the ANN. Finally, the trained ANN and a two dimension (2D) convolutional sampling method were combined and used to estimate DIL-NT probability map for two test cases. Results: Among 168 radiomics-based latent variables, only the first four variables of each modality in the PLSC space were found to be significantly different between the DILs and NTs. Area Under Receiver Operating Characteristic (AUROC), Positive Predictive and Negative Predictive values (PPV and NPV) for the conventional method were 94%, 0.95, and 0.92, respectively. When the feature vector was randomly permuted 10,000 times, a very strong permutation-invariant efficiency (p < 0.0001) was achieved. The radiomic-based latent variables of the NTs and DILs showed no statistically significant differences (Fstatistic < Fc = 4.11 with Confidence Level of 95% for all 8 variables) against contour variability. Dice coefficients between DIL-NT probability map and physician contours for the two test cases were 0.82 and 0.71, respectively. Conclusion: This study demonstrates the high performance of combining radiomics information extracted from multimodal MR information such as T2WI and ADC maps, and adaptive models to detect DILs in patients with PCa.


INTRODUCTION
Radiation Therapy (RT) has been proven to be an effective form of treatment for prostate cancer (PCa) and still is considered as one of the standard treatment options available. The current practice is to treat the entire prostate with a homogeneous dose distribution (1,2). Escalated dose conformal radiotherapy has shown an advantage in biochemical progression-free survival but it is associated with the increase in acute and late toxicities (3). Simultaneous dose escalation to the dominant intra-prostatic lesions (DILs), while maintaining acceptable doses to the whole prostate gland has potential to improve therapeutic ratio for prostate cancer patients. A median dose to the entire gland could prevent the disease recurrence in the prostate from satellite tumors and significantly reduce the side effects associated with escalated radiation dose to the entire gland. A boosting dose to the DIL can maintain the effectiveness of focal therapy to treat the DIL that is the main determinant for tumor progression and prognosis. For this strategy to be successful, key requirements are the ability to accurately and reliably identify clinically significant tumors in the prostate gland.
Among different imaging techniques, Magnetic Resonance Imaging (MRI) is used increasingly and provides clinicians and researchers with useful information for delineation of the prostate gland and clinically significant tumors in PCa patients (1,2,4). While multi-parametric (MP) MRI is wellestablished (5, 6) for detection of lesions and for staging of the disease, the sensitivity for small and lower grade lesions as well as spare tumors has been low (7) and MP-MRI has failed to improve the detection accuracy of lesions in the central gland (8). Furthermore, accurate and automatic delineation of DILs from prostate glandular tissue which is not a common practice, still remains a challenge. Radiomics analysis, which is defined as the post-processing for high throughput extraction of textural and intensity-based information from medical images, can play a central role toward detecting biomarkers for diagnosis and/or therapy of patients with cancer (9,10).
This study aims to identify discriminant radiomics features in the real radiomics-feature space and the latent-variable space (constructed from radiomics features in the space of Partial Least Square Correlation, PLSC) for construction of an adaptive model to classify DILs and NTs. The discriminant feature set in the PLSC latent-variable space can also be used for intra-tumoral segmentation and treatment response evaluation.

METHODS AND MATERIALS
Patient Population, and Pre-processing A total of hundred-seventeen patients consisted of the following two groups were studied: Group A: This group consisted of 98 PCa patients collected in Radboud University Nijmegen Medical Centre (11) and evaluated with Computer-Aided Diagnosis (CAD) (12,13). Each MR study was read and reported by or under the supervision of an expert radiologist (Barentsz), with more than 20 years of experience in prostate MR. The radiologist indicated areas of suspicion with a score per modality using a point marker.
If an area was considered likely for cancer a biopsy was performed. All biopsies were performed under MR-guidance and confirmation scans of the biopsy needle in situ were made to confirm accurate localization. Biopsy specimen were subsequently graded by a pathologist and the results were used as ground truth. Gleason grade groups for these patients are listed in Table 1, GroupA.
All MR studies included T2-weighted (T2W) and diffusionweighted (DW) imaging. The images were acquired on two  )]. The voxel-wise Apparent Diffusion Coefficient (ADC) map was constructed using two DWIs with two b-values. A large field of view transverse T2W sequences was also acquired to access the pelvic bones and lymph nodes. Image registration and lesion contouring was performed on in-house developed software.

Data Contouring and Harmonization
For each patient of group B, a radiologist with over 20 years of experience evaluated the axial T2WI and ADC maps and used the following criteria for delineation of DIL: Areas with a wellcircumscribed, hypo-intense with the highest Gleason score in the prostate on T2WI and ADC map. DIL and the equivalent region in contralateral (normal prostatic glandular tissues, NT) were contoured on axial T2WI and ADC maps, respectively. To harmonize the data and make them independent from MR scanner gains (can affect weighted images), for each patient of both groups, the signal intensity of their DIL was normalized to the mean value of their corresponding normal volume prior to the radiomics analysis.

Radiomics Analysis
All data processing was performed off-line using a commercial software package (MATLAB 2016a, the MathWorks Inc., Natick, MA, 2000). For each patient, 168 radiomics features (15), from eight different categories, were extracted from DIL and NT volumes contoured on ADC maps and T2W images. The 8 feature categories (15), as detailed below and in Table 2

Feature Selection and Statistical Analysis
A Partial Least Square Correlation (PLSC) (16) technique combined with one-way analysis of variance (ANOVA) were recruited to identify the most discriminant PLSC latent variables constructed from radiomics features extracted from NTs and DILs of multimodal MR information (T2WI and ADC map). PLSC method which is also called as projection to latent structures, can relate the information present in two MR modalities in which collect measurements on the same set of observations (16,17). The goal of the PLSC is to find pairs of latent vectors with maximal covariance and with the additional constraints that the pairs of latent vectors made from two different indices are uncorrelated and the coefficients used to compute the latent variables are normalized. As shown in Figure 1, two observation matrices were constructed using 168 radiomics features extracted from the two image modalities (T2WI and ADC) from total patients. A singular value decomposition (SVD) technique was used to analyze the common and discriminant information between the two observation matrices. For each MR modality, a latent vector was computed by the SVD technique and then it was tested by the ANOVA (with homoscedasticity assumption and confidence level of 0.95) to identify the most discriminant features in latent variable space between the features extracted from DIL and NT volumes in both groups. The Holm-Bonferroni method (18) was also used for circumventing the problem of multiple comparisons for the p-values. This method of p-value adjustment controls the familywise error rate and offers a uniform test, which is more powerful than the classic Bonferroni correction (18). Using the discriminant latent variable set identified by ANOVA, an optimal feature set for both modalities was identified and constructed.

Feature Ranking Using Bootstrapping Ratio Technique
A bootstrapping ratio (16,19,20) and permutation test (10,000 times randomly repeated) were performed on the latent vectors of the features sets (extracted from T2WI and ADC) and the SVD was computed for each configuration and distribution of eigen values was used to estimate the ranking and efficiency of the radiomics features against random permutation. For radiomics feature ranking, bootstrap ratios were computed by dividing the mean of the bootstrapped distribution of a significant latent variable by its standard deviation. The bootstrap ratio is akin to a Student t criterion and so if a ratio is large enough (>2.00; because it roughly corresponds to 95% of confidence level for a t-test) then the variable is considered significant/important for the dimension. The bootstrap estimates a sampling distribution of a statistic by computing multiple instances of this statistic from bootstrapped samples obtained by sampling with replacement from the original sample (16,19,20).

DOST 18 features
The two-dimensional matrix of DOST coefficients was divided into nine equal segments and the energy and entropy of each segment was averaged over the tumor volume and eighteen features (nine energy along with nine entropy) were generated and used as the DOST radiomics features.

LBP 6 features
Local Binary Pattern algorithm with a radial filter (eight-neighborhood) was used to generate a two-dimensional LBP map and Entropy, Entropy, Mean, Standard Deviation, Skewness, and Kurtosis of the LBP maps were used as the six LBPF radiomics features.

2DWT 48 features
Two-dimensional Wavelet Transform with six decomposition levels for four different information attributes (Multi-resolution image, vertical, horizontal, and diagonal) was used to generate 24 maps of 2DWT information. Energy and entropy of the information maps were calculated and used as the 48 2DWT radiomics features.

Artificial Neural Networks: Architecture Optimization, Training, and Validation Strategies
Eight latent variables constructed from the radiomics information were identified as the optimal feature set and were used as the input to an artificial neural networks (ANN) with a feed-forward multilayer perceptron (MLP) architecture and back-propagation training algorithm (21) for classification of DILs and NTs. In this type of ANN, the nodes are organized in multiple layers; The ANN used in our study had three layers: the input layer, single intermediate layer, and the output layer (21,22). Nodes were interconnected by weights in such a way that information propagates from one layer to the next, passing through a sigmoid (bipolar) activation function (22). Learning rate and momentum factors were set to control the internode weight adjustments during training (learning rate: 0.01, and Momentum: 0.01). A back propagation learning strategy (21) was employed for training the ANN in a supervised mode. In this strategy, a trial set of weights (the weight vectors, one vector for each layer of the ANN) was proposed. The initial weights were assigned randomly, and the same set of initial weights was saved and used for different trial during the leave-one-out method. The weight vectors were then adjusted to minimize some measure of error (in this case the Mean Square Error, MSE) between the output of the ANN and the training set. This procedure was performed iteratively across the entire data set using a batch processing mode to improve the convergence rate and the stability of training. The weight changes obtained from each training case were accumulated, and the weights updated after the entire set of training cases was evaluated. Batch processing improves stability, but with a tradeoff in reduction of the convergence (21)(22)(23). Two different training and validation strategies were recruited and tested as follows: Strategy 1: Leave-One-Out Cross-Validation (LOOCV) method, which is a particular case of the Leave-P-Out Cross Validation (called as Exhaustive Method) was employed for training, testing, and ANN architecture optimization (21,22,(24)(25)(26). LOOCV was recruited to find the optimal structure, termination error, and validation of the ANN. As shown in Figure 2, this approach leaves one data point out of training data, i.e., if there are N data points in the original sample then, N-1 samples are used to train the model and 1 point is used as the validation. This is repeated for all combinations in which original sample can be separated this way, and then the error is averaged for all trials, to give overall effectiveness with less estimated bias (27). This method is generally preferred over the Leave-P-Out Cross Validation when the sample size is small since it does not suffer from the intensive computation, as number of possible combinations is equal to number of data FIGURE 1 | The flowchart demonstrates different steps for the extraction of radiomics features from T2W images and ADC maps for DILs and normal tissues. As shown in this figure, for each MR modality, 168 radiomics features are extracted from normal and DIL volumes. The optimal feature set for the two MR modalities are identified using ANOVA applied on the latent variables generated by the PLSC technique for features with Silhouette coefficient of 0.5 and greater. points in original sample or N (28). Finally, to evaluate the stability of the optimal ANN against optimal number of training epochs, a series of ROC curves were generated by applying a threshold at the output of the randomly (100 times) trained ANN. The, the optimal cut-point which is the point closest-tocorner in the ROC plane was calculated. The optimal cut-point defines as the point minimizing the Euclidean distance between the ROC curve and the (0, 1) point (29). As the sensitivity (true positives) increases, the ANN can identify more cases with DIL, while the accuracy on identifying NTs (specificity) are sacrificed. Cut-points dichotomize the test values, so this provides the classification (DIL or not). Simultaneous assessment of sensitivity and specificity is used to estimate the cut-point value which is considered as optimal when the point classifies most of the individuals correctly (29,30).
To measure how accurately the ANN matched the whole input dataset with the entire identifier set, the ANN's Correct-Classification-Fraction (CCF: True Positive plus True Negative, TP+TN) curve was generated at different levels of epochs during the LOOCV procedure. The area under Receiver-operating characteristic (AUROC, Az-value) curves (21,22,24,25) for the ANN that is an index of predictive performance, was used to compare the ANN's performance in determining the optimal architecture of the ANN, and also finding the termination error (avoid overfitting) for training the optimal ANN. Strategy 2: For each discriminant latent variable, the data of the patient group A (96 patients) was split 100 times into training and validation components. In each data split, twothirds (67%) of the entire dataset was randomly sampled and used as a training set and the remaining one-third (33%) was used as the unseen cohort or validation dataset (31). Using the training and validation sets for each of the 100 iterations, the ANN was trained and validated separately for each discriminant latent variable. The same procedure was repeated for the set of eight latent variables. The AUROC, Positive Predictive value (PPV) and Negative Predictive value (NPV) were computed for each trial and were averaged to evaluate ANN classification performance for each discriminant latent variable and the set of eight latent variables.
All data processing and classifier implementation were performed using a series of in-house codes developed in the MATLAB environment.

Testing of Data Harmonization, Feature Consistency, and Generalization Error
Data harmonization refers to all efforts to combine different datasets collected by different scanners in different institutions. Finally, in order to test the consistency of the identified discriminant latent variables against the data harmonization and also testing the performance of the classifiers against prospective/unseen datasets (ANN generalization error), the following sub-analysis was conducted: An ANN was trained using the eight discriminant latent variables (constructed from radiomics information) extracted from patients information of group A. The trained ANN was then applied on the eight discriminant latent variables (constructed from radiomics information) extracted from patient information of group B (as test set or unseen patient cohorts). Ultimately, a ROC analysis was performed on the predictions of the trained ANN and AUROC, NP, and PP values for the unseen testing cohort (group B) were calculated.

Contour Variability Test
Nineteen patients were randomly chosen from hundredseventeen patients and their DIL and NT contours were modified by scaling the contours by a factor of 1.2 in all directions followed by a 1 voxel shift in all directions and their modified contours were used to repeat the radiomics and PLSC analyses and ANOVA method was used to test the sensitivity of the latent variables against contour variability.

Tumor Probability Map
The trained ANN and a two dimension (2D) convolutional sampling method window size = 25 × 25) were combined and used to estimate DIL-NT probability map for two test cases. Dice coefficients between the DIL contours and the DIL patch estimated from the probability maps (P thr > 0.001) for the two cases were calculated and compared.

RESULTS
A flowchart demonstrating different steps for extracting radiomics features from T2W images and ADC maps for DILs and NTs are shown in Figure 1. As shown in the figure, for each MR modality, 168 radiomics features were extracted from each of the NTs and DILs and finally, the optimal discriminant latent feature set for the two MR modalities were identified using a PLSC technique and ANOVA. Table 3 shows feature ranking results based on the PLSC and bootstrapping ratio techniques for the first 10 significant radiomic features of two MR modalities. Figures 3A,B demonstrate the scatter plots of the first three PLSC latent variables for T2WI and ADC, respectively. Figures 3C,D demonstrate the permutation tests for the inertia explained by the PLSC of the T2WI and ADC map along with their observed inertia for the 10,000 permutations. Figure 4A shows correct classification fraction (CCF = TP + TN) of the optimal ANN at different training epochs for LOOCV technique. The epoch corresponding to 10% change in plateau for the optimum architecture (8:5:1) was used as the stopping epoch (epoch = 17) of the ANN. Figure 4B shows TP, TN, false positive (FP), and false negative (FN), of the optimal ANN at different training epochs.
The AUCCF values for different ANN structures for LOOCV technique are shown in Figure 4C. As shown in this figure, the ANN with five neurons in its only hidden layer shows the highest performance (A z = 0.95) and is chosen as the ANN with optimal structure. Figure 4D shows the average AUROC of the ANN generated for randomly (100 times) trained ANNs along with the optimal cut-point (OCP = 0.96). Given the average AUROC (A z test ∼ 0.96), the optimal cut-point of the ANN, and the eigen value distributions for the randomly permuted (10,000 permutations) radiomics features, the generalization error of the ANN was about 4% with a very strong permutation-invariant efficiency, p < 0.0001) against the order of the latent variables.
AUROC, PPV, and NPV for the conventional method were 94%, 0.95, and 0.92, respectively. ROC analyses for eight individual latent variables (4 for T2WI and 4 for ADC) are shown in Figure 5. Figures 5A-D demonstrate the ROC analyses of the ANN for the first 4 latent variables constructed from T2WI for 100 random iteration corresponding to a different division of training and validation data of group A while Figures 5E-H depict the corresponding information for the ADC map. Table 4 shows AUROC, NPV, and PPV values along with their confidence intervals measured for each individual latent variable for 100 iterations (each corresponding to a different division of training and validation datasets).
As shown in Figure 5I, for the conventional training and validation method, the average AUROC, PPV and NPV were 95%, 0.96, and 0.93, respectively. Figure 5J shows the response of the trained ANN (group A) when it was applied on group B. The performance of the trained ANN (using group A dataset) when it was applied on the unseen data cohort (group B) was: Sensitivity/Specificity = 0.95/0.94. The radiomic-based latent variables of the NTs and DILs showed no statistically significant differences (F statistic for all 8 latent variables were smaller than F critical = 4.11, with Confidence Level of 95%) against contour variability. Figures 6A-F, illustrate T2WI, ADC map, and lesion probability map for a slice of prostate gland of two different patients estimated by the trained ANN using a 2D-convolutional sampling method (window size = 25 × 25). Dice coefficients between DIL-NT probability map and physician contours for the two test cases were 0.82 and 0.71, respectively.

DISCUSSION
Recent studies have shown that cancerous tissues are spatially heterogeneous due to factors, such as cell structures, genes, protein contents, cell morphologies, tumor microenvironment, and physiology (32). Indeed, the main purpose of using radiomics is to reveal and extract additional information from medical imaging modalities, associated with macroscopic and microscopic image-based features that have the potential to serve as surrogates for pathophysiological and radiological parameters, such as tumor heterogeneity level, pathology, response to a given therapy, decoration and distribution of information in images, and structural and image-based patterns in digital images. In our study, given the variation and nature of the radiomics features, we extracted multi scale information in form of features from the prostate gland to characterize normal prostatic tissue and tumor phenotypes from multi model MRI.
The PLSC technique used in this study allowed the finding of shared information between the two image modalities (T2WI and ADC). This approach is equivalent to a correlation problem (16,17,33) and provided descriptive features from multivariate information in form of latent variables which are optimal linear combinations of the variables extracted FIGURE 3 | (A,B) Clusters of NTs and DILs for each latent variable are well-separated with less diffusivity. It confirms that the distribution of the identified latent variable (PLSC-ANOVA) in the feature space is well-matched to its own cluster (less scattered) and poorly diffused to its neighboring clusters for the MR modalities. (C,D) Show the results of the permutation tests for the inertia explained by the PLSC of T2WI and ADC map for 10,000 permutations. As shown in the subfigures, the observed value (shown by vertical arrows) were never obtained in the 10,000 permutations for both modalities. Therefore, it is concluded that PLSC extracted a significant amount of common variance between these two modalities with P < 0.0001. from the two image modalities. Partial least square (PLS) method that benefits from projecting feature information on latent structures, relates the information present in two data tables (modalities) that collect measurements on the same set of observations (16). PLSC latent variables constructed on the basis of radiomics information extracted from DIL and NT consists of all radiomics features and can help reveal variations of descriptive features or discriminant parameters for classification of DIL from NT. An adaptive classifier (such as ANN) provides capability of implicitly detecting complex non-linear relationships between dependent and independent radiomics variables (already found as optimal feature set in latent variable space) and their variations, modeling their non-linear changes as well as detecting all possible interactions between the predictor variables. As shown in Figures 3A,B, clusters of NTs and DILs for each latent variable are well-separated with less diffused marginal points in the feature space. It confirms that the distribution of the identified latent variable (PLSC-ANOVA) in the PLSC space is well-matched to its own cluster (less scattered) and poorly diffused to its neighboring clusters. Figures 3C,D show the results of the permutation tests for the inertia explained by the PLSC of T2WI and ADC map for 10,000 permutations. The observed value (shown by vertical arrows) were never obtained in the 10,000 permutations for both modalities. Therefore, it is concluded that PLSC technique was able to successfully extract significant amount of common variances between these two modalities with p-value smaller than 0.0001. Recruitment of PLSC technique and ANOVA in this study allowed robust comparison and revealing of the correlation and descriptive power of different radiomics features extracted from the two MR modalities, while providing more predictive accuracy and a much lower chance of risk for the two sets of features affecting each other. The major limitations could be the sensitivity to the relative scaling of the descriptor variables that was addressed by the standardization and harmonization steps prior to the feature extraction.
Recent studies (34)(35)(36)(37)(38)(39) have shown that ADC measurements are affected by the user selected repetition time (T R ) values, especially if it is comparable to the relaxation time. The degree of T R dependence is also codependent on another parameter called number of diffusion preparation pulses. Similar to T R dependence of ADC values, it is expected that there could be an echo time (T E ) dependence on ADC values. In fact, Wang et al. (39) found a modest correlation between T E and ADC values in the prostate. It has been shown that tissue specific relaxation time parameters such as T1 and T2 and imaging parameters such as T R and T E affects the optimum b-value for different anatomies, tissues, and even lesion types within the same organ. Therefore, since the ADC value could be highly and "non-linearly" affected by the MR imaging parameters (34)(35)(36)(37)(38)(39), in this study, as part of data harmonization, normalization to normal volume was performed to suppress the effect of the MR imaging parameters on the ADC values. Such normalization made the ANN independent and less sensitive to the MR imaging parameters for prospective patients whom could be scanned with different scanners or different imaging parameters.
As shown in Figure 5 and according to the statistical measures reported in Table 4, as it is expected, for each modality, from left to right (Figures 5A-D or Figures 5E-H), as the order of the latent variable increases the information content or discrimination power of the variable for DIL classification deceases. As shown in Table 4 and Figure 5, the analysis results strongly confirm that compared to T2WI modality, the ADC modality is more discriminative with higher information content for the classification of DILs and NTs.
The application of novel machine learning techniques such as Bayesian approach, Support vector machine (SVM) kernels: polynomial, radial base function (RBF) and Gaussian and Decision Tree for detecting prostate cancer have been proposed by several research groups (40)(41)(42). Moreover, different  features extracting strategies are proposed to improve the DIL detection performance (40). ANNs have been used in different fields on a variety of tasks such as computer vision, speech recognition, machine translation, social network filtering, medical diagnosis, and in many other domains. There have been numerous applications of ANNs within medical decisionmaking (26,43,44). It has been shown that ANNs have unique properties including robust performance in dealing with noisy or incomplete input patterns, high fault tolerance, and the ability to generalize from the training data (26,43). The adaptive model constructed in this study can benefit from the ANN's properties stated above and can distinguish DILs from NTs with almost uniform sensitivity at different levels of specificities (see Figures 4A,B, 5I). The stability (lesions being non-patchy and uniform) of the predicted DILs and NTs in the probability maps (shown in Figure 6) clearly confirm the robustness of the PLSC-ANN technique in information extraction from the two MR modalities. The proposed ANN in this study was trained without any data augmentation. The results implied that the trained ANN can also evaluate any suspicious lesion in different zones of the prostate gland (PZ or TZ) regardless of its Gleason score.
Our study also confirms that the most discriminant features are textural-based features and given the bootstrapping feature ranking results, it can be concluded that frequency or arrangement-based features (LBP, GLRL, DOST, and 2DGF, see Table 3, a measure of the decoration or disorder of information distribution within a region), that are associated with subtle and descriptive information content of the two image modalities, play a key role in discrimination of DIL from NT. Also, we did not include morphological features such as volume, shape, solidity, convexity, eccentricity, and etc. in order to eliminate any possible biasing result from the manual contouring of DILs and NTs.
In this study, DIL and the NT contours were separately drawn on each image modality. While such a process could increase the chance of contour variability and negatively increase the variation of the data, it had an advantage that the two image modalities (T2W images and ADC map) did not necessarily need to be co-registered to each other prior to the radiomic analysis and adaptive modeling and therefore, the analysis results were not negatively affected by any possible co-registration errors. DILs and NTs contoured on unregistered image modalities were directly used for training and testing of the ANN. We only coregistered the two image modalities (T2WI and ADC map) using rigid co-registration [affine transform (45)] method for the two test cases (see Figure 6) to predict DIL-NT probability map using the trained ANN and 2D-convolutional sampling method.
The current major computer aided diagnosis systems recorded AUROC performance ranging from 0.77 to 0.89 and the focus was to detect lesions in the peripheral zone. Most image features, either individually or in combination that were effective in the differentiation of prostate cancer, are volume averaged quantities such as the 10th percentile of the ADC, T2W signal intensity skewness (46). Niaf et al. studied texture features extracted from MP-MRI on 30 fully annotated patients using four different feature selection and classification methods (47). They could achieve a diagnostic performance of 0.89 but the study was limited to the peripheral zone only. The performance was poorer due to the overfitting problem when all features were used for classification.
In this study, despite using 117 subjects (two cohorts: 96, and 19) with two different training and validation strategies, there are still several challenges as follows: Compared to the number of radiomics features, the study is limited by the number of patients, which will impact the optimal features selected, and also might render a predictive model susceptible to Type II errors. A larger sample size will also allow the construction of a more reliable ANN in order to draw a reliable and unequivocal conclusion.
In this study, two different training and validation strategies were recruited and the strong agreement between the analysis results confirmed the robustness of the identified features. In the first strategy, employing the LOOCV method in this study, allowed us to use a high proportion of the available training data fraction (1-1/K = 0.99 for K = 117), for training, while making use of all the data in estimating the generalization error or agreement. The cost is that the process can be lengthy, since we need to train and evaluate the network K times. Typically, according to the literatures, K ≈ 10 is considered reasonable (48). In this study, K was set to 117 for 117 patients (one case with DIL and NT in each fold) and the ANN had a single output, to predict the outcome. The radiomics features selected might be impacted by the intensities, size of the contour, and contrast of the NT. Since the region of interests were delineated manually, the accuracy and variability of the ROIs could impact on the optimal feature selection and the training results.
The Az-test for the average ROC analysis of the ANN is 1% higher than the Az-test of the optimal ANN (see Figures 3C,D). This is due to the difference between the way the two tests are conducted: for average AUROC, each NT or DIL from each subject is considered as a sample (thus the total samples are equal to 234) while in the ordinary Az-test for the optimal ANN, pair of NT and DIL for each subject is considered as a sample (thus the total samples are equal to 117). Strong agreement between the statistical measures of the LOOCV and conventional methods and also the high predictive power of the trained ANN (group A) when it was applied on group B (as prospective or unseen data cohort), confirm the consistency and high information content of the discriminant features identified in this study.
The 2D-convolutional sampling analysis results presented in Figure 6, imply that the trained-ANN is capable of estimating the DIL and normal tissue probabilities when the target contour (the 2D window) consists of a mixed radiomic information extracted from DIL and normal tissue.
ANN was implanted as a classifier since it has high tolerance against variation of input feature components and contours (according to the contour variability test results) while they are less sensitive to random noise (49), which allows the construction of a variation-and noise-insensitive adaptive classifier with higher accuracy and speed. Most importantly, ANN considers non-linear relationships among input data that cannot always be recognized by conventional analyses. Results of the permutation test also imply that the discriminant features used for training, are reliable and efficient for classification.

CONCLUSION
In conclusion, this study demonstrates the high performance of combining radiomics analysis, PLSC technique and adaptive model for extracting and ranking features from multimodal MR information such as T2WI and ADC maps to detect DILs and NTs in patients with PCa. The radiomics information of ADC modality was proved to have higher discrimination power compared to the corresponding features extracted from T2WI modality. Results are suggestive that the integration of quantitative image analysis methods such as radiomics analysis and PLSC technique when combined with an adaptive model can help identify imaging biomarkers and show great potential to help clinicians improve the classification of clinically significant prostate lesions for therapy of prostate cancer.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, to any qualified researcher.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The Internal Review Board at Henry Ford Health System. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
HB-E and NW designed the research and methodology. HB-E, CL, BJ, and NW performed the research. HB-E and CL contributed to the data pre-processing. HB-E developed statistical analysis, PLSC, ANN training, and validation as well as the development of analytical tools. HB-E and NW wrote the paper. MP and BJ investigated the data and also contoured and labeled the tumors and normal tissues on the MR images using the pathology images. DH helped with the implementation of the MR pulse sequences. HB-E, ME, BM, IC, and NW advised and mentored the study.

FUNDING
This work was supported in part by a Research Scholar Grant, RSG-15-137-01-CCE from the American Cancer Society and Dykastra Steele Family Foundation award F60570 and all authors of this manuscript have no other relevant financial interest or relationship to disclose with regard to the subject matter of this study.