Personalizing Medicine Through Hybrid Imaging and Medical Big Data Analysis

Papp, Laszlo; Spielvogel, Clemens P.; Rausch, Ivo; Hacker, Marcus; Beyer, Thomas

doi:10.3389/fphy.2018.00051

REVIEW article

Front. Phys., 07 June 2018

Sec. Medical Physics and Imaging

Volume 6 - 2018 | https://doi.org/10.3389/fphy.2018.00051

This article is part of the Research TopicMultimodality Molecular ImagingView all 8 articles

Personalizing Medicine Through Hybrid Imaging and Medical Big Data Analysis

Laszlo Papp¹

Clemens P. Spielvogel²

Ivo Rausch¹

Marcus Hacker²

Thomas Beyer¹^*

¹QIMP Team, Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria
²Division of Nuclear Medicine, Department of Biomedical Imaging and Image-guided Therapy, Medical University of Vienna, Vienna, Austria

Medical imaging has evolved from a pure visualization tool to representing a primary source of analytic approaches toward in vivo disease characterization. Hybrid imaging is an integral part of this approach, as it provides complementary visual and quantitative information in the form of morphological and functional insights into the living body. As such, non-invasive imaging modalities no longer provide images only, but data, as stated recently by pioneers in the field. Today, such information, together with other, non-imaging medical data creates highly heterogeneous data sets that underpin the concept of medical big data. While the exponential growth of medical big data challenges their processing, they inherently contain information that benefits a patient-centric personalized healthcare. Novel machine learning approaches combined with high-performance distributed cloud computing technologies help explore medical big data. Such exploration and subsequent generation of knowledge require a profound understanding of the technical challenges. These challenges increase in complexity when employing hybrid, aka dual- or even multi-modality image data as input to big data repositories. This paper provides a general insight into medical big data analysis in light of the use of hybrid imaging information. First, hybrid imaging is introduced (see further contributions to this special Research Topic), also in the context of medical big data, then the technological background of machine learning as well as state-of-the-art distributed cloud computing technologies are presented, followed by the discussion of data preservation and data sharing trends. Joint data exploration endeavors in the context of in vivo radiomics and hybrid imaging will be presented. Standardization challenges of imaging protocol, delineation, feature engineering, and machine learning evaluation will be detailed. Last, the paper will provide an outlook into the future role of hybrid imaging in view of personalized medicine, whereby a focus will be given to the derivation of prediction models as part of clinical decision support systems, to which machine learning approaches and hybrid imaging can be anchored.

Introduction

Hybrid Imaging

Patient management today entails the use of non-invasive imaging methods. These fall into two categories: anatomical or morphological imaging and molecular or functional imaging. The first category includes imaging methods, such as X-ray, Computed Tomography (CT), or Ultrasound imaging (US) while the second category is the domain of nuclear medicine imaging, employing techniques such as Single Photon Emission Computer Tomography (SPECT) and Positron Emission Tomography (PET). Magnetic Resonance Imaging (MRI) is somewhat in between both categories for it provides anatomical details with high visual contrast while probing functional as well as insights into metabolic pathways [1].

Standalone imaging methods have been used for decades in patient diagnosis. Patients suffering from cancer, cardiovascular diseases or neurodegenerative disease, however, have been shown to benefit from the use of so-called combined, or hybrid imaging methods. Hybrid imaging describes the physical combination of complementary imaging systems, such as SPECT/CT [2], PET/CT [3], and PET/MRI [4], all of which provide “anato-metabolic” image information [5], that is based on intrinsically aligned morphological and functional data.

Routine diagnosis based on hybrid imaging employs mainly visual data interpretation [6, 7]. There is, however, much more information in these data that can be turned into knowledge [8]. Extraction and analysis of simple to higher-order radiomic features comprises a revolution in the field of in vivo disease characterization [7]. This concept appears to be particularly promising in the field of cancer research [8, 9], where diagnosis of tumors is typically performed with ex vivo approaches, such as biopsy. The drawback of biopsies—beyond being invasive—is that they provide only information about a small region of tumors from where the sample is taken (Figure 1). Furthermore, the majority of tumors are heterogeneous across all scales [10]. Hybrid imaging can help describe the overall heterogeneity of tumors on both morphological and metabolic levels [10, 11]. Therefore, hybrid imaging appears to be a key technology to build accurate, in vivo tumor characterization models [7].

FIGURE 1

Figure 1. Tumor heterogeneity cannot be fully assessed with a core needle biopsy. Depending on the sampled region, histopathology results may differ, thus, pointing to different tumor biomarkers that subsequently affect the choice of therapy.

Hybrid Imaging and Medical Big Data

Our civilization has been dealing with finding ways to handle large-scale data sets for thousands of years [12, 13] (Figure 2). Thanks to multiple technological advances, our approach toward handling such data has continuously advanced, thus, resulting in the birth of the term “Big Data” [14–16]. The so-called 4 V model is one of the simplest ways to characterize Big Data [17] through four major features: volume, velocity, variety, and veracity (Figure 3), all of which help describe key observations of big data during their evaluation (Table 1).

FIGURE 2

Figure 2. Data and Big data during the past eras of mankind [12, 13]. People were faced with the challenges of data storage from their beginning. In the modern history of mankind big data is bound to digital data storage, distributed supercomputing as well as novel evaluation endeavors.

FIGURE 3

Figure 3. The 4V model of Big Data [17] referring to Volume (data size), Velocity (speed of change), Variety (different sources and formats of data), and Veracity (uncertainty in data).

TABLE 1

Table 1. Four major features of Big Data according to the 4V model [17].

For the past years, the volume of Big Data has grown exponentially. Based on a 2013 estimation, 90% of the world's data was generated in the two prior years (2011–2013) [18]. The Ponemon Institute estimated that 30% of all electronic data in 2012 was generated by healthcare alone [19]. According to a 2016 estimation from IBM researchers, 90% of all Medical Big Data is imaging data [20].

Modern PET/CT and PET/MRI systems provide gigabytes of datasets per study [21, 22]. Furthermore, hybrid imaging combines different modalities representing different image sizes and resolution levels, thus, it inherently results in datasets with a high variety. Both dynamic and gated acquisitions are subject to low sensitivity and longer acquisition times due to velocity of events in the living body [23]. The veracity in imaging patterns originating from varieties in imaging hardware, acquisition protocols and image reconstructions across vendors and system generations is understood, but remains a challenge for a better understanding of disease and multi-center data pooling alike [24]. All of these features eventually define hybrid imaging data a major component of Medical Big Data.

Machine Learning for Medical Big Data Analysis

Medical Big Data cannot be dealt with by traditional data processing applications [25], thus, novel data handling and evaluation approaches are required that extend beyond conventional software processing capabilities. Machine learning is a promising approach to deal with large-scale medical data [26]. In light of hybrid imaging, several groups have reported promising results for disease characterization by applying robust machine learning methods for combined in vivo analysis [27–29].

Furthermore, holistic approaches that combine clinical and imaging information become more popular [30–32]. Current technological advances support the collection of large-scale, heterogeneous information from living organisms, not only by hybrid imaging, but also by genomics [31], proteomics [33], or histopathology [34]. Such highly-heterogeneous datasets are very challenging to match, given the different time points and frequencies they have been acquired with. Nevertheless, the exploration of Medical Big Data is linked to a fully-personalized patient care [14, 35]. With the emergence of Medical Internet of Things (MIoT), real-time remote monitoring becomes feasible [36–38]. By collecting and combining MIoT information with hybrid imaging data, the accuracy of predictive analytics can be increased significantly [38] and can result in automated, Clinical Decision Support systems (CDSS) [39] (Figure 4). This leads to healthcare approaches that help improve patient comfort and reduce healthcare costs as part of a fully patient-centric, personalized medicine [40] (Figure 5). Hybrid imaging data, as a major constituent of this process, plays a central role in personalized medicine [41].

FIGURE 4

Figure 4. Representation of Medical Big Data as the result of various information capturing systems and examinations. The variety of Medical Big Data is manifested not only in the different structural nature of the collected data, but also in the various frequencies the given data can be collected from living beings. Machine Learning approaches can help in automatically exploring and analyzing this highly heterogeneous dataset, resulting in predictive models. These technological approaches can result in a Clinical Decision Support system (CDSS) that can help physicians to shift diagnosis and treatment toward precision medicine.

FIGURE 5

Figure 5. Brief introduction to the evolution of medical image interpretation and its incorporation to other Medical Big Data. From the early 1900s medical imaging data was interpreted manually by human based pattern recognition approaches. According to current trends, Medical Internet of Things [37] together with wearable sensors will support a real-time health status tracking. This trend will help to combine and analyse heterogeneous Medical Big Data (Holomics), which will be driven by expert AI systems operating in the cloud.

Technology

Machine Learning

Machine Learning refers to approaches that are able to identify and learn from patterns of datasets [7, 42, 43]. Even though ML algorithms can differ significantly, each of them can be characterized by the same logical structure composed of a model, a fitness measurement, and an optimizer (Figure 6).

FIGURE 6

Figure 6. The general scheme of machine learning approaches. The data serve as an input to explore and to build a decision-making model by the optimizer. A fitness measurement characterizes the performance of the model. This scheme can symbolize a one-step or iterative process, depending on the given ML approach. Of note, the three modules may have different relationships with regards to the actual ML approach.

The model predicts new information from the data and it is the result of any ML training process. The optimizer generates the model in a way that its fitness value is maximized. Specific algorithms may interpret the relationship of the model, the fitness measure and the optimizer in different ways. In case of unsupervised machine learning, for example, the fitness measure can be an internal step of the optimizer by measuring, e.g., within-cluster variances [44]. In contrast, supervised ML approaches may consider the fitness measure as an independent component from the optimizer [45]. Furthermore, the optimizer and model may not be independent but represent an integrative structure [46, 47].

There are numerous ways to categorize ML algorithms. One way of categorization relies on the nature of the data to be analyzed, thus, leading to the categories of supervised, unsupervised, and reinforcement methods (Figure 7). Supervised machine learning is a classification or regression approach that builds a predictive model based on labeled reference data [45]. Several research groups focusing on imaging analysis apply supervised machine learning to retrospective datasets [7, 10, 43].

FIGURE 7

Figure 7. Three main types of machine learning based on the nature of the input data or environment: supervised, unsupervised and reinforcement learning and their main properties by the means of data, fitness, and model.

Unsupervised machine learning operates with unlabeled data, hence, it can be characterized as a clustering approach [44]. Research groups focusing on exploratory analysis of their imaging dataset without a ground truth, utilize unsupervised machine learning methodologies [46, 48–50].

Reinforcement learning mimics human learning, thus, it considers that there is an environment with a certain state that can be changed by an ML “agent” through certain actions [46, 51]. Whenever, an action is taken and the environment state changes, a reward or punishment is issued back to the agent. The goal is to build a set of actions that maximize the reward and minimize the punishment regardless of the current environment stage. Reinforcement learning can be utilized in case a given environment is very complex, prone to changes, or—generally speaking—is of unknown nature [51]. To date, reinforcement learning is underrepresented in the field of medical imaging with only limited applications [46, 52].

Another type of categorizing ML methods follows the nature of the feature extraction and analysis of the data. In this case two main groups can be identified, such as shallow learning methods building on engineered (or handcrafted) features [8, 53] and deep learning (DL) methods building on automatically comprehended, multi-layer representation of the data [54]. Several related works utilize machine learning built on engineered features [55–59]. These approaches typically employ feature selection [60] in the form of feature redundancy reduction [61] or feature ranking [8].

Deep learning is reported to generally outperform shallow learning ML approaches, as it is able to decompose and analyse the data on different levels of information complexity [54, 62]. Nevertheless, the true potential of DL is seen only in view of data having a complex, hierarchical structure. This is a challenging requirement to fulfill, and, thus, DL to date is underrepresented in the context of tumor characterization and hybrid imaging [62, 63]. On the other hand, DL appears to be a promising approach to synthetize artificial CT from MRI for the attenuation correction of PET images acquired by hybrid PET/MRI imaging systems [64]. In addition, DL appears helpful for dealing with brain disease characterization, such as AD/MCI based on PET/MRI data [65, 66].

Despite the wide range of ML approaches available today, there is no unique ML method which generally outperforms all others [67]. Therefore, testing of multiple ML approaches is encouraged to identify the most suitable for a particular evaluation [68]. Table 2 provides an overview of the most common ML algorithms applied in medical science.

TABLE 2

Table 2. Overview of machine learning algorithms and example uses in medical imaging.

High-Performance and Cloud Computing

Machine learning evaluation of Medical Big Data requires high-performance computational resources [50]. As the amount of data increases exponentially, progressively complex computational architectures are needed for the storage, processing, and analysis of the data [98].

Distributed systems, such as the Hadoop ecosystem are potential solutions to deal with Medical Big Data [99, 100]. The foundation of the Hadoop ecosystem is Apache Hadoop with two major functional components: the MapReduce model for data processing and the Hadoop Distributed File System (HDFS) for storage. MapReduce splits the input data set into independent pieces processed in parallel by map tasks, while the “reduce” component combines the outputs of the map tasks afterwards [101]. The HDFS is a fault-tolerant distributed file system designed to run on low-cost hardware, which is suitable for medical image data applications [102]. Hadoop has been applied to address numerous tasks in medical imaging, such as parameter optimization for lung texture segmentation, content-based medical image indexing, and 3D directional wavelet analysis for solid texture classification [99, 103, 104].

Next to Apache Hadoop, several distributed high-performance computing platforms are available as well [99, 100]. One example is the open-source Apache Spark [105, 106], which has a better ability of computing compared to Hadoop [107, 108]. It has been shown that Spark is up to 20 times faster than Hadoop for iterative applications, it accelerates a real-world data analytics report by a factor of 40 and it can be used interactively to scan a 1 TB dataset within a few seconds [109]. These characteristics enable Spark to serve as an efficient tool for medical imaging data analysis tasks, such as the computation of voxel-wise local clustering coefficients of fMRI data [107].

To date, major industry leaders provide cloud storage with the combination of cloud ML engines to address the need of Big Data evaluation [15, 110]. These systems are potentially ideal frameworks for large-scale hybrid imaging data evaluation [111, 112]. An example service is the Google Cloud Platform used by both academic research institutions and by a variety of healthcare companies. Here, the Google Cloud Machine Learning Engine can be utilized to submit anonymized MRI scans to an ML-enabled AI platform to help diagnose prostate cancer [113].

Similar to the Google Cloud Platform, Amazon provides a suite of services called the Amazon Web Services including cloud computing and machine learning tools. The Amazon Elastic Compute Cloud, for example, is an Infrastructure as a Service which offers the possibility to rent virtual computers. It is being used to develop a technology for supporting radiologists to identify abnormalities in medical images across different modalities as well as for providing a blood flow imaging solution that enables doctors to render MRI scans in multi-dimensional models and better diagnose patients for cardiovascular diseases [114, 115]¹.

Another package for cloud-based services is Azure provided by Microsoft. Azure has been employed for medical image classification using algorithms, such as support vector machines and k-nearest neighbors [116, 117]. A few machine learning examples for medical imaging analysis utilizing Azure are covered in Criminisi [118].

Data Handling

Data Preservation and Reproduction

Beyond technical considerations of evaluating large-scale medical data, persistent storage is a challenge for various medical institutions. To date, hospitals that generate and collect medical data are also responsible for archiving the data [119]. The process is further defined in the triangle of legal obligations, practical considerations as well as financial resources. The time period a hospital needs to archive patient data varies with the country, however, mandatory preservation periods are generally between 10 and 100 years [120]. However, these storage periods are much longer than the estimated storage durability of memory technologies available today (Table 3). Medical imaging related research, particularly research involving longitudinal and/or large-scale population analysis, may mandate data sets that have been acquired over decades. In the context of hybrid imaging, state-of-the-art PET/CT and PET/MRI systems provide large datasets, as a raw PET list mode data may grow over several gigabytes of storage space, while a multi-slice CT may correspond to ~2 gigabytes of data [21]. Furthermore, a wide range of MRI sequences can be acquired as part of a PET/MRI study [22], that further add to the issue of dealing with large data. Since PET raw data (aka list mode files) is considerably large, it is frequently not archived at all [122]. This prevents scientists from retrospectively optimizing image quality by new image reconstruction approaches and eventually to standardize protocols for accurate, population-wide evaluations. Since the required data preservation for both routine and research purposes is not feasible by conventional tools, there is an urgent need to shift focus toward more persistent solutions. As an example, cloud storage and evaluation approaches [111, 112] can support convenient data sharing and repeatability of published results across various research groups.

TABLE 3

Table 3. Estimated lifespan of some media storage technologies according to Morgan [121].

Data Sharing

Clinical research is an essential building block for the concept of efficient patient management. Research studies are generally complex and the resulting data are valuable, not only to the principal investigator but to society as a whole [123, 124]. Nonetheless, many researchers remain reluctant to share their data with an expert audience [125, 126] beyond describing them as part of peer-reviewed publications. In contrast, sharing research data in a structured and tangible way has been shown to yield benefits for both the principal investigators and other experts in the field who may re-use the data with alternative evaluation approaches to extract new information that may subsequently benefit patient management [124]. Journals, like “Science” or “Nature” expect data to be made public and, therefore, provide the necessary means [124]. However, the quality of public supplementary material collections of published studies is variable, and frequently the re-use of these data is not possible [127]. The same holds true for the quality of alternative public data archives that were shown to contain incomplete data and data archived only partially in over 56% cases that prevented re-use [126].

In the light of medical imaging and hybrid imaging in particular, a wide range of imaging data from different fields is already available for researchers worldwide (Table 4). Some of these databases are dedicated to the collection and sharing of very heterogeneous data from different modalities, various diseases, and different body regions.

TABLE 4

Table 4. Selection of online medical image sources.

One such data source is The Cancer Imaging Archive (TCIA). TCIA is an open-access repository, funded by the National Cancer Institute, containing several million medical images from various cancer types². The data is partitioned into collections based on characteristics, such as cancer type or affected region in common. Apart from supplying users with massive amounts of high quality data, it also provides an application programming interface (API) for automated data access. EURORAD, which is operated by the European Society of Radiology also includes but is not limited to cancer images. Nevertheless, it mainly focuses on the training of radiologists and provides no automated data access³ Open-i is a service hosted by the National Library of Medicine (NLM) [130]. It provides a search engine and a download API for accessing images from PubMed Central articles, NLM History of Medicine collection and other sources. Nevertheless, not all images which can be retrieved using open-i are free to use. Another major source for in vivo medical images is the National Biomedical Imaging Archive (NBIA) which includes clinical and genetic data associated to the images [136].

In addition to these more general data repositories, there are many specialized data sources for medical images. The Open Access Series of Imaging Studies (OASIS) for example, includes comparative data for patients with Alzheimer's disease and normal physiological conditions [132]. In addition to neuroimaging data, it also includes clinical and biomarker information. Another specialized data base is the medical image repository of the Johns Hopkins Medical Institute [134]. It includes MRI images of human, mouse and monkey brain images. Further data sources for medical images can be found in Table 4.

Joint Data Exploration

To date, in vivo disease characterization with hybrid imaging data—especially in the light of oncological applications—is performed mainly by analyzing engineered features [63, 129]. This process is widely referred to as “Radiomics,” even though, this kind of approach was originally applied to morphological images only [8]. In an early publication “Radiomics” was defined as “the high-throughput extraction of large amounts of image features from radiographic images” [128]. For the sake of consistency, we employ the term “radiomics” in the context of in-vivo feature analysis, including those derived from functional and hybrid imaging. However, we introduce a different term, “Holomics,” to address combinations of imaging and non-imaging data, and that we consider more appropriate than the term “imiomics,” as suggested in Strand et al. [142] for the combined analysis of imaging and—omics data.

Radiomics

In general, any radiomics-based exploration of imaging data requires object delineation, followed by feature extraction and evaluation [128, 143, 144]. Extracted features cover the range of first to higher order, wavelet, Laplacian, and fractal features [6, 8, 145, 146]. Some of these features—referred to as textural features—characterize certain spatial patterns in images. The concept of textural evaluation in medical images has been first introduced by Haralick et al. [147] in the 1970s. At the time neither the image quality nor the computational capacity was sufficient to operate with textural features. Thanks to recent advances in imaging and computational fields, several groups have investigated the potential of textural features in light of in vivo disease characterization [8, 28, 56, 148–150].

Routine clinical evaluation of PET images relies on the statistical analysis of Standardized Uptake Values (SUV) [151]. In case of oncology imaging, semi-quantitative variants, such as SUV max, SUV peak [152–154], total lesion glycolysis (TLG) [155, 156], and metabolic tumor volume (MTV) [157, 158] are used to differentiate non-/malignant lesions. However, these values are insufficient to describe the stage of a tumors or to account for tumor heterogeneity [6]. Instead, textural features by their nature, appear to be ideal candidates for metabolic tumor heterogeneity analysis [159, 160]. Accordingly, promising results in the field of treatment outcome [161], therapy response [156, 162], survival [163–166] as well prognostic stratification [167] have been proposed. Several studies perform conventional correlation analysis [156, 160, 165, 168, 169], as well as robust machine learning evaluation [10, 170, 171] of textural features to characterize tumors in vivo. An oncological review of PET-based radiomic approaches concluded that it is a promising method for personalized medicine as it can enhance cancer management [172].

State-of-the-art ML approaches and hybrid imaging appear to be synergistic partners [7, 63, 173]. There is increasing evidence for in vivo tissue characterization with both PET/CT and PET/MRI hybrid imaging [11, 146, 174, 175]. In one study, PET, CT, and PET/CT features were used to predicting local tumor control in head and neck cancer [129] by multivariate cox regression with a confidence interval (CI) CI_CT and CI_PET/CT of 0.73, however, CT-based radiomics overestimated the probability of tumor control in the poor prognostic groups. Another study found a correlation of PET and CT features using lymph node density [176] and concluded that CT density measurements together with PET uptake analysis increases the differentiation between malignant and benign LN. Disease-free survival prediction in non-small cell lung cancer patients can be performed in PET/CT images with an area under the receiver operator characteristic curve (AUC) of 0.68 when employing combined PET/CT features [177]. Another group combined PET uptake measures with CT textural features for radiation pneumonitis diagnosis [178]. They reported an AUC increase of 0.04–0.08 in the combined model compared to single-modality classifiers (AUC 0.71–0.81).

Outcome prediction of locally advanced cervical cancer based on PET/CT and MRI is a topic of ongoing research [174]. The combined analysis of PET and ADC features resulted in an accuracy of 94% for predicting recurrence and 100% for predicting lack of loco-regional control compared to clinical parameters (51–60% accuracy). Combined PET and CT analysis was used to predict FMISO uptake in head-and-neck [179]. The group identified that the combined PET and CT features provide the highest AUC (0.79) for the prediction of tumor hypoxia as evaluated by FMISO PET. Joint fusion features can also be used, for example, to predict lung metastasis in soft-tissue sarcomas [57]. Here, the best performance was achieved with a combined PET/T1 and PET/T2FS textural analysis resulting AUC 0.984, which was significantly higher than that for single modality approaches. A systematic review focusing on oncological applications of radiomics approaches is presented in Avanzo et al. [175].

Holomics

There are several studies that go beyond the utilization of hybrid imaging and incorporate additional non-imaging data for increased predictive accuracy [180–182]. This kind of approach successfully increased risk assessment of head-and-neck cancer built on in vivo and clinical variables with utilizing random forest ML approaches [29]. Independent cross-cohort validation revealed an AUC of 0.69 and a CI of 0.67 for predicting loco-regional recurrences, while distant metastases were predicted with an AUC of 0.86 and a CI of 0.88.

Associations between tumor vascularity, VEGF expression and PET/MRI features in primary clear-cell-renal-cell-carcinoma have been discussed in Yin et al. [32]. The authors reported the highest correlation of tumor microvascular density and PET/MRI features compared to PET or MRI features alone. Correlation of [18F]FDG PET textural features with gene expression in pharyngeal cancer was performed in Chen et al. [183]. The study demonstrated that the overexpression status of vascular endothelial growth factor (VEGF) together with PET features was prognostic, thus, allowing to better stratify treatment response compared to PET-only parameters. Late life depression classification and response prediction with ML based on clinical and imaging features was presented in Patel et al. [30]. The study revealed an accuracy (ACC) of 87% for the classification of late-life depression and ACC of 89% to predict treatment response. Combined ML analysis of in vivo, ex vivo, and patient demographics features to predict 36-months glioma survival was presented in Papp et al. [184]. Comparison of the combined model (M36_IEP) with the ex vivo and patient (M36_EP), imaging and patient (M36_IP), and imaging-only (M36_I) models revealed an AUC of 0.9, 0.87, 0.77, and 0.72, respectively in a Monte Carlo cross-validation scheme.

Holomics also introduces several technological challenges. The “curse of dimensionality” refers to the phenomenon that by increasing the dimension of a data, the volume of feature space increases, hence the data becomes sparse [185]. Therefore, it is suggested that the number of data points shall increase exponentially in order to derive accurate models from high dimensional data. To overcome this issue, several dimensionality reduction methods are applied to the combined analysis of imaging and non-imaging data [186]. Similarly, feature selection methods, such as pre-filtering [8, 61] or wrapper and embedded approaches [187–189] can be utilized.

Standardization

Machine learning approaches operating over medical big data require a large amount of standardized data to generate accurate predictive models [10, 63]. Nevertheless, access to standardized multi-center data in the field of hybrid imaging is a challenge [54, 63] which necessitates multi-center standardization efforts [7]. Standardization of hybrid imaging techniques through patient preparation, imaging protocols as well as data evaluation is already of general interest in the field of medical imaging [22, 28, 55, 190–193].

Imaging Protocol

Functional imaging through SPECT and PET aim at the assessment of physiological parameters, as metabolic activity or perfusion. However, these parameters depend on various factors, and, thus, are unstable. For example, the uptake of glucose—and concomitantly—[18F]FDG in brown fat is dependent on its activation, which seems to be triggered by the surrounding temperature [194]. The uptake of glucose in the myocardium depends on the current metabolic pathway of the heart. The heart gains its energy almost exclusively from carbohydrates (primarily glucose) or from metabolizing fatty acids, whereas the pathway used depends on the availability of these substances, and, therefore, on the diet the patient followed prior to the examination [195]. As a consequence of these variabilities, standardized procedures in functional imaging demand the standardization of the entire workflow, including appropriate patient preparation [196].

International organizations, such as the IAEA, EANM, SNM, or ACR have proposed guidelines for patient preparation, imaging and evaluation approaches [197–199]. Accreditation programmes have been set up, such as EARL or the accreditation programs of the ACR, to reach at least a minimum of comparability of imaging data between different centers. However, despite such standardization efforts, site and system specific configurations still result in highly heterogeneous imaging patterns [24]. The reasons are manifold; as explained above, the patient preparation affects the outcome of a functional study. The physiological mechanisms behind this are in general understood and can be handled using appropriate protocols [196]. However, in clinical practice it is often difficult to adhere to these protocols in all details. For example, for outpatients it is almost impossible to check to what extent a required diet was followed.

Another source of variability are differences in the imaging systems and image processing chains. Different imaging system come with different detectors, detector arrangements, and electronics leading to differences in sensitivity and resolution [200]. Further, differences in image reconstruction algorithms, data correction techniques as scatter- and attenuation correction, used image matrices and voxel size as well as applied post filtering processing steps can substantially influence image appearance, quantitative readings, and noise properties of the image data [201]. All of these issues broaden the variability of data quality between different systems and imaging centers, and, therefore, contribute to a limited comparability of image-based ML studies [10, 28, 202, 203].

Delineation

Feature engineering requires the object of interest to be delineated first [8, 145, 204–206] with reproducibility [28]. Lesion segmentation can be performed manually, or semi-/automatically [151, 207]. Manual delineation employs slice-by-slice contouring tools to delineate objects in medical images, which is subject to inter-observer variability depending on the level of expertise of the operator [158, 208].

Semi-automated delineation with fixed thresholds is a popular approach when delineating objects in functional images [151]. These approaches either determine the threshold level by a certain SUV level [209] or by a percentage of the maximum SUV value in a given lesion [210]. Unfortunately, fixed thresholds are prone to the presence of different noise patterns originated from differences in the acquisition and reconstruction protocols [207, 211]. Therefore, different research groups that dichotomize PET tracer avid lesions by fixed thresholds reported contradicting results [212–214]. Inter-observer variabilities can be reduced by training programmes [215] or by collecting and pre-selecting many observer's contouring about the same lesion to achieve an average or consensus contour [151].

Automated delineation methods, such as the Fuzzy Local Adaptive Bayesian (FLAB) were reported to be robust for various, even heterogeneous object delineation tasks in PET [151, 205, 211]. This approach has been also successfully applied to hybrid imaging data, such as PET/CT and PET/MRI [216]. Similarly, random walk approaches have been proven to be effective delineation tools especially over noisy images [216–218]. A comparative study of 13 PET segmentation methods over 157 simulated, phantom and clinical PET images was presented in Hatt et al. [207]; here, a method built on a convolutional neural network (CNN) was found to be the most accurate.

Despite the known drawbacks of manual and semi-automated approaches and the emerging success of automated contouring, to date, the latter solution is still underrepresented in clinical routine [151, 219–221]. This indicates the necessity to extend the evaluation and cross-validation of popular delineation approaches in a large-scale multi-center environment.

Feature Engineering

Given the popularity of textural features in functional and hybrid imaging, their variability with respect to noise, acquisition protocols, and sample size is reasonably well understood [222, 223]. However, technical parameters, such as textural matrix bin size as well as value range intervals appear to greatly affect textural feature repeatability as well [55, 224]. Some in vivo features are not yet unified with regards to a common naming convention and the underlying equation itself [10]. Discussions are ongoing as to the impact of variations in imaging protocols, reconstruction parameters and choice of delineation on textural parameters [10, 160, 225–227]. As an example, while numerous studies utilized a fixed number of bins for textural analysis, recent studies suggested that a fixed bin size with variable number of bins per lesion provides better comparability and reproducibility of textural features [10, 55, 184]. Furthermore, image resolution normalization [228], or normalization of already extracted radiomic features in the feature space [229] have been proposed. In addition, guidelines are available focusing on imaging, feature extraction, analysis and cross-validation standardization of radiomic studies [230–232]. Even though these initiatives point toward a repeatable radiomic research, to date, there are still no standardized, widely accepted and followed radiomics protocols established in the field [232].

Machine Learning Performance Evaluation

Machine learning methods can establish highly accurate predictive models [42, 184, 233]. Nevertheless, inaccurate representation of performance values may lead to misinterpretation of results. Even though this issue is not exclusive to ML approaches [234], predictive models established by ML are prone to such misrepresentations. The training phase of each ML approach optimizes a predictive model over a training dataset. This implicates that the established model may become over-fitted to the training data resulting in a poor performance with independent data. Striking a balance between the training and validation errors is a challenge and referred to as bias-variance trade-off [235].

To estimate the performance of the model in single-center studies, cross-validation approaches shall be utilized [236], such as the leave-one-out method [188, 237], k-fold and stratified k-fold cross-validation techniques [233, 238] as well as Monte Carlo approaches [184, 239]. Likewise, multi-center validation schemes [69, 128, 170] shall be preferred over single-center schemes when estimating the reproducibility of reported results.

Robust machine learning approaches that intend to properly estimate the performance of its predictive models generally split the data into three subsets [240]. Initially, a part of the whole data set is taken out and categorized as test set, the remaining samples are categorized as training set. Test and training set are supposed to follow the same distribution, so they correctly represent the same underlying sample population [241, 242] Splitting is usually conducted to obtain approximate set sizes of 70 and 30% of the original data set for training and test set, respectively [243, 244]. The training set is partitioned using different techniques listed in Table 5. Using these methods, the training set is further divided into an actual training set and a validation set. Training and validation set are again supposed to follow the same distribution. The resulting training-validation pairs can be used to train and tune the ML-established models, respectively, while the remaining test set can be utilized to estimate the performance of the models over an independent dataset. For most of the listed techniques, this procedure occurs several times on each of these training-validation set pairs. A common approach when partitioning the data into training and validation set is the use of stratification [246]. In stratified validation, the sets have the same fraction of labels as the data of origin. This is particularly important when dealing with data sets where the number of samples corresponding to the different labels are imbalanced.

TABLE 5

Table 5. List of cross-validation techniques as discussed in Upadhaya et al. [164], Mi et al. [188], Beleites et al. [238], Xu and Liang [239], and Ross et al. [245].

In summary, ML performance shall never be reported over training sets, as the performance values over this set are overestimated, especially in case of overfitting. If a validation set is used to guide model selection or optimization, its performance shall not be reported either, as it becomes the part of the ML optimization process. To properly estimate the performance of the models, independent test sets shall be utilized, that had not been part of any ML decision making process in the given cross-validation fold.

Outlook

Images are data [8] and data is knowledge; this statement applies to all types of data, not only in medicine. It is our task to turn this knowledge into a patient benefit. One option is to build clinical decision support systems that are trained and validated on these data, and, therefore, embrace non-invasive imaging and non-imaging data potentially linked through machine learning as described here.

Nonetheless, restricted data access and variable data formats challenge the build-up of knowledge databases and the adoption of CDSS in modern healthcare. Across all disciplines and specialties data come in different formats. In medicine alone, available data come in the form of 2D and 3D images, they may entail serial information, additional raw (measured) data may be attached, data further include clinical tests, blood samples, genomic analysis, and so on. From a patient's perspective, these information is scattered across multiple systems, including the electronic medical record (EMR) system, laboratories, picture archiving systems (PACS), and alike. These data must be made available, accessible and tangible in order to apply any type of knowledge generation.

The sourcing of knowledge, to help an individual patient now, or to derive new therapeutics for more patients in the future, is inherently linked to the concept of “big data.” Big data, in combination with ML, can help uncover associations between various types of data (assuming that data silos can be torn down and data can be accessed) and it can help build prediction models for diagnosis and disease progression as well as therapy response assessment [247]. The use of big data requires a so called end-to-end strategy in which “IT departments or groups are the technical enablers; but key executives, business groups, and other stakeholders help set objectives, identify critical success factors, and make relevant decisions” [247]. Such strategy entails multiple milestones, including the validation of ML algorithms, the standardization of features and the general accessibility (and willingness to share) of data.

Novel Heterogeneity Phantoms

In view of the rapid growth of ML and hybrid imaging, the importance of image quality naturally shifts from visual interpretation toward quantitative, automated evaluation. This trend requires a change of focus toward standardization efforts across all scales of hybrid imaging. To date, there are no standardized, commercially available physical “heterogeneity” phantoms that allow us to validate and optimize hybrid imaging procedures for ML evaluation on-site or across multiple sites. We believe, such efforts are required to support the adoption of ML in the context of hybrid imaging.

Open Data, Open Cloud

The combination of open science and cloud computing is in the focus of several public initiatives. In 2016, the European Commission announced the creation of the European Open Science Cloud in order to promote scientific data access and evaluation with high-performance cloud computing technologies [248]. According to their report, all scientific data created under the umbrella of the Horizon 2020 research and innovation programme will be open data to support the scientific community. In addition, acceleration of quantum computing technology will be initiated by 2018 to support the construction of the next generation of supercomputers. By 2020 a large scale European high performance computing data storage and network infrastructure will be deployed to establish the base of future research and innovation in Europe [248]. Such efforts, in combination with existing open source date can help synchronize hypothesis-driven data cohorts for an efficient application of ML approaches with the purpose to generate knowledge from image data. To date, hybrid image data are not yet widely dispersed in such data initiatives, but increased awareness and ease-of-use of data repositories may facilitate a growth in accessible hybrid image data.

Doctor in Pocket

Machine Learning together with widely-accessible medical Big Data promises an era, where computer-aided diagnosis (CAD) and clinical decision support systems (CDSS) will contribute to routine decision making [39]. Artificial Intelligence (AI) assistants [249] will be able to process and provide personalized, real-time feedback to individuals over their Medical Big Data through their smartphones. These AI assistants could follow our physiological and mental wellness. While they could have access to massive population-wide medical information to learn from, they could dynamically change their model of ours to end up with fully-personalized predictive models (Figure 5).

Conclusions

Medical imaging originated from technological progress and innovation proposed by cross-specialists, including physicists, engineers, medical doctors, biologists, mathematicians, chemists, and alike. Medical imaging research has always been a data-driven science. Lately, medical practice, and healthcare in general, has moved into big data, as a modernist's view on data-driven science.

Medical Big data offers the ability to source unique knowledge from the available data, which, however, are spread across various formats and information contents and which may not be equally well accessible and assessable. Joint efforts are needed to turn medical big data into useful medical big data, for example by harmonizing data access and by moving from single site to multi-centric data cohorts and repositories.

While we have access to computer algorithms that can deduce higher-order information from available data, their validation hinges on the availability of large scale, high-quality, and standardized reference data. Only recently we have seen a growing awareness for the need for standardized imaging and data collection procedures, as pre-requisites for the use of machine learning and the construction of clinical decision support systems that can be employed in routine practice.

In this context, hybrid imaging contains a multitude of valuable information that, if combined with complementary non-imaging data, has been shown to yield surprisingly accurate insights into the causes of disease. If adopted carefully in the context of CDSS, hybrid imaging may contribute to an improved diagnosis of patients, and, in turn, to a more efficient therapy planning.

Author Contributions

All authors contributed to drafting and establishing the main skeleton of the paper. LP contributed to the Technology, Data Handling, Joint Data Exploration and Standardization sections. CS contributed to the Technology section. IR contributed to the Standardization section. MH contributed to the Data Handling section. TB was the primary reviewer of the paper and contributed to the Outlook and Conclusions sections. All authors reviewed and approved the paper for submission.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Footnotes

1. ^AWS Case Study: Arterys. Available online at: https://aws.amazon.com/de/solutions/case-studies/arterys/

2. ^Cancer Imaging Archive. Available online at: http://www.cancerimagingarchive.net/.

3. ^Eurorad: Radiological Case Database. Available online at: http://www.eurorad.org/.

References

1. Rinck PA. Magnetic Resonance in Medicine. 11th ed. The Round Table Foundation: BoD (2017). Available online at: http://www.magnetic-resonance.org/

PubMed Abstract

2. Beyer T, Freudenberg LS, Townsend DW, Czernin J. The future of hybrid imaging—part 1: hybrid imaging technologies and SPECT/CT. Insights Imaging (2011) 2:161–9. doi: 10.1007/s13244-010-0063-2

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Beyer T, Townsend DW, Czernin J, Freudenberg LS. The future of hybrid imaging—part 2: PET/CT. Insights Imaging (2011) 2:225–34. doi: 10.1007/s13244-011-0069-4

PubMed Abstract | CrossRef Full Text | Google Scholar

4. Beyer T, Freundenberg LS, Czernin J, Townsend DW. The future of hybrid imaging-part 3: Pet/mr, small-animal imaging and beyond. Insights Imaging (2012) 3:189. doi: 10.1007/s13244-011-0136-x

PubMed Abstract | CrossRef Full Text | Google Scholar

5. Wahl RL, Quint LE, Cieslak RD, Aisen AM, Koeppe RA, Meyer CR. “Anatometabolic” tumor imaging: fusion of FDG PET with CT or MRI to localize foci of increased activity. J Nucl Med. (1993) 34:1190–7.

PubMed Abstract | Google Scholar

6. Visvikis D, Hatt M, Tixier F, Rest CC, Le. The age of reason for FDG PET image-derived indices. Eur J Nucl Med Mol Imaging (2012) 39:1670–2. doi: 10.1007/s00259-012-2239-0

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Gillies RJ, Beyer T. PET and MRI: is the whole greater than the sum of its parts? Cancer Res. (2016) 76:6163–6. doi: 10.1158/0008-5472.CAN-16-2121

PubMed Abstract | CrossRef Full Text | Google Scholar

8. Gillies RJ, Kinahan PE, Hricak H. Radiomics: images are more than pictures, they are data. Radiology (2015) 278:563–77. doi: 10.1148/radiol.2015151169

PubMed Abstract | CrossRef Full Text | Google Scholar

9. Court LE, Fave X, Mackin D, Lee J, Yang J, Zhang L. Computational resources for radiomics. Transl Cancer Res. (2016) 5:340–8. doi: 10.21037/tcr.2016.06.17