SuperHistopath: A Deep Learning Pipeline for Mapping Tumor Heterogeneity on Low-Resolution Whole-Slide Digital Histopathology Images

High computational cost associated with digital pathology image analysis approaches is a challenge towards their translation in routine pathology clinic. Here, we propose a computationally efficient framework (SuperHistopath), designed to map global context features reflecting the rich tumor morphological heterogeneity. SuperHistopath efficiently combines i) a segmentation approach using the linear iterative clustering (SLIC) superpixels algorithm applied directly on the whole-slide images at low resolution (5x magnification) to adhere to region boundaries and form homogeneous spatial units at tissue-level, followed by ii) classification of superpixels using a convolution neural network (CNN). To demonstrate how versatile SuperHistopath was in accomplishing histopathology tasks, we classified tumor tissue, stroma, necrosis, lymphocytes clusters, differentiating regions, fat, hemorrhage and normal tissue, in 127 melanomas, 23 triple-negative breast cancers, and 73 samples from transgenic mouse models of high-risk childhood neuroblastoma with high accuracy (98.8%, 93.1% and 98.3% respectively). Furthermore, SuperHistopath enabled discovery of significant differences in tumor phenotype of neuroblastoma mouse models emulating genomic variants of high-risk disease, and stratification of melanoma patients (high ratio of lymphocyte-to-tumor superpixels (p = 0.015) and low stroma-to-tumor ratio (p = 0.028) were associated with a favorable prognosis). Finally, SuperHistopath is efficient for annotation of ground-truth datasets (as there is no need of boundary delineation), training and application (~5 min for classifying a whole-slide image and as low as ~30 min for network training). These attributes make SuperHistopath particularly attractive for research in rich datasets and could also facilitate its adoption in the clinic to accelerate pathologist workflow with the quantification of phenotypes, predictive/prognosis markers.


INTRODUCTION
The analysis of histopathological images of surgical tissue specimens stained with hematoxylin and eosin (H&E) remains a critical decision-making tool used for the routine management of patients with cancer and the evaluation of new therapeutic strategies in clinical trials (1)(2)(3). In several precision medicine settings, there is an increasing demand for accurate quantification of histological features. However, in their diagnostic practice, pathologists exercise a predominantly qualitative or semi-quantitative assessment with an inherent degree of inter-and intra-observer variability, which occasionally hampers their consistency (4)(5)(6)(7). In the new era of digital pathology, advanced computational image analysis techniques are revolutionizing the field of histopathology by providing objective, robust and reproducible quantification of tumor components, thereby assisting pathologists in tasks such as tumor identification and tumor grading (8,9). Histopathological image analysis can now be performed in high-resolution H&Estained whole-slide images (WSI) using state-of-the-art deep learning and classical machine learning approaches for single cell segmentation and/or classification. The new ability to map the spatial context of each single cell also opened new avenues for the study of the tumor micro-environment (10)(11)(12)(13)(14)(15)(16), which is key to guide the delivery of precision medicine including immunotherapy.
However, computational pathology is still not widely adopted in the oncological setting. One of the challenges lies in the gigabyte sizes of high-resolution WSIs, which result in computationally expensive approaches. WSIs need to be divided into images patches (typical size: 256x256) before being processed by deep networks such as convolutional neural networks (CNNs) (17). Secondly, single-cell approaches provide markers that are often hard-to-be-evaluated or even interpreted by the pathologists and can be prone to the generalization errors when applied in new unseen dataset. As a result, many promising markers eventually fail to reach the clinic due to a lack of cross-validation in new independent datasets. On the other hand, tissue classification approaches, which target multicellular assemblies and paucicellular areas where individual cells are incorporated into the region segmentation, would be accessible for visual validation by pathologists. Such algorithms would enable the characterization of the distribution and interrelationship of global features that are currently detectable by human perception but not quantifiable without artificial intelligence-(AI-)assisted numerical expression.
Current computed pathology tools primarily focus on individual cell analysis at high-resolution (40x/20x magnification) with limited local context features, whereas pathologists frequently employ collateral information, taking into account the overall tissue microarchitecture. Many established clinical markers are actually identified at low or intermediate magnifications, including tumor architecture-based grading systems (18,19), stroma-tumor ratio (20,21), infiltrating lymphocytes (TILs) (22,23) and necrosis (24)(25)(26). This has not been yet fully emulated by computational pathology methodologies. However, some methods for the classification of tissue components have been suggested either using image patch classification typically with a CNN or pixellevel classification/segmentation typically with a U-Net-like architecture (27), mainly for tasks such as the dichotomized classification of tissue (e.g. cancerous vs non-cancerous) (28,29), the segmentation of a feature of interest (e.g. glands) (16,30) or multi-type tissue classification (9,(31)(32)(33)(34)(35). For segmentation purposes, U-Net-like architectures are usually preferred over CNNs, which have established limitations in conforming to object contours. Yet, CNNs have also resulted in promising segmentation approaches (36)(37)(38) with the enhanced capability of classifying a large number of categories (39). Multi-scale approaches incorporating information from various image resolutions have also been proposed (40)(41)(42)(43). Different approaches have been explored for the classification of epithelium or stroma using superpixels-based segmentation of image patches with either hand-crafted or deep learning features (44,45). Bejnordi and colleagues used a similar method for their multi-scale approach for the classification of tissue or non-tissue components on low resolution images and stroma and background regions from intermediate and high resolution images (46). However, these methods are typically performed on high-magnifications image patches (20-40x and more rarely 10x) and are associated with a high computational cost.
Here, we propose a framework (SuperHistopath), which can map most of the global context features that contribute to the rich tumor morphological heterogeneity visible to pathologists at low resolution and used for clinical decision making in a computationally efficient manner. We first apply the well-established simple linear iterative clustering (SLIC) superpixels algorithm (47) directly on the WSI at low resolution (5x magnification) and subsequently classify the superpixels into different tumor region categories using a CNN based on pathologists' annotations. SuperHistopath particularly capitalizes on: i. the use of superpixels which provide visually homogeneous areas of similar size respecting the region boundaries and avoid the potential degradation of classification performance associated with image patches, (no matter how small) spanning over multiple tissue categories. ii. the use of CNN necessary to accurately classify and map the multiple tissue categories that constitute the rich and complex histological intratumoral heterogeneity. iii. the computational efficiency, faster processing speed and lower memory requirements associated with processing the WSI at low resolution.
We applied SuperHistopath to H&E-stained images from three different cancer types: clinical cutaneous melanoma, triple-negative breast cancer and tumors arising in geneticallyengineered mouse models of high-risk childhood neuroblastoma.

Datasets
All digitized whole-slide images (WSI) used in this study were H&E-stained, formalin-fixed and paraffin-embedded (FFPE) sections, and scaled to 5x magnification as presented in Table 1 (image sizes at 5x varied from~8000x8000 to ~12000x12000 pixels). We applied our framework to clinical patient samples of cutaneous melanoma and triple-negative breast cancer, in addition to tumor samples from transgenic mouse models of childhood neuroblastoma. Both the Th-MYCN and Th-ALK F1174L /MYCN mouse models have been shown to spontaneously develop abdominal tumors, which mirror the major histopathological characteristics of childhood high-risk disease (50,51).

Region Classification
First, each dataset was pre-processed using the Reinhard stain normalization (52) to account for stain variabilities that could affect classification. Then, all images were segmented using the simple linear iterative clustering (SLIC) superpixels algorithm, which groups together similar neighboring pixels. With our pathologist's input, we selected the optimal number of superpixels by visually identifying a superpixel size that capture only homogeneous areas and adhere to image boundaries. This is a critical step for ensuring accurate tissue segmentation, and therefore, classification ( Figure 1). The number of superpixels was adapted for each image to ensure a homogenous superpixel size across the datasets and was automatically set based on the image size according to Equation 1 (53).
where Ni is the number of superpixels in the i th image, Si is the size of image i in pixels, and U is a constant held across all images that defined the desired superpixels size.
The SLIC algorithm inherently provides a roughly uniform superpixel size. Setting U = 1500, Equation 1 gave a mean superpixels size of 51 × 51 pixels, equivalent to an area of approximately 117 × 117 mm 2 . Bilinear interpolation was subsequently use to resize each superpixel to a fixed size of 56 x 56 or 75 x 75 pixels (the minimum input size for inceptionlike network architectures).
Region annotations were provided by a senior pathologist with over 20 years of experience for the melanoma and breast cancer clinical datasets, and a senior pediatric neuropathologist with over 20 years of experience for the neuroblastoma mouse datasets. For training and testing, superpixels were assigned to each category based on their isocenter location within the annotated regions. Note that region annotations for our algorithm do not need to delineate boundaries as illustrated in Figure 1B.
The numbers of clinically relevant tissue categories, number of WSIs and superpixels used for training and testing are summarized for each tumor types in Table 2. Standard image  augmentations, such as rotations (90°, -90°, 180°), flips (horizontal and vertical), and contrast (histogram equalization) were performed in each case to capture more variation and even out the training dataset imbalances.

Training of the Convolutional Neural Networks
Our custom-designed CNN for superpixel classification consists of 6 convolutional layers (32,32,64,64,128, 128 neurons, respectively) of 3 x 3 filter size and 3 max-pooling layers, followed by a "flatten" layer and a dense layer of 256 neurons ( Figure 2). A superpixel RGB image (post-interpolation) was used as input into the network and normalized from range 0-255 to range 0-1 using the maximum value. The output of the network was a label assigned to each superpixel based on which region category it belonged to. After empirical experimentation, a ReLU activation function was used in all layers except for the last layer where standard softmax was used for classification. The weights incident to each hidden unit were constrained to have a norm value less than or equal to 3 and a dropout unit of 0.2 was used before every max-pooling operation to avoid overfitting (54). The weights of the layers were randomly initialized using "Glorot uniform" initialization (55), and the network was optimized using the Adam method (56) with a learning rate of 10 -3 and a categorical cross-entropy cost function. The number of trainable parameters for our custom-made network is~1.9 M. The network was implemented in python (v. 3.6.5) using the Keras/Tensorflow libraries (v. 2.2.4/1.12.0, respectively).
To choose the best network for our framework, we tested other known neural network architectures as implemented in the Keras framework, including InceptionV3 (57), Xception (58), InceptionResNetV2 (59), and ResNet (60). We initialized the weights using the pre-trained ImageNet weights. To optimize each network, we excluded the final classification layer, and added three additional layers, i) a global average pooling layer, ii) a dense layer of 256 neurons with ReLu activation, constrained to have a norm value less than or equal to 3, and iii) a dense layer tailored to the number of classes of each cancer type using the softmax function for classification.
For inception-like architectures (Inception v3, Inception ResNetV2, Xception) only superpixels of size 75 x 75 were used. We trained all the networks for 50 epochs using batch sizes of 150 and 256 for superpixels of sizes 75 x 75 and 56 x 56, respectively, and kept the models with the highest validation accuracy.
The Xception and custom-made networks were re-trained from the beginning for each cancer type, without applying any further changes.

Application of SuperHistopath for the Quantification of Clinical Features of Interest
In the melanoma dataset, we calculated the number of pixels belonging to each classified category. For each patient we derived i) the ratio of pixels classified as stroma region to all pixels in tumor compartments, and ii) the ratio of pixels classified as clusters of lymphocytes to all pixels in tumor compartments; we evaluated the prognostic value of these quantitative indices using survival analysis. Patients were divided into high-and low-risk groups based on split at the median value of all scores to ensure both groups were of similar size. Kaplan-Meier estimation was used to compare overall survival in the 127 patients. Differences between survival estimates were assessed with the log-rank test and hazard ratios were calculated using Cox's proportional-hazard regression.
In the neuroblastoma dataset, we evaluated the differences in phenotype between the Th-ALK F1174L /MYCN (n=7) and Th-MYCN tumors (n=6) by quantifying the proportion of pixels classified by our SuperHistopath as regions rich in undifferentiated neuroblasts, differentiating neuroblasts, tissue damage (necrosis/ apoptosis) hemorrhage and clusters of lymphocytes. Note that i) we did not quantify stroma in these tumors as they faithfully mirror the stroma-poor phenotype which define high-risk disease ii) lymphocytes clusters universally correspond to encapsulation of lymph node by the tumor, rather that tumor infiltrates, consistent with the "cold" immune phenotype of high-risk disease. We focus on identifying any significant difference in the ratio of differentiation or the ratio of hemorrhagic regions to all tumor compartments between the two tumor types using the Mann-Whitney U test, with a 5% level of significance.

Melanoma
We first developed and evaluated our framework on the H&Estained, FFPE sections of clinical specimen of cutaneous melanoma scaled to 5x magnification. Figure 1 shows the results of the segmentation using the simple linear iterative clustering (SLIC) superpixels algorithm, which groups together similar neighboring pixels.
The optimized Xception network achieved the highest score and classified the melanoma sample regions into 6 predefined tissue categories of interest: tumor tissue, stroma, cluster of lymphocytes, normal epidermis, fat, and empty/white space with an overall accuracy of 98.8%, an average precision of 96.9%, and an average recall of 98.5% over 14,092 superpixels in a separate test set of five images ( Tables 3, 4). Our custom CNN also achieved comparable performance to the state-of-the-art networks with an overall accuracy of 96.7%, an average precision of 93.6%, and an average recall of 93.6% (Figure 2, Supplementary Table 1). The confusion matrices for the XCeption and our custom CNN networks are presented in Table 4 and Supplementary Table 1, respectively. Figure 3 shows qualitative results of our approach's regional classification in representative melanoma WSIs using the optimized Xception network.

Breast Cancer
SuperHistopath classified sample regions into 6 predefined tissue categories of interest: tumor, necrosis, stroma, cluster of The lymphocytes, fat, and lumen/empty space with an overall accuracy of 93.1%, an average precision of 93.9%, and an average recall of 93.6% using Xception and 91.7%, 92.5%, 91.8% respectively using our custom-made CNN over 10,349 superpixels in the independent test set of five images. The confusion matrices for the XCeption and our custom CNN networks are presented in Table 5 and Supplementary Table  2, respectively. Figure 4 shows qualitative results our approach's regional classification in representative triple-negative breast cancer WSIs.   Table 6 and Supplementary Table 3, respectively. Figure 5 shows qualitative results of our approach's regional classification in representative WSIs of neuroblastoma arising in the Th-MYCN mouse model.

SuperHistopath Pipeline for the Analysis of Low-Resolution WSI Affords Significant Speed Advantages
The average time for the SLIC superpixels algorithm to segment a WSI in 5x magnification was < 2 min using a 3.5 GHz Intel core i7 processor. The average time for both the Xception and our custom-made CNN network to classify every superpixel in the  images was 1-2 min using the same processor. A quick convergence of the networks (around epoch 30) was observed in all cases, which needed~3 h for Xception and only~30 min for our custom-made CNN using a Tesla P100-PCIE-16GB GPU card, and therefore the latter was used for experimenting.

Stroma-to-Tumor Ratio and Clusters of Lymphocytes Abundance as Predictive Markers of Survival in Melanoma
We first use SuperHistopath to quantify both the stroma-totumor ratio and the immune infiltrate, which have both shown to provide prognostic and predictive information in patient with solid tumors, including melanoma (20,21,23). The important role of immune hotspots has been established based on density analysis of single cell classification of lymphocytes in highresolution images (61,62). Here, we demonstrate in our melanoma dataset of 127 WSIs i) that a high stromal ratio as identified in low resolution WSIs is a predictor of poor prognosis (SuperHistopath: p = 0.028, Coxph-Regression [discretized by median]: HR = 2.1, p = 0.0315; Figure 6A) and ii) that clusters of lymphocytes hold predictive information in our melanoma dataset, with a high lymphocyte ratio being an indicator of favorable prognosis [SuperHistopath: p = 0.015, Coxph-Regression (discretized by median): HR = 0.4, p = 0.018; Figure 6B]. Pearson's correlation showed no significant  correlation between stromal ratio and clusters of lymphocytes ratio (r = -0.13, p = 0.13), and between absolute sizes of stroma and clusters of lymphocytes (r = 0.13, p=0.11). Taken together, our data, captured from low resolution (5x) WSIs, are consistent with those extracted from single-cell analysis in high-resolution WSIs (53).

Necrosis Quantification
We use the SuperHistopath to quantity tumor necrosis in our breast cancer and childhood neuroblastoma preclinical datasets. Tumor necrosis, defined as confluent cell death or large area of tissue damage hold predictive and prognostic information, both at diagnosis and after chemotherapy, in many solid tumors including breast cancer and childhood malignancies (24-26, 63, 64). While visible at 5x objective lens magnification, its quantification can often be a challenging task even for experienced pathologists. Here, we show that SuperHistopath can provide satisfactory quantification of necrosis in clinical breast cancer samples by distinguishing from stroma with high specificity (91.5%) and satisfactory precision (79.5%) and in the high-risk neuroblastoma mouse models with high precision and specificity (93.5% and 98.9% respectively).

Quantification of Neuroblastoma Differentiation
We used SuperHistopath to quantify the phenotype of MYCNdriven transgenic mouse models of high-risk stroma-poor neuroblastoma. We show that SuperHistopath can identify areas of differentiation, a critical feature for the stratification of children neuroblastoma, with both high precision and specificity (100% and 96.9% respectively). SuperHistopath also showed that expression of ALK F1174L mutation significantly shift the MYCNdriven phenotype from poorly-differentiated and hemorrhagic phenotype (Th-MYCN: 1.8 ± 1.3% differentiating area and 29.2 ± 6.7% hemorrhage, Figure 6C) into a differentiating phenotype also characterized by the almost complete abrogation of the hemorrhagic phenotype (Th-ALK F1174L /MYCN: 20.3 ± 3.1% differentiating area and 0.2 ± 0.1% hemorrhage, p=0.0003 and p=0.0008 respectively, Figure 6D) as previously demonstrated (51,65).

DISCUSSION
In this study, we implemented SuperHistopath: a digital pathology pipeline for the classification of tumor regions and the mapping of tumor heterogeneity from low-resolution H&Estained WSIs, which we demonstrated to be highly accurate in three types of cancer. Combining the application of the SLIC superpixels algorithm directly on low magnification WSIs (5x) with a CNN architecture for the classification of superpixels, contributes to SuperHistopath computational efficiency allowing for fast processing, whilst affording the quantification of robust and easily interpretable clinicallyrelevant markers. Applying our computational approach on low-resolution images leads to markedly increased processing speed, for both the classification of new samples and network training. Here, we chose the (5x) magnification as a compromise between tumor structures visibility and computational cost. Specific metrics such as stroma-to-tumor ratio could potentially be derived from images at even lower magnifications (e.g. 1.25x) as recently shown (53). Digital histology images are conventionally processed at 40x (or 20x) magnification where cell morphology is most visible. At those resolutions, WSIs are large (representative size at 20x: 60000 x 60000 pixels), requiring of a lot of memory and images to be divided into patches (tiles) for processing. Under these conditions, the training of new networks for cell segmentation and classification typically requires days and the application to new WSI samples can take hours prior to code optimization. In contrast, the training of our neural network until acceptable convergence needed as little as~30 min and application on new samples~5 min (for both superpixel segmentation and classification) in our study. High-resolution images are essential when studying cell-to-cell interactions, however we show that the processing of low resolution images is appropriate for the extraction of specific global context features.
Furthermore, SuperHistopath combines the main advantages of regional classification and segmentation approaches. On one hand, classification approaches applied on smaller patches resulting from splitting WSIs allow the use of CNN for the robust classification of many categories necessary to capture intratumor heterogeneity (39), yet at the expense of higher risk of misclassification, especially close to regional boundaries where an image patch, regardless of its size, may contain multiple tumor components. Overlapping (sliding) window approaches can improve the issue, yet at an increased computational cost. On the other hand, segmentation approaches such as U-Net-like architectures can resolve the regional boundaries issue but appear to work better for few classes, typically two. SuperHistopath efficiently combines the use of a segmentation approach using superpixels to adhere to region boundaries with CNN classification to cover the rich tumor histological heterogeneity (here 6-8 region categories depending on the cancer type).
Our method also markedly simplifies and accelerates the process of preparing ground-truth (annotations) datasets as i) the use of superpixels alleviate the need for careful boundary delineation of the tumor components of interest ( Figure 1B), a cumbersome and time-consuming process necessary for using U-Net-like architectures and ii) each annotated region contains large numbers of superpixels facilitating the collection of the large datasets traditionally required by deep learning methods.
The appropriate choice of superpixel size is crucial to warrant both accurate tissue segmentation and classification. Equation 1 ensured a uniform superpixel size for every whole-slide image regardless of their original size. The main considerations for choosing superpixels size (i.e. setting the constant U) is to ensure that they only contain a single tissue type, while being large enough to contain sufficient tissue information. In our study, we found that classification is not sensitive to small changes of U.
However larger superpixels (U > 1750) did not adhere well to the tissue boundaries, whereas smaller superpixels (U < 1250) indeed led to a slight decrease in classification performance.
Many promising computational pathology-derived biomarkers ultimately fail to translate in the clinic due to their inherent complexity and the difficulty for pathologists to evaluate them in new datasets. In this proof-of-concept study, we showed that SuperHistopath can quantify well-understood features/markers already used, albeit only qualitatively or semi-quantitatively, by pathologists, including the stroma-to-tumor ratio, lymphocyte infiltration, tumor necrosis, and neuroblastoma differentiation. We also show that SuperHistopath-derived results corroborated those obtained from single-cell analysis on high-resolution samples (53). The computational efficiency of SuperHistopath, combined with the simple superpixels-enabled data collection, could facilitate its adoption in the clinic to accelerate pathologist workflow, could assist in intra-operative pathological diagnosis and should facilitate working with large datasets in clinical research.
Moving forward, we plan to expand the types of global context features extractable from SuperHistopath in more cancer types. We will also evaluate the accuracy of SuperHistopath on digitized frozen tissue sections to demonstrate its potential to assist in the rapid intra-operative pathological diagnostic. We will also update our previous framework (SuperCRF) which incorporates region classification information to improve cell classification (53) using SuperHistopath. Together both SuperHistopath and SuperCRF would provide invaluable tools to study spatial interactions across length scales to provide a deeper understanding of the cancerimmune-stroma interface, key to further unlock the potential of cancer immunotherapy (17).
In this proof-of-concept study, we applied our method to three cancer types with disparate histology without any changes (just retraining). While the approach could thus be virtually extended to any type of cancer, improvement could be made tailored to a specific global feature, cancer type or dataset and could include further exploring i) the use of SVM to combine the CNN-extracted features with handcrafted ones, ii) the use of other image color spaces which has been shown to improve classification in certain cases (66) and iii) alternative superpixel algorithms such as the efficient topology preserving segmentation (ETPS) algorithm (67). Additionally, further improvement of this proof-of-concept framework could be sought via experimentation with hyperparameter tuning, or the use of other custom and wellestablished architectures (59,68). Since superpixels only capture small homogeneous areas, combination with other approaches such as classification of larger image patches with a deepCNN or U-net-like architectures might be more appropriate for the single purpose of segmenting some large and multi-component tumor structures, e.g. certain types of glands (16).
To conclude, our novel pipeline, SuperHistopath can accurately classify and map the complex tumor heterogeneity from lowresolution H&E-stained histology images. The resulting enhanced speed for both training and application (~5 min for classifying a WSI and as low as~30 min for network training) and the efficient and simple collection of ground-truth datasets make SuperHistopath particularly attractive for research in rich datasets and would facilitate its adoption in the clinic to accelerate pathologist workflow in the quantification of predictive/prognosis markers derived from global features of interest.