Impact Factor 5.201 | CiteScore 5.8
More on impact ›

Original Research ARTICLE

Front. Cell Dev. Biol., 29 October 2020 |

Integrative Models of Histopathological Image Features and Omics Data Predict Survival in Head and Neck Squamous Cell Carcinoma

Hao Zeng1†, Linyan Chen1†, Yeqian Huang2, Yuling Luo2 and Xuelei Ma1*
  • 1State Key Laboratory of Biotherapy, Department of Biotherapy, Cancer Center, West China Hospital, Sichuan University Collaborative Innovation Center, Chengdu, China
  • 2West China School of Medicine, West China Hospital, Sichuan University, Chengdu, China

Background: Both histopathological image features and genomics data were associated with survival outcome of cancer patients. However, integrating features of histopathological images, genomics and other omics for improving prognosis prediction has not been reported in head and neck squamous cell carcinoma (HNSCC).

Methods: A dataset of 216 HNSCC patients was derived from the Cancer Genome Atlas (TCGA) with information of clinical characteristics, genetic mutation, RNA sequencing, protein expression and histopathological images. Patients were randomly assigned into training (n = 108) or validation (n = 108) sets. We extracted 593 quantitative image features, and used random forest algorithm with 10-fold cross-validation to build prognostic models for overall survival (OS) in training set, then compared the area under the time-dependent receiver operating characteristic curve (AUC) in validation set.

Results: In validation set, histopathological image features had significant predictive value for OS (5-year AUC = 0.784). The histopathology + omics models showed better predictive performance than genomics, transcriptomics or proteomics alone. Moreover, the multi-omics model incorporating image features, genomics, transcriptomics and proteomics reached the maximal 1-, 3-, and 5-year AUC of 0.871, 0.908, and 0.929, with most significant survival difference (HR = 10.66, 95%CI: 5.06–26.8, p < 0.001). Decision curve analysis also revealed a better net benefit of multi-omics model.

Conclusion: The histopathological images could provide complementary features to improve prognostic performance for HNSCC patients. The integrative model of histopathological image features and omics data might serve as an effective tool for survival prediction and risk stratification in clinical practice.


Head and neck cancer (HNC) comprises a variety of carcinomas that originate from head and neck region, including the nasal cavities and sinuses, oropharyngeal cavities, larynx, major and minor salivary glands (Lydiatt et al., 2017). Moreover, HNC is the sixth most common cancer with yearly incidence of 500,000–600,000 cases worldwide (Suh et al., 2014). The head and neck squamous cell carcinoma (HNSCC) accounts for more than 90% of the cases (Suh et al., 2014). Tobacco, alcohol consumption and human papillomavirus (HPV) infection are common risk factors related with cancer incidence (Chaturvedi et al., 2008; Hashibe et al., 2009). The 5-year mortality rate has remained flat at around 50% and not been improved by significant progress in treatment regime (Chiesa et al., 2016). Prognosis prediction represents a good opportunity to improve patient survival, because prognostic markers contribute to the risk stratification and individualized treatment protocol. Only traditional clinical predictors such as tumor stage and tumor depth are unable to meet the growing demand of precision oncology (Biankin et al., 2015). Therefore, it is of crucial importance to apply more effective prognostic markers and models for patients with HNSCC.

The histopathological images obtained by biopsy or resection of lesions are widely used in the definitive diagnosis, staging and prognosis of cancer patients. In recent years, the computer-aided images analysis systems have been applied to assess digital pathological images, with the advantages of high accuracy, rapidity and consistency, which can make up for the shortage of manual evaluation (Zhang et al., 2015). The extracted histopathological image features (HIF) encompass multiple morphological and histological information, such as cell shape, size, texture patterns of nuclei and cytoplasm (Soliman, 2015). Although these features cannot be recognized by pathologists with visual inspection, previous studies have shown the significant prognostic value of HIF in several cancers, including breast cancer, lung cancer and brain tumor (Sertel et al., 2009; Kong et al., 2013; Wang et al., 2013; Chen et al., 2015; Yu et al., 2016).

In addition to pathological images, other omics profiles including genomics, transcriptomics and proteomics have also been widely used for risk stratification and survival prediction of cancer patients (Wallner et al., 2006; Yanaihara et al., 2006). For example, enhanced TP53 mutation, gene duplication and 3p loss were found in recurrent and metastatic HNSCC with primary HPV infection, while TERT promoter mutation was more frequent in HPV-negative cohort (Morris et al., 2017). The TRAF3 deletion, E2F1 amplification and PIK3CA mutation were related with abnormal activation of NF-κB signaling and other carcinogenic pathways in HPV-positive HNSCC (Cancer Genome Atlas Network, 2015). Moreover, the TP53 gain-of-function (GOF) variant and mTOR pathway activation were predictive of worse survival and early progression in HPV-negative HNSCC patients (Niehr et al., 2018).

However, given the heterogeneity of cancer patients and complexity of survival prediction, the research work is far from stopping. The interconnections between histopathology and omics and how to integrate these features for better outcome prediction and personalized treatments still need further exploration. Previous study has revealed a significant association between several gene expressions (such as HYAL2 and HLA-DRA) and morphological features of nuclei texture in liver hepatocellular carcinoma (Zhong et al., 2019). Also, the obvious interconnections of TP53 mutation and histological features of nuclei and cytoplasm were reported in lung adenocarcinoma (Yu et al., 2017). In addition, some studies established integrative models based on omics and histopathological image data in liver cancer, lung cancer, renal cancer and breast cancer, which showed an improved prognostic accuracy than individual factors (Cheng et al., 2017; Yu et al., 2017; Sun et al., 2018; Zhong et al., 2019). These results indicated the widespread application value and development prospects of histopathological images and omics data for predicting prognosis.

After literature review, we were of opinion that there is still room for improvement. Some studies only focused on single omics profile such as genomics, transcriptomics or proteomics, and lacked the comprehensive survival analyses of each omics combined with histopathological image features. Moreover, the prognostic performance of models that integrate histopathological and omics features for HNSCC patients is still unclear. Therefore, in this study, we aimed to evaluate and compare the prognostic role of histopathological images, genomics, transcriptomics and proteomics in HNSCC patients. Furthermore, different combinations of multi-omics models were established to improve prognostic accuracy, and to highlight the contribution of histopathological images in prognosis modeling.

Materials and Methods

The overall flowchart of image features extraction and multi-omics prognostic models establishment was presented in Figure 1. The histopathological images were divided into small sub-images and analyzed by CellProfiler to extract image features. Afterward, the random forest (RF) algorithm was used to combine images features and omics to build prognostic models and generate the average prediction accuracy. Finally, we applied time-dependent receiver operating characteristic curve, Kaplan-Meier survival curve, and decision curve analysis to estimate and compare the prognostic values between models. The details of each part were described in the following sections.


Figure 1. The workflow of data analysis and integration. (A) The whole-slide histopathological images of head and neck squamous cell carcinoma were cropped into small sub-images of 1,000 × 1,000 pixels. Then we excluded the sub-images containing white space more than 50% and selected 20 sub-images for each patient. Next, CellProfiler estimated the images and obtained mean value of image features related to shape, intensity and texture. (B) We integrated features of histopathological images, genomics, transcriptomics and proteomics to generate improved prognostic models by random forest method with in training set, then evaluated the model predictive performance in validation set.

Data Acquisition and Images Segmentation

We obtained a dataset consisting of clinical, genetic and transcriptomics information of HNSCC patients from the Cancer Genome Atlas (TCGA) data portal1. The corresponding protein profile via reverse phase protein array (RPPA) were downloaded in the Cancer Proteome Atlas (TCPA) repository2. The corresponding hematoxylin and eosin (H&E) histopathological images were downloaded from the Cancer Imaging Archive (TCIA) portal3. Since the whole-slide images (20× or 40× magnification) were too large to extract features, we performed images segmentation by the Openslide Python library (Goode et al., 2013) to facilitate subsequent analyses. Firstly, 216 whole-slide images were divided into 341,649 small sub-images of 1,000 × 1,000 pixels, and changed into tiff format from svs format. Next, we excluded the sub-images containing white space more than 50%. Moreover, for each patient, 20 sub-images were randomly included to reduce sample selection bias and decrease calculation amount.

Histopathological Image Features Extraction

We used CellProfiler4 to automatically measure images and extract histopathological features (Carpenter et al., 2006). The images processing and measurement modules of CellProfiler transformed the color images stained by hematoxylin and eosin into grayscale images, then obtained 10 aspects of image features including area occupied, correlation, granularity, image intensity, image quality, object intensity, object neighbors, object radial distribution, object size shape and texture. These features focus on objective image information, which are different from accustomed pathological characteristics (e.g., cellular pleomorphism, nuclear atypia, and mitoses) recognized by visual inspection of pathologists. For instance, object size shape outputs several cell-level features containing area, perimeter, form factor (4πarea/perimeter2), eccentricity, lengths of major axis and minor axis, Euler number, Zernike shape features and so on. Texture evaluation module of CellProfiler provides information about variations in the spatial distribution of intensities of grayscale images, including Haralick’s features and Gabor “wavelet” features (Haralick et al., 1973). Image intensity and object intensity describe total pixel intensities in images or specific objects (e.g., nuclei or cells), respectively. Finally, we extracted 593 quantitative image features for each sub-image, then calculated the average values of 20 sub-images for each patient.

Statistical Analysis

1. Survival analysis: Patients were divided into two groups based on the median value of each histopathological image feature. The hazard ratio (HR) and 95% confidence interval (CI) for overall survival (OS) were calculated by univariate Cox regression analysis. The least absolute shrinkage and selection operator (LASSO)-Cox regression method was also utilized to show significant image features (Tibshirani, 1997). Then Kaplan-Meier survival curve and log-rank test compared the differences of survival results between two groups. The p-value < 0.05 was regarded as statistically significant.

2. Feature selection: We first randomly divided the HNSCC patients into training (n = 108) or validation (n = 108) sets. The genomics data contained 14,794 features and the transcriptome data contained 19,754 features. By contrast, histopathological images contained 593 features and the proteomics contained 161 features. Therefore, in the training set, we included all features of histopathological images and proteomics, while conducted preliminary screening of genomics and transcriptomics to reduce their dimensionality. The 100 most common somatic mutations were used for further analyses. Next, we defined patients with a survival time more than 60 months as the long-term survival group, while died patients with a survival time of 1–12 months were considered as the short-term survival group. The R DESeq2 package was used for the normalization and analysis of differently expressed genes (DEGs) between groups in training set. Then 100 most significant DEGs were applied to predict survival. The feature selection could reduce the potential bias caused by large difference in feature numbers among omics, and may reduce potential confounders (e.g., low frequency mutations or non-significant expressed genes).

3. Gene set enrichment analysis: To find the differences of Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways between short-term and long-term survival groups, we used the Gene Set Enrichment Analysis (GSEA) to sort DEGs according to the degree of differential expression, and examined the enriched gene sets in two groups (Subramanian et al., 2005). Statistical significance was defined as p < 0.05 or false discovery rate q < 0.25.

4. Integrative prognostic models: Each type of data (histopathological image features/HIF, genomics, transcriptomics, and proteomics) and various fusions of multiple features (HIF + genomics, HIF + transcriptomics, HIF + proteomics, HIF + omics) were involved to evaluate and compare the usefulness in prognosis modeling. In the training set, we applied the random forest (RF) method to build prognostic models via R randomForestSRC package (Breiman, 2001; Ishwaran et al., 2014). RF is a widely used machine-learning method in high-dimensional data processing, which can handle thousands of input variables at the same time, evaluate the predictive ability of each feature and exclude uncorrelated ones. At the same time, it can use internal cross-validation to generate unbiased estimation of generalization error and ensure high accuracy. The RF classifier with 1,000 decision trees and 10-fold cross-validation were used in training set. Next, we calculated the area under the curve (AUC) of time-dependent receiver operating characteristic curve (ROC) to verify the performance and robustness of new models in validation set. Moreover, based on the median value of risk score estimated from models, patients of the validation set could be divided into high-risk and low-risk groups. Then we conducted Kaplan-Meier analysis and log-rank test to evaluate the prediction ability of models. Decision curve analysis (DCA) was performed to measure the net benefits of each model based on 5-year OS (Vickers et al., 2008).


Patients Characteristics

A dataset of 216 HNSCC patients (154 males and 62 females) with data of histopathological images and other omics from TCGA project was included (Table 1). The median age at initial diagnosis was 62 years (range 19–90 years) of patients. This cohort comprised squamous cell carcinoma of oral cavity (61 tongue, 6 alveolar ridge, 6 buccal mucosa, 25 floor of mouth, 6 hard palate, and 39 non-specific lesions), larynx (57 patients), tonsil (13 patients), and hypopharynx (3 patients). There were 106 patients died (50 in training set and 56 in validation set) during follow-up, the median survival time was 41.8 months (range 3.5–175.1 months) for alive patients and 15.2 months (range 0.1–213.9 months) for died patients. Moreover, 83 patients had tumor progression including locoregional recurrence, distant metastasis and new primary malignancy. Chi-squared analyses and t-tests showed no statistically significant differences in age, gender, tumor types, cancers stage, cancer status and survival time between training and validation groups.


Table 1. Demographic and clinical characteristics of patients.

Prognostic Value of Histopathological Image Features

To estimate the association between individual histopathological image features and survival results, we firstly divided patients into two groups according to median values of each feature. The results of univariate Cox analyses showed that 163 image features were significantly predictive of OS (p < 0.05, Supplementary Table 1). We also presented 20 representative image features with the most significant differences (Figure 2A), which effectively separated two survival groups. Moreover, after the LASSO-Cox regression analysis, eight histopathological features (four Zernike shape features, three granularity features, and one cells intensity characteristic) were selected. More specifically, Zernike features are a series of 30 shape features based on Zernike polynomials from order 0 to order 9 (Li et al., 2009). Granularity is a texture measurement to show the matching degree between structural elements and images texture (Vincent, 2000). Intensity-Mass Displacement describes the distance between gravity center of gray-level and binary representation of cells. The Kaplan-Meier survival curves of four image features showed the significant differences between high-level and low-level features (Figure 2B). We also analyzed the relation between HPV status and image features in 44 patients (33 HPV and 7 HPV+) by Wilcoxon rank sum test. Then 200 features had different distributions between groups, and four most significant features were provided in Figure 2C.


Figure 2. Univariate survival prediction using histopathological image features. (A) Survival difference between groups separated by median values of image feature in Cox regression analyses. (B) Kaplan-Meier curves for “Mean_Cells_AreaShape_Zernike_9_1,” “Median_Cells_AreaShape_Zernike_6_2,” “Median_Cells_Intensity_MassDisplacement,” and “StDev_Cells_Granularity_9.” (C) The different distribution of image features between HPV- and HPV + patients.

Integrative Model of Histopathological Image Features With Genomics

To decrease the dimension of genomics data and increase stability of analyses, we examined the gene mutation status in the training set, and involved 100 most common somatic mutations in prognostic models (Supplementary Table 2). The waterfall plot showed 15 most frequently altered genes (Figure 3A). Previous studies also reported frequent mutations of TP53, CDKN2A, PIK3CA, NOTCH1, and NSD1 in HNSCC (Huang et al., 2019). Among them, the tumor suppressor protein p53 (TP53) mutation is commonly detected in HNSCC with report rate of 50–80% (Poeta et al., 2007), which can inhibit regulatory function of cell cycle, DNA repair and apoptosis (Vogelstein et al., 2000). In addition, mutations of TP53, p16INK4a, and overexpression of cyclin D1 and MET were regarded as poor predictors of survival and cancer progression in HNSCC patients (Bova et al., 1999; Muzio et al., 2006; Poeta et al., 2007).


Figure 3. Prognostic model integrating histopathological image features with genomics. (A) The waterfall plot of 15 common somatic mutant genes in training set. (B) The 1-, 3-, and 5-year area under the time-dependent receiver operating characteristic curve (AUC), and (C) Kaplan-Meier survival curves showed improved predictive performance of integrative histopathology + genomics model (HIF + G) than histopathological image features alone (HIF) or genomics alone (G) in validation set.

Compared with traditional ROC, the time-dependent ROC is more suitable for time-to-event outcome and can comprehensively describe the predictive models (Kamarudin et al., 2017). In the validation set, we found that histopathological image features (HIF) model reached better AUCs of 1-year (0.711 vs. 0.605), 3-year (0.747 vs. 0.574), and 5-year (0.784 vs. 0.560) than genomics model (G). Furthermore, the model (HIF + G) including image features and genomics mutations had improved predictive accuracy (AUC = 0.751, 0.807, 0.830) than models using HIF or genomics alone (Figure 3B). Afterward, we divided patients into high-risk and low-risk groups by median value of risk score predicted from each model. The integrative model (HIF + G) showed better performance for prognosis (HR = 5.49, 95%CI: 3.17–10.90, p < 0.001, Figure 3C) than single-omics in HNSCC patients (Table 2).


Table 2. Predictive performance of prognostic models.

Integrative Model of Histopathological Image Features With Transcriptomics

Besides genomics analysis, transcriptomics is also an important mean to estimate cells phenotype and function, and provides additional information of tumor features. To reduce the dimensionality, some patients of training set were classified into two groups based on survival status (12 months ≥ uncensored OS ≥ 1 month vs. OS ≥ 60 months), and 100 differently expressed mRNA genes (padj < 0.05) between groups were selected (Supplementary Table 3). Moreover, the GSEA of mRNA sequencing data showed that three KEGG pathways were enriched in the short-term survival group (Figure 4A). Among them, the overexpression of vascular epithelial growth factor (VEGF) can strongly induce angiogenesis in hypoxia environment of tumors, and are related with enhanced risk of death in HNSCC (Kyzas et al., 2005; Haase, 2009). Therefore, the up-regulation of VEGF signaling pathway may indicated the necessity of VEGF-targeting therapy (e.g., tyrosine kinase inhibitors).


Figure 4. Prognostic model integrating histopathological image features with transcriptomics. (A) Three representative signaling pathways in short-term survival patients of training set by Gene Set Enrichment Analyses (GSEA). (B) The multivariate model of histopathological and transcriptomics features (HIF + RNA) reached higher 1-, 3-, and 5-year AUCs, and (C) more significant survival difference of Kaplan-Meier curves than models of image features alone (HIF) or transcriptomics alone (RNA) in validation set.

Next, in the validation set, the transcriptomics features (RNA) yielded a good predictive performance with 1-, 3-, and 5-year AUC of 0.713, 0.732, and 0.632, which was better than genomics, but not more significant than histopathological features (Figure 4B). By combination of transcriptomics and image features, the integrative model (HIF + RNA) increased 1-year AUC to 0.775, 3-year AUC to 0.827, and 5-year AUC to 0.837. Similar results were also revealed in Kaplan-Meier survival curves (Figure 4C), the HIF + RNA model had more significant prognostic value for OS (HR = 6.26, 95%CI: 2.79–9.71, p < 0.001).

Integrative Model of Histopathological Image Features With Proteomics

We included proteomics data of TCPA repository via RPPA technology, which is a cost-effective method to analyze the expression and variation of marker proteins in the samples (Li et al., 2013). Totally 151 patients with protein and histopathological profiles were eligible for analyses. As shown in Figure 5C, the 5-year AUC was increased to 0.817 by incorporating image features and proteomics (HIF + P) compared with AUC of 0.772 and 0.614 for proteomics or image features alone. The 1- and 3-year AUCs were also improved when using combined features (Figures 5A,B). Moreover, the high-risk patients based on risk stratification of integrative model (HIF + P) were significantly associated with worse survival (HR = 3.98, 95%CI: 1.79–6.41, p < 0.001, Figure 5D).


Figure 5. Prognostic model integrating histopathological image features with proteomics. (A–C) Predictive power of histopathological image features (HIF), protein expression (P), and combination of images and proteomics (HIF + P) for survival in validation set. (D) Kaplan-Meier curves revealed a more significant survival difference between high-risk and low-risk groups in HIF + P model.

Multi-Omics Model for Survival Prediction

The previous analyses showed that histopathological image features had individual prognostic ability for OS. Additionally, the histopathology + omics models could improve predictive performance than genomics, transcriptomics or proteomics alone in HNSCC cohort. Finally, we established a multi-omics model to investigate the prognostic power when incorporating all above features. In the validation set, the 1-, 3-, and 5-year AUCs were 0.871, 0.908, and 0.929 (Figure 6A), which were higher than those of HIF + genomics, HIF + transcriptomics and HIF + proteomics models. Kaplan-Meier analysis demonstrated a significant different survival between high-risk and low-risk patients (Figure 6B), with a HR of 10.66 (95%CI: 5.06–26.8, p < 0.001). Furthermore, the multi-omics model had a better net benefit than others if the risk threshold probabilities >10% in DCA analysis (Figure 6C).


Figure 6. Multi-Omics Model integrating histopathological image features with omics data. (A) The time-dependent receiver operating characteristic curve, and (B) Kaplan-Meier curves for multi-omics model involving image features, genomics, transcriptomics, and proteomics in validation set. (C) Decision curve analysis for each model in validation set. The gray oblique line represented net benefit of intervening all patients, and horizontal gray line meant that net benefit of no patients with intervention. The multi-omics had the highest net benefit compared with other models across the range of >10% in risk threshold.


In this study, we extracted the histopathological image features (HIF), utilized machine-learning algorithms to establish prognostic models combining features of histopathological images, gene mutations, RNA and protein expression in training set, and estimated the prognostic capability of models in validation set of HNSCC patients. As far as we know, such finding for HNSCC is firstly reported in this research. The results showed that individual HIF were able to predict OS, especially the Zernike shape features, granularity and cells intensity. The prognostic model based on HIF reached better predictive accuracy than other omics (i.e., genomics, transcriptomics and proteomics). Moreover, the predictive performance of integrative models using more than two types of data outperformed than that of single-omics models (Table 2). The DCA curve also underlined a higher clinical net benefit of multi-omics model compared with others. Taken together, it suggested that multi-omics model integrating histopathological images with omics may be an effective risk stratification approach to improve personalized treatments in clinical practice. For instance, low-risk patients should avoid over-treatment while high-risk patients might benefit from active treatments and strict follow-up (Cheng et al., 2017).

The histopathological examination is regarded as a gold standard for diagnosis and staging in patients with cancer. However, the accuracy of grading would be affected by pathologists’ experience, and cancer patients at the same stage can have diverse survival results. The enormous amount of information in pathological slices is not easily obtained by subjective evaluation of pathologists, which poses great challenges, but also brings opportunities. Recently, the computational systems are developed to assist the image features extraction, and these features are associated with tumor characteristics and survival outcomes (Beck et al., 2011; Romo-Bucheli et al., 2016; Moon et al., 2017). The automated approach also has the strengths of improving efficiency and reducing human resource costs. Unlike previous studies including one representative or entire images (Yu et al., 2016; Cheng et al., 2017), we randomly selected 20 sub-images from whole-slide images, which decreased both computational cost and potential biases.

Additionally, our results of univariate COX analysis, multivariate LASSO and RF model all demonstrated the significant prognostic value of image features for OS. These features provide an objective and quantitative measurement of the morphology and texture of nuclei and cytoplasm. For example, the Zernike shape features mark the nucleic pixels as 1 and cytoplasmic region as 0, then produce Zernike polynomials from binary images (Li et al., 2009). Granularity estimates the size of image texture by using enlarged structure elements to match the texture (Vincent, 2000). It indicated the relation between survival outcomes and cell-level morphological structure (e.g., occupied area and shape) as well as the overall pixels characteristics of images (e.g., texture and intensity) in HNSCC cohort. Therefore, the histopathological images analysis may have potential practical value in predicting survival for HNSCC patients.

Given the heterogeneity and diversity of tumors, molecular and genetic detection are becoming routine approaches to differentiate cancer characteristics such as genotypes and phenotypes, and play a leading role in the field of precision oncology (O’Connor et al., 2008; Kather et al., 2019). Some studies have reported the improved effectiveness of prognostic models combining genomics and image features than individual models in other cancers (Cheng et al., 2017; Yu et al., 2017; Sun et al., 2018; Zhong et al., 2019). Our study had several differences from published articles. Firstly, machine-learning frameworks including LASSO and RF with 10-fold internal cross-validation can achieve more stable estimation of predictive ability. Secondly, instead of classical ROC curve that only determines the discrimination ability of markers at a fixed time point, we utilized time-dependent ROC to describe survival status in a range of time, and yielded dynamic values of AUC throughout the study (Kamarudin et al., 2017). Lastly, we estimated a variety of quantitative molecular biomarkers including somatic gene mutation, RNA sequencing data and protein expression. The prognostic role of each profiles and integration with histopathology were compared, which showed that the histopathology + omics models were better than models using one type of data, and multi-omics model achieved highest accuracy. Our results indicated the complementary effect between histopathological image features and other omics data for survival prediction. Therefore, we suggested that, when omics data were limited, histopathological images might provide effective features to improve prognostic prediction with small additional effort.

Previous studies have showed that patients with HPV+ HNSCC had a better prognosis and therapeutic response (Dayyani et al., 2010). The HPV status was considered as a validated molecular characteristic of HNSCC to guide the treatment strategies (Suh et al., 2014). For example, less intensive treatments are being considered for HPV+ oropharyngeal SCC patients (Mirghani et al., 2015). The CT radiomics features have been reported to distinguish RNA-defined HPV subtypes in HNSCC (Huang et al., 2019). The correlation between HPV status and histopathological image features is also worthy of research. In the situation of limited samples, we only showed the different distribution of image features between HPV+ and HPV groups. However, we hypothesized that identification of HPV status was within the ability of histopathological images analysis, which needs a comprehensive estimation in larger cohorts.

There were some limitations in this study. Firstly, we built models by 10-fold cross-validation in training set and conducted verification in another validation set to make predictive estimation as robust as possible. However, since it was difficult to find other datasets with complete information of histopathology and omics, this study was limited in one cohort and small sample size, lacking external validation. Therefore, the generalizability of current results should be considered within these limitations. Secondly, we balanced the basic clinical characteristics between two sets, but others such as complication and treatment may be potential confounding factors. Moreover, there may exist selection bias in TCGA dataset, because the representative tumor slices were more likely to be uploaded, and its typical histopathological patterns might help machine-learning model for classification (Yu et al., 2016). Nevertheless, clinicians actually examine many slices, thus the feasibility of proposed predictive models in clinical practice needs to be studied. Finally, as a retrospective study, although the integrative models showed prognostic value in our work, it requires prospective estimation by multi-center large-scale studies before routine use. In future research, other machine-learning or deep-learning methods (e.g., convolution neural networks) can be used to generate prognostic models, but the latter require massive samples for training (Mobadersany et al., 2018). Similarly, multiparameter such as immunochemical stained images could provide richer feature sets for predicting survival.


The results indicated that histopathological image features had potential as significant prognostic biomarkers for overall survival in patients with HNSCC. The integrative models of genomics, transcriptomics, and proteomics along with histopathological image features may more accurately predict survival outcome than single-omics models, which might contribute to the risk stratification and personalized treatment for cancer patients.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: the TCGA repository (

Author Contributions

XM and HZ contributed to the conception and design of the work. HZ and LC performed the data analysis, interpretation, and manuscript drafting. YH and YL contributed to the data acquisition. All authors revised the manuscript, approved the submitted version, and agreed to be accountable for all aspects of the work.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary Material

The Supplementary Material for this article can be found online at:

Supplementary Table 1 | Univariate Cox analysis of image features for two patient groups.

Supplementary Table 2 | Included mutant genes.

Supplementary Table 3 | Differently expressed mRNA genes.


  1. ^
  2. ^
  3. ^
  4. ^


Beck, A. H., Sangoi, A. R., Leung, S., Marinelli, R. J., Nielsen, T. O., van de Vijver, M. J., et al. (2011). Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Sci. Transl. Med. 3:108ra113. doi: 10.1126/scitranslmed.3002564

PubMed Abstract | CrossRef Full Text | Google Scholar

Biankin, A. V., Piantadosi, S., and Hollingsworth, S. J. (2015). Patient-centric trials for therapeutic development in precision oncology. Nature 526, 361–370. doi: 10.1038/nature15819

PubMed Abstract | CrossRef Full Text | Google Scholar

Bova, R. J., Quinn, D. I., Nankervis, J. S., Cole, I. E., Sheridan, B. F., Jensen, M. J., et al. (1999). Cyclin D1 and p16INK4A expression predict reduced survival in carcinoma of the anterior tongue. Clin. Cancer Res. 5, 2810–2819.

Google Scholar

Breiman, L. (2001). Random Forests. Mach. Learn. 45, 5–32. doi: 10.1023/A:1010933404324

CrossRef Full Text | Google Scholar

Cancer Genome Atlas Network (2015). Comprehensive genomic characterization of head and neck squamous cell carcinomas. Nature 517, 576–582. doi: 10.1038/nature14129

PubMed Abstract | CrossRef Full Text | Google Scholar

Carpenter, A. E., Jones, T. R., Lamprecht, M. R., Clarke, C., Kang, I. H., Friman, O., et al. (2006). CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biol. 7:R100. doi: 10.1186/gb-2006-7-10-r100

PubMed Abstract | CrossRef Full Text | Google Scholar

Chaturvedi, A. K., Engels, E. A., Anderson, W. F., and Gillison, M. L. (2008). Incidence trends for human papillomavirus-related and -unrelated oral squamous cell carcinomas in the United States. J. Clin. Oncol. 26, 612–619. doi: 10.1200/JCO.2007.14.1713

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, J. M., Qu, A. P., Wang, L. W., Yuan, J. P., Yang, F., Xiang, Q. M., et al. (2015). New breast cancer prognostic factors identified by computer-aided image analysis of HE stained histopathology images. Sci. Rep. 5:10690. doi: 10.1038/srep10690

PubMed Abstract | CrossRef Full Text | Google Scholar

Cheng, J., Zhang, J., Han, Y., Wang, X., Ye, X., Meng, Y., et al. (2017). Integrative Analysis of Histopathological Images and Genomic Data Predicts Clear Cell Renal Cell Carcinoma Prognosis. Cancer Res. 77, 91–100 e. doi: 10.1158/0008-5472.CAN-17-0313

PubMed Abstract | CrossRef Full Text | Google Scholar

Chiesa, F., Ostuni, A., Grigolato, R., Calabrese, L., and Ansarin, M. (2016). “Head and Neck Cancer Prevention,” in Head and Neck Cancer, ed. J. Bernier (Cham: Springer), doi: 10.1007/978-3-319-27601-4_2

CrossRef Full Text | Google Scholar

Dayyani, F., Etzel, C. J., Liu, M., Ho, C. H., Lippman, S. M., and Tsao, A. S. (2010). Meta-analysis of the impact of human papillomavirus(HPV) on cancer risk and overall survival in head and neck squamous cell carcinomas (HNSCC). Head Neck Oncol. 2:15. doi: 10.1186/1758-3284-2-15

PubMed Abstract | CrossRef Full Text | Google Scholar

Goode, A., Gilbert, B., Harkes, J., Jukic, D., and Satyanarayanan, M. (2013). OpenSlide: A vendor-neutral software foundation for digital pathology. J. Pathol. Inform. 4:27. doi: 10.4103/2153-3539.119005

PubMed Abstract | CrossRef Full Text | Google Scholar

Haase, V. H. (2009). The VHL tumor suppressor: master regulator of HIF. Curr. Pharm. Des. 15, 3895–3903. doi: 10.2174/138161209789649394

PubMed Abstract | CrossRef Full Text | Google Scholar

Haralick, R. M., Shanmugam, K., and Dinstein, I. H. (1973). Textural features for image classification. IEEE Trans. Syst. Man Cyber. 3, 610–621. doi: 10.1109/TSMC.1973.4309314

CrossRef Full Text | Google Scholar

Hashibe, M., Brennan, P., Chuang, S. C., Boccia, S., Castellsague, X., Chen, C., et al. (2009). Interaction between tobacco and alcohol use and the risk of head and neck cancer: pooled analysis in the International Head and Neck Cancer Epidemiology Consortium. Cancer Epidemiol. Biomarkers Prev. 18, 541–550. doi: 10.1158/1055-9965.EPI-08-0347

PubMed Abstract | CrossRef Full Text | Google Scholar

Huang, C., Cintra, M., Brennan, K., Zhou, M., Colevas, A. D., Fischbein, N., et al. (2019). Development and validation of radiomic signatures of head and neck squamous cell carcinoma molecular features and subtypes. EBioMedicine 45, 70–80. doi: 10.1016/j.ebiom.2019.06.034

PubMed Abstract | CrossRef Full Text | Google Scholar

Ishwaran, H., Gerds, T. A., Kogalur, U. B., Moore, R. D., Gange, S. J., and Lau, B. M. (2014). Random survival forests for competing risks. Biostatistics 15, 757–773. doi: 10.1093/biostatistics/kxu010

PubMed Abstract | CrossRef Full Text | Google Scholar

Kamarudin, A. N., Cox, T., and Kolamunnage-Dona, R. (2017). Time-dependent ROC curve analysis in medical research: current methods and applications. BMC Med. Res. Methodol. 17:53. doi: 10.1186/s12874-017-0332-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Kather, J. N., Krisam, J., Charoentong, P., Luedde, T., Herpel, E., Weis, C. A., et al. (2019). Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study. PLoS Med. 16:e1002730. doi: 10.1371/journal.pmed.1002730

PubMed Abstract | CrossRef Full Text | Google Scholar

Kong, J., Cooper, L. A., Wang, F., Gao, J., Teodoro, G., Scarpace, L., et al. (2013). Machine-based morphologic analysis of glioblastoma using whole-slide pathology images uncovers clinically relevant molecular correlates. PLoS One. 8:e81049. doi: 10.1371/journal.pone.0081049

PubMed Abstract | CrossRef Full Text | Google Scholar

Kyzas, P. A., Cunha, I. W., and Ioannidis, J. P. (2005). Prognostic significance of vascular endothelial growth factor immunohistochemical expression in head and neck squamous cell carcinoma: a meta-analysis. Clin. Cancer Res. 11, 1434–1440. doi: 10.1158/1078-0432.CCR-04-1870

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, J., Lu, Y., Akbani, R., Ju, Z., Roebuck, P. L., Liu, W., et al. (2013). TCPA: a resource for cancer functional proteomics data. Nat. Methods 10, 1046–1047. doi: 10.1038/nmeth.2650

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, S., Lee, M. C., and Pun, C. M. (2009). Complex Zernike moments features for shape-based image retrieval. IEEE Trans. Syst. Man Cyber. 39, 227–237. doi: 10.1109/TSMCA.2008.2007988

CrossRef Full Text | Google Scholar

Lydiatt, W. M., Patel, S. G., O’Sullivan, B., Brandwein, M. S., Ridge, J. A., Migliacci, J. C., et al. (2017). Head and Neck cancers-major changes in the American Joint Committee on cancer eighth edition cancer staging manual. CA Cancer J. Clin. 67, 122–137. doi: 10.3322/caac.21389

PubMed Abstract | CrossRef Full Text | Google Scholar

Mirghani, H., Amen, F., Blanchard, P., Moreau, F., Guigay, J., Hartl, D. M., et al. (2015). Treatment de-escalation in HPV-positive oropharyngeal carcinoma: ongoing trials, critical issues and perspectives. Int. J. Cancer. 136, 1494–1503. doi: 10.1002/ijc.28847

PubMed Abstract | CrossRef Full Text | Google Scholar

Mobadersany, P., Yousefi, S., Amgad, M., Gutman, D. A., Barnholtz-Sloan, J. S., Velázquez Vega, J. E., et al. (2018). Predicting cancer outcomes from histology and genomics using convolutional networks. Proc. Natl. Acad. Sci. U S A 115, E2970–E2979. doi: 10.1073/pnas.1717139115

PubMed Abstract | CrossRef Full Text | Google Scholar

Moon, W. K., Lee, Y. W., Huang, Y. S., Lee, S. H., Bae, M. S., Yi, A., et al. (2017). Computer-aided prediction of axillary lymph node status in breast cancer using tumor surrounding tissue features in ultrasound images. Comput. Methods Progr. Biomed. 146, 143–150. doi: 10.1016/j.cmpb.2017.06.001

PubMed Abstract | CrossRef Full Text | Google Scholar

Morris, L., Chandramohan, R., West, L., Zehir, A., Chakravarty, D., Pfister, D. G., et al. (2017). The Molecular Landscape of Recurrent and Metastatic Head and Neck Cancers: Insights From a Precision Oncology Sequencing Platform. JAMA Oncol. 3, 244–255. doi: 10.1001/jamaoncol.2016.1790

PubMed Abstract | CrossRef Full Text | Google Scholar

Muzio, L. L., Farina, A., Rubini, C., Coccia, E., Capogreco, M., Colella, G., et al. (2006). Effect of c-Met expression on survival in head and neck squamous cell carcinoma. Tumour Biol. 27, 115–121. doi: 10.1159/000092716

PubMed Abstract | CrossRef Full Text | Google Scholar

Niehr, F., Eder, T., Pilz, T., Konschak, R., Treue, D., Klauschen, F., et al. (2018). Multilayered Omics-Based Analysis of a Head and Neck Cancer Model of Cisplatin Resistance Reveals Intratumoral Heterogeneity and Treatment-Induced Clonal Selection. Clin. Cancer Res. 24, 158–168. doi: 10.1158/1078-0432.CCR-17-2410

PubMed Abstract | CrossRef Full Text | Google Scholar

O’Connor, J. P., Jackson, A., Asselin, M. C., Buckley, D. L., Parker, G. J., and Jayson, G. C. (2008). Quantitative imaging biomarkers in the clinical development of targeted therapeutics: current and future perspectives. Lancet Oncol. 9, 766–776. doi: 10.1016/S1470-2045(08)70196-7

CrossRef Full Text | Google Scholar

Poeta, M. L., Manola, J., Goldwasser, M. A., Forastiere, A., Benoit, N., Califano, J. A., et al. (2007). TP53 mutations and survival in squamous-cell carcinoma of the head and neck. N. Engl. J. Med. 357, 2552–2561. doi: 10.1056/NEJMoa073770

PubMed Abstract | CrossRef Full Text | Google Scholar

Romo-Bucheli, D., Janowczyk, A., Gilmore, H., Romero, E., and Madabhushi, A. (2016). Automated Tubule Nuclei Quantification and Correlation with Oncotype DX risk categories in ER+ Breast Cancer Whole Slide Images. Sci. Rep. 6:32706. doi: 10.1038/srep32706

PubMed Abstract | CrossRef Full Text | Google Scholar

Sertel, O., Kong, J., Shimada, H., Catalyurek, U. V., Saltz, J. H., and Gurcan, M. N. (2009). Computer-aided Prognosis of Neuroblastoma on Whole-slide Images: Classification of Stromal Development. Pattern Recogn. 42, 1093–1103. doi: 10.1016/j.patcog.2008.08.027

PubMed Abstract | CrossRef Full Text | Google Scholar

Soliman, K. (2015). CellProfiler: Novel Automated Image Segmentation Procedure for Super-Resolution Microscopy. Biol. Proced. Online. 17:11. doi: 10.1186/s12575-015-0023-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U S A 102, 15545–15550. doi: 10.1073/pnas.0506580102

PubMed Abstract | CrossRef Full Text | Google Scholar

Suh, Y., Amelio, I., Guerrero Urbano, T., and Tavassoli, M. (2014). Clinical update on cancer: molecular oncology of head and neck cancer. Cell Death Dis. 5:e1018. doi: 10.1038/cddis.2013.548

PubMed Abstract | CrossRef Full Text | Google Scholar

Sun, D., Li, A., Tang, B., and Wang, M. (2018). Integrating genomic data and pathological images to effectively predict breast cancer clinical outcome. Comput. Methods Progr. Biomed. 161, 45–53. doi: 10.1016/j.cmpb.2018.04.008

PubMed Abstract | CrossRef Full Text | Google Scholar

Tibshirani, R. (1997). The lasso method for variable selection in the Cox model. Stat. Med 16, 385–395. doi: 10.1002/(sici)1097-0258(19970228)16

CrossRef Full Text | Google Scholar

Vickers, A. J., Cronin, A. M., Elkin, E. B., and Gonen, M. (2008). Extensions to decision curve analysis, a novel method for evaluating diagnostic tests, prediction models and molecular markers. BMC Med. Inform. Decis. Mak. 8:53. doi: 10.1186/1472-6947-8-53

PubMed Abstract | CrossRef Full Text | Google Scholar

Vincent, L. (2000). Granulometries and opening trees. Fundam. Inform. 41, 57–90. doi: 10.3233/fi-2000-411203

CrossRef Full Text | Google Scholar

Vogelstein, B., Lane, D., and Levine, A. J. (2000). Surfing the p53 network. Nature 408, 307–310. doi: 10.1038/35042675

PubMed Abstract | CrossRef Full Text | Google Scholar

Wallner, M., Herbst, A., Behrens, A., Crispin, A., Stieber, P., Göke, B., et al. (2006). Methylation of serum DNA is an independent prognostic marker in colorectal cancer. Clin. Cancer Res. 12, 7347–7352. doi: 10.1158/1078-0432.CCR-06-1264

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, C., Pécot, T., Zynger, D. L., Machiraju, R., Shapiro, C. L., and Huang, K. (2013). Identifying survival associated morphological features of triple negative breast cancer using multiple datasets. J. Am. Med. Inform. Assoc. 20, 680–687. doi: 10.1136/amiajnl-2012-001538

PubMed Abstract | CrossRef Full Text | Google Scholar

Yanaihara, N., Caplen, N., Bowman, E., Seike, M., Kumamoto, K., Yi, M., et al. (2006). Unique microRNA molecular profiles in lung cancer diagnosis and prognosis. Cancer Cell. 9, 189–198. doi: 10.1016/j.ccr.2006.01.025

PubMed Abstract | CrossRef Full Text | Google Scholar

Yu, K. H., Berry, G. J., Rubin, D. L., Ré, C., Altman, R. B., and Snyder, M. (2017). Association of Omics Features with Histopathology Patterns in Lung Adenocarcinoma. Cell Syst. 5, 620–627.e. doi: 10.1016/j.cels.2017.10.014

PubMed Abstract | CrossRef Full Text | Google Scholar

Yu, K. H., Zhang, C., Berry, G. J., Altman, R. B., Ré, C., Rubin, D. L., et al. (2016). Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat. Commun. 7:12474. doi: 10.1038/ncomms12474

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, X., Xing, F., Su, H., Yang, L., and Zhang, S. (2015). High-throughput histopathological image analysis via robust cell segmentation and hashing. Med. Image Anal. 26, 306–315. doi: 10.1016/

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhong, T., Wu, M., and Ma, S. (2019). Examination of Independent Prognostic Power of Gene Expressions and Histopathological Imaging Features in Cancer. Cancers 11:361. doi: 10.3390/cancers11030361

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: head and neck cancer, histopathological image, genomics, transcriptomics, proteomics, prognosis

Citation: Zeng H, Chen L, Huang Y, Luo Y and Ma X (2020) Integrative Models of Histopathological Image Features and Omics Data Predict Survival in Head and Neck Squamous Cell Carcinoma. Front. Cell Dev. Biol. 8:553099. doi: 10.3389/fcell.2020.553099

Received: 17 April 2020; Accepted: 08 October 2020;
Published: 29 October 2020.

Edited by:

Cecilia Ana Suarez, Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Argentina

Reviewed by:

Andrea Porzionato, University of Padua, Italy
Yongyue Wei, Nanjing Medical University, China

Copyright © 2020 Zeng, Chen, Huang, Luo and Ma. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Xuelei Ma,

These authors share first authorship