Edited by: April Khademi, Ryerson University, Canada
Reviewed by: Nitish Kumar Mishra, University of Nebraska Medical Center, United States; Shihao Shen, University of California, Los Angeles, United States
This article was submitted to Bioinformatics and Computational Biology, a section of the journal Frontiers in Bioengineering and Biotechnology
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
This study applied a deep-learning cell identification algorithm to diagnostic images from the colon cancer repository at The Cancer Genome Atlas (TCGA). Within-image sampling improved performance without loss of accuracy. The features thus derived were associated with various clinical variables including metastasis, residual tumor, venous invasion, and lymphatic invasion. The deep-learning algorithm was trained using images from a locally available data set, then applied to the TCGA images by tiling them, and identifying cells in each patch defined by the tiling. In this application the average number of patches containing tissue in an image was ~900. Processing a random sample of patches greatly reduced computation costs. The cell identification algorithm was applied directly to each sampled patch, resulting in a list of cells. Each cell was labeled with its location and classification (“epithelial,” “inflammatory,” “fibroblast,” or “other”). The number of cells of a given type in the patch was calculated, resulting in a patch profile containing four features. A morphological profile that applied to the entire image was obtained by averaging profiles over all patches. Two sampling policies were examined. The first policy was random sampling which samples patches with uniform weighting. The second policy was systematic random sampling which takes spatial dependencies into account. Compared with the processing of complete whole slide images there was a seven-fold improvement in performance when systematic random spatial sampling was used to select 100 tiles from the whole-slide image for processing, with very little loss of accuracy (~4% on average). We found links between the predicted features and clinical variables in the TCGA colon cancer data set. Several significant associations were found: increased fibroblast numbers were associated with the presence of metastasis, venous invasion, lymphatic invasion and residual tumor while decreased numbers of inflammatory cells were associated with mucinous carcinomas. Regarding the four different types of cell, deep learning has generated morphological features that are indicators of cell density. The features are related to cellularity, the numbers, degree, or quality of cells present in a tumor. Cellularity has been reported to be related to patient survival and other diagnostic and prognostic indicators, indicating that the features calculated here may be of general usefulness.
Histopathology, the microscopic examination of diseased tissue is central to the diagnosis and treatment of cancer. Recent developments in digital microscopy have enabled the extraction of useful information from whole-slide images (WSIs) of cancer tissue using deep learning algorithms that are based on convolutional neural networks (Janowczyk and Madabhushi,
Most deep learning algorithms are trained with relatively small images. To apply a trained algorithm to a whole-slide image a straightforward approach is to tile the WSI with small patches and apply the deep-learning algorithm to each patch independently. The per-patch results may be averaged over the WSI to generate a collection of features which characterize the spatial characteristics of the WSI, a
However, such an approach is computationally costly: on average, each WSI in the data set used in this study contained about 900 patches that had significant amounts of tissue. Computational costs can be reduced by sampling a limited number of patches, applying the algorithm to each, then averaging the per-patch features. In principle, if enough patches are sampled, processing costs can be reduced without significant loss of accuracy. The main aim of this study was to examine the behavior of sampling as applied to WSIs. In addition, we have showed how profiles generated by sampled patches have significant associations with clinical variables.
From image to profile: stages.
In the first stage, that of segmentation, the whole slide image (1a) is separated into foreground and background regions, represented by a binary mask (1b). In the second stage the mask is divided into patches which are the same size and resolution as those used to train the algorithm. Each patch then is categorized as foreground or background, depending on the percentage of pixels assigned by the mask. Next, as shown in (1c) foreground patches in the grid of tiles are sampled. For each patch (1d) that has been sampled the cell identification algorithm locates and classifies cells (1e). The information concerning cell nuclei is summarized in a
“The degree, quality, or condition of cells that are present” (Farlex Partner Medical Dictionary,
Sampling of regions within an image is a standard procedure in manual pathology. Pathologists are accustomed to rapidly scanning tissue slides under the microscope and selecting interesting regions for intensive consideration. Kayser et al. (
Automated sampling within an image is used in
In another study sampling was employed in the analysis of cases of colon cancer where pathologists were asked to categorize the tissue type at 300 randomly selected points in a dense region of tissue (West et al.,
As for digital pathology, a description of the use of sampling in the detection of invasive breast cancer in histopathology images can be found in Cruz-Roa et al. (
In histopathology applications the choice of a sampling policy is affected by spatial dependency, whereby characteristics at neighboring locations tend to have similar values. Standard statistical sampling techniques that assume independence among observations do not take spatial dependency into account and are not always the most appropriate. Sampling policies that do take account of spatial dependencies have been developed in geospatial statistics (Delemelle,
In the experiments described in this article a straightforward approach has been used: sampling of a set of patches, followed by cell identification, and profile generation. The two sampling policies that have been implemented are random sampling, and systematic random sampling.
Note that in some situations, non-random sampling, such as uniform spacing may be adequate. Uniform spacing gives good coverage of the WSI but will fail if there are periodicities in the image, or if there are relationships that depend on distance that should be estimated from the sample.
In the work described here, variants of two sampling policies have been implemented. In the basic form of
In
This article reports on experiments with sampling polices, RS and SRS. Because there was no prior information to indicate that any specific feature in the morphological profile should be prioritized, we did not consider the use of adaptive sampling. This does not rule out the use of adaptive sampling in future applications, for example, when it is necessary to concentrate on features that are uncommon and when sampling should be directed toward areas with such features. For example, if a tissue sample consists mainly of normal cells, but we wish to analyze the features of abnormal cells, it might be advisable to search near points already sampled that were found to contain abnormal cells.
To enable the comparison of the two sampling policies in the calculation of WSI profiles a cell identification algorithm was trained, based on work described in Sirinukunwattana et al. (
In equation (1) the model accepts an image
The image's morphological profile is a set of
Patch showing types of cells identified.
Training data consisted of 853 hand-marked images, most of which were from the same WSIs described in Sirinukunwattana et al. (
The classification model, based on the Tensorflow “cifar10” model (Krizhevsky,
The training data used in classification was marked with the locations of different types of cells. The data set included the 100 training images described in Sirinukunwattana et al. (
Colorectal cancer data from The Cancer Genome Atlas (TCGA) has yielded a molecular characterization of human colon and rectal data (Cancer Genome Atlas Network,
Each WSI was segmented into foreground (tissue present) and background (no tissue present) regions using an entropy-based algorithm which created a foreground mask (See
In the case of RS and for each experimental run, nT patches were randomly sampled from the set of nF foreground patches. The cell detection algorithm was applied to each patch individually. The detection component calculated the haemotoxylin channel and supplied it to the detection CNN. The classification module extracted small patches around each detected point, normalized them collectively, using the average intensities saved from the training stage, and applied the classification algorithm to each patch individually, generating a set of patch types which were used to calculate morphological profiles.
Denoting the profile of patch
The following version of SRS was implemented. As with RS a sample size was specified: in this case a nominal sample size nNOM. A coarse tiling of the WSI used
Detail of whole-slide image showing sample grids and selected patches.
Note that a straightforward gray-detection algorithm was used to identify patches containing artifacts. The percentage of patches containing artifacts γ was estimated by sampling patches.
In five of the TCGA diagnostic images 1,500 cells were hand-marked by a pathologist. Cells were classified as normal epithelial cells, malignant epithelial cells, inflammatory cells or as fibroblasts. Patches containing hand-marked cells were run through the cell identification algorithm, and the accuracies of detection and classification were computed. (Note that the two types of epithelial cells were merged into one, because the cell identification algorithm did not distinguish them). Both detection and classification achieved 65% accuracy on average (
Detection and classification accuracy.
AA-3543 | 0.85 | 0.66 |
AA-3845 | 0.68 | 0.76 |
AA-3864 | 0.62 | 0.81 |
AA-3986 | 0.61 | 0.90 |
AA-A02J | 0.50 | 0.66 |
Average | 0.65 | 0.76 |
Both sampling policies, RS and SRS, were applied using the following nominal sample sizes: 25, 50, 100. For both RS and SRS and for each nominal sample size two batch runs were executed. In each batch run the sampling policy was applied to the 142 whole slide images. The batch runs of RS were done after those for SRS using the actual sample sizes generated by SRS, ensuring that the runs could be compared for accuracy.
Scatterplots comparing batch runs of SRS (systematic random sampling).
Comparison of RS and SRS.
Epithelial cells | |||
RS–relative batch diff. | 11.30% | 8.10% | 5.80% |
SRS–relative batch diff. | 9.20% | 6.00% | 3.50% |
Inflammatory cells | |||
RS–relative batch diff. | 19.20% | 12.50% | 8.30% |
SRS–relative batch diff. | 17.40% | 7.60% | 6.50% |
Fibroblasts | |||
RS–relative batch diff. | 14.50% | 11.00% | 8.00% |
SRS–relative batch diff. | 13.80% | 8.10% | 4.80% |
“Other” cells | |||
RS–relative batch diff. | 24.30% | 16.90% | 9.90% |
SRS–relative batch diff. | 20.10% | 11.50% | 7.80% |
Correlation matrix of cell counts.
Epithelial | 1 | |||
Inflammatory | 0.20 | 1 | ||
Fibroblast | −0.59 | −0.34 | 1 | |
Other | −0.63 | −0.13 | 0.56 | 1 |
Epithelial | Inflammatory | Fibroblast | Other |
Preprocessing of the clinical data associated with the 142 images in the data set identified 14 clinical variables of interest. (Variables with large numbers of missing values were excluded, as were variables with constant values). Each variable was cross-tabulated against each of the four profile features, or correlation coefficients were calculated, or a MANOVA was performed. Where the clinical variable was a binary categorical variable,
Associations between cells counts and clinical variables.
22.1 | 17.2 | 0.00152 | 0.0411 | Y | |
5.8 | 4.0 | 0.0372 | 0.0411 | Y | |
5.3 | 7.7 | 0.0156 | 0.0429 | Y | |
2.1 | 3.5 | 0.00506 | 0.0482 | Y | |
22.1 | 17.7 | 0.0130 | 0.0438 | Y | |
5.8 | 4.4 | 0.0506 | 0.0393 | ||
5.3 | 7.8 | 0.0179 | 0.0420 | Y | |
2.2 | 3.3 | 0.0100 | 0.0464 | Y | |
4.6 | 6.4 | 0.00661 | 0.0473 | y | |
22.9 | 19.4 | 0.0111 | 0.0455 | Y | |
4.8 | 6.4 | 0.0116 | 0.0446 | Y | |
5.9 | 3.4 | 0.00361 | 0.0491 | Y | |
5.70 | 3.80 | 0.0488 | 0.0402 |
Differences between the two categories for
Mucinous carcinomas were associated with fewer inflammatory cells than were non-mucinous carcinomas. Finally, the 12 patients who were recorded as dead when added to the TCGA repository were also likely to have fewer inflammatory cells detected than patients who were recorded as alive, although the associated
Note that the remaining clinical variables, for which no associations were found, were as follows: Gender, Age, T Stage, N Stage, History of colon polyps, History of other malignancy, Anatomic neoplasm subdivision (Tumor Location—left side vs. right side), and CEA level.
In this application statistical sampling of patches from whole-slides images proved to be worthwhile: significant improvements in performance were achieved with very little loss of accuracy. Systematic random sampling was markedly more accurate than straightforward random sampling. For example, with a sample size of 100, and considering epithelial cell counts the batch difference indicator was 3.5% for systematic random sampling and 5.8% for basic random sampling (
The profiles being computed were particularly suitable for random sampling because the features of interest were
Statistically significant associations between morphology and various clinical variables were found in this study. The TNM grading system used in cancer treatment considers tumor penetration, nodes, and metastasis (National Cancer Institute,
There were five clinical variables for which we found significant relationships with morphological features. Four clinical variables had significant associations with fibroblast counts: in each case higher fibroblast counts were associated with poorer values of the clinical variable. This is not unexpected (Hewitt et al.,
Two clinical variables were associated with differences in inflammatory cell counts, namely metastasis, and mucinous carcinoma. Poor values of the clinical variables were associated with lower numbers of inflammatory cells, which might be expected, in the light of the positive role of tumor infiltrating lymphocytes in slowing down disease progression (Nosho et al.,
Finally, metastasis, residual tumor, and venous invasion were related to lower numbers of epithelial cells.
The morphological features extracted from the 142 diagnostic images from the COAD data set may be regarded as expressions of
In addition to the cellularity features studied here, other features may be calculated using deep learning. Such features include
Jass (
We have shown experimentally that a cell identification algorithm using deep learning can uncover interesting relationships between tissue morphology and a range of clinical variables and that systematic sampling of tissue regions can improve performance without losing accuracy.
The experimental results in this paper were obtained from a single TCGA site. The analysis should be extended to all sites in the TCGA colon cancer repository. In the experiments carried out here, standardization was straightforward, using the pooled average intensities of a group of WSIs to normalize data. Unfortunately, there is no guarantee that this approach will always be successful. Standardization techniques that cater for the many different originating sites in TCGA should be used. Carried out effectively, standardization ensures that reproducible morphological features are generated.
Publicly available datasets were analyzed in this study. This data can be found here:
MS was responsible for aspects of sampling, handled the data, developed the code and did the numerical analysis, and wrote the report. KH did hand-marking of the TCGA histology slides and provided advice on both histology and clinical data. NR contributed the general framework for the deep learning approach and provided both general specific guidance.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
1