Breast Cancer Histopathological Images Recognition Based on Low Dimensional Three-Channel Features

Hao, Yan; Qiao, Shichang; Zhang, Li; Xu, Ting; Bai, Yanping; Hu, Hongping; Zhang, Wendong; Zhang, Guojun

doi:10.3389/fonc.2021.657560

ORIGINAL RESEARCH article

Front. Oncol., 14 June 2021

Sec. Cancer Imaging and Image-directed Interventions

Volume 11 - 2021 | https://doi.org/10.3389/fonc.2021.657560

This article is part of the Research TopicBio-inspired Physiological Signal(s) and Medical Image(s) Neural Processing Systems Based on Deep Learning and Mathematical Modeling for Implementing Bio-Engineering Applications in Medical and Industrial FieldsView all 18 articles

Breast Cancer Histopathological Images Recognition Based on Low Dimensional Three-Channel Features

Yan Hao¹

Shichang Qiao²

Li Zhang²

Ting Xu²

Yanping Bai^2*

Hongping Hu²

Wendong Zhang³

Guojun Zhang³

¹School of Information and Communication Engineering, North University of China, Taiyuan, China
²Department of Mathematics, School of Science, North University of China, Taiyuan, China
³School of Instrument and Electronics, Key Laboratory of Dynamic Testing Technology, North University of China, Taiyuan, China

Breast cancer (BC) is the primary threat to women’s health, and early diagnosis of breast cancer is imperative. Although there are many ways to diagnose breast cancer, the gold standard is still pathological examination. In this paper, a low dimensional three-channel features based breast cancer histopathological images recognition method is proposed to achieve fast and accurate breast cancer benign and malignant recognition. Three-channel features of 10 descriptors were extracted, which are gray level co-occurrence matrix on one direction (GLCM1), gray level co-occurrence matrix on four directions (GLCM4), average pixel value of each channel (APVEC), Hu invariant moment (HIM), wavelet features, Tamura, completed local binary pattern (CLBP), local binary pattern (LBP), Gabor, histogram of oriented gradient (Hog), respectively. Then support vector machine (SVM) was used to assess their performance. Experiments on BreaKHis dataset show that GLCM1, GLCM4 and APVEC achieved the recognition accuracy of 90.2%-94.97% at the image level and 89.18%-94.24% at the patient level, which is better than many state-of-the-art methods, including many deep learning frameworks. The experimental results show that the breast cancer recognition based on high dimensional features will increase the recognition time, but the recognition accuracy is not greatly improved. Three-channel features will enhance the recognizability of the image, so as to achieve higher recognition accuracy than gray-level features.

Introduction

Cancer has become one of the major public health problems that seriously threaten the health of people. The incidence and mortality of breast cancer have been rising continuously in recent years. Early accurate diagnosis is the key to improve the survival rate of patients. Mammogram is the first step of early diagnosis, but it is difficult to detect cancer in the dense breast of adolescent women, and the X-ray radiation poses a threat to the health of patients and radiologists. Computed tomography (CT) is a localized examination, which can not be used to judge that a patient is suffering from breast cancer according to the observed abnormalities. The gold standard for breast cancer diagnosis is still pathological examination. Pathological examinations usually obtain tumor specimens through puncture, excision, etc. And then stain them with hematoxylin and eosin (H&E) stains. Hematoxylin binds deoxyribonucleic acid (DNA) to highlight the nucleus, while eosin binds proteins and highlights other structures. Accurate diagnosis of breast cancer requires experienced histopathologists, and it requires a lot of time and effort to complete this task. In addition, the diagnosis results of different histopathologists are not the same, which strongly depends on the prior knowledge of histopathologists. It resulting in lower diagnosis consistency, and the average diagnosis accuracy is only 75% (1).

Currently, breast cancer diagnosis based on histopathological images is facing three major challenges. Firstly, there is a shortage of experienced histopathologists around the world, especially in some underdeveloped areas and small hospitals. Secondly, the diagnosis of histopathologist is subjective and there is no objective evaluation basis. Whether the diagnosis is correct or not depends entirely on the histopathologists’ prior knowledge. Thirdly, the diagnosis of breast cancer based on histopathological images is very complicated, time-consuming and labor-intensive, which is inefficient in the era of big data. In face of these problems, an efficient and objective breast cancer diagnosis method is urgently needed to alleviate the workload of histopathologists.

With the rapid development of computer-aided diagnosis (CAD), it has been gradually applied to the clinical field. The CAD system cannot completely replace the doctor, but it can be used as a “second reader” to assist doctors in diagnosing diseases. However, there are many false positive areas detected by the computer, which will take a lot of time of doctors to re-evaluate the results prompted by the computer, resulting in a decrease in the accuracy and efficiency. Therefore, how to improve the sensitivity of computer-aided tumor detection method, while greatly reducing the false positive detection rate, improve the overall performance of the detection method is a subject to be studied.

In recent years, machine learning has been successfully applied to image recognition, object recognition, and text classification. With the advancement of computer-aided diagnosis technology, machine learning has also been successfully applied to breast cancer diagnosis (2–8). There are two common methods, histopathological images classification based on artificial feature extraction and traditional machine learning methods, and histopathological images classification based on deep learning methods. Histopathological images classification based on artificial feature extraction and traditional machine learning methods needs manual design of features, but it does not require equipment with high performance and has advantages in computing time. However, histopathological images classification based on deep learning, especially convolutional neural network (CNN), often requires a large number of labeled training samples, while the labeled data is difficult to obtain. The labeling of lesions is a time-consuming and laborious work, which takes a lot of time even for very experienced histopathologists.

The key of traditional histopathological images classification is feature extraction. The common features include color features, morphological features, texture features, statistical features etc. Spanhol et al. (9) introduced a publicly available breast cancer histopathology dataset (BreaKHis), and they extracted LBP, CLBP, gray level co-occurrence matrix (GLCM), Local phase quantization (LPQ), parameter-free threshold adjacency statistics (PFTAS) and one keypoint descriptor named ORB features, and 1-nearest neighbor (1-NN), quadratic linear analysis (QDA), support vector machines (SVMs), and random forests (RF) were used to assess the aforementioned features, with an accuracy range from 80% to 85%. Pendar et al. (10) introduced a representation learning-based unsupervised domain adaptation on the basis of (9) and compared it with the results of CNN. Anuranjeeta et al. (11) proposed a breast cancer recognition method based on morphological features. 16 morphological features were extracted, and 8 classifiers were used for recognition, the accuracy is about 80%. The authors in (12–14) proposed breast cancer recognition methods based on texture features. Particularly, Carvalho et al. (14) used phylogenetic diversity indexes to characterize the types of breast cancer. Sudharshan et al. (15) compared 12 multi-instance learning methods based on PFTAS and verified that multi-instance learning is more effective than single-instance learning. But none of them considered the color channel of the image. Fang et al. (16) proposed a framework called Local Receptive Field based Extreme Learning Machine with Three Channels (3C-LRF-ELM), which can automatically extract histopathological features to diagnose whether there is inflammation. In addition, in order to reduce the recognition time and the complexity of the algorithms, this paper is committed to achieving high recognition accuracy with low dimensional features.

Deep learning methods, especially CNN, can achieve more accurate cancer recognition (17–25) for it’s ability to extract powerful high-level features compared with traditional image recognition methods. For example, Spanhol et al. (17) used the existing AlexNet to test the BreaKHis dataset, and its recognition accuracy was significantly higher than their previous work (9). The authors in (18–21, 25) used different CNN frameworks and obtained the recognition accuracy of more than 90% on the two-class problem of the BreaKHis dataset. Benhammou et al. (22) comprehensively surveyed the researches based on BreaKHis datasets from the magnification-specific binary, magnification independent binary, magnification specific multi-category and magnification independent multi-category four aspects, and proposed a magnification independent multi-category method based on CNN, which is rarely considered in previous studies. The works (23–26) also achieved good performance on the Bioimaging 2015 dataset. Both the BreaKHis and Bioimaging 2015 are the challenging datases for breast cancer detection. Due to the drawbacks of model training, most researchers’ research were based on models that have been well trained through other datasets and verified by histopathological images. Few people trained a complete model with histopathological images for the lack of labeled data.

In order to reduce the workload of histopathologists and allow them to spend more time on the diagnosis of more complex diseases, efficient and fast computer-aided diagnosis methods are of urgent need. This paper proposed a breast cancer histopathological images recognition method based on low dimensional three-channel features. The features of the three channels of the image were extracted respectively, then the three-channel features were fused to realize better breast cancer histopathological images recognition for the image level and the patient level. The framework is shown in Figure 1.

FIGURE 1

Figure 1 Proposed framework for histopathological image classification.

The contributions of this paper are as follows:

1) proposed a histopathological images recognition method based on three-channel features,

2) proposed a histopathological images recognition method based on low dimensional features,

3) it is a method with high accuracy and fast recognition speed,

4) it is a method easy to implement.

The rest of the paper is organized as follows: in Section 2 the feature extraction methods are introduced, the experiments and results analysis are given in Section 3, and Section 4 concludes the work.

Feature Extraction

Gray Level Co-Occurrence Matrix

Gray level co-occurrence matrix is a common method to describe the texture of an image by studying its spatial correlation characteristics. In 1973, Haralick et al. first used GLCM to describe texture features (27). In our experiments, we calculated the GLCM of 256 gray levels in one direction 0° and four directions 0°, 45°, 90°, 135°, respectively. Then, according to the GLCM, 22 related features were calculated: autocorrelation, contrast, 2 correlation, cluster probability, cluster shade, dissimilarity, energy, entropy, 2 homogeneity, maximum probability, sum of squares, sum average, sum variance, sum entropy, difference variance, difference entropy, 2 information measures of correlation, inverse difference, inverse difference moment (27–29).

Average Pixel Value of Each Channel

The average value reflects the centralized tendency of the data and is an important amplitude feature of images. For an image, the average pixel value of each color channel is expressed as

\begin{array}{l} f_{m e a n} = \frac{1}{M N} \sum_{x_{c} = 1}^{M} \sum_{y_{c} = 1}^{N} f (x_{c}, y_{c}), & (1) \end{array}

where f (x_c, y_c ) represents the pixel value of (x_c, y_c ).

Hu Invariant Moment

Geometric moments were proposed by Hu.M.K (30) in 1962. They constructed seven invariant moments according to second-order and third-order normalized central moments, and proved that they are invariant to rotation, scaling and translation. Hu invariant moment is a region-based image shape descriptor. In the construction of Hu invariant moments, the central moment is used to eliminate the influence of image translation, the normalization eliminates the influence of image scaling, and the polynomial is constructed to realize the invariant characteristics of rotation. Different order moments reflect different characteristics, the low order reflects the basic shape of the target, and the high order reflects the details and complexity.

Wavelet Features

The result of two-dimensional wavelet decomposition reflects the frequency changes in different directions and the texture characteristics of the image. Since the detail subgraph is the high-frequency component of the original image and contains the main texture information, the energy of the individual detail subgraph is taken as the texture feature, which reflects the energy distribution along the frequency axis with respect to the scale and direction. In this paper, 5-layer wavelet decomposition was carried out, and the energy of high-frequency components in each layer was taken as the feature vector.

Tamura

Tamura et al. (31) proposed a texture feature description method based on the psychological research of texture visual perception, and defined six characteristics to describe texture. Namely, coarseness, contrast, directionality, line likeness, regu larity, and roughness. Coarseness reflects the change intensity of image gray level. The larger the texture granularity is, the coarser the texture image is. Contrast reflects the lightest and darkest gray levels in a gray image, and the range of differences determines the contrast. Directionality reflects the intensity of image texture concentration along a certain direction. Lineality reflects whether the image texture has a linear structure. Regulation reflects the consistency of texture features between a local region and the whole image. Roughness is the sum of roughness and contrast.

Local Binary Pattern

Local Binary Pattern (32) is an operator used to describe local texture features of an image. It has significant advantages such as rotation invariance and gray level invariance. The original LBP operator is defined as comparing the gray values of eight adjacent pixels with the threshold value namely the center pixel in a 3×3 window. If the value of the adjacent pixel is greater than or equal to the value of the center pixel, the position of the pixel is marked as 1, otherwise it is 0. That is, for a pixel (x_c, y_c) on the image

\begin{array}{l} L B P_{P, R} (x_{c}, y_{c}) = \sum_{p = 0}^{P - 1} s (g_{p} - g_{c}) 2^{p}, s (x) = {\begin{cases} 1, x \geq 0 \\ 0, x < 0 \end{cases} & (2) \end{array}

Where P is the number of sampling points in the neighborhood of the center pixel, R is the radius of the neighborhood, g_c is the gray value of the center pixel; g_p is the gray value of the pixel adjacent to the center pixel.

In this way, 8 points in the neighborhood can be compared to generate a total of 256 8-bit binary numbers, that is, the LBP value of the center pixel of the 3×3 window is obtained, and this value is used to reflect the texture information of the region.

Completed Local Binary Pattern

Completed local binary pattern (33) is a variant of LBP. The local area of the CLBP operator is represented by its center pixel and local differential sign magnitude transformation. After the center pixel is globally thresholded, it is coded with a binary string as CLBP_Center (CLBP_C). At the same time, the local difference sign magnitude transformation is decomposed into two complementary structural components: difference sign CLBP-Sign (CLBP_S) and difference magnitude CLBP-Magnitude (CLBP_M). For a certain pixel (x_c, y_c) on the image, the components are expressed as:

\begin{array}{l} {\begin{cases} C L B P_C_{P, R} (x_{c}, y_{c}) = s (g_{c} - g_{N}) \\ C L B P_S_{P, R} (x_{c}, y_{c}) = \sum_{p = 0}^{P - 1} s (g_{p} - g_{c}) 2^{p} s (x) = {\begin{cases} 1, x \geq 0 \\ 0, x < 0 \end{cases} \\ C L B P_M_{P, R} (x_{c}, y_{c}) = \sum_{p = 0}^{P - 1} s (D_{p} - D_{c}) 2^{p} \end{cases} . & (3) \end{array}

Where, N is the number of windows, $g_{N} = \frac{1}{N} \sum_{n = 0}^{N - 1} g_{n}$ represents the mean gray value about g_c when the center point is constantly moving, and $D_{p} = | g_{p} - g_{c} |$ , $D_{c} = \frac{1}{P} \sum_{p = 0}^{P - 1} | g_{p} - g_{c} |$ represents the mean magnitude. CLBP_S_P,R (x_c, y_c) is equivalent to the traditional LBP operator, which describes the difference sign characteristics of the local window. CLBP_M_P,R (x_c, y_c) describes the difference magnitude characteristics of the local window. CLBP_C_P,R (x_c, y_c) is the gray level information reflected by the pixel at the center. In our experiments, we worked with rotation-invariant uniform patterns, with a standard value of P = 8, R = 1, yielding a 20-D feature vector for each channel.

Gabor

Gabor feature is a kind of feature that can be used to describe the texture information of image. The frequency and direction of Gabor filter are similar to human visual system, and it is particularly suitable for texture representation and discrimination. Gabor features mainly rely on Gabor kernel to window the signal in frequency domain, so as to describe the local frequency information of the signal. Different textures generally have different center frequencies and bandwidths. According to these frequencies and bandwidths, a set of Gabor filters can be designed to filter texture images. Each Gabor filter only allows the texture corresponding to its frequency to pass smoothly, while the energy of other textures is suppressed. Texture features are analyzed and extracted from the output results of each filter for subsequent classification tasks. we used the Gabor filters with five scales and eight orientations, the size of the filter bank is 39×39, the block size is 46×70, yielding a 4000-D feature vector for each channel.

Histogram of Oriented Gradient

Histogram of Oriented Gradient (34) is a feature descriptor used for object detection in computer vision and image processing. It constructs features by calculating and counting the histogram of the gradient direction in the local area of the image. The use of gradient information can well reflect the edge information of the target, the local appearance and shape of the image can be characterized by the size of the local gradient. It is generally used in pedestrian detection, face recognition and other fields, but it does not perform well on images with complex texture information. It is introduced as a comparison in this paper.

Experiments and Results

Dataset

The BreaKHis dataset (9) contains biopsy images of benign and malignant breast tumors, which were collected through clinical studies from January 2014 to December 2014. During the period, all patients with clinical symptoms of BC were invited to the Brazilian P&D laboratory to participate in the study. Samples were collected by surgical open biopsy (SOB) and stained with hematoxylin and eosin. Hematoxylin is alkaline, mainly making the chromatin in the nucleus and nucleic acid in the cytoplasm stained blue-purple. eosin is acidic, mainly making the components in the cytoplasm and extracellular matrix stained pink. These images can be used for histological studies and marked by pathologists in the P&D laboratory. The BreaKHis dataset consists of 7909 breast tumor tissue microscopic images of 82 patients, divided into benign and malignant tumors, including 2480 benign (24 patients) and 5429 malignant (58 patients). The image is obtained in a three-channel RGB (red-green-blue) true color space with magnification factors of 40X, 100X, 200X, 400X, and the size of each image is 700×460. Tables 1 and 2 summarize the image distribution. And Figure 2 shows the representative examples of BreaKHis dataset.

TABLE 1

Table 1 Image distribution by magnification factor and class.

TABLE 2

Table 2 Image distribution by magnification factor and subclass.

FIGURE 2

Figure 2 Representative examples of BreaKHis dataset.

Protocol

All of the experiments were conducted on a platform with an Intel Core i7-5820K CPU and 16G memory. The BreaKHis dataset has been randomly divided into a training set (70%, 56 patients) and a testing set (30%, 26 patients). We guarantee that patients use to build the training set are not used for the testing set. The results presented in this work are the average of five trials.

All the images we used were without any preprocessing before feature extraction. For the SVM, we chose the RBF kernel. The best penalty factor c=2 and kernel function parameter g=1 were obtained by cross validation. For wavelet function, we selected coif5 wavelet function, which has better symmetry than dbN, has the same support length as db3N and sym3N, and has the same number of vanishing moments as db2N and sym2N.

Here, we report the recognition accuracy at both the image level and the patient level. For the image level, let N_{rec_I} be the number of images correctly classified, N represents all the test samples, then the recognition accuracy of the image level can be defined as

\begin{array}{l} I m a g e_a c c u r a c y = \frac{N_{r e c_I}}{N} . & (4) \end{array}

For the patient level, we followed the definition of (9). Let N_P be the image of patient P, S is the total number of patients, and N_{rec_P} images of patient P were correctly classified, then the patient score can be defined as

\begin{array}{l} P a t i e n t s c o r e = \frac{N_{r e c_P}}{N_{P}}, & (5) \end{array}

and define the recognition accuracy of the patient level as

\begin{array}{l} P a t i e n t_a c c u r a c y = \frac{\sum P a t i e n t s c o r e}{S} . & (6) \end{array}

To further assess the performance of the proposed framework, sensitivity (Se), precision (Pr) and F1-score metrics were used and the formulations of the metrics are described as

\begin{array}{l} S e = \frac{T P}{T P + F N}, & (7) \end{array}

\begin{array}{l} P r = \frac{T P}{T P + F P}, & (8) \end{array}

\begin{array}{l} F 1 - s c o r e = \frac{2 \times T P}{2 \times T P + F P + F N}, & (9) \end{array}

where true positive (TP) represents the number of malignant samples classified as malignant, whereas true negative (TN) represents the number of benign samples classified as benign. Also, false positive (FP) represents the number of benign samples incorrectly classified as malignant while false negative (FN) represents the number of malignant samples misclassified as benign.

Experiment Results

Table 3 reports the performance of all descriptors we have assessed. The image level recognition accuracy, the patient level recognition accuracy, sensitivity, precision and F1-score of 10 different three-channel descriptors under 4 magnifications were compared. The descriptors are GLCM1, GLCM4, APVEC, HIM, wavelet feature, Tamura, CLBP. In order to show the effectiveness of low dimensional features, LBP, Gabor, and Hog were introduced for comparison.

TABLE 3

Table 3 Classification performance of different descriptors based on three-channel features.

For images at 40X magnification, GLCM1 achieved the highest recognition accuracy of 94.12 ± 2.19% at the image level and 93.48 ± 2.7% at the patient level, as well as the highest precision and F1_score. The second was GLCM4 with which the image_accuracy and the patient_accuracy were 93.4 ± 3.54% and 92.95 ± 4.02, respectively. Followed by APVEC achieving the image_accuracy of 92.12 ± 1.09%, and the patient_accuracy of 90.55 ± 0.84%. The same conclusion was drawn for 100X. The image level recognition accuracy and the patient level recognition accuracy of GLCM1, GLCM4, and APVEC were 92.65 ± 3.08%, 91.74 ± 3.89%, 91.98 ± 3.79%, 91.16 ± 3.88%, 90.2 ± 2.33%, 89.18 ± 3.45%, respectively. However, for 200X, APVEC achieved the highest image level recognition accuracy of 94.97 ± 1.35%, followed by GLCM1 and GLCM4. GLCM1 performed best at the patient level with an accuracy of 94.24 ± 2.86%, which is 0.3% higher than APVEC. As for 400X, APVEC performed best at both the image level (92.78 ± 3.14%) and the patient level (93.3 ± 3.25%) followed by GLCM1 and GLCM4. On the whole, GLCM1, GLCM4 and APVEC performed well at both the image level and the patient level, followed by HIM. The four descriptors all get the highest recognition accuracy at 200X, and all descriptors except Gabor and Hog obtain the worst performance at 400X, which is same as the conclusion of (18, 35). Although the recognition accuracy of LBP and Gabor is above 82%, which is also acceptable, it also needs more recognition time due to the high feature dimension, as shown in Table 4. Tamura and Hog performed slightly worse compared to other descriptors.

TABLE 4

Table 4 Running time for feature extraction of each image and classification of different descriptors.

The reason for the above results is that the distributions of features extracted by different feature descriptors are different. The high dispersion of feature distribution will increase the difficulty of image recognition, and the feature with more concentrated distribution will achieve better recognition performance. Figure 3 is the best illustration of the results.

FIGURE 3

Figure 3 Visualization of feature distribution. (A) Feature distribution of 40X, 100X, 200X, (B) feature distribution of 400X.

Figure 3 is the visualization of feature distribution. The ordinate represents the feature values. Since the feature values of 40X, 100X, and 200X are relatively small, while the feature values of 400X are relatively large, the feature distribution cannot be displayed in the same figure at the same time. Here are two figures showing the data distribution, Figure 3(A) shows the feature distribution of 40X, 100X, 200X, and Figure 3(B) shows the feature distribution of 400X. It can be seen from Figure 3 that for 40X, 100X, 200X, the outliers of GLCM1, GLCM4, APVEC, and HIM are much less than other feature descriptors, indicating that the distributions of these four features are relatively concentrated, which is beneficial for breast cancer identification. In addition, comparing the feature distributions of benign and malignant samples under different magnifications, it can be found that the data distribution of benign and malignant samples of Hog are very similar, indicating the weak ability to discriminate between benign and malignant, which is also the reason for its poor performance. The outliers of GLCM1 and GLCM4 under 400X are obviously more compared to other magnifications, and the similarity of the benign and malignant feature distributions of all descriptors is relatively high, resulting in the poor performance of 400X.

Compared with RGB images, grayscale images only retain the brightness information of the images, but lose the chroma and saturation information of the images. Three-channel features can make up for the lost information of single-channel features, increasing the recognition capability of features, so as to achieve better recognition performance. To further illustrate the advantages of three-channel features, Table 5 shows the performance of different descriptors of gray-level features.

TABLE 5

Table 5 Classification performance of different gray-level features.

Comparing Table 3 and Table 5, it can be seen that the performance of the three-channel features is much better than that of gray-level features, especially GLCM1, GLCM4, APVEC, HIM and Gabor. The accuracy for most of them has increased by more than 10% for both the image level and the patient level. Figure 4 shows the average recognition accuracy of three-channel features and gray-level features for the image level and the patient level. The advantages of the three-channel features can be seen more clearly from Figure 4.

FIGURE 4

Figure 4 Classification accuracy for different features. (A) Image_accuracy for three-channel features and gray-level features, (B) patient_accuracy for three-channel features and gray-level feature\s.

Although the advantages of the three-channel features are obvious, we still have no idea about which channel plays a more important role in the classification results. Table 6 shows the classification performance of single-channel features under different magnifications. Observing the experimental results, we can find that R channel have a greater impact on the classification results under 40X, 100X, 200X magnifications, while B channel performs better under 400X. This is consistent with the actual situation of H&E histopathological images under different magnifications. The images of 40X, 100X, and 200X have more cytoplasm and appear pink. The image of 400X contains more information about the precise lesion locations, which is usually presented through the nucleus, and appear blue-purple.

TABLE 6

Table 6 Classification performance of single-channel features under different magnifications.

Different descriptors extract different features. It often cannot obtain all the effective information of the image only by one method. There may be a complementary relationship between different methods, and sometimes more redundant information may be added. In this paper, GLCM1 with the best recognition performance is combined with 8 other methods except GLCM4. Different features are fused in a cascade way. The results are shown in Table 7.

TABLE 7

Table 7 Classification performance of GLCM1 combined with other descriptors.

Table 7 shows that after the combination of GLCM1 and APVEC, the recognition accuracy of 40X and 100X is better than a single method whether it is for the image level or the patient level, and the accuracy of 200X and 400X is slightly lower than that of APVEC. The combination of GLCM1 and HIM improves the image level accuracy, while for the patient level, the accuracy of 40X and 100X is slightly lower than GLCM1. This shows the complementary relationship between GLCM1 and APVEC, HIM. The performance of the combination of GLCM1 and other methods is lower than that of single GLCM1, which shows that the fusion of different texture features increases the redundancy of features and reduces the recognizability.

The recognition accuracy of GLCM1, GLCM4, APVEC, and HIM based on the three-channel features is better than many existing studies, particularly, better than the performance of some deep learning models. Table 8 shows that the method proposed in this paper is superior to many state-of-the-art methods in benign and malignant tumor recognition, both for the image level and the patient level. It is worth mentioning that works (35–43) did not split training and test set according to the protocol of (9), works (44, 45) adopted the existed protocol, and works (46, 47) randomly divided training set (70%) and test set (30%), but they did not mention whether it was the same as the protocol. Although the recognition accuracy of the works (37, 39, 41–43, 46, 47) is significantly higher than that of our method, they all use deep learning model, which requires a large number of labeled training samples and consumes longer training time. In addition, in these works, except (42), they only calculated the image level recognition accuracy. George et al. even only tested their method based on the data of 200X.

TABLE 8

Table 8 Comparison of the proposed methods with other state-of-the-art methods.

Conclusion

In this paper, a breast cancer histopathological images recognition method based on low dimensional three-channel features is proposed. There have been many related studies, but in traditional methods, most scholars did not consider the color channel of the image, so that the extracted features lost part of the effective information. This paper compares the performance of 10 different feature descriptors in the recognition of breast cancer histopathological images. We extracted the three-channel features of different descriptors and fused the features of each channel. Then SVM was used to assess their performance. The experimental results show that the recognition accuracy of GLCM1, GLCM4, APVEC can reach more than 90% regardless of the image level or the patient level. And the performance based on three-channel features is much better than that of gray-level features, especially for GLCM1, GLCM4. We also proved that the R channel has a greater impact on the classification results of 40X, 100X, and 200X, while for 400X, it is more dependent on the B channel. In addition, high dimensional features consume more recognition time, this paper dedicates to achieving accurate recognition based on low dimensional features. Experiment results verify that the high dimensional features extracted by LBP, Hog, and Gabor require more recognition time, but the accuracy has not been greatly improved. Our method is based on the existing traditional methods and is easy to implement without complex image preprocessing. Experimental results and comparison with other methods confirm that our method requires less training time than deep learning methods, which cannot be ignored in practical applications.

In the future work, we will continue to propose more efficient and rapid methods for breast cancer recognition. The target is to realize multi-class recognition of breast cancer based on the research of benign and malignant tumor recognition. In addition to improving the recognition accuracy, we also hope to extract more effective information about cancer, which can help doctors find the lesion faster and reduce the workload on doctors.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: https://web.inf.ufpr.br/vri/databases/breast-cancer-histopathological-database-breakhis/.

Author Contributions

Data processing: TX and HH. Methodology: YH, YB, and SQ. Software: YH, SQ, and LZ. Supervision: YB, HH, TX, WZ and GZ. Original draft: YH. Review & editing: YH, SQ, YB, WZ and GZ. All authors contributed to the article and approved the submitted version.

Funding

This work was supported by the National Natural Science Foundation of China as National Major Scientific Instruments Development Project (Grant No. 61927807), the National Natural Science Foundation of China (Grant No. 51875535, 61774137), the Key Research and Development Projects of Shanxi Province (Grant No. 201903D121156) and the National Key Research and Development Project (Grant No. 2019YFC0119800).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We especially thank for the support of Shanxi Provincial Key Laboratory for Biomedical Imaging and Big Data and the fund for Shanxi ‘1331 Project’ Key Subject Construction and Innovation Special Zone Project.

References

1. Elmore JG, Longton GM, Carney PA, Geller BM, Onega T, Tosteson ANA, et al. Diagnostic Concordance Among Pathologists Interpreting Breast Biopsy Specimens. JAMA (2015) 313(11):1122–32. doi: 10.1001/jama.2015.1405

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Xu J, Xiang L, Liu Q, Gilmore H, Wu J, Tang J, et al. Stacked Sparse Autoencoder (SSAE) for Nuclei Detection on Breast Cancer Histopathology Images. IEEE Trans Med Imaging (2016) 35(1):119–30. doi: 10.1109/TMI.2015.2458702

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Reis S, Gazinska P, Hipwell J, Mertzanidou T, Naidoo K, Williams N, et al. Automated Classification of Breast Cancer Stroma Maturity From Histological Images. IEEE Trans Biomed Eng (2019) 64(10):2344–52. doi: 10.1109/TBME.2017.2665602

CrossRef Full Text | Google Scholar

4. Niazi MKK, Lin Y, Liu F, Ashok A, Bilgin A. Pathological Image Compression for Big Data Image Analysis: Application to Hotspot Detection in Breast Cancer. Artif Intell Med (2018) 95:82–7. doi: 10.1016/j.artmed.2018.09.002

PubMed Abstract | CrossRef Full Text | Google Scholar

5. Anji RV, Soni B, Sudheer RK. Breast Cancer Detection by Leveraging Machine Learning. ICT Express (2020) 6(4):320–4. doi: 10.1016/j.icte.2020.04.009

CrossRef Full Text | Google Scholar

6. Rahman MM, Ghasemi Y, Suley E, Zhou Y, Wang S, Rogers J. Machine Learning Based Computer Aided Diagnosis of Breast Cancer Utilizing Anthropometric and Clinical Features. IRBM (2020). doi: 10.1016/j.irbm.2020.05.005

CrossRef Full Text | Google Scholar

7. Das A, Nair MS, Peter SD. Sparse Representation Over Learned Dictionaries on the Riemannian Manifold for Automated Grading of Nuclear Pleomorphism in Breast Cancer. IEEE Trans Image Process (2019) 28:1248–60. doi: 10.1109/TIP.2018.2877337

PubMed Abstract | CrossRef Full Text | Google Scholar

8. Wang Z, Li M, Wang H, Jiang H, Yao Y, Zhang H. Breast Cancer Detection Using Extreme Learning Machine Based on Feature Fusion With CNN Deep Features. IEEE Access (2019) 7:105146–58. doi: 10.1109/ACCESS.2019.2892795

CrossRef Full Text | Google Scholar

9. Spanhol FA, Oliveira LS, Petitjean C, Heutte L. A Dataset for Breast Cancer Histopathological Image Classification. IEEE Trans Biomed Eng (2016) 63(7):1455–62. doi: 10.1109/TBME.2015.2496264

PubMed Abstract | CrossRef Full Text | Google Scholar

10. Pendar A, Behzad H, Alireza ME, Abdolhossein F. Representation Learning-Based Unsupervised Domain Adaptation for Classification of Breast Cancer Histopathology Images. Biocybern Biomed Eng (2018) 38:S0208521617304448. doi: 10.1016/j.bbe.2018.04.008

CrossRef Full Text | Google Scholar

11. Anuranjeeta A, Shukla KK, Tiwari A, Sharma S. Classification of Histopathological Images of Breast Cancerous and Non Cancerous Cells Based on Morphological Features. Biomed Pharmacol J (2017) 10(1):353–66. doi: 10.13005/bpj/1116

CrossRef Full Text | Google Scholar

12. Belsare AD, Mushrif MM, Pangarkar MA, Meshram N. Classification of Breast Cancer Histopathology Images Using Texture Feature Analysis. In: Tencon 2015-2015 IEEE Region 10 Conference. Macao, China: IEEE (2016) 1–5. doi: 10.1109/TENCON.2015.7372809

CrossRef Full Text | Google Scholar

13. Sharma M, Singh R, Bhattacharya M. Classification of Breast Tumors as Benign and Malignant Using Textural Feature Descriptor. In: 2017 IEEE International Conference on Bioinformatics & Biomedicine. Kansas City, MO, USA: IEEE (2017) 1110–3. doi: 10.1109/BIBM.2017.8217811

CrossRef Full Text | Google Scholar

14. Carvalho ED, Filho AOC, Silva RRV, Araújo FHD, Diniz JOB, Silva AC. Breast Cancer Diagnosis from Histopathological Images Using Textural Features and CBIR. Artif Intell Med (2020) 105:101845. doi: 10.1016/j.artmed.2020.101845

PubMed Abstract | CrossRef Full Text | Google Scholar

15. Sudharshan PJ, Petitjean C, Spanhol F, Oliveira LE, Heutte L, Honeine P. Multiple Instance Learning for Histopathological Breast Cancer Image Classification. Expert Syst Appl (2019) 117:103–11. doi: 10.1016/j.eswa.2018.09.049

CrossRef Full Text | Google Scholar

16. Fang J, Xu X, Liu H, Sun F. Local Receptive Field Based Extreme Learning Machine with Three Channels for Histopathological Image Classification. Int J Mach Learn Cybern (2019) 10(6):1437–47. doi: 10.1007/s13042-018-0825-6

CrossRef Full Text | Google Scholar

17. Spanhol FA, Oliveira LS, Petitjean C, Heutte L. Breast Cancer Histopathological Image Classification Using Convolutional Neural Networks. In: International Joint Conference on Neural Networks (Ijcnn 2016). IEEE (2016) 2560–67. doi: 10.1109/IJCNN.2016.7727519

CrossRef Full Text | Google Scholar

18. Kumar A, Singh KS, Saxena S, Lakshmanan K, Sangaiah AK, Chauhan H. Deep Feature Learning for Histopathological Image Classification of Canine Mammary Tumors and Human Breast Cancer. Inf Sci (2020) 508:405–21. doi: 10.1016/j.ins.2019.08.072

CrossRef Full Text | Google Scholar

19. Bardou D, Zhang K, Ahmad SM. Classification of Breast Cancer Based on Histology Images Using Convolutional Neural Networks. IEEE Access (2018) 6:24680–93. doi: 10.1109/ACCESS.2018.2831280

CrossRef Full Text | Google Scholar

20. Alom MZ, Yakopcic C, Nasrin MS, Taha TM, Asari VK. Breast Cancer Classification From Histopathological Images With Inception Recurrent Residual Convolutional Neural Network. J Digital Imaging (2019) 32(5):605–17. doi: 10.1007/s10278-019-00182-7

CrossRef Full Text | Google Scholar

21. Toğaçar M, ÖZkurt KB, Ergen B, Cömert Z. Breastnet: A Novel Convolutional Neural Network Model Through Histopathological Images for the Diagnosis of Breast Cancer. Phys A: Statal Mech its Appl (2019) 545:123592. doi: 10.1016/j.physa.2019.123592

CrossRef Full Text | Google Scholar

22. Benhammou Y, Achchab B, Herrera F, Tabik S. BreakHis Based Breast Cancer Automatic Diagnosis Using Deep Learning: Taxonomy, Survey and Insights. Neurocomputing (2020) 375:9–24. doi: 10.1016/j.neucom.2019.09.044

CrossRef Full Text | Google Scholar

23. Arau´jo T, Aresta G, Castro E, Rouco J, Aguiar P, Eloy C, et al. Classification of Breast Cancer Histology Images Using Convolutional Neural Networks. PloS One (2017) 12(6):e0177544. doi: 10.1371/journal.pone.0177544

PubMed Abstract | CrossRef Full Text | Google Scholar

24. Li Y, Wu J, Wu Q. Classification of Breast Cancer Histology Images Using Multi-Size and Discriminative Patches Based on Deep Learning. IEEE Access (2019) 7:21400–8. doi: 10.1109/ACCESS.2019.2898044

CrossRef Full Text | Google Scholar

25. Vo DM, Nguyen NQ, Lee SW. Classification of Breast Cancer Histology Images Using Incremental Boosting Convolution Networks. Inf Sci (2019) 482:123–38. doi: 10.1016/j.ins.2018.12.089

CrossRef Full Text | Google Scholar

26. Yan R, Ren F, Wang Z, Wang L, Zhang T, Liu Y. Breast Cancer Histopathological Image Classification Using a Hybrid Deep Neural Network. Methods (2020) 173:52–60. doi: 10.1016/j.ymeth.2019.06.014

PubMed Abstract | CrossRef Full Text | Google Scholar

27. Haralick RM, Shanmugam K, Dinstein I. Textural Features of Image Classification. IEEE Trans Sys Man Cybern (1973) 6:610–21. doi: 10.1109/TSMC.1973.4309314

CrossRef Full Text | Google Scholar

28. Soh LK, Tsatsoulis C. Texture Analysis of SAR Sea Ice Imagery Using Gray Level Co-Occurrence Matrices. IEEE Trans Geosci Remote Sens (1999) 37(2):780–95. doi: 10.1109/36.752194

CrossRef Full Text | Google Scholar

29. Clausi DA. An Analysis of Co-Occurrence Texture Statistics as a Function of Grey Level Quantization. Can J Remote Sens (2002) 28(1):45–62. doi: 10.5589/m02-004

CrossRef Full Text | Google Scholar

30. Hu MK. Visual Pattern Recognition by Moment Invariants. IRE Trans Inf Theory (1962) 8(2):179–87. doi: 10.1109/TIT.1962.1057692

CrossRef Full Text | Google Scholar

31. Tamura H, Mori S, Yamawaki T. Textural Features Corresponding to Visual Perception. IEEE Trans Sys Man Cybern (1978) 8(6):460–73. doi: 10.1109/TSMC.1978.4309999

CrossRef Full Text | Google Scholar

32. Ojala T, Pietikainen M, Maenpaa T. Multiresolution Gray-Scale and Rotation Invariant Texture and Classification with Local Binary Patterns. IEEE Trans Pattern Anal Mach Intell (2002) 24(7):971. doi: 10.1007/3-540-44732-6_41

CrossRef Full Text | Google Scholar

33. Guo Z, Zhang L, Zhang D. A Completed Modeling of Local Binary Pattern Operator for Texture Classification. IEEE Trans Image Process (2010) 19(6):1657–63. doi: 10.1109/TIP.2010.2044957

PubMed Abstract | CrossRef Full Text | Google Scholar

34. Dalal N, Triggs B. Histograms of Oriented Gradients for Human Detection. In: 2005 IEEE Computer Society Conference on Computer Vision & Pattern Recognition (CVPR'05). San Diego, CA, USA: IEEE (2005) 1:886–93. doi: 10.1109/CVPR.2005.177

CrossRef Full Text | Google Scholar

35. Gupta V, Bhavsar A. Breast Cancer Histopathological Image Classification: Is Magnification Important? In: 2017 IEEE Conference on Computer Vision & Pattern Recognition Workshops (CVPRW). Honolulu, HI, USA: IEEE (2017):769–76. doi: 10.1109/CVPRW.2017.107

CrossRef Full Text | Google Scholar

36. Das K, Conjeti S, Roy AG, Chatterjee J, Sheet D. Multiple Instance Learning of Deep Convolutional Neural Networks for Breast Histopathology Whole Slide Classification. In: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). Washington, DC, USA: IEEE (2018) 578–81. doi: 10.1109/ISBI.2018.8363642

CrossRef Full Text | Google Scholar

37. Das K, Karri S, Roy AG, Chatterjee J, Sheet D. Classifying Histopathology Whole-Slides Using Fusion of Decisions From Deep Convolutional Network on a Collection of Random Multi-Views At Multi-Magnification. In: 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017). Melbourne, VIC, Australia: IEEE (2017):1024–7. doi: 10.1109/ISBI.2017.7950690

CrossRef Full Text | Google Scholar

38. Cascianelli S, Bello-Cerezo R, Bianconi F, Fravolini ML, Kather JN. Dimensionality Reduction Strategies for CNN-Based Classification of Histopathological Images. Int Conf Intelligent Interact Multimed Syst Serv (2018) 76:21–30. doi: 10.1007/978-3-319-59480-4_3

CrossRef Full Text | Google Scholar

39. Wei B, Han Z, He X, Yin Y. Deep Learning Model Based Breast Cancer Histopathological Image Classification. In: 2017 IEEE 2nd International Conference on Cloud Computing and Big Data Analysis (Icccbda). Chengdu, China: IEEE (2017) 348–53. doi: 10.1109/ICCCBDA.2017.7951937

CrossRef Full Text | Google Scholar

40. Zhi W, Yueng HWF, Chen Z, Zandavi SD, Lu Z, Chung YY. Using Transfer Learning With Convolutional Neural Networks to Diagnose Breast Cancer From Histopathological Images. In: The 24th International Conference On Neural Information Processing. Guangzhou, China: Springer, Cham (2017) 10637:669–76. doi: 10.1007/978-3-319-70093-9_71

CrossRef Full Text | Google Scholar

41. Nahid AA, Kong Y. Histopathological Breast-Image Classification Using Concatenated R–G–B Histogram Information. Ann Data Sci (2018 6:513–29. doi: 10.1007/s40745-018-0162-3

CrossRef Full Text | Google Scholar

42. Han Z, Wei B, Zheng Y, Yin Y, Li K, Li S. Breast Cancer Multi-Classification From Histopathological Images with Structured Deep Learning Model. Sci Rep (2017) 7(1):4172. doi: 10.1038/s41598-017-04075-z

PubMed Abstract | CrossRef Full Text | Google Scholar

43. Boumaraf S, Liu X, Zheng Z, Ma X, Ferkous C. A New Transfer Learning Based Approach to Magnification Dependent and Independent Classification of Breast Cancer in Histopathological Images. Biomed Signal Process Control (2021) 63:102192. doi: 10.1016/j.bspc.2020.102192

CrossRef Full Text | Google Scholar

44. Song Y, Zou J, Chang H, Cai W. Adapting Fisher Vectors for Histopathology Image Classification. In: 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017). Melbourne, VIC, Australia: IEEE (2017) 600–3. doi: 10.1109/ISBI.2017.7950592

CrossRef Full Text | Google Scholar

45. Saxena S, Shukla S, Gyanchandani M. Pre-Trained Convolutional Neural Networks as Feature Extractors for Diagnosis of Breast Cancer Using Histopathology. Int J Imaging Syst Technol (2020) 30:577–91. doi: 10.1002/ima.22399

CrossRef Full Text | Google Scholar

46. Wang P, Wang J, Li Y, Li P, Li L. Automatic Classification of Breast Cancer Histopathological Images Based on Deep Feature Fusion and Enhanced Routing. Biomed Signal Process Control (2021) 65(6):102341. doi: 10.1016/j.bspc.2020.102341

CrossRef Full Text | Google Scholar

47. George K, Sankaran P, Joseph KP. Computer Assisted Recognition of Breast Cancer in Biopsy Images Via Fusion of Nucleus-Guided Deep Convolutional Features. Comput Methods Prog Biomed (2020) 194:105531. doi: 10.1016/j.cmpb.2020.105531

CrossRef Full Text | Google Scholar

Keywords: breast cancer, histopathological images recognition, feature extraction, low dimensional features, three-channel features

Citation: Hao Y, Qiao S, Zhang L, Xu T, Bai Y, Hu H, Zhang W and Zhang G (2021) Breast Cancer Histopathological Images Recognition Based on Low Dimensional Three-Channel Features. Front. Oncol. 11:657560. doi: 10.3389/fonc.2021.657560

Received: 23 January 2021; Accepted: 11 May 2021;
Published: 14 June 2021.

Edited by:

Francesco Rundo, STMicroelectronics, Italy

Reviewed by:

Chen Liu, Army Medical University, China
Alceu Britto, Pontifical Catholic University of Parana, Brazil

Copyright © 2021 Hao, Qiao, Zhang, Xu, Bai, Hu, Zhang and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Yanping Bai, YmFpeXA2NjZAMTYzLmNvbQ==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.