Cross-attention multi-branch CNN using DCE-MRI to classify breast cancer molecular subtypes

Purpose The aim of this study is to improve the accuracy of classifying luminal or non-luminal subtypes of breast cancer by using computer algorithms based on DCE-MRI, and to validate the diagnostic efficacy of the model by considering the patient’s age of menarche and nodule size. Methods DCE-MRI images of patients with non-specific invasive breast cancer admitted to the Second Affiliated Hospital of Dalian Medical University were collected. There were 160 cases in total, with 84 cases of luminal type (luminal A and luminal B and 76 cases of non-luminal type (HER 2 overexpressing and triple negative). Patients were grouped according to thresholds of nodule sizes of 20 mm and age at menarche of 14 years. A cross-attention multi-branch net CAMBNET) was proposed based on the dataset to predict the molecular subtypes of breast cancer. Diagnostic performance was assessed by accuracy, sensitivity, specificity, F1 and area under the ROC curve (AUC). And the model is visualized with Grad-CAM. Results Several classical deep learning models were included for diagnostic performance comparison. Using 5-fold cross-validation on the test dataset, all the results of CAMBNET are significantly higher than the compared deep learning models. The average prediction recall, accuracy, precision, and AUC for luminal and non-luminal types of the dataset were 89.11%, 88.44%, 88.52%, and 96.10%, respectively. For patients with tumor size <20 mm, the CAMBNET had AUC of 83.45% and ACC of 90.29% for detecting triple-negative breast cancer. When classifying luminal from non-luminal subtypes for patients with age at menarche years, our CAMBNET model achieved an ACC of 92.37%, precision of 92.42%, recall of 93.33%, F1of 92.33%, and AUC of 99.95%. Conclusions The CAMBNET can be applied in molecular subtype classification of breasts. For patients with menarche at 14 years old, our model can yield more accurate results when classifying luminal and non-luminal subtypes. For patients with tumor sizes ≤20 mm, our model can yield more accurate result in detecting triple-negative breast cancer to improve patient prognosis and survival.


Introduction
Breast cancer is one of the most prevalent cancers in women and one of the main causes of cancer-related death in women under the age of 45. There are nearly 410000 patients who die of breast cancer annually all over the world (1,2). Breast cancer is highly heterogeneous. The different molecular subtypes of breast cancer are significantly different in treatment, radiochemotherapy sensitivity, and prognosis (3,4). Luminal type A breast cancer subtype well responds to endocrine therapy, has a low risk of recurrence and metastasis, and has a good prognosis. Luminal type B well responds to endocrine therapy, but is more proliferative than luminal type A and may easily recur in the early stages (5). HER2-positive and triple-negative subtypes have a high malignancy grade and poor prognosis (6). Meanwhile, HER2-positive well responds to targeted molecular therapy. Therefore, it is important to distinguish between luminal and non-luminal breast cancer for accurate treatment.
Molecular typing of breast cancer mainly depends on immunohistochemical examination of biopsy specimens. Histopathological examination is not only invasive, timeconsuming, and expensive, but also easily leads to infection, hematoma, and other complications. Because of the great heterogeneity of the tumor, the biopsy tissue cannot fully represent the biological behavior of the tumor. MRI has the advantages of being non-invasive, resolving soft tissue well, and non-radiation, and it has unique advantages for breast examination. The studies show that MRI imaging features are helpful in identifying molecular subtypes of breast cancer. luminal A and luminal B masses are irregularly shaped and have burr-like edges (7). Triple-negative breast cancer usually shows a well-defined round mass with annular enhancement (8). Therefore, the prediction of molecular subtypes of breast cancer based on MRI features can effectively reduce the number of biopsies, alleviate the pain of patients, reduce the burden on patients, and provide a reference for individualized treatment.
However, predicting the molecular subtype of a tumor based on the MRI features of breast cancer is difficult because of two issues: (1) low contrast between the lesion area and normal tissue; (2) In clinical practice, the shapes of different subtypes of tumors are very similar, and the interpretation results of professional physicians are greatly influenced by the subjective factors of the physicians. So it is difficult to distinguish the molecular subtypes of breast cancer from the naked eye.
Many traditional machine learning algorithms have been applied to relevant breast cancer analysis tasks (9)(10)(11)(12)(13). However, these traditional machine learning methods rely on manual feature extraction with strong a priori, poor model generalization, and low robustness, which are difficult to find discriminating features manually and solve the classification of breast cancer subtypes. In recent years, with the development of deep learning technology, deep learning algorithms have been widely used in medical image processing, such as tumor detection (14)(15)(16) and segmentation (17-19), benign and malignant differentiation (20-23), etc. A lot of work (24)(25)(26)(27) has been devoted to the problems related to breast cancer subtype classification, They try to extract discriminative features of breast cancer MRI by using deep learning models. Their experimental results illustrate the feasibility of predicting the molecular subtypes of tumors based on the MRI features of breast cancer.
At the same time, because these models are only direct applications or simple modifications of existing models, they do not make targeted measures to address the above-mentioned problems in breast cancer subtype classification. Therefore, these models cannot distinguish well between the different molecular subtypes of breast cancer. So to solve the above issues and improve the performance of breast cancer subtype classification based on breast MRI images, we propose a new deep network model CAMBNET to extract high-level feature information and focus on lesion objects. The model includes a multi-branch module, a crossattention mechanism, and a deep feature extraction module. Specifically, using only a single branch for feature extraction may not be effective and the multi-branch module is used to extract richer features, In response to the problem of low contrast between lesions and normal areas in the data set and the very similar shape of lesions in different diseases, the attention mechanism has been widely used in similar problems. Therefore, we designed the crossattention module to help the network pay attention to salient objects, and as the depth of the model increases, the model can extract deeper features and further improve the feature extraction capability of the model. So the deep feature extraction module is used to further extract the deep features. Due to many limitations such as the rarity of the disease and the lack of appropriately labeled medical expertise, resulting in a relatively small dataset for breast cancer subtype classification, we chose a smaller depth of model layers and constructed our deep learning model.
Studies have shown that early menarche age increases the risk of luminal-type breast cancer, which may be related to endogenous estrogen exposure (28). Tumor size is one of the indices to evaluate the staging of breast cancer and has an important significance for the selection of surgical methods. And small tumors offer limited imaging options, which can easily lead to misdiagnosis. The earlier the age of menarche, the higher the rate of axillary lymph node metastasis and the worse the prognosis of breast cancer patients (29). Therefore, the initial aim of this study was to develop a new deep network model for predicting luminal and non-luminal subtypes of breast cancer using DCE-MRI images. We also investigated the diagnostic efficacy of different age groups for menarche (≤ 14 years and >14 years) and tumor size groups (≤ 20 mm and >20 mm).

Data collection
This is a retrospective study and is approved by the Second Affiliated Hospital of Dalian Medical University Ethics Committee. Non-specific invasive breast cancer patients admitted to our hospital from May 2017 to December 2019 were selected. The inclusion criteria consisted of: (1) patients with non-specific invasive breast cancer confirmed by biopsy or surgical pathology had complete immunohistochemical results and molecular subtypes were identified. (2) DCE-MRI was performed within a week before the operation. (3) complete clinical data, including age and menstrual status. The exclusion criteria consisted of (1) percutaneous biopsy or neoadjuvant chemotherapy or radiotherapy before MRI examination or (2) tumor was inconclusive because of artifacts or no visible region of interest (ROI) or (3) image quality was poor or (4) molecular typing of immunohistochemical data of pathologic diagnosis was incomplete. Ultimately, 160 patients with breast cancer were enrolled in the study, including 84 with luminal subtypes (luminal A and luminal B) versus 76 with non-luminal subtypes (HER2-positive and triple-negative).

MRI technique
Images were obtained with a 3.0-T magnetic resonance imaging scanner (Discovery 750W, GE). A special coil was used to scan the breast. Patients were in the prone position with the head tilted forward and the double breasts naturally suspended in the coil. T1WI, T2WI, DWI and 3D volume images of the breast (3D VIBRANT) were performed. The 3D VIBRANT scan parameters are as follows: TR 7.6 ms, TE 3.8 ms, layer thickness 1.2 mm, FOV 320 mm × 320 mm, flip angle 15°, matrix 288 × 288. The contrast agent was injected into the antecubital vein through a high-pressure injector (GE Company, USA). The flow rate of the contrast agent was 2 mL/s and the dose was 0.2 mmol/kg. After injection, 7 consecutive non interval scans were performed, each scan lasting 1 minute and 7 seconds.

Image processing
Region of interest (ROI) outlining the tumor region in MRI T1WI, T2W1, and DCE (selected third-stage enhanced images after contrast injection) by 2 senior diagnostic breast MRI physicians should include all tumor regions, including cystic and necrotic regions.
As the physician outlines the specific contour of the tumor, we derive the minimum matrix covering the tumor by extracting the most marginal points in the four directions of the contour. These four points were added 10 pixels in their respective directions to crop their contour areas, and the cropped images were uniformly adjusted to 64 × 64 pixels by bilinear interpolation. And the image is normalized by transforms. Normalize. The specific process is shown in Figure 1.

Data set partitioning
Breast images from 20% of the cases in the dataset were used as the test dataset, and 80% were kept as the training dataset while ensuring that no patient images appeared in both the training and test sets. The number of each sub-dataset in the dataset is shown in Table 1.

Deep learning model
In this paper, a multi-branch crossover network is proposed to extract high-level features. Two of the branches fuse the extracted Data set processing process. (A) is the original image, (B) is the specific outline of the tumor sketched by the physician, (C) is the four edge points in the specific outline, (D) is the minimum matrix covering the tumor, (E) shows that 10 pixels are added to each direction of the four points to crop the outline area. and (F) is the cropped image uniformly resized to 64×64 pixels by a bilinear interpolation method.
features after passing through the cross-attention module, and then the fused features are fused with the shallow features extracted by SFEpath to improve the classification performance of MRI images for two different subtypes. The proposed framework is shown in Figure 2. From Figure 2, we can see that our proposed network architecture consists of three main parts: the three-branch framework, the cross-attention module, and the deep feature extraction module. The specific model parameters are shown in Table 2. We will explain these modules in detail in the following sections.

Three-branch structure
Due to the limited amount of data in the dataset, an overly complex model is too easy to over-fit. So three light branching paths are designed. From Figure 3A, SFB refers to the Squeeze-and-Excitation module (30) and SFEpath is added to the branching framework to extract depth features as the input of depth concern, and part of the original input feature information is directly transferred to the output features by using the residual connection. The residual connection can simplify the difficulty of feature learning, protect the integrity of feature information to a certain extent, and alleviate the problem of model degradation in deep networks. It enables the model to better learn the shallow information such as the texture and shape of the breast image and makes the features extracted by the model richer.
Since the contrast between tumor and background is low, a network capable of extracting multiple depth features from different branches is needed. To obtain more depth features, a multi-branch network was designed using its two branches (called LTTpath1 and LTTpath2), inspired by the Inception model. That is, the main structure of this network uses asymmetric rotating c 1 ×n and n×c 1 filters to reduce the parameter size and computational cost, instead

Cross-attention Module
For the model to learn specific differences and relationships between different subtypes of breast cancer, the interference of irrelevant regions is suppressed. We propose a cross-attention mechanism that focuses on the salient features of each breast cancer subtype. Our proposed cross-attention module consists of a spatial attention module and a channel attention module, as shown in Figure 3B, where the spatial attention module and the channel attention module are single-path modules instead of the dual-path module of CBAM because in breast cancer subtype classification experiments we found that the combination of single-path with cross-path patterns is better than the combination of dual-path patterns or dual-path with crosspath patterns.
To suppress the interference of irrelevant regions, we further utilize the spatial attention module and the channel attention module. For channel attention, it suppresses the less informative channels by learning channel attention weights in the channels as feature selectors that indicate the importance of each feature channel, channel attention focuses on "what" is meaningful for a given input image. Unlike the channel attention module, the spatial attention module is concerned with "where" the information part is, and as a complement to the channel attention module, spatial attention obtains the importance of each spatial location by learning

Depth feature extraction module
The features learned from the three branches are fused according to their different characteristics. LTTpath1 and LTTpath2 complement each other's information through additive operations. Since the features extracted from SFEpath are shallow information such as the texture and shape of the breast image, the features extracted from SFEpath are used as complementary information to the fused features of LTTpath1 and LTTpath2. The fusion is superimposed by a concatenation operation to reduce the loss of information.
As shown in Figure 3A, the fusion of features from multiple branches by path4 reduces the size of the feature map by half and doubles the number of feature maps, maintaining the complexity of the network layer. The deep features are further extracted by increasing the number of channels and setting stride to 2 to remove the residual connected blocks, and finally, the extracted deep features are used for the classification of breast cancer subtypes.

Parameter setting
We implemented the proposed framework and conducted experiments using the Pytorch library. The parallel computation uses a GPU-equipped graphics processing unit (NVIDIA GeForce GTX 2060) to accelerate the processing of training and testing. The batch sizes for training and testing were set to 8 and 1, respectively. the maximum epoch time was set to 200, and the initialized learning rate was 0.002, multiplied by 0.95 every 10 epochs. we chose RMSprop as the optimizer for the training phase. Data overfitting is prevented by limiting the square size of kernel weights and using L2 regularization. The whole framework is trained in an end-to-end manner, and the model is trained with a backpropagation algorithm that saves the model parameters that perform best on the validation set. The whole training process takes 1 hour. The cross-entropy function is chosen as the classification loss function.   For each subtype of disease, we report five metrics, namely: Acc (Accuracy), Pre(Precision), Rec(Recall), F1(F1 score), and AUC (Area under the ROC curve).

Comparison of CAMBNET and classical CNN
The classification results of different methods according to the evaluation metrics are shown in Table 3. we also performed migration learning experiments on our proposed CAMBNET model. We selected 3200 images from the Breakhis database as the training set and 1010 images as the test set to initially train the model. During the training process, the training model parameters with the best classification results were saved, and then the model parameters were reloaded and further trained on the training set of the target dataset we collected. At the same time, this paper performs migration learning while performing real-time data enhancement on each breast MRI image in training. The main implementation method is to perform random rotation and flip along the diagonal of the image. According to Table 3, our model achieves the best results in all classification metrics, with an Accuracy of 88.44%, Precision of 88.52%, Recall of 89.11%, and F1 of 88.40%. Meanwhile, after transfer learning and data augmentation, the model can be further improved with 89.46% for Accuracy,89.83% for Precision, 90.35% for Recall, and 89.44% for F1. As shown in Figure 4, the CAMBNET model has the best performance with an AUC value of 96%.

Comparison of CAMBNET with other methods
Previous work has been done to classify breast cancer subtypes using existing machine learning methods or building deep learning models. Tianwen Xie et al (31) (32) built models for breast cancer subtypes for classification. We compared our method with the specific methods used in the three papers mentioned above, and the experimental results are shown in Table 4. As can be seen from Table 4, the accuracy of the traditional machine learning methods is significantly lower than the classification accuracy of the deep learning models, while our model exceeds the previously mentioned model methods in all metrics, which fully illustrates the accuracy of our model.

Multi-source data testing
We collected and collated 40 images acquired by a 1.5T magnetic resonance scanner (HDXT, GE, USA) and replaced 40 images of the  Receiver operating characteristic (ROC) curves of correlation networks in test dataset.
source data set with these 40 1.5T images, thus collating a multi-source data set. The CAMB model has also experimented with multi-source data. The specific experimental results are shown in Table 5, from which it can be seen that the indicators of the experiments have decreased. However, the CAMB model still achieves 82.29% accuracy on the multi-source data set, and the experimental results show that the CAMB model has strong robustness. We analyze that this is due to the fact that the model uses cross-attention mechanisms, multi-branch paths, dropout, feature fusion, and other measures to ensure the robustness of the model.
Effect of age at menarche on molecular subtype classification Table 6 shows the effect of the patient's age at menarche (≤14 and >14 years) on the classification effect of the CAMBNET model (33). The experimental results showed that the younger the age at menarche, the better the model classification effect, and the more significant the classification effect in distinguishing between luminal and non-luminal types. At the age of menarche >14 years, the CAMBNET model classified luminal and non-luminal types with an Accuracy of 82.58%, Precision of 83.06%, Recall of 82.85%, F1 of 82.57%, and AUC of 87.45%. The accuracy of the CAMBNET model in classifying luminal and non-luminal types was 92.37% for Accuracy, 92.42% for Precision, 93.33% for Recall, 92.33% for F1, and 99.95% for AUC for age at menarche ≤14 years. The accuracy was 69.23% in cases with age at menarche >14 years and 88.44% in cases with age at menarche ≤14 years.

Impact of tumor size on molecular subtype classification
As shown in Table 7

Visual analysis of CAMBNET
Although the CAMBNET model has achieved high accuracy in breast cancer subtype classification, the lack of visual analysis severely limits its application in realistic tasks. Therefore, we experimentally demonstrate the reliability and feasibility of this method by performing a visual analysis of the CAMBNET model.
First, we obtained the visual images of the feature shown in Figure 5, and the higher brightness in the feature map indicates higher attention and a higher contribution to the classification performance. Darker pixel regions such as blue indicate a smaller proportion of training and a smaller contribution to the classification performance. As shown in Figure 5, the focused regions of FEATURE MAP are consistent with the locations of key lesions that physicians focus on, which demonstrates that the method can well localize image features with clinical diagnostic value and proves the effectiveness of the CAMBNET model in breast cancer subtype classification.
To further demonstrate the effectiveness of the designed multibranch attention network, we visualized the learned features with the CAMBNET model and the ResNet34 (34), DenseNet121 (35), Vgg16 (36) networks which have better performance in classification in the classical model by Grad-CAM. Grad-CAM is a gradient-weighted class combining gradient information with the feature mapping activation mapping method. Given an input  sample, Grad-CAM first calculates the gradient of the target class for each feature map in the last convolutional layer and performs a global average pooling of the gradients. The importance weight of each feature map is obtained by global average pooling. Then, the weighted activation of the feature maps is calculated based on the importance weights to obtain a gradient-weighted class activation map. The gradient-weighted class activation map can be used to locate the important regions with class discriminative properties in the input samples. The results are shown in Figure 6, and we can see that the focus region of our designed multi-branch attention network is mainly on the tumor itself compared with other classical networks. Meanwhile, the focus of other models is often not on the tumor itself but on other non-focus regions. This indicates that compared with other classical models, the CAMBNET model can better learn the features of important regions and focus on the discriminative features between different subtypes, and finally achieve accurate classification of subtypes.

Discussion
It has been reported that the histological features based on DCE-MRI images of the breast are helpful to differentiate the molecular types of breast cancer. The average prediction specificity, accuracy, precision, and area under the ROC curve were 0.958, 0.852, 0.961, and 0.867, respectively. Another study (26) also used a convolutional neural network (CNN) algorithm to predict the molecular subtype of   breast cancer based on the MRI features of breast cancer and achieved good diagnostic efficiency. The above study demonstrates the feasibility of using deep learning to classify different molecular subtypes of breast cancer. To further improve the performance of breast cancer subtype classification based on breast MRI images, we propose a new deep network model to extract high level feature information and focus on lesion objects. Experiments conducted on the MRI dataset of 160 clinical breast tumor patients obtained from the Second Hospital of Dalian Medical University showed that the recall, accuracy, precision, and area under the ROC curve of our method were 89.11%, 88.44%, 88.52%, and 96.10% for luminal and non-luminal types. The above experimental results verify the effectiveness of the model, and we used transfer learning and data augmentation for the CAMBNET model to further improve the model's ability to classify breast cancer subtypes. Among them, Accuracy of 89.46%, Precision of 89.83%, Recall of 90.35%, and F1 of 89.44%. Histopathological analysis of breast cancer has achieved high accuracy in recent years. Chuang Zhu et al. (38) proposed a method for histopathological image classification of breast cancer by combining multiple compact convolutional neural networks (CNN). Mustafa I. Jaber et al. (39) developed a deep learning method for subtype classification of tumors using only breast biopsy tissue sections. Related work achieved high accuracy rates and we compared these models with our model and showed that the results achieved by both are comparable. However, MRI has the advantage of being noninvasive and fast, whereas histopathological images are invasive and also have slower feedback of results, so our work still has a relative advantage. And the interpretability of the machine learning results was achieved through the visual analysis of the CAMBNET model. It shows that the method is reliable and feasible.
TNM staging of breast cancer is of great importance for guiding treatment, evaluating the curative effect, and assessing prognosis (40). Whether the maximum diameter of the tumor is more than 2 cm is the most important index for distinguishing T1 from T2. TNBC has the highest invasiveness and the worst prognosis. Some studies (41) have found that the diameter of the primary tumor of TNBC positively correlates with the axillary lymph node metastasis rate. When the tumor diameter exceeds 2 cm, the ipsilateral axillary lymph node metastasis rate increases by 50% (42, 43). In this study, our model performed best in distinguishing TNBC from NTNBC in the group with tumor diameter ≤ 2 cm. The accuracy is 90.29% and the AUC value is 83.45%, which is helpful for the early diagnosis and treatment of TNBC, improving the prognosis and survival rate of patients. Early age of menarche is one of the risk factors for breast cancer (44). The younger the age of menarche, the earlier a woman is exposed to estrogen. Studies have shown that the earlier the age of menarche, the worse the degree of differentiation and prognosis of breast cancer patients, and the higher the rate of axillary lymph node metastasis (29). Studies have reported that the earlier the age of menarche, the higher the incidence of non-luminal breast cancer and the higher the malignancy, and the worse the prognosis of non-luminal breast cancer compared to luminal breast cancer. For the classification of luminal and non-luminal breast cancer, our results show that the ACC value for menarche age ≤ 14 years is 92.37%, precision is 92.42%, recall is 93.33%, F1 is 92.33%, and AUC is 99.95%. At the same time, we investigated the classification efficiency of our model in luminal type A and non-luminal type A. The results showed that the ACC diagnostic efficiency for menarche age ≤ 14 years was 88.44%, precision 83.45%, recall 65.11%, F1 69.35%, and AUC 85.68%. Our model is more valuable in classifying luminal and non-luminal types of breast cancer patients with menarche age ≤ 14 years.
In this study, there are also some limitations, firstly, only the classification of intraluminal subtypes/non-luminal subtypes was performed in this paper, because the dataset is not sufficient relative to the task of four subtypes. Moreover, annotating such images is laborious and time-consuming, and subsequent work can be performed for weakly supervised or unsupervised learning. Meanwhile, the authors have some textual supplementary information at hand, which can be considered for subsequent experiments to be applied by distillation learning and other methods.

Conclusion
In summary, the experimental results show that our novel deep learning algorithm based on multi-branch feature fusion and attention mechanism has high accuracy in predicting molecular subtypes of breast cancer, Our model might be more valuable in classifying luminal from non-luminal subtypes for patients with age at menarche ≤14 years. For patients with tumor sizes ≤20 mm, our model could be helpful in more accurately detecting triple-negative breast cancer to improve patient prognosis and survival. So our novel deep learning algorithm has greater potential for future clinical applications. In the near future, we will collect more data to build a larger and more comprehensive breast cancer subtype database to better study the problem of breast cancer subtype classification, aiming to comprehensively assist physicians in the clinical diagnosis and treatment of breast cancer subtypes.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement
The studies involving human participants were reviewed and approved by Ethics Committee of the Second Hospital of Dalian Medical University Second Hospital of Dalian Medical University. The patients/participants provided their written informed consent to participate in this study.