Dual-Branch Convolutional Neural Network Based on Ultrasound Imaging in the Early Prediction of Neoadjuvant Chemotherapy Response in Patients With Locally Advanced Breast Cancer

The early prediction of a patient’s response to neoadjuvant chemotherapy (NAC) in breast cancer treatment is crucial for guiding therapy decisions. We aimed to develop a novel approach, named the dual-branch convolutional neural network (DBNN), based on deep learning that uses ultrasound (US) images for the early prediction of NAC response in patients with locally advanced breast cancer (LABC). This retrospective study included 114 women who were monitored with US during pretreatment (NAC pre) and after one cycle of NAC (NAC1). Pathologic complete response (pCR) was defined as no residual invasive carcinoma in the breast. For predicting pCR, the data were randomly split into a training set and test set (4:1). DBNN with US images was proposed to predict pCR early in breast cancer patients who received NAC. The connection between pretreatment data and data obtained after the first cycle of NAC was considered through the feature sharing of different branches. Moreover, the importance of data in various stages was emphasized by changing the weight of the two paths to classify those with pCR. The optimal model architecture of DBNN was determined by two ablation experiments. The diagnostic performance of DBNN for predicting pCR was compared with that of four methods from the latest research. To further validate the potential of DBNN in the early prediction of NAC response, the data from NAC pre and NAC1 were separately assessed. In the prediction of pCR, the highest diagnostic performance was obtained when combining the US image information of NAC pre and NAC1 (area under the receiver operating characteristic curve (AUC): 0.939; 95% confidence interval (CI): 0.907, 0.972; F1-score: 0.850; overall accuracy: 87.5%; sensitivity: 90.67%; and specificity: 85.67%), and the diagnostic performance with the combined data was superior to the performance when only NAC pre (AUC: 0.730; 95% CI: 0.657, 0.802; F1-score: 0.675; sensitivity: 76.00%; and specificity: 68.38%) or NAC1 (AUC: 0.739; 95% CI: 0.664, 0.813; F1-score: 0.611; sensitivity: 53.33%; and specificity: 86.32%) (p<0.01) was used. As a noninvasive prediction tool, DBNN can achieve outstanding results in the early prediction of NAC response in patients with LABC when combining the US data of NAC pre and NAC1.


INTRODUCTION
Breast cancer is the most common cause of cancer-related death among women worldwide (1). Neoadjuvant chemotherapy (NAC) has been used as a systematic preoperative treatment for patients with locally advanced breast cancer (LABC) (2). NAC has the advantage of downsizing breast cancers, thus allowing breast-conserving surgery and assessments of the response to chemotherapy during treatment. The achievement of pathologic complete response (pCR) may be a potential independent predictor of better disease-free survival (DFS) and overall survival (OS), especially in patients with triple-negative and human epidermal growth factor 2 (HER2)-enriched breast cancer (3). However, even with the continuous improvements in chemotherapy regimens, the number of patients who achieve pCR remains low (4). Due to the different molecular types and histopathology of breast cancer, the response to chemotherapy may be different. Therefore, identifying patients with superior responses to NAC early has naturally become one of the current hotspots of study.
The optimal method for monitoring the response to NAC has not been established (5). Imaging examination can be used as one of the primary assessment methods. Magnetic resonance imaging (MRI), US, and positron emission tomography (PET)/computed tomography (CT) have been used as evaluation tools (5)(6)(7). However, imaging examinations have limitations when used clinically because image interpretation is mainly based on a radiologist's visual assessment and is not standardized. Furthermore, MRI and PEC/CT are expensive, and PEC/CT is radioactive, making them impractical for frequent scans of patients receiving NAC. Among those methods, ultrasound (US) may become the primary monitoring tool due to its reusability, versatility, sensitivity, and safety.
With the continuous development of deep learning, computeraided diagnosis (CAD) has become an important research topic, especially in breast cancer research. CAD research has involved the classification (8), segmentation (9), and detection (10) of breast tumours. Especially for classification tasks, which mainly focus on the differentiation of benign and malignant breast tumours, CAD has attracted increasing attention from researchers (11). Deep convolutional neural networks (CNNs) have been widely applied to many healthcare and medical imaging works, leading to stateof-the-art results (12- 16). The classification operation procedure of a CNN is that an input image is fed into the CNN to learn essential features and save these parameters as weights and biases to classify images (17). Recently, with the help of deep learning methods, there have been several published studies for predicting breast cancer treatment responses based on PET/CT and MRI images (18)(19)(20). El Adoui M et al. introduced a two-branch CNN for the early prediction of breast cancer response to chemotherapy using DCE-MRI volumes acquired before and after chemotherapy (18). Braman N et al. developed a CNN for predicting pCR to HER2-targeted NAC with pretreatment DCE-MRI (19). Choi J H et al. used a CNN algorithm based on Alexnet to predict responses to NAC for advanced breast cancer using PET and MRI images (20). Those studies have shown that deep learning has emerged as a promising tool for breast cancer response prediction.
High-resolution breast US images contain rich texture and echo features that, when combined with deep learning techniques, may potentially be used to achieve a highly accurate and noninvasive NAC response detection method. At present, there are some studies about the use of CAD with US images for predicting the response of breast cancer to NAC (21)(22)(23). However, most of these studies focus on feature engineering work based on semiautomatic intermediate steps, and the technique is labour intensive and time consuming. The accuracy of a deep network has far exceeded that of a traditional machine learning method based on handcrafted features (8). However, in the learning process of existing deep learning models, the correlation and importance of the data during different chemotherapy courses have been ignored, and the characteristics of the data have not been well grasped. The purpose of our study is to construct a novel deep learning-based approach named the dual-branch convolutional neural network (DBNN) based on US images at different stages of chemotherapy for the early prediction of NAC in patients with LABC.

Study Participants
This retrospective single-centre study was approved by the Ethics Committee of ShangHai RenJi Hospital (ShangHai P.R. China), and the requirement for written informed consent was waived. Between February 2015 and June 2019, we enrolled 132 women with LABC who were treated with NAC and surgical resection at our institution. The eligibility criteria were as follows: (a) patients with breast cancer aged 18 to 80 years; (b) patients with histologically confirmed breast cancer and no history of treatment for breast cancer; (c) patients for which US was performed during NAC; and (d) after NAC, the patients underwent surgery and a pathological evaluation was performed. Of the 132 patients, 18 were excluded for the following reasons: (a) US was performed at an outside hospital (n= 3); (b) no midtreatment US data were available (n= 12); and (c) the US images were of poor quality (n=3). A total of 114 patients (age range: 26-72 years; mean age: 49.92 years) comprised the study group. (Figure 1).

US Examination
The ultrasonography examinations were performed using MyLab Twice (Esaote, Genoa, Italy) with a 4-13-MHz LA523 linear transducer by an experienced radiologist at the Department of Ultrasound (C.F.W. with 10 years of experience in breast US). In this study, US images were collected before and after the first course of chemotherapy. The US images of pCR and non-pCR samples collected at different treatment stages are shown in Figure 2. The primary dataset called Renji NAC (RJNAC) contains 1936 (968×2 stages) US images (800×608 pixels) at different treatment stages, including 968 US images at each stage, with an average of 16 to 20 images per patient. For the prediction of pCR, the dataset was randomly split into training data (80%) and test data (20%) (a ratio of 4:1). That is, when dividing the dataset, the pCR and non-pCR ratios in the samples were kept close. In the training set and the test set, the pCR and non-pCR ratios were both approximately 0.63. Specifically, each stage of the training set contained 776 images, including 300 pCR images and 476 non-pCR images, while the test set contained 192 images, including 75 pCR images and 117 non-pCR images. (Figure 1).

Data Preprocessing
The data collected in this study are ultrasonic video data. To input it into the neural network, we perform a video frame cutting operation on the video data (24)(25)(26). Four preprocessing steps are applied before starting the training process. As detailed in Figure 3, the first step is to cut the video with different time lengths according to the fixed frame interval to form an indefinite number of M ultrasonic images. The second step is to select N high-quality breast tissue images by removing some images containing artifacts, blur, and non-lesion tissue. Blind to the patients' private information and pathological results, two professional radiologists (Q.D. and C.F.W. with five and ten years of experience in breast US, respectively) independently read the breast US images. They reach a consensus through discussion to ensure the correctness and repeatability of the dataset. The N of two stages of each patient must be the same but can vary for different patients, depending on how many clear and usable mass images were contained in the indefinite number of M images of different patients. The change of N among different patients does not affect the model learning. N images of two stages are paired sequentially to ensure that the image pairs of each pair are closest in the video time sequence. The third step is that, after removing the nonrelevant breast tissue information, such as the model number of the instruments, time of scanning or imaging, and patient information, we retain the remaining information as a region of interest (ROI). In addition, the resolution of ROI images obtained after video processing is consistent with the resolution of ROI images obtained by static single frame cropping, both of which are 445×445 pixels. Finally, we use the median filter (27) to denoise the US images and preserve edge information. All US images are represented as greyscale images with sizes of 128 × 128 before being fed into the deep neural network.

Dual-Branch Convolutional Neural Network
In the prediction of NAC response, the existing studies failed to take advantage of the correlation among multistage data and the importance of data at each chemotherapy stage (5,(28)(29)(30). To solve this problem, we developed a model named DBNN based on feature sharing and weight assignment to predict chemotherapy response by utilizing US images before and after the first stage of chemotherapy (NAC pre and NAC 1 , respectively). Dual branches were designed to extract data features from NAC pre and NAC 1 . There are feature-sharing modules between different branches so that the model could fully use the correlation of the data from each stage. In addition, the model has a weight assignment module, which considers the  importance of different branch features and provides prior knowledge for accurate classification. As shown in Figure 4, the DBNN architecture is composed of two branches that take a 128 × 128 breast tumour ROI cropped from NAC pre and NAC 1 images as input. Each path contains four convolution blocks, which contain nine convolutional layers in total. Batch normalization layers (31) follow each convolutional layer to speed up network convergence, and a rectified linear unit (ReLU) activation function (32) is used to increase the nonlinearity of the network. Then, these layers are followed by four max-pooling layers (33), where each maxpooling layer is used to perform image downsampling. Furthermore, DBNN has two fully connected layers for feature weighting, and features are shared between each branch by feature fusion.
The details of DBNN feature sharing are shown in the black dotted box in Figure 5. DBNN consists of four convolutional blocks, and the input of each block is the output of the previous block (except for Block 1, where the input is US images from NAC pre and NAC 1 ). Sixty-four kernels are used for each convolutional layer in Block 1, 128 for each layer in Block 2, 256 for each layer in Block 3 and 512 for each layer in Block 4, and each kernel has a size of 3 × 3. An US image is input into the respective branch at each stage. Then, the fusion feature map is trained through the convolutional layer, batch normalization layer, and ReLU function and finally downsampled and input into the other blocks until the convolution operation is completed.
First, the network starts from the input layer and is expressed as: where X denotes the input of NAC pre and Y denotes the input of NAC 1 . Then, C 0 and C 0 0 are input to their respective convolution layers, and features are extracted through the convolution kernel.
Finally, the feature maps C 1 and C 0 1 are generated. The formula is expressed as: where C i and C where C j and C 0 j represent the feature maps of layer j, j ϵ{2,4,6,9}. C j-1 and C 0 j−1 are used as inputs of the next layers, C j and C 0 j , respectively. After each convolution block, we obtain C k and C 0 k and input them into the max-pooling layer to reduce the number of parameters of the feature map: where C k and C 0 k represent the feature maps of layer k, k ϵ{2,4,7,9}.
In contrast to the fusion method in the fully connected layer, DBNN shares the features between each branch; that is, it uses fusion when extracting low-level features. As a result, the model could be trained effectively to screen out crucial features, including changes in lesion areas before and after  NAC treatment, thus affecting the prediction results of chemotherapy response. As shown in Figure 6, the weight fusion strategy of DBNN is uncomplicated, and the black dotted box shows the details of the red dotted box. First, the feature vector F(X) from the NAC pre branch and the feature vector F(Y) from the NAC 1 branch are input, and then the updated feature vectors F(X') and F(Y') are obtained by multiplying the two feature vectors by a(0.2) and b(0.8), respectively. Finally, the sum operation is performed on the updated features to obtain the feature vector F(Z) which is fused with the two branches. The process is expressed by the formula: After the fully connected layer, we used a dropout strategy (34) (with a rate of 0.5), which helps to prevent the model from overfitting during training. Then, the two branches were summed after the fully connected layer with 1024 hidden units, and a softmax function was applied for pCR classification.
The performance of machine learning algorithms is primarily affected by their hyperparameters because their performance will be inferior without optimal hyperparameter values (35). In particular, the deep learning model relies on good hyperparameter values to accelerate the convergence of the model and achieve optimal performance. To compile and evaluate each model, we use cross entropy (36) as the loss function and a standard accuracy metric that calculates the mean accuracy rate across all predictions. Table 1 shows the hyperparameter setup. The loss curves show no overfitting or underfitting in our model ( Figure 7).
All experiments were performed on a Dell T640 tower server deep learning workstation with two NVIDIA GeForce RTX 2080Ti independent graphics cards and two Intel Xeon Silver 4110 CPUs, with RAM extended to 64 GB. The experimental  platform was in Python version 3.7. DBNN was implemented by PyTorch, which is a deep learning platform.

Histopathologic Assessment
A pathologist with more than 20 years of experience in breast pathology assessed the histologic results. All pathologic results from outside biopsies were reviewed at our institution. Tumour pathologic characteristics were obtained from histopathologic reports of US guided core biopsies performed before NAC. The histologic type, grade, and expressions of HER2, the oestrogen receptor (ER), the progesterone receptor (PR), and antigen Ki67 were assessed. Tumours with >1% nuclear staining were denoted as ER/PR positive. The cut-off point for Ki-67 high expression was 30%. In terms of HER2 expression, tumours were considered HER2 negative if they had a score of 0 or 1+ during the immunohistochemical (IHC) examination, and a score of 3+ indicated that the tumour was HER2-positive. If the HER2 status was equivocal (IHC score: 2+ or 1 + to 2+), further investigation using in situ hybridization (ISH) was required. In our study, pCR was defined as no residual invasive carcinoma in the breast at surgical resection. Molecular subtypes were classified according to the St. Gallen Consensus (38).

Statistical Analysis
Our statistical analysis was performed using IBM SPSS Statistics 22 (Armonk, NY, USA). Clinicopathological characteristics and US images before and after the first stage of chemotherapy, including maximum tumour diameter and tumour histologic type, were collected. The continuous variables were described as the range, mean and standard deviation, while the categorical variables were reported as counts with percentages. T-tests, chi-squared tests, or Fisher exact tests for independent samples were used to determine significant differences between the pCR and non-pCR groups. To evaluate the performance of the developed models, we calculated six performance metrics: accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1score. The predicted performance was assessed by using receiver operating characteristic (ROC) curves, and the area under the curve (AUC) scores were compared. Then, the results were analysed to select the best model to predict NAC response in patients with breast cancer utilizing breast US images. P <.05 was considered to indicate a significant difference. The performance results of the model and other methods were compared by using the Mann-Whitney U test. The 95% CIs for AUC were estimated by using the DeLong method (39)(40)(41). Statistical computing was implemented with the Scipy package, a Python-based open-source data processing tool. For the prediction of pCR, DBNN was trained on the training set and then validated on the test set.
F1-score conveys the balance between PPV and sensitivity. The closer the value is to 1, the better the performance of the method. The F1-score equation is defined as follows:

Patient Characteristics
One hundred and fourteen women comprised the final study group (age range: 26-72 years; mean age: 49.92 years). The median maximum diameter of the tumours in the pretreatment US images was 3.82 cm (range: 1. 35-8.2cm). The patient characteristics and  the sizes of the tumours in the pCR and non-pCR groups are listed in

Performance Analysis of DBNN Feature Sharing
As mentioned above, it can be understood that the number of layers in a CNN has a specific impact on the prediction and classification performance of the model. Thus, CNNs with different numbers of layers were designed in this experiment. The experimental results were compared to determine the best layer number for the dual branch network. The performance of different convolution layer numbers is shown in the first five rows of Table 5. It can be seen that with the deepening of the network, the performance indices of the dual branch model increased first and then decreased in general. Here, X denotes the number of layers of each branch network in CNN-X. CNN-9 performs the best out of the models with different numbers of layers, and it has an accuracy of 81.77%. Moreover, it also ranks the highest in specificity, PPV, and F1-score. Therefore, in this study, the nine-layer CNN was selected as the backbone of the model. Next, the influences of feature sharing and the weight assignment strategy on the model are explored. At present, there are many methods of feature sharing, including feature element sum and feature concatenation, which are the classic feature fusion methods (42)(43)(44)(45)(46). Thus, we also explored the influence of two different strategies on model performance. In the last two rows of Table 5, the performance comparison results of the model with different feature-sharing strategies are shown. CNN-9 FSS represents the CNN model that uses the feature element sum method, while CNN-9 FSC represents the CNN model that uses the feature concatenation method. Table 5 shows that the model achieves better performance when the feature element sum method is used. The accuracy, sensitivity, NPV, and F1-score values were higher than those obtained by the CNN with feature concatenation and CNN-9 without feature sharing. Therefore, DBNN adopts the feature element sum method as its feature-sharing method.

Weight Assignment of DBNN Feature Connection
DBNN is a dual-branch network with two inputs and one output, and the two inputs are NAC pre and NAC 1 chemotherapy data. The output is the probability of predicting pathological results. Therefore, a feature map from each branch network needs to connect the features and then maps from a high-dimensional vector to a low-dimensional vector to complete the classification task. We compared the experimental results of the feature element sum method, feature concatenation method, and feature weight assignment method of the dual-branch network to explore different feature connection methods (see Table 6). CNN-9 FSS_concat represents the CNN model with the feature concatenation method, and CNN-9 FSS_sum represents the CNN model with the feature element sum method. CNN-9 FSS (A, B) represents the CNN model with the weight connection method, where A is the weight of the NAC pre branch and B is the weight of the NAC 1 branch. As shown in Table 6, when the feature weight of the NAC pre branch is 0.2 and when that of the NAC 1 branch is 0.8, the model's performance is the best, with an accuracy of 87.50%. In addition, the F1-score is higher than that of the other models, which may be because NAC 1 stage data contributed more to the  prediction than NAC pre stage data. It can be seen from the last nine rows of Table 6 that the average accuracy and F1-score values are superior when the NAC 1 branch is heavier than the NAC pre branch. Therefore, the method of weight connection is adopted in the model, and the experimental results show that this method can achieve the best results. In the following experiments, CNN-9 FSS (0.2, 0.8) is called DBNN.

Results of DBNN Data Augmentation
As stated earlier, there was a data imbalance problem in RJNAC.
The amount of data with non-pCR pathological results was approximately twice that with pCR pathological results, affecting the model's performance. Therefore, we explored the impact of different data augmentation strategies on the performance of DBNN. The experimental results were compared using nonaugmented data, geometrically transformed data (47), Mixup data (48), and small amounts of upsampled data. Geometric transformation techniques include rotations, flips, and zooming to generate new training samples to maintain realistic tumour shapes. Moreover, small amounts of data upsampling techniques apply geometric transformations to non-pCR examples to achieve a quantity balance between the two categories, solving the data imbalance problem manually. As seen from Table 7, the performance of the model is better without data augmentation. First, it can be seen that the performance of the model on nonaugmented data was better than that of the model on geometrically transformed data. Augmenting both types of data aggravate the data imbalance, leading to degradation in the performance of the model; hence, Mixup data augmentation also degrades model performance.
In addition, Mixup may not be suitable for the augmentation of medical datasets because it disturbs the relationship between a lesion and the surrounding area, making the model learn incorrect information. Finally, we enhance the sample size of the two types of data so that they are consistent by sampling small numbers of samples. The experimental results on the augmented data were not as good as the results on the nonaugmented data. Perhaps DBNN learns the redundant features of the data during the learning process, resulting in model performance degradation.

Comparison With the Single Branch Models
To further validate the potential of DBNN in predicting the efficacy of NAC, it was used to predict the pathological classification of patients early based on the different stage data of NAC treatment in the RJNAC dataset. Compared with the AUC value in the first two rows and the last row in Table 8, we know that the model's prediction results when using a single branch network for single-stage data were not as good as those when using multistage data. In addition, the performance of the model trained on the NAC 1 data was slightly superior to that trained on the NAC pre data when using single-stage data, which indicates the necessity of DBNN weight assignment. From Table 8 and Figure 8, we can see that the areas under the ROC curve for NAC pre (Az pre ), NAC 1 (Az 1 ) and NAC pre +NAC 1 (Az pre+1 ) were 0.730, 0.739 and 0.939, respectively. The performance of the model trained on the NAC 1 data shows higher specificity than that trained on the NAC pre data. The sensitivity of the model trained on NAC pre was superior to that trained on NAC 1 data. The value  of Az pre+1 was significantly higher than that of Az pre and Az 1 (P <0.01). However, there was no significant difference between the values of Az pre and Az 1 (P =0.3244). Moreover, some more sophisticated deep learning models were tested for single branch classification due to CNN-9 was used to train single branch models. The experimental results are shown in Table 9. The AUC of CNN-9 on NAC pre data was the highest (AUC=0.730), and the AUC value on NAC 1 data was very close to the optimal value (0.739 vs. 0.756). Therefore, we believe that CNN-9 can be used as a representative of the classical single branch network.

Comparison With the Latest Studies
At present, there are few studies on the prediction of NAC response for breast cancer based on US images, and the datasets used in each study and each imaging protocol are different, so it is difficult to compare the results directly. However, to verify the research value of DBNN, this study referred to the four latest papers, reproduced the methods according to the technical details described in the articles, and applied them to the RJNAC dataset (5,18,19,28). Two identical Inception-ResNet-V2 CNNs based Siamese models without fine-tuning were reimplemented to extract generic features. Then the difference between the feature vectors was used to train a logistic regression model for the prediction (5). We reimplemented a two-input CNN, in which each input branch consisted of four blocks of 2D convolution layers, each followed by a ReLU activation function and maxpooling layer. A dropout layer was applied after every two convolutional blocks. Then, the two branches were concatenated after a fully connected layer followed by ReLU, dropout (with a rate of 40%), and a Sigmoid function for the final classification (18), while two dense layers were processed to yield the final output (19). The developed multi-input deep learning architecture contained two parallel sub-architectures with similar layers to the single architecture, consisting of six blocks with multiple convolutional layers, each followed by a ReLU activation function and max-pooling layer. Then, a concatenation was applied between two single architectures, a dropout of 50%, and a fully connected layer was used at the end of the network to provide a classification result (28). In these studies, three of the approaches were based on MRI data (18,19,28), and one was based on US image data (5). In Figure 9, the area under the ROC curve for DBNN (Az DBNN ) was significantly higher than that of Az Byra (5) (P =0.004). However, there was no significant difference in the values of Az DBNN and the values of the area under the ROC curve for the other methods ( Table 10). We also show the prediction results and pathology labels of the model on NAC pre and NAC 1 images and the probability of the model output prediction results in Figure 10.

DISCUSSION
The early prediction of chemotherapy response in patients with breast cancer is crucial for improving and personalizing patient treatment. In this study, a novel deep learning method, DBNN, based on US images for the early prediction of NAC response in patients with breast cancer was proposed and validated. The experimental results showed that the best prediction performance was obtained with the DBNN model using feature sharing and weight assignment. It was worth noting that all the performances shown in Tables 5-10 were from the test set. The highest diagnostic performance was obtained when the US image information of NAC pre and NAC 1 was combined, in which the accuracy, sensitivity, specificity, F1-score, and AUC values were 87.50%, 90.67%, 85.67%, 0.850, and 0.939, respectively.
The DBNN approach for the early prediction of NAC proposed in this study has several advantages.
First, compared to the previous traditional machine learning methods, which mainly depend on feature engineering and require domain knowledge to build feature extractors, our deep learning approach is automatic and does not require feature engineering. Methods based on machine learning are limited in their function, as they are dependent on handcrafted features. Moreover, our model considers not only the tumoral region but also the tumour's surrounding tissue by using entire breast tumour images. Supplementing the US features extracted from a tumour itself with features computed within the tumour's  surrounding tissue, such as the peritumoural region, may improve the prediction of pCR from US images (49,50). Second, different from the existing deep learning algorithms, DBNN fuses features of each branch in the process of extracting low-level features, which may effectively screen out important features through the training network to achieve more accurate early prediction results. Third, in contrast to the existing methods for predicting NAC response using two-stage data, we assume that the importance of the data before and after chemotherapy is inconsistent. Therefore, DBNN introduces the weight assignment strategy to increase the weight of data features after chemotherapy by using prior knowledge to guide network training to affect the NAC response prediction results.
It is difficult to directly compare our results to those of other methods reported in other studies due to different data acquisition techniques, analysis protocols and subject groups. Moreover, there are few studies that use deep learning for NAC response early prediction in breast cancer based on US images. Nevertheless, we can compare our results with those of models trained on our datasets. The studies performed by El Adoui et al. (28), Braman et al. (19), and El Adoui et al. (18) were based on MRI data, and the study designed by Byra et al. (5) was based on US image data. All four methods are two-input CNN architectures for the prediction of breast tumour NAC response from follow-up images. Each branch was operated on by a series of convolution-based operations and summarized into a set of deep features, which were then combined and processed by the feature fusion of two branches to generate a final score representing the response probability. However, those methods only considered the late fusion of deep features. The models cannot effectively share data features at different stages in their respective branches and may even filter out crucial features, such as changes in lesion areas. Therefore, they cannot make full use of the relationship between different data for model training. In Table 10, comparisons of the performance of the state-of-the-art methods and our method were made based on seven indices: accuracy, sensitivity, specificity, PPV, NPV, F1-score and AUC. Our method obtained better results on most of the evaluation indices. The ROC curves based on the true positive rate (TPR) and false positive rate (FPR) for the existing methods and our proposed method are shown in Figure 9. The AUC values of all the algorithms were over 0.8, and the largest AUC value (0.939) was obtained by our model. The area under the ROC curve obtained by DBNN (Az DBNN ) was significantly higher than that obtained by Az Byra (5) (P =0.004). The model developed by Byra et al. (5) was based on a small dataset with images from 30 patients, while our dataset contained images from 114 patients. We can train our deep learning model from scratch because a model pretrained on natural images is often not the best model when applied to medical images. Moreover, we shared the data features of the two streams in the training process and assigned  the weights of the different stages by using prior knowledge to obtain more accurate results. Although the proposed method has improved the prediction accuracy of NAC response, there are still some limitations in this study. First, due to the small dataset of US images collected from a single centre, the model's generalization ability needs to be further improved. Since there is currently no public dataset of ultrasound images before and after the first stage of chemotherapy for NAC, our next work will continue to collect data from multi-centres to further verify our model's generalization ability. It is generally accepted that the larger a dataset is, the better the performance of the deep learning models (51,52). Limited datasets are a prevalent challenge in medical image analysis. Second, due to the heterogeneous nature of the histopathologic and molecular subtypes of breast cancer included in our study, the pathologic response to NAC may be affected and may cause selection bias. Finally, we did not add breast cancer molecular subtype to our method, which may help to predict the response of breast cancer to NAC early. The application of DBNN is only in the primary stage. Therefore, how to extend our method to clinical decision-making is worthy of in-depth study.
In the future, there will be at least two aspects of NAC response prediction models based on different stages of data that can be further developed. On the one hand, DBNN should also consider more feature methods, such as combining low-level features and high-level features by utilizing residual cross-branch connections. Moreover, adaptive weight allocation can be regarded as the weight assignment strategy. On the other hand, the robustness and generalization ability of DBNN need further verification.
In conclusion, our study proposes a novel dual-branch DBNN model based on feature sharing and weight assignment to predict the efficacy of NAC treatment for breast cancer utilizing greyscale US images. DBNN has two remarkable advantages: feature sharing and weight assignment. Feature sharing can make the model consider the correlations between data in different stages of NAC during training. Moreover, weight assignment, which provided prior knowledge, emphasizes the importance of data at different NAC treatment stages. The results show that DBNN has the potential to enable the early prediction of pCR and achieved good prediction performance when applied on NAC pre and NAC 1 data. However, a further large-scale study with an independent external validation dataset is needed before this approach can be used for actual clinical decision-making, and it may become an important monitoring tool for the early prediction of the response to NAC in patients with breast cancer.