Predicting Axillary Lymph Node Metastasis in Early Breast Cancer Using Deep Learning on Primary Tumor Biopsy Slides

Objectives To develop and validate a deep learning (DL)-based primary tumor biopsy signature for predicting axillary lymph node (ALN) metastasis preoperatively in early breast cancer (EBC) patients with clinically negative ALN. Methods A total of 1,058 EBC patients with pathologically confirmed ALN status were enrolled from May 2010 to August 2020. A DL core-needle biopsy (DL-CNB) model was built on the attention-based multiple instance-learning (AMIL) framework to predict ALN status utilizing the DL features, which were extracted from the cancer areas of digitized whole-slide images (WSIs) of breast CNB specimens annotated by two pathologists. Accuracy, sensitivity, specificity, receiver operating characteristic (ROC) curves, and areas under the ROC curve (AUCs) were analyzed to evaluate our model. Results The best-performing DL-CNB model with VGG16_BN as the feature extractor achieved an AUC of 0.816 (95% confidence interval (CI): 0.758, 0.865) in predicting positive ALN metastasis in the independent test cohort. Furthermore, our model incorporating the clinical data, which was called DL-CNB+C, yielded the best accuracy of 0.831 (95%CI: 0.775, 0.878), especially for patients younger than 50 years (AUC: 0.918, 95%CI: 0.825, 0.971). The interpretation of DL-CNB model showed that the top signatures most predictive of ALN metastasis were characterized by the nucleus features including density (p = 0.015), circumference (p = 0.009), circularity (p = 0.010), and orientation (p = 0.012). Conclusion Our study provides a novel DL-based biomarker on primary tumor CNB slides to predict the metastatic status of ALN preoperatively for patients with EBC.


INTRODUCTION
Breast cancer (BC) has become the greatest threat to women's health worldwide (1). Clinically, identification of axillary lymph node (ALN) metastasis is important for evaluating the prognosis and guiding the treatment for BC patients (2). Sentinel lymph node biopsy (SLNB) has gradually replaced ALN dissection (ALND) to identify ALN status, especially for early BC (EBC) patients with clinically negative lymph nodes. Although SLNB had the advantage of less invasiveness than ALND, SLNB still caused some complications such as lymphedema, axillary seroma, paraesthesia, and impaired shoulder function (3,4). Moreover, SLNB has been considered a controversial procedure, owing to the availability of radionuclide tracers and the surgeon's experience (5,6). In fact, SLNB can be avoided if there are some reliable methods of preoperative prediction of ALN status for EBC patients.
Several studies intended to predict the ALN status by clinicopathological data and genetic testing score (7,8). However, due to the relatively poor predictive values and high genetic testing costs, these methods are often limited. Recently, deep learning (DL) can perform high-throughput feature extraction on medical images and analyze the correlation between primary tumor features and ALN metastasis information. In a previous study, deep features extracted from conventional ultrasound and shear wave elastography (SWE) were used to predict ALN metastasis, presenting an area under the curve (AUC) of 0.796 in the test set (9). Nevertheless, SWE has not been integrated into routine clinical breast examinations in many hospitals. Another recent study demonstrated that the DL model based on diffusion-weighted imaging-magnetic resonance imaging (DWI-MRI) database of 172 patients achieved an AUC of 0.852 for preoperative prediction of ALN metastasis (10), but the small sample size enrolled could not be representative.
Currently, DL has enabled rapid advances in computational pathology (11,12). For example, DL methods have been applied to segment and classify glomeruli with different staining and various pathologic changes, thus achieving the automatic analysis of renal biopsies (13,14); meanwhile, DL-based automatic colonoscopy tissue segmentation and classification have shown promise for colorectal cancer detection (15,16); besides, the analysis of gastric carcinoma and precancerous status can also benefit from DL schemes (17,18). More recently, for the ALN metastasis detection, it is reported that DL algorithms on digital lymph node pathology images achieved better diagnostic efficiency of ALN metastasis than pathologists (19,20). In particular, the assistance of algorithm significantly increases the sensitivity of detection for ALN micro-metastases (21). In addition to diagnosis, several previous studies indicated that deep features based on whole-slide images (WSIs) of postoperative tumor samples potentially improved the prediction performance of lymph node metastasis in a variety of cancers (20,22). So far, there is no relevant research on preoperatively predicting ALN metastasis based on WSIs of primary BC samples. In this study, we investigated a clinical data set of EBC patients treated by preoperative core-needle biopsy (CNB) to determine whether DL models based on primary tumor biopsy slides could help to refine the prediction of ALN metastasis.

Patients
On approval by the Institutional Ethical Committees of Beijing Chaoyang Hospital affiliated to Capital Medical University, we retrospectively analyzed data from EBC patients with clinically negative ALN from May 2010 to August 2020. Written consent was obtained from all patients and their families.
The detailed inclusion criteria were as follows: 1) patients with CNB pathologically confirmed primary invasive BC; 2) patients who underwent breast surgery with SLNB or ALND; 3) baseline clinicopathological data including age, tumor size, tumor type, ER/PR/HER-2 status, and the number of ALN metastasis were comprehensive; 4) complete concordance of molecular status was found between CNB and excision specimens; 5) no history of preoperative radiotherapy and chemotherapy; and 6) adequate volume of biopsy materials with three or more cores for each patient.
The exclusion criteria included the following: 1) patients with physically positive or imaging-positive ALN; 2) missing postoperative pathology information; 3) missing wax blocks and hematoxylin and eosin (H&E) slices; and 4) low-quality H&E slices or WSIs. The patient recruitment workflow is shown in Figure 1.

Deep Learning Model Development
To avoid the inter-observer heterogeneity, all available tumor regions in each CNB slide were examined and annotated by two independent and experienced pathologists blinded to all patientrelated information. A WSI was classified into positive (N(+)) or negative (N0) using the proposed DL CNB (DL-CNB) model. Our DL-CNB model was constructed with the attention-based multiple-instance learning (MIL) approach (23). In MIL, each training sample was called a bag, which consisted of multiple instances (24-26) (each instance corresponds to an image patch of size 256 × 256 pixels). Different from the general fully supervised problem where each sample had a label, only the label of bags was available in MIL, and the goal of MIL was to predict the bag label by considering all included instances comprehensively. The whole algorithm pipeline comprised the following five steps: (1) Training data preparation ( Figure 2A). For each raw WSI, amounts of non-overlapping square patches were first cropped from the selected tumor regions. Then each WSI could be represented as a bag with N randomly selected patches. To increase the training samples, M bags were built for each WSI. All M bags were labeled as positive if the slide is an ALN metastasis case, and vice versa. Note that we could add the clinical information of the slide to all the M constructed bags to involve more useful information for predicting, and in this situation, the developed model was called DL-CNB+C.
(2) Feature extraction (left part of Figure 2B). N feature vectors were extracted for the N image instances in each bag by using a convolutional neural network (CNN) model. The performances of AlexNet (27), VGG16 (28) with batch norm (VGG16_BN), ResNet50 (29), DenseNet121 (30), and Inception-v3 (31) were compared to find the best feature extractor. At this stage, the clinical data were also preprocessed for feature extraction. Concretely, the numerical properties in clinical data were standardizing by removing the mean and scaling to unit variance, thus eliminating the effect of data range and scale; furthermore, considering that there was no natural ordinal relationship between different values of the category attributes, the categorical properties in clinical data were encoded as the one-hot vectors, which could express different values equally.
(3) MIL (right part of Figure 2B). The extracted N feature vectors of image instances were first processed by the max-pooling (32)(33)(34) and reshaping and then were passed to a two-layer fully connected (FC) layer. The N weight factors for the instances in the bag were thus obtained and then were further multiplied to the original feature vectors (23) to adaptively adjust the effect of instance features. Finally, the weighted image feature vectors and the clinical features were fused by concatenation; due to the large difference of dimensions between image features and clinical features, the clinical features were copied 10 times for expansion. Then, the fused features were fed into the classifier, and the outputs and the ground truth labels were used to calculate the cross-entropy loss.
(4) Model training and testing. We randomly divided the WSIs into training cohort and independent test cohort with the ratio of 4:1 and randomly selected 25% of the training cohort as the validation cohort. We used Adam optimizer with learning rate 1e−4 to update the model parameters and weight decay 1e−3 for regularization. In the training phase, we used the cosine annealing warm restarts strategy to adjust the learning rate (35).
In the testing phase, the ALN status is predicted by aggregating the model outputs of all bags from the same slide ( Figure 2C).

Visualization of Salient Regions From Deep Learning Core-Needle Biopsy Model
We visualized the important regions that were more associated with metastatic status. After the processing of attention-based MIL pooling, the weights of different patches can be obtained, and the corresponding feature maps were then weighted together in the following FC layers to conduct ALN status prediction. With the attention weights, we created a heat map to visualize the important salient regions in each WSI.

Interpretability of Deep Learning Core-Needle Biopsy Model With Nucleus Features
Interpretability of DL-CNB model with nucleus features was performed to study the contribution of different nucleus morphological characteristics in the prediction of lymph node metastasis (36,37). Multiple specially designed nucleus features were firstly extracted for each WSI, and these features together formed a training bag. With the constructed feature bags, the proposed DL-CNB model was re-trained. The weights of different features (instances) can be obtained based on the attention-based MIL pooling, and thus the contribution of different features was yielded. The specific process is described in Figure 3.

Statistical Analysis
The logistic regression was used to predict ALN status by clinical data only model. The clinical difference of N0 and N(+) was compared by using the Mann-Whitney U test and chi-square For each patch, we processed nucleus segmentation (a weakly supervised segmentation framework was applied to obtain the nucleus), defined multiple nucleus morphometric features (such as major axis, minor axis, area, orientation, circumference, density, circularity, and rectangularity, which are denoted as f 1 , f 2 , f 3 , …, f n ), and extracted n feature parameters correspondingly. (C) All n kinds of feature parameters from a WSI were quantized into n distribution histograms and saved to n feature matrices (m 1 , m 2 , m 3 , …, m n ). (D) The matrices from a WSI were considered as instances of a bag and served as the input of DL-CNB model; the re-trained DL-CNB model could generate scores of features (instances) in the bag, which represented the weight of each feature in pathological diagnosis. test. The AUCs of different methods were compared by using Delong et al. (38). The other measurements like accuracy (ACC), sensitivity (SENS), specificity (SPEC), positive predictive value (PPV), and negative predictive value (NPV) were also used to estimate the model performance. All the statistics were twosided, and a p-value less than 0.05 was considered statistically significant. All statistical analyses were performed by MedCalc software (V 19.6.1; 2020 MedCalc Software bvba, Mariakerke, Belgium), Python 3.7, and SPSS 24.0 (IBM, Armonk, NY, USA).

Clinical Characteristics
A total of 1,058 patients with EBC were enrolled for analysis. Among them, 957 (90.5%) patients had invasive ductal carcinomas, and 101 (9.5%) patients had invasive lobular carcinomas. There were 840 patients in the training cohort and 218 patients in the independent test cohort after all WSIs were randomly divided by using N0 as the negative reference standard and others as the positive. The average patient age was 57.6 years (range, 26-90 years) for the training and validation sets and 56.7 years (range, 22-87 years) for the test set. The mean ultrasound tumor size was 2.23 cm (range, 0.5-4.5 cm). A total of 556 patients (52.6%) had T1 tumors, while 502 patients (47.4%) had T2 tumors. According to the results of SLNB or ALND, positive lymph nodes were found in 403 patients. Among them, 210 patients (52.1%) had one or two positive lymph nodes (N + (1 − 2)), and 193 patients (47.9%) had three or more positive lymph nodes (N + (≥3)). As shown in Table 1, there was no significant difference between the detailed characteristics of the training and independent test cohorts (all p > 0.05).

Convolutional Neural Network Model Selection
The detailed results are summarized in Supplementary Table 1. Based on the overall analysis, VGG16_BN model pre-trained on ImageNet (39) provided the best performance in the validation cohort and the independent test cohort (AUC: 0.808, 0.816), compared with AlexNet (AUC: 0.764, 0.780), ResNet50 (AUC: 0.644, 0.607), DenseNet121 (AUC: 0.714, 0.739), and Inception-v3 (AUC: 0.753, 0.762). Furthermore, considering other metrics, VGG16_BN achieved the best ACC, SPEC, and PPV in the independent test cohort. VGG16_BN consisted of (convolution layer, batch normalization layer, and Rectified Linear Unit (ReLU)) as the basic block where ReLU played a role of activation function to provide the non-linear capability; and max-pooling layers were inserted between basic blocks for down-sampling; besides, there was an adaptive average pooling layer at the end of VGG16_BN for obtaining features with a fixed size. The details of VGG16_BN are described in Supplementary Table 2 In the independent test cohort, the DL-CNB+C model still achieved the highest AUC of 0.831, which was better than the AUC of DL-CNB only (AUC: 0.816, p = 0.453) and classification by clinical data only (AUC: 0.613, p < 0.0001). The ACC, SENS, and NPV of DL-CNB+C were also better than those of other methods. The detailed statistical results are summarized in Table 2, and its corresponding receiver operating characteristics (ROCs) are shown in Figure 4. We further divided N(+) into low metastatic potential (N + (1 − 2)) and high metastatic potential (N + (≥3)) according to the number of ALN metastasis. Adopting N0 as the negative reference standard, the combined model showed better discriminating ability between N0 and N + (1 − 2) (AUC: 0.878) and between N0 and N + (≥3) (AUC: 0.838).
The detailed statistical results are summarized in Supplementary Tables 3, 4, and the corresponding ROCs are shown in Supplementary Figures 1, 2.

Interpretability of Deep Learning Core-Needle Biopsy Model
To investigate the interpretability of the DL-CNB, we conducted two studies for digging the correlation factors of ALN status prediction. In the first study, we adopted the attention-based MIL pooling to find the important regions that contributing to the prediction. The heat map in Figure 6A  In the second study, we specially designed and extracted multiple nucleus features for each WSI. The weights of different features were then obtained based on the same attention-based MIL pooling in our DL-CNB. The weights highlighted the nucleus features that were most relevant to the ALN status prediction of each WSI. We found that the WSI of N(+) group had higher nuclear density (p = 0.015) and orientation (p = 0.012) but lower circumference (p = 0.009), circularity (p = 0.010), and area (p = 0.024) compared with N0 group (Figures 6B, C). There were no significant differences in other nucleus features including major axis (p = 0.083), minor axis (p = 0.065), and rectangularity (p = 0.149) between N0 and N(+).

DISCUSSION
In most previous studies, DL signatures of ALN metastases were based on medical images such as ultrasound, CT, and MRI (10,40,41). However, since many patients had undergone CNB at the time of imaging examination, and the reactive changes such as needle path in the tumor would result in the predictive inaccuracy of imaging information. This study focused on preoperative CNB WSI, which also played an important role in BC management and has been increasingly performed in clinical practice. Preoperative CNB can provide not only the histopathological diagnosis of BC but also the molecular status including ER/PR/HER-2 status, which is associated with ALN metastasis (42). Otherwise, the morphological features of tumor cells can be visualized on CNB WSI. Therefore, primary tumor biopsy WSI as a complementary imaging tool has the potential for ALN metastasis prediction. To the best of our knowledge, this is the first study to apply the DL-based histopathological features extracted from primary tumor WSIs for ALN prediction analysis.
Here, the best-performing DL-CNB model yielded satisfactory predictions with an AUC of 0.816, a SENS of 81.0%, and a SPEC of 70.9% on the test set, which had superior predictive capability as compared with clinical data alone. Furthermore, unlike other combined models incorporating clinical data (7,9), the DL-CNB+C model slightly improved the ACC to 0.831, which showed that our results were mainly derived from the contribution of DL-CNB model. In addition, during the subgroup analysis stratified by patient's age, our DL-CNB+C model achieved an AUC of 0.918 for patients younger than 50 years, indicating that age was the critical factor in predicting ALN status. Regarding the number of ALN metastasis, the DL-CNB+C model showed better discriminating ability between N0 and N + (1 − 2), and between N0 and N + (≥3). However, the unfavorable discriminating ability was found between N + (1 − 2) and N + (≥3). This was consistent with the study of Zheng et al. (9), who also reported poor efficacy between N + (1 − 2) and N + (≥3), utilizing the DL radiomics model. In the future, further exploration of ALN staging prediction is needed.
Indeed, computer-assisted histopathological analysis can provide a more practical and objective output (43). For example, different molecular subtypes (44) and Oncotype DX risk score (45) occurring in BC could be directly predicted from the H&E slides. On the one hand, our DL model can provide significant information for risk stratification and axillary staging, thereby avoiding axillary surgery and reducing the complication and hospitalization costs. On the other hand, our results also highlight the development of algorithms based on artificial intelligence, which will reduce the labor intensity of pathologists. Similar approaches may be used to the pathology of other organs.
In our study, we are first to quantitatively assess the role of nuclear disorder in predicting ALN metastasis in BC. Our finding is consistent with several recent studies that demonstrate the powerful predictive effect of nuclear disorder on patient survival (46,47). Interestingly, the top predictive signatures that distinguished N0 from N(+) were characterized by the nucleus features including density, circumference, circularity, and orientation. We found that the WSI of N(+) had higher nuclear density and polarity but lower circularity, which was understandable since in the tumors with ALN metastasis, tumor cells became poorly differentiated as a result of rapid cell growth, encouraging the nuclei in these structures to form highly clustered and consistently metastatic patterns. Our results showed that nuanced patterns of nucleus density and orientation of tumor cells are important determinants of ALN metastasis. There are some limitations in our study. First, the selection of regions of interest within each CNB slide required pathologist guidance. Future studies will explore more advanced methods for automatic segmentation of tumor regions. Second, this is a retrospective study, and prospective validation of our model in a large multicenter cohort of EBC patients is necessary to assess the clinical applicability of the biomarker. Third, recent evidence indicated that a set of features related to tumor-infiltrating lymphocytes (TILs) was found to be associated with positive LNs in bladder cancer (22). However, due to few TILs on breast CNB slides, we only selected sufficient tumor cells for the identification of salient regions rather than whole slides. Finally, we only chose H&E stained images of CNB samples. The clinical utility of immunochemical stained images remains to be established as an interesting attempt.

CONCLUSION
In brief, we demonstrated that a novel DL-based biomarker on primary tumor CNB slides predicted ALN metastasis preoperatively for EBC patients with clinically negative ALN, especially for younger patients. Our methods could help to avoid unnecessary axillary surgery based on the widely collected H&Estained histopathology slides, thereby contributing to precision oncology treatment.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

ETHICS STATEMENT
Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.  this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

AUTHOR CONTRIBUTIONS
Copyright © 2021 Xu, Zhu, Tang, Wang, Zhang, Li, Jiang, Shi, Liu and Jin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.