Deep Learning-Based Prediction Model for Breast Cancer Recurrence Using Adjuvant Breast Cancer Cohort in Tertiary Cancer Center Registry

Several prognosis prediction models have been developed for breast cancer (BC) patients with curative surgery, but there is still an unmet need to precisely determine BC prognosis for individual BC patients in real time. This is a retrospectively collected data analysis from adjuvant BC registry at Samsung Medical Center between January 2000 and December 2016. The initial data set contained 325 clinical data elements: baseline characteristics with demographics, clinical and pathologic information, and follow-up clinical information including laboratory and imaging data during surveillance. Weibull Time To Event Recurrent Neural Network (WTTE-RNN) by Martinsson was implemented for machine learning. We searched for the optimal window size as time-stamped inputs. To develop the prediction model, data from 13,117 patients were split into training (60%), validation (20%), and test (20%) sets. The median follow-up duration was 4.7 years and the median number of visits was 8.4. We identified 32 features related to BC recurrence and considered them in further analyses. Performance at a point of statistics was calculated using Harrell's C-index and area under the curve (AUC) at each 2-, 5-, and 7-year points. After 200 training epochs with a batch size of 100, the C-index reached 0.92 for the training data set and 0.89 for the validation and test data sets. The AUC values were 0.90 at 2-year point, 0.91 at 5-year point, and 0.91 at 7-year point. The deep learning-based final model outperformed three other machine learning-based models. In terms of pathologic characteristics, the median absolute error (MAE) and weighted mean absolute error (wMAE) showed great results of as little as 3.5%. This BC prognosis model to determine the probability of BC recurrence in real time was developed using information from the time of BC diagnosis and the follow-up period in RNN machine learning model.

Several prognosis prediction models have been developed for breast cancer (BC) patients with curative surgery, but there is still an unmet need to precisely determine BC prognosis for individual BC patients in real time. This is a retrospectively collected data analysis from adjuvant BC registry at Samsung Medical Center between January 2000 and December 2016. The initial data set contained 325 clinical data elements: baseline characteristics with demographics, clinical and pathologic information, and follow-up clinical information including laboratory and imaging data during surveillance. Weibull Time To Event Recurrent Neural Network (WTTE-RNN) by Martinsson was implemented for machine learning. We searched for the optimal window size as time-stamped inputs.
To develop the prediction model, data from 13,117 patients were split into training (60%), validation (20%), and test (20%) sets. The median follow-up duration was 4.7 years and the median number of visits was 8.4. We identified 32 features related to BC recurrence and considered them in further analyses. Performance at a point of statistics was calculated using Harrell's C-index and area under the curve (AUC) at each 2-, 5-, and 7-year points. After 200 training epochs with a batch size of 100, the C-index reached 0.92 for the training data set and 0.89 for the validation and test data sets. The AUC values were 0.90 at 2-year point, 0.91 at 5-year point, and 0.91 at 7-year point. The deep learning-based final model outperformed three other machine learning-based models. In terms of pathologic characteristics, the median absolute error (MAE) and weighted mean absolute error (wMAE) showed great results of as little as 3.5%. This BC prognosis model to determine the probability of BC recurrence in real time was developed using information from the time of BC diagnosis and the follow-up period in RNN machine learning model. Keywords: breast cancer, surveillance, machine learning, recurrence model, real time prediction, adjuvant cohort INTRODUCTION Breast cancer (BC) is the most common cancer affecting women worldwide and the most frequent cause of cancer death in women (1,2). Recent advances in treatment strategies have improved BC-related mortality and morbidity; however, almost 30% of BC patients show recurrence in the follow-up. Therefore, to improve BC outcomes, it is necessary to focus on research such as improving screening methods for early detection of recurrence according to risk stratification, identifying new biomarkers, and developing new innovative treatment strategies.
There is an urgent unmet need to identify innovative methods to determine the prognosis of individual patients. Traditionally, clinicopathologic characteristics such as tumor size, axillary nodal status, histologic and nuclear grade, hormone receptors [estrogen receptor (ER) and progesterone receptor (PR)], and human epidermal growth factor receptor 2(HER2) status have been used to identify risk groups and to predict patient prognoses (3,4). In addition to clinicopathologic characteristics, multigene signature panels offer an additional benefit in predicting patient prognoses (5,6).
Several models for predicting the survival of individual BC patients have been proposed. Adjuvant! Online and PREDICT R are the online tools that predict durations of overall survival (OS) and disease-free survival (DFS) based on clinicopathologic factors (7)(8)(9), and CancerMath R shows the cancer-related mortality and life expectancy of BC patients (10). Survival rates predicted by those tools are used to determine the benefit of adjuvant chemotherapy after surgery or as reference data for shared decisions with patients about multiple treatments and surveillance (11). Most prediction models predict OS or DFS after BC diagnosis or curative surgery. However, they did not reflect newly developed comorbidities and test results, which may affect BC-specific recurrence or death with time during surveillance (12,13).
Using recent advances in various machine learning algorithms (14,15), some researchers have worked to develop models that can consider a large amount of complex data, and many efforts are being made to more accurately predict the survival of individual BC patients. The attention-based multi-NMF DNN (AMND) model based on a deep neural network was proposed to predict the survival of BC with the gene expression profile and clinical data of 1,489 patients (16). The area under the curve (AUC) value of the AMND model was 87.04%. The rule-based trees random forest model (TRF) was developed for the prediction of BC survival with 900 patients (17). The classification performance of this method showed an AUC of 93%. In addition, there was breast cancer recurrence prediction based on SVM (BCRSVM) for BC recurrence prediction within 5 years after BC surgery with 679 patients. This model suggested an AUC of 85% for the proposed model with seven timeindependent variables as the most informative way of predicting recurrence (18). Despite these successes, there is still no predictive tool for individual survival available to determine appropriate follow-up periods and test methods for individual patients who have completed curative surgery and adjuvant treatment (16,17). Therefore, this study developed a recurrence prediction model of individual BC patients using the machine learning method. This model was developed using BC-related clinicopathologic factors at the time of curative surgery and consecutive clinical factors that have been identified during the BC surveillance period.

Study Population
This is a retrospective data analysis from the BC registry composed of BC patients who received curative surgery followed by adjuvant treatment including chemotherapy, radiotherapy, endocrine therapy, and targeted therapy at Samsung Medical Center between January 2000 and December 2016. Patients who received neoadjuvant chemotherapy before surgery were diagnosed with ductal carcinoma in situ, were male or foreigners, and have a history of BC surgery at another hospital were excluded. Among the remaining 13,370 patients in this registry, we also excluded 253 patients with any of the following conditions: (1) restricted access to electronic medical record (EMR) (n = 1), (2) double primary cancer (n = 127), (3) no follow-up after surgery (n = 98), or (4) presence of distant metastases (n = 27) (Figure 1). Therefore, we analyzed the data of 13,117 patients. This study was reviewed and approved by the Institutional Review Board (IRB) of Samsung Medical Center, Seoul, Korea (IRB No. 2018-06-137), with an informed consent waiver, due to the use of retrospective clinical data.

Measurements
Detailed information on surgery, adjuvant chemotherapy, radiotherapy, endocrine therapy, and targeted therapy were obtained from EMR. The pathologic stage was based on the criteria of the American Joint Committee on Cancer, 7th Edition (4). Two experienced pathologists reviewed and determined the primary tumor characteristics based on size, axillary nodal status, and receptor status (ER, PR, and HER2) by immunohistochemical (IHC) staining. ER positivity and PR positivity were defined as an Allred score of 3-8 based on IHC staining with antibodies against ER (Immunotech, France) and PR (Novocastra, UK), respectively. HER2 status was evaluated using the appropriate antibody (Dako, CA) and/or silver in situ hybridization (SISH). HER2 grades 0 and 1 indicated a negative result, while grade 3 indicated a positive result. Amplification of HER2 was confirmed by SISH for results of 2+. Triple negative BC was defined as BC with negative ER and PR expression, and lack of HER2 overexpression. In terms of radiologic tests, the categories of mammography and breast sonography were reported according to BI-RADS R (Breast Imaging, Reporting & Data System) (19), which is a risk assessment and quality assurance tool developed by the American College of Radiology. In this study, recurrence was defined as the first detected event of local and/or distant BC recurrence.

Data Pre-processing
The initial data set contained 325 clinical data elements including baseline characteristics with patient demographics, clinical information, and laboratory test results at the time of BC diagnosis, pathologic information including tumor size, nodal status, histologic characteristics, and IHC information for ER, PR, HER2, Ki-67, cytokeratin 5/6 (CK5/6), epidermal growth factor receptor(EGFR), and follow-up clinical information including laboratory and imaging data during surveillance (Supplementary Table 1).
Characteristic values including the subtype and stage were transformed into nominal or ordinal numeric values. For continuous variables, log transformation was used to deal with skewed data as needed, and Z-score normalization was applied. To reduce the number of discrete intervals for a continuous attribute, data binning divided continuous features (Ki-67) into a pre-specified number of categories (25% or 10% units), thereby making the data discrete. Categorical variables were one-hot encoded for the data analysis. For missing data, we used the average method for the data at the first time point and the last observation carried forward (LOCF) method for the data at a later time point.

Feature Selection
Potential independent variables selected by univariate analyses were considered in a time-dependent Cox regression model. A backward selection procedure applying the Akaike information criterion (AIC) (20) was used to select the final multivariable model. The backward stepwise selection procedure began with a model that included potential independent variables, then, the least significant variable was removed, and the model was run again based on AIC, in a stepwise manner until there were no variables left to remove. All analyses were performed using the R v3.6 software, and the significance level was set at 0.05. A few variables not significant at the 0.05 level were manually selected from the clinical point of view for the final multivariable model.

Deep Learning-Based Survival Algorithm
Before constructing the deep learning models, we randomly created 60/20/20 mutually exclusive sets for training, validation, and testing while preserving the same proportion of recurrence events in all three sets.
Weibull Time To Event Recurrent Neural Network (WTTE-RNN) by Martinsson was implemented as an open-source Python module (https://github.com/ragulpr/wtte-rnn) (21). The deep survival model took data about each patient's time-independent features (age at operation, molecular tests results, hormone receptor stage, pathologic stage, etc.) and time-dependent features (i.e., lab test results, mammography, etc.) as input. The system represented those inputs as matrix A of size m × n × k, where m is the total number of follow-ups in patient records, n is the window size of the follow-ups, and k is the total number of features. We did a grid search to find the optimal window size as time-stamped inputs. The network contained 32 cells at the first hidden layer and 20 cells at the second layer. We used hyperbolic tangent activation layers after the recurrent layers with gated recurrent units that take the time domain into account. Adam was used as the optimizer with an initial learning rate of 0.001, which was reduced by a factor of 10 when the model stopped improving after iterations. The model was trained with the batch size set to 100. The dropout rate was set to 0.25. The network structure was implemented in Python, using Keras with a Tensorflow backend (Python 3.5, Keras 2.1.2, Tensorflow 1.4.0).
To compare our model with existing models, we also developed logistic regression, random forest, and gradient boosting machine learning models because they are the methods most typically used in medical applications. The optimal configuration for the hyperparameters of each machine learning model was set by testing a wide range of parameters in a grid search.

Performance Evaluation
We used Harrell's concordance-index (C-index) in the lifeline package in Python to measure the concordance between the predicted recurrence time and the actual recurrence time (22,23).
The AUC was used to assess the 2-, 5-, and 7-year recurrence predictions. To evaluate the model performance based on the main BC features, we stratified patients by their pathologic T stage, pathologic N stage, hormone receptor, and HER2 (ER/PR/HER2) status, and EGFR and CK5/6 status. The predicted recurrence values calculated by our model were compared with the actual recurrence in each group over 2, 5, and 7 years. Model 'specificity and accuracy were assessed using the Median Absolute Error (MAE), mean absolute error, weighted Mean Absolute Error (wMAE), and maximum error.
Data sets for which the follow-up period was shorter than the prediction period and no recurrence occurred were excluded from the performance evaluation.

Clinical Characteristics
The baseline characteristics of the patients are summarized in Table 1. The median follow-up was 4.7 years (interquartile range: 3.0-7.7 years). Of the 13,117 patients in the study population, BC recurrence occurred in 1,214 (9.2%) patients during the follow-up period. The median age at BC curative surgery was 48 years (interquartile range, 43-55), and patients who did not experience BC recurrence were slightly older than those who did (median age: 48 vs. 46, p < 0.001). The proportion of BC subtypes differed between patients with and without BC recurrence. Hormone receptor (HR)+, defined as ER and/or PR+, HER2-BCs were more frequently observed in patients without recurrence (64.7 vs. 50.8%, p < 0.001), whereas ER-HER2-BCs were less frequent in patients without recurrence (12.5 vs. 22.2%, p < 0.001). Pathologic stage also affected BC recurrence: higher T (T3 and T4) and N (N2 and N3) stages were more frequently observed in patients with BC recurrence (p < 0.001 for all).
The number of follow-up visits was measured from the date of surgery to the last follow-up, including all-cause mortality. Overall, the mean number of follow-up visits within 1 year

Model Training and Performance
To learn the various measurement period data, we needed to find a fixed value for the window size (k) during the training stage. Longer periods are better, but a look-back period of 12 months was the optimal window size for our training data set, according to the results of grid-search algorithms.
The AUC and C-index were used to evaluate the performance. The C-index eventually reached 0.92 for the training data set and 0.89 for the validation and test data sets. The AUC value was 0.90 at the 2-year point, 0.91 at the 5-year point, and 0.91 at the 7-year point ( Figure 3A). We also compared our model with the performances of three other machine learning prediction models (Figures 3B-D). The logistic regression model produced AUC in the range of 0.69-0.72, and the random forest and gradient boosting methods, which are the most popular ensemble models, showed similar AUC values in the range of 0.80-0.83. The deep learning-based final model only   exceeded an AUC of 0.90, outperforming the existing machine learning-based models.
We also evaluated the model from a clinical point of view with pathologic T stage, pathologic N stage, subtypes according to ER/PR and HER2 status, EGFR status, and CK5/6 status by comparing the predicted recurrence proportion with the actual recurrence proportion (Figure 4). The MAE and wMAE of each group showed great results of as little as 3.5%. The model errors for pathologic T stage and N stage features were similar to each other but differed from those for the other pathologic features. The subtypes had similar error values of around 2.5%. The discrimination of the wMAE at each prediction time (2, 5, and 7 years) showed only small differences (Supplementary Tables 2-6).

DISCUSSION
In this study, we developed an individually conditional BC recurrence prediction model using machine learning and an adjuvant BC cohort in a tertiary cancer hospital. We used baseline patient clinical characteristics, pathologic characteristics after curative surgery, and the results of followup tests, including laboratory tests, mammography, and breast sonography.
Machine learning is currently used for a wide range of applications in cancer research (24). Imaging diagnosis and pathologic diagnosis of BCs have been broadly supported by machine learning algorithms. A large scale retrospective analysis has indicated that artificial intelligence (AI) algorithms can improve BC detection ability on mammography better than radiologists can (25). In addition, AI has improved breast MRI interpretation (26) and predictions of the response to neoadjuvant chemotherapy (27). In terms of pathologic diagnosis, AI has helped pathologists to precisely diagnose BC using digital image analysis (28). Moreover, AI can interpret comprehensive genetic information to predict tumor site of origin (29).
Previous machine learning models of BC prognosis prediction were developed using baseline clinical and pathologic information (30)(31)(32). Most of those studies used pathologic information and additional molecular information, such as the intrinsic subtype at BC diagnosis. However, molecular information is not given in routine clinical practice.
Current surveillance studies suggested that BC recurrence was influenced by clinical information during the follow-up period as well as at the time of diagnosis. For example, alcohol consumption (33) and obesity at postmenopausal status (34) were well-known risk factors for BC recurrence. In terms of laboratory tests, changes in CA-15-3 during followup were traditionally used to detect BC recurrence (35,36), and C-reactive protein was also considered as a predictive biomarker for BC recurrence (37). Moreover, surveillance guideline recommends annual mammography in BC patients after curative BC surgery (38). However, previous machine learning studies of the prediction model of BC prognosis have not considered follow-up exam data such as imaging and laboratory tests.
We used the same BC surveillance guideline since early 2010 (39,40). Those guidelines recommend taking a careful history and performing a physical examination every 6-12 months, including regular mammography 6 months after the completion of definitive radiation therapy. In addition, the use of complete blood counts, chemistry panels, and tumor markers (CEA, CA-15-3) is not recommended for routine follow-up in an otherwise asymptomatic patient with no specific findings on clinical examination according to those guidelines. Understanding of the nature and biology of BC has improved, and it is now known that the timing and pattern of BC recurrence differ for patients with different BC subtypes (41). Moreover, the current concept of oligometastasis in BC, defined as low-volume metastatic disease with a few, small metastatic lesions, considered BC patients with oligometastasis to be a distinct subgroup with a more favorable long-term prognosis than patients with metastatic BC (42). This suggested that an early diagnosis of BC recurrence, rather than waiting for patients to show symptoms, might thus confer a survival benefit. Therefore, improved screening programs that incorporate the biology of individual BC patients and a method to precisely predict the risk of recurrence for individual patients are urgently needed.
For this study, we utilized an RNN model that weight features at BC diagnosis, treatment, and follow-up. In RNNs, the output of a hidden unit at the current time step is fed back into the hidden unit so that it forms part of the input for the previous time steps. This allows RNNs to fit and make predictions from sequences of events ordered chronologically. In addition, we applied the Weibull distribution instead of the well-known Cox regression model to our deep learning framework. The Weibull distribution allows more flexibility than other survival models because the associated hazard rate is not constant with respect to time, which helps to estimate the length of the hazard during the cancer recurrence period when using follow-up data.
In this model, the T and N stages and lymphovascular invasion at the time of curative surgery affected BC prognosis as stationary variables. In terms of IHC, neither ER, PR, nor HER2 affected BC prognosis. These three IHC components are used to categorize BC when choosing endocrine therapy, chemotherapy, and targeted therapy (38). Although these factors have been understood as important prognostic and predictive biomarkers for BC recurrence, proper treatment according to the BC subtype would neutralize their prognostic effects (43).
Our machine learning prognostic model uses baseline, treatment, and follow-up variables. In this analysis, we focused on laboratory tests during the follow-up period. An increase in the white blood cell count, hemoglobin, and total protein had a protective effect against BC recurrence, whereas elevated levels of serum glucose, absolute neutrophil count, and CA-15-3 increased the risk of BC recurrence (Supplementary Table 1). A previous study to find the relationship between BC prognosis and laboratory tests indicated that hemoglobin, alkaline phosphatase, and prothrombin time were associated with BC prognosis (44). In other cancer types, the association between the lymphocyte-monocyte ratio and cancer recurrence was studied (45,46). However, those previous studies used the results from perioperative blood tests, not serial follow-up data.
In terms of follow-up imaging tests, the results of mammography and ultrasonography were naturally affected BC recurrence. Current studies with supplementary ultrasonography adding to mammography would help to detect BC recurrence but increase false-positive findings, and therefore, guidelines for BC surveillance did not recommend (47,48).
The tests we used in our model were routinely performed at every follow-up visit. Thus, our machine learning model for BC prognosis was made with maximal use of the laboratory test results from current surveillance practices without requiring other laboratory work, such as intrinsic subtyping. Therefore, our BC prognosis model could fit into routine clinical practice better than previous machine learning models. Moreover, we can adapt this prognosis model into our EMRs using a website and thereby acquire information about BC recurrence in real time. This model could thus present the recurrence risk at each follow-up point using all available laboratory and imaging test results.
The results of this study should be interpreted in light of some limitations. Because our study was limited to a single institution, our results might not be generalizable to other cancer patients in other settings. Therefore, the findings from our study should be validated using samples from other institutions to confirm generalizability. Nonetheless, our model is the first machine learning-based BC prognosis model developed using clinical information at both BC diagnosis and followup. Moreover, our model produced high AUC scores that remained consistent for several years after the completion of BC treatment.
In conclusion, we used an RNN machine learning model and data from an adjuvant BC cohort in a tertiary cancer institute to develop a BC prognosis model that considers information from the time of BC diagnosis and during the followup period. This model can rapidly and precisely predict the probability of BC recurrence. A retrospective validation study using another adjuvant BC cohort and a prospective validation study are warranted.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquires can be directed to the corresponding author.

ETHICS STATEMENT
This study was reviewed and approved by the Institutional Review Board (IRB) of Samsung Medical Center, Seoul, Korea (IRB No. 2018-06-137). This study was performed in accordance with the Declaration of Helsinki. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.