AUTHOR=Tang Jun , Fang Yu , Xu Zhe 

TITLE=Establishment of prognostic models of adrenocortical carcinoma using machine learning and big data

JOURNAL=Frontiers in Surgery

VOLUME=Volume 9 - 2022

YEAR=2023

URL=https://www.frontiersin.org/journals/surgery/articles/10.3389/fsurg.2022.966307

DOI=10.3389/fsurg.2022.966307

ISSN=2296-875X

ABSTRACT=Background: Adrenocortical carcinoma (ACC) is a kind of rare malignant tumor that causes patients dead in a short of time, for which it is important to identify patients in high risk and then doctors can adopt more aggressive regimens to treat them. Machine learning has its own advantages in processing complicated data. To date, there is no research that try to use machine learning algorithms and big data to construct prognostic models for ACC patients.
Methods: We downloaded clinical data of patients with ACC from the Surveillance, Epidemiology, and End Results (SEER) database. Subsequently, these records were screened according to inclusion and exclusion criteria. The finally remaining data was applied to univariate survival analysis to select meaningful outcome-related candidates. We chose back propagation artificial neural network (BP-ANN)，random forest (RF)，support vector machine (SVM) and Naive Bayes classifier (NBC) as alternative algorithms. The acquired cases were grouped into training set and testing sets at a ratio of 8:2, and 10-fold cross-validation method repeated 10 times was performed. Areas under receiver operating characteristic (AUROC) curve was used as indices of efficiency. 
Results: The calculated 1-, 3-, 5- and 10-year overall survival rate were 62.3%,42.0%, 34.9%, 26.1%. The eventual number of patients was 825. In the training set, the AUCs of BP-ANN, RF, SVM and NBC for predicting 1-year survival status were 0.921, 0.885, 0.865 and 0.854 respectively; those for predicting 3-year survival status were 0.859, 0.865, 0.837, and 0.831; those for 5-year survival status were 0.888, 0.872, 0.852 and 0.841. In the testing set, AUCs of these four model for 1-year status were 0.899, 0.875, 0.886, 0.862; those for 3-year status were 0.871, 0.858, 0.853 and 0.869; those for 5-year status were 0.841, 0.783, 0.836 and 0.867. The consequences of 10-fold cross-validation method repeated 10 times indicated that mean value of 1-, 3- and 5-year AUROCs of BP-ANN were 0.890, 0.847 and 0.854 that were better than those of other classifiers (P<0.008). 
Conclusion: The model combined with BP-ANN and big data can precisely predict the survival status of ACC patients, and has the potential to be clinically used.