AUTHOR=Jia Pingping , Zhao Qianqian , Wu Xiaoxiao , Shen Fangqi , Sun Kai , Wang Xiaolin TITLE=Identification of cachexia in lung cancer patients with an ensemble learning approach JOURNAL=Frontiers in Nutrition VOLUME=Volume 11 - 2024 YEAR=2024 URL=https://www.frontiersin.org/journals/nutrition/articles/10.3389/fnut.2024.1380949 DOI=10.3389/fnut.2024.1380949 ISSN=2296-861X ABSTRACT=Objective: Nutritional intervention prior to the occurrence of cachexia will greatly improve the survival rate of lung cancer patients. This study aimed to establish an ensemble learning model based on anthropometry and blood indicators without body weight loss information to identify the risk of cachexia for administrating nutritional support earlier and prevent the occurrence of cachexia in lung cancer patients. Methods: This multicenter study included 4712 patients with lung cancer. The least absolute shrinkage and selection operator (LASSO) method was used to obtain the key indexes. The characteristics excluded weight loss information and the queue data was randomly divided into a training set (70%) and a test set (30%). Training set was used to select the optimal model among the 18 models and verify the model performance. 18 machine learning models were evaluated to predict the occurrence of cachexia and determined their performance using area under the curve (AUC), accuracy, precision, recall, F1 score and Matthews correlation coefficient (MCC). Results: Among 4712 patients, 1392 (29.5%) patients were diagnosed as cachexia based on the Fearon et al. 's framework. A 17-variables Gradient Boosting Classifier (GBC) model including BMI, feeding situation, tumor stage, neutrophil lymphocyte ratio (NLR) and some gastrointestinal symptoms were selected among 18 machine learning models. The GBC model showed good performance for predicting cachexia in the training set (AUC = 0.854, accuracy = 0.819, precision = 0.771, recall = 0.574, F1 score = 0.658, and MCC = 0.549, kappa = 0.538). The above indicators were also confirmed in the test set (AUC = 0.859, accuracy = 0.818, precision = 0.801, recall = 0.550, F1 score = 0.652, and MCC = 0.552, kappa = 0.535). The learning curve, decision boundary, precision recall (PR) curve, the receiver operating curve (ROC), the classification report and the confusion matrix in test sets demonstrated good performance. The feature importance diagram showed the contribution of each feature to the model. Conclusions: The GBC model established in this study could facilitate the identification of cancer cachexia in lung cancer patients without weight loss information, which would guide the nutritional intervention early to decrease the occurrence