AUTHOR=Avram Mihaela-Flavia , Lupa Nicolae , Koukoulas Dimitrios , Lazăr Daniela-Cornelia , Mariș Mihaela-Ioana , Murariu Marius-Sorin , Olariu Sorin TITLE=Random forests algorithm using basic medical data for predicting the presence of colonic polyps JOURNAL=Frontiers in Surgery VOLUME=Volume 12 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/surgery/articles/10.3389/fsurg.2025.1523684 DOI=10.3389/fsurg.2025.1523684 ISSN=2296-875X ABSTRACT=BackgroundColorectal cancer is considered to be triggered by the malignant transformation of colorectal polyps. Early diagnosis and excision of colorectal polyps has been found to lower the mortality and morbidity associated with colorectal cancer.ObjectiveThe aim of this study is to offer a predictive model for the presence of colorectal polyps based on Random Forests machine learning algorithm, using basic patient information and common laboratory test results.Materials and methods164 patients were included in the study. The following data was collected: sex, residence, age, diabetes mellitus, body mass index, fasting blood glucose levels, hemoglobin, platelets, total, LDL and HLD cholesterol, triglycerides, serum glutamic-oxaloacetic transaminase, chronic gastritis, presence of colonic polyps at colonoscopy. 80% of patients were included in the training set for creating a Random forests algorithm, 20% were in the test set. External validation was performed on data from 42 patients. The performance of the Random Forests was compared with the performance of a generalized linear model (GLM) and support vector machine (SVM) built and tested on the same datasets.ResultsThe Random Forest prediction model gave an AUC of 0.820 on the test set. The top five variables in order of importance were: body mass index, platelets, hemoglobin, triglycerides, glutamic-oxaloacetic transaminase. For external validation, the AUC was 0.79. GLM performance in internal validation was an AUC of 0.788, while for external validation AUC-0.65. For SVN, the AUC - 0.785 for internal validation and 0.685 for the external validation dataset.ConclusionsA random forest prediction model was developed using patient's demographic data, medical history and common blood tests results. This algorithm can foresee, with good predictive power, the presence of colonic polyps.