AUTHOR=Sayyari Aliakbar , Magsudy Amin , Moeinipour Yasamin , Hosseini Amirhossein , Amiri Hamidreza , Arzaghi Mohammadreza , Sohrabivafa Fereshteh , Hamzavi Seyedeh Fatemeh , Azizi Ashkan , Hatamii Tahereh , Okhovat AmirAli , Dara Naghi , Imanzadeh Negar , Imanzadeh Farid , Hajipour Mahmoud TITLE=Investigation of predictive factors for fatty liver in children and adolescents using artificial intelligence JOURNAL=Frontiers in Pediatrics VOLUME=Volume 13 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/pediatrics/articles/10.3389/fped.2025.1537098 DOI=10.3389/fped.2025.1537098 ISSN=2296-2360 ABSTRACT=BackgroundChildhood obesity is a growing problem worldwide, leading to non-alcoholic fatty liver disease (NAFLD), which is the most common liver disease in children. Liver biopsy is the gold standard for NAFLD diagnosis. Machine learning algorithms could assist in an early diagnostic approach and leading to a favorable prognosis.ObjectiveThis study aimed to identify predictive factors for NAFLD in children and adolescents using machine learning models, focusing on liver biopsy outcomes such as fibrosis, infiltration, ballooning, and steatosis.MethodsData from 659 children suspected of NAFLD, who underwent liver biopsy at Mofid Children's Hospital between 2011 and 2023, were analyzed. The dataset included categorical and numerical variables, which were processed using one-hot encoding and standardization. Several machine learning models were trained and evaluated, including CatBoost, AdaBoost, Random Forest, and others. Model performance was assessed using cross-validation with accuracy, precision, recall, F1 score, and ROC-AUC metrics. Feature importance was determined through permutation analysis.ResultsAmong NAFLD patients, the CatBoost Classifier achieved the highest accuracy (91.8%) and ROC-AUC score (92.3%) in cross-validation. In addition, the adjusted models showed better results. That is, the F1 for the CatBoost raised from 83% to 89% (AUC: 0.86–0.92), for the GradientBoosting from 76% to 81% (AUC: 0.81–0.85), and for Bernolli Naive Bayes from 78% to 82% (AUC: 0.82–0.85).ConclusionMachine learning models, particularly CatBoost, demonstrated strong predictive capabilities for NAFLD diagnosis in children.