AUTHOR=Dong Zhou , Chen Hong , Yang Yuchen , Hao Hairong TITLE=Research on the optimization model of anti-breast cancer candidate drugs based on machine learning JOURNAL=Frontiers in Genetics VOLUME=Volume 16 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2025.1523015 DOI=10.3389/fgene.2025.1523015 ISSN=1664-8021 ABSTRACT=Breast cancer is one of the most common malignancies among women globally, with its incidence rate continuously increasing, posing a serious threat to women’s health. Although current treatments, such as drugs targeting estrogen receptor alpha (ERα), have extended patient survival, issues such as drug resistance and severe side effects remain widespread. This study proposes a machine learning-based optimization model for anti-breast cancer candidate drugs, aimed at enhancing biological activity and optimizing ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) properties through multi-objective optimization. Initially, grey relational analysis and Spearman correlation analysis were performed on the molecular descriptors of 1,974 compounds, identifying 91 key descriptors. A Random Forest model combined with Shapley Additive Explanations (SHAP) values was then used to further select the top 20 descriptors with the greatest impact on biological activity. The constructed Quantitative Structure-Activity Relationship (QSAR) model, using algorithms such as LightGBM, Random Forest, and XGBoost, achieved an R2 value of 0.743 for biological activity prediction, demonstrating strong predictive performance. Additionally, a multi-model fusion strategy and Particle Swarm Optimization (PSO) algorithm were employed to optimize both biological activity and ADMET properties, thereby improving the prediction of Caco-2, CYP3A4, hERG, HOB, and MN properties. For example, the best model for predicting Caco-2 achieved an F1 score of 0.8905, while the model for predicting CYP3A4 reached an F1 score of 0.9733. This multi-objective optimization model provides a novel and efficient tool for drug development, offering significant improvements in both biological activity and pharmacokinetic properties, with practical implications for the optimization of future anti-breast cancer drugs.