A contemporary review of breast cancer risk factors and the role of artificial intelligence

Background Breast cancer continues to be a significant global health issue, necessitating advancements in prevention and early detection strategies. This review aims to assess and synthesize research conducted from 2020 to the present, focusing on breast cancer risk factors, including genetic, lifestyle, and environmental aspects, as well as the innovative role of artificial intelligence (AI) in prediction and diagnostics. Methods A comprehensive literature search, covering studies from 2020 to the present, was conducted to evaluate the diversity of breast cancer risk factors and the latest advances in Artificial Intelligence (AI) in this field. The review prioritized high-quality peer-reviewed research articles and meta-analyses. Results Our analysis reveals a complex interplay of genetic, lifestyle, and environmental risk factors for breast cancer, with significant variability across different populations. Furthermore, AI has emerged as a promising tool in enhancing the accuracy of breast cancer risk prediction and the personalization of prevention strategies. Conclusion The review highlights the necessity for personalized breast cancer prevention and detection approaches that account for individual risk factor profiles. It underscores the potential of AI to revolutionize these strategies, offering clear recommendations for future research directions and clinical practice improvements.


Introduction
Over the past decade, breast cancer has remained a leading cause of mortality among women globally, driving an intensive search for effective prevention and early detection strategies.During 2020, more than 2.3 million women were diagnosed, of which 33.5% died (1).Despite significant advances in understanding biological mechanisms and risk factors of breast cancer, substantial challenges persist in the personalized clinical management and preventive intervention.This work aims to evaluate and synthesize the evidence available on breast cancer risk factors, ranging from genetic predispositions and lifestyle to environmental influences, with a particular interest in recent technological advancements, including AI, in predicting and detecting the disease.We pose two critical research questions: 1) What are the main risk factors associated with the development of breast cancer, and how do these vary among different populations and age groups?2) How do recent technological advancements based on Artificial Intelligence (AI) help the detection and prevention of breast cancer?Guided by the hypothesis that the variability in breast cancer risk factors among different populations suggests that prevention and early detection strategies must be personalized, considering genetic, lifestyle, and environmental factors to be effective, this review seeks to identify areas of consensus and discrepancy in the scientific literature.Highlighting the need for personalized strategies that consider variability among populations and age groups, we aim to provide clear recommendations that guide future research and clinical practices towards more effective prevention and early detection of breast cancer.
The paper is organized as follows.In Section 2, the methodology for selecting and reviewing papers is described.Section 3 shows the results with particularly emphasis to the bibliometric study and risk factor categories.A discussion and some conclusions are in Sections 5 and 6, respectively.

Methodology
The methodology of the paper involved a comprehensive bibliographic development and analysis, which steps are described in Figure 1.

Literature search and eligibility criteria
Our review concentrated on studies published between 2020 and 2024, with a focus on breast cancer risk factors.We sourced these from databases like PubMed, Scopus, and Web of Science.We included research papers that provided insights into demographic, genetic, lifestyle, and environmental influences on breast cancer risk, alongside studies utilizing AI for enhancing risk prediction and classification.Exclusion criteria were set for articles published prior to 2020 and those not directly examining the outlined risk factors.English language has been mainly used for the selection.

Study selection and data extraction
The study selection process meticulously filtered approximately 250 article by titles, abstracts and keywords, to determine their relevance to breast cancer risk factors and AI applications.A deeper process based on a complete reading of the papers narrowed the focus to 112 articles that met our inclusion criteria and offered important information on the topic.This approach ensured that only the most relevant studies were included, providing a detailed exploration of breast cancer risk factors and the role of AI in risk management.A bibliometric analysis was realized for setting frequencies and relationships among risk factors.Finally, these risk factors were systematically classified into categories, as detailed in Table 1.

Analysis and classification
This classification was based on the analysis of risk factors available in various articles, which were then grouped according to characteristics to derive the respective classifications.Regarding risk factors, they were classified into groups corresponding to "Demographic and Genetic Factors", "Reproductive and Hormonal Factors", "Metabolic Factors", "Medical History" and "Lifestyle and Environmental Factors."Additionally, a new independent category was created to group papers that include studies with artificial intelligence models, named "Use of AI in Risk Prediction".A simple Natural Language Processing (NLP) word count was used to identify the risk factors most frequently mentioned in each paper.

Documentation and conclusion
This methodology involved the following steps: conducting an exhaustive literature search across major scientific databases;

Results
By applying the above methodology, we show the results of the a systematic literature review of the selected 112 papers and we describe the main findings for each category of risk according to Table 2.

Bibliometric analysis
In this section we provide a bibliometric analysis using the Bibliometrix package of R software (114).
In order to facilitate a deeper understanding of how keywords interconnect across the collection of reviewed papers, a keyword network graph is shown in Figure 2. The graph highlights the thematic ties and focal points within the research landscape under examination.In the Figure 2 we can see the most interconnected and frequent keywords are: female, breast tumor, breast cancer and breast neoplasms.
Figure 3   (2)      of the volume of scientific publications, with significant contributions in both national (SCP) and international (MCP) collaborations, followed by China, evidencing a robust level of scientific output and cooperative engagement in these nations.
Conversely, the author network depicted in Figure 4 illustrates clustering among authors who have contributed to more than five publications.Those with a higher publication frequency are represented by larger circles, visually highlighting the most prolific contributors within the network.

Breast cancer risk factors
In this Section, we provide a detailed analysis of breast cancer risk factors identified by the reviewed works as represented in Table 2.  29), highlight the need to adopt personalized approaches.These findings emphasize the multifaceted nature of breast cancer risk and treatment strategies across diverse populations.• Family History: The presence of a family history significantly impacts the assessment and management of breast cancer risk (110).reveals that 35.5% of women with a familial history face a high lifetime risk, yet only 23.9% receive enhanced screening (13).demonstrates the effectiveness of machine learning, achieving 77.78% precision in risk prediction.In addition (77), identifies specific germline variants linked to susceptibility.Furthermore, the integration of polygenic risk scores with family history, as demonstrated by (91), significantly alters surveillance recommendations.Overall, these findings underscore the crucial role of family history in personalized breast cancer care and risk management.• Genetic mutations, such as BRCA1 (Breast Cancer Gene 1) and BRCA2 (Breast Cancer Gene 2): Genetic mutations, particularly in BRCA1 and BRCA2 genes, significantly increase hereditary breast cancer risk.Studies like (92) analyze the role of germline CHEK2 (Checkpoint Kinase 2) variants, while (97) advocate personalized prevention strategies (98).identifies genetic loci associated with contralateral breast cancer risk, and (3) explores molecular links between obesity and breast cancer.These findings emphasize the multifactorial nature of breast cancer, requiring tailored risk assessment and management.• Economic factors: Economic factors significantly impact breast cancer risk and outcomes (86).reveals disparities in access to systemic anticancer therapies based on geographic and sociodemographic factors.Similarly (36), notes a social gradient in cancer incidence in Costa Rica (51).links higher education levels to increased breast cancer risk (2).emphasizes local demographic factors in TNBC (Triple-Negative Breast Cancer) treatment, while (32) highlights access disparities in Colombia.Finally (70), stresses the importance of socio-demographic indices and public health policies in addressing breast cancer burden in developing countries.Co-authorship network analysis in scientific research.(64).studies parity's impact on breast cancer incidence, highlighting rising rates in younger women (72).metaanalysis reveals subtype-specific risks, emphasizing tailored prevention strategies.• Hormonal factors (use of hormone replacement therapy, contraceptives, etc.): Hormonal factors like hormone replacement therapy and contraceptives influence breast cancer risk (3).highlights obesity's role in breast cancer, especially in postmenopausal women (10).emphasizes hormonal imbalances' impact, urging further research (59).finds no significant difference in breast cancer risk with Hormone Replacement Therapy among BRCA mutation carriers.These findings emphasize the importance of hormonal markers like estrogen and progesterone receptors in breast cancer treatment (3,10,59).Additionally (21), and ( 72) explore lifestyle factors like diet and reproductive behaviors, highlighting hormonal influences on breast cancer risk.

Metabolic factors
• Diabetes: Elevated levels of insulin can promote cellular proliferation and reduce apoptosis, thus facilitating the development and progression of mammary neoplasms (3).elucidate obesity's pivotal role in breast cancer (BC) risk, particularly postmenopausal women, citing hormonal imbalances and insulin resistance among its mechanisms.They reveal how obesity-driven molecular changes, like increased estrogen and insulin levels, contribute to BC via specific signaling pathways.Conversely (34), find a significant correlation between genetic predisposition to Type 2 Diabetes Mellitus (T2DM) and poorer breast cancer-specific survival (HR = 1.10, 95% CI = 1.04-1.18,P = 0.003), emphasizing the potential causal impact of T2DM on BC outcomes.• Metabolism: Metabolic processes play a crucial role in modulating breast cancer risk, significantly influencing hormonal levels and cellular dynamics.Alterations in metabolism, including imbalances in lipid and glucose metabolism, can lead to endocrine changes and alterations in the cellular microenvironment that favor mammary carcinogenesis.Metabolism plays a crucial role in breast cancer risk, with various factors influencing susceptibility (113).found that high-density lipoprotein cholesterol (HDL-C) significantly affects breast cancer risk, suggesting a metabolic component to cancer development (9).identified associations between insulin-like growth factor 1 (IGF-1) levels and fasting blood glucose with breast cancer risk, emphasizing the complexity of metabolic factors.Additionally (13), integrated genetic mutations and demographic factors to predict breast cancer risk, highlighting the importance of considering metabolic pathways in risk assessment.These findings underscore the multifaceted nature of metabolism-related risk factors in breast cancer susceptibility (113) ( 9) and ( 13).

Medical history
• Breast density: Breast density complicates cancer detection in the sense that it can make more difficult for mammograms to identify cancerous tumors due to the tissue's thickness or opaqueness.Additionally, high breast density is considered an independent risk factor for developing breast cancer.This is because denser breast tissue contains more connective and glandular tissues, which can potentially hide tumors and it is also associated with a higher likelihood of cancer development (11).found a sixfold risk difference between densest and least dense categories (42).investigated this relationship across a cohort of 21,150 women, confirming the effectiveness of automated density assessments in predicting breast cancer risk.Similarly (69) emphasizes higher risk in younger women with lower BMI (46).explores mammographybased risk assessment for early screening.These studies underscore the importance of considering mammographic density in breast cancer risk assessment and screening.• Other cancers and diseases: The presence of other cancers may indicate heightened risk for breast cancer (107).developed prognostic nomograms for breast cancer patients with lung metastasis (66).addressed disparities in colorectal and breast cancer screenings (83).revealed screening rate disparities among females with schizophrenia (106).noted a slight increase in primary lung cancer risk post-radiotherapy for breast cancer.

Lifestyle factors
• Alcohol consumption: Alcohol consumption significantly increases breast cancer risk, even with moderate intake (85).revealed odds ratios between 1.82 to 5.67, indicating a notable association (40).highlighted a high prevalence (18.34%) of risky drinking among Australian women, exceeding weekly guidelines.These studies emphasize the importance of preventive measures.These findings underscore the link between alcohol intake and breast cancer risk, highlighting the need for preventive measures (35,51).(103), positively impact survivors' quality of life (49).link low physical activity to higher risk, especially in post-menopausal women.Additionally (91), propose personalized surveillance integrating lifestyle factors for better outcomes.• Stress, anxiety, or depression: Chronic stress may impact breast cancer risk (57).links stress, anxiety, and depression to reduced quality of life in survivors (103).shows positive outcomes in QoL (Quality of Life) indicators with homebased interventions despite pandemic challenges.

Environmental factors
• Exposure to radiation: Exposure to ionizing radiation, like from radiotherapy, elevates breast cancer risk, especially when received at a young age.Studies explore various factors (38): concluded that exposure to chest radiation therapy significantly elevates breast cancer risk, with individuals who have undergone such treatments facing a notably higher likelihood of developing the disease.Similarly (57), mention that receiving chest radiation therapy was significantly associated with a higher risk of breast cancer, with an Adjusted Odds Ratio (AOR) of 6.43, indicating a more than sixfold increase in risk compared to those who had not received such therapy (98).found that genetic variations can influence an individual's susceptibility to radiation toxicity (106).discusses lung cancer risk post-radiotherapy (111); links menopause to chemotherapy side effects; and (22) reported a high radiodermatitis incidence (98.2%) in breast cancer patients undergoing radiotherapy, with BMI and statin use affecting severity, and hydrogel showing protective effects.• Exposure to chemicals: Chemicals like endocrine disruptors may disrupt hormonal balance, potentially contributing to breast cancer (105).evaluates CDK4/6 inhibitors' toxicity in metastatic breast cancer, stressing personalized treatment strategies due to varying drug profiles.• Environmental pollutants, specific exposures and heavy metals: Environmental pollutants, including heavy metals and air pollution, contribute to breast cancer risk (6).found altered levels of metals like copper and cadmium in breast cancer patients (96).investigated air pollution's association with postmenopausal breast cancer risk, finding a significant 18% risk increase with a 10 µg/m3 rise in PM10 levels in 2007.

The role of artificial intelligence models for detecting breast cancer
The integration of artificial intelligence (AI) in breast cancer management spans various aspects, including diagnosis, recurrence prediction, survival rate estimation, and treatment response assessment.Studies like (5) demonstrate the effectiveness of machine learning models, achieving 80.23% accuracy in diagnosing early-stage breast cancer.Key risk factors identified for breast cancer included levels of glucose, age, and resistin.This approach demonstrates the potential of machine learning in enhancing breast cancer diagnostic processes by effectively selecting critical risk factors.Similarly (8), utilizes NLP and machine learning to predict breast cancer recurrence, emphasizing the efficacy of the OneR algorithm.The main clinical data used in the paper for predicting breast cancer recurrence involve a wide range of factors extracted from electronic health records (EHR).These include diagnostic symptoms, medications, lab results, medical recommendations, past medical history, procedures, family history, imaging, endoscopic assessments, anesthesia types, allergies, and other clinical documents.NLP algorithms were developed to extract these key features from the medical records.Notably (81), highlights Support Vector Machine (SVM) as the most accurate algorithm for breast cancer prediction, achieving an accuracy of 97.2%.The characteristics of the cell nuclei present in the images, are used as inputs for the SVM.They include, Radius, Texture, Area, Perimeter, Smoothness, Compactness, Concavity, Concave points, Symmetry, and Fractal dimension.These attributes are determined from the digitized images and serve as the basis for the SVM model to classify instances into benign or malignant categories.
For detection purposes, most of the papers use mammography images for training deep learning models, by assuming these algorithms are able to detect anomalies in the breast tissue.In this context, a comprehensive review is provided by (14) (17) which developed a Machine Learning (ML) system for classifying breast cancer and diagnosing cancer metastases using clinical data extracted from Electronic Medical Records (EMRs).The best results have been obtained by a decision tree classifier which achieved 83% accuracy and an AUC (Area Under the Curve) of 0.87, demonstrating the potential of ML models based on blood profile data to aid professionals in identifying high-risk metastases breast cancer patients, thereby improving survival outcomes.
Regarding treatment response assessment (28), employs CNNs to predict treatment response in breast cancer patients undergoing chemotherapy, achieving high accuracies for various parameters.The study integrates both imaging and non-imaging data for the inputs of the models included longitudinal multiparametric MRI data (dynamiccontrast-enhanced MRI and T2-weighted MRI), demographics, and molecular subtypes.The use of advanced imaging techniques alongside clinical and molecular data indicates the need for a personalized treatment planning and assessment in breast cancer care (73).demonstrates deep learning's superior performance in risk identification compared to traditional Machine Learning (ML) methods.Important inputs for their models include age, resistin levels, global burden of disease (GBD) relative risk upper values, glucose, adiponectin, high BMI (binary), MCP-1, leptin, relative risks from meta-analyses, obesity (binary), and insulin levels.These inputs were selected based on their relevance and low redundancy for predicting breast cancer, highlighting the potential of deep learning to complement traditional screening methods by identifying individuals at risk non-invasively and affordably.In survival rate prediction (63), evaluates ML's role, highlighting challenges like data preprocessing and model validation.review 31 studies, mainly from Asia, to predict the 5-year survival rate of breast cancer.It is highlighted that among the papers reviewed, the most used algorithms are decision trees (61.3%), artificial neural networks (58.1%) and support vector machines (51.6%),where clinical and molecular information was used to build predictive models (73).used a database of 116 women, of which 52 were healthy and 64 had been diagnosed with breast cancer.The information included demographic and anthropometric data.The application of Deep Learning was considered the best evaluated method for breast cancer prediction, among algorithms such as SVM, Neural Networks, Logistic Regression, XGBoost, Random Forest, Naive Bayes and Stochastic Gradient.Lastly, studies like (88) predict patient satisfaction post-mastectomy, revealing that 45.2% of women experienced improved satisfaction with their breasts.These findings underscore the potential of AI in enhancing various aspects of breast cancer management, from diagnosis to patient satisfaction assessment.A novel approach that integrates Machine Learning (ML) algorithms with Explainable Artificial Intelligence (XAI) has been recently developed to enhance the understanding and interpretation of predictions made by ML models.In the context of breast cancer research (95), introduced a Hybrid Algorithm combining ML and XAI techniques aimed at preventing breast cancer.This innovative methodology enables the identification and extraction of key risk factors, such as high-fat diets and breastfeeding habits, to accurately differentiate between patients with and without breast cancer among Indonesian women.Risk indicators, such as auxiliary nodes and breast density, can also be extracted by the images by using deep learning (7, 56, 84).

Discussion
Upon reviewing multiple studies on breast cancer and its associated risk factors, several key findings emerge.A detailed description of the results of each work will be presented in Section 3.2.This analysis advocates for a multifaceted approach to prevention, screening, and treatment, reflecting the complex nature of breast cancer risk factors.

Conclusion
Our research reveals a breakthrough in early detection of breast cancer with machine learning models demonstrating an impressive diagnostic accuracy of 80.23%.The bibliographic review and analysis of the last 5 years in this field allowed us to identify the transformative impact of AI both in the identification of risk factors and in the improvement of diagnostic accuracy.Our analysis, unlike previous studies such as those by (69) (89), and (35), goes beyond updating risk factor inventories to show the fundamental role of sophisticated risk algorithms.AI.These tools, particularly SVM, have achieved an accuracy rate of up to 97.2% in locating breast cancer, which is a significant leap over traditional diagnostic methods by using a wider range of datasets, including images and clinical details including risk factors for your diagnosis.
Future explorations should delve into AI's ability to tailor breast cancer detection and treatments, thereby improving patientspecific outcomes.
Flow chart of the methodology.
applying inclusion and exclusion criteria, and to narrow down the selection from approximately 250 papers to 112 most relevant papers; employing techniques for a more deep analysis of the risk factors mentioned across the selected papers and categorizing the identified risk factors into specific groups for a structured analysis.This methodology not only ensures a comprehensive understanding of the existing research landscape but also supports the identification of key risk factors for breast cancer, facilitating a more precise and evidence-based analysis.
displays the distribution of bibliographic authors by country.In this chart, 'MCP' represents Multiple Country Publications, indicating research papers co-authored by individuals from various nations, while 'SCP' signifies Single Country Publications, denoting research executed solely by authors within the same country.This visual representation clearly indicates that the United States is at the forefront in terms

TABLE 2
Summary of risk factors and characteristics in breast cancer research literature.
focusing on various ANN models such as Spiking Neural Network (SNN), Deep Belief Network (DBN), Convolutional Neural Network (CNN), Multilayer Neural Network (MLNN), Stacked Autoencoders (SAE), and Stacked De-noising Autoencoders (SDAE).The review highlights the effectiveness of these models in improving diagnosis accuracy, precision, recall, and other metrics, with particular success noted in models like ResNet-50 and ResNet-101 within the CNN algorithm framework.Instead, clinical data have been considered by Medical history, specifically breast density and the history of other cancers, can influence breast cancer risk.In particular, dense breast tissue can obscure mammograms, making detection more challenging, and emphasizes the independent risk factor that high breast density presents.Additionally, the history of other cancers may indicate an elevated risk for breast cancer.This work underscores the importance of considering an individual's medical history in breast cancer risk assessments and the need for personalized screening strategies.• Lifestyle factors such as alcohol consumption, cigarette smoking, obesity, poor nutrition, and physical inactivity, highlight their significant roles in increasing breast cancer risk and the necessity of addressing these modifiable risk factors through public health interventions and individual lifestyle changes to reduce breast cancer incidence.This review underscores the potential of preventive measures and lifestyle modifications in mitigating breast cancer risk, emphasizing the importance of holistic approaches in breast cancer prevention strategies.• Environmental factors like radiation exposure, chemicals, and pollutants, play a significant role in breast cancer risk.The cited works emphasize the need for awareness and protective measures against these exposures.Highlighting the complexity of breast cancer etiology, our work calls for comprehensive research to better understand the interactions between environmental factors and genetic predisposition, and for public health strategies to minimize exposure and mitigate breast cancer risk.• The description of role of artificial intelligence (AI) models in detecting breast cancer illustrates the significant potential AI has in enhancing diagnostic accuracy, predicting recurrence, estimating survival rates, and assessing treatment response.Highlighting various studies, this review shows that machine learning algorithms, such as Support Vector Machines (SVM) and Convolutional Neural Networks (CNNs), have achieved notable success.This discussion emphasizes AI's transformative impact on breast cancer management, advocating for further research and integration of AI technologies to tailor detection and treatment approaches, ultimately improving patient outcomes.