- Department of Pharmacy, Chongqing University Cancer Hospital, Chongqing, China
Objective: Prediction models, which estimate disease or outcome probabilities, are widely used in cancer research. This study aims to identify hotspots and future directions of cancer-related prediction models using bibliometrics.
Methods: A comprehensive literature search was conducted in the Science Citation Index Expanded (SCIE) from the Web of Science Core Collection (WoSCC) up to November 15, 2024, focusing on cancer-related prediction models research. Co-occurrence analyses of countries, institutions, authors, journals, and keywords were conducted using VOSviewer 1.6.20. Additionally, keyword clustering, timeline visualization, and burst term analysis were performed with CiteSpace 6.3.
Results: A total of 1,661 records were retrieved from the SCIE. After deduplication and eligibility screening, 1,556 publications were included in the analysis. The bibliometric analysis revealed a consistent annual increase in cancer-related prediction model research, with China and the United States emerging as the leading contributors. The United States, England, and the Netherlands had the strongest collaborative networks. The most frequent keywords, excluding “prediction model” and “predictive model”, included nomogram (frequency=192), survival (191), risk (121), prognosis (112), breast cancer (103), carcinoma (93), validation (87), surgery (85), diagnosis (83), chemotherapy (80), and machine learning (77). Besides, the timeline view analysis indicated that the “#7 machine learning” cluster was experiencing vigorous growth.
Conclusion: Cancer-related prediction models are rapidly advancing, especially in prognostic models. Emerging modeling techniques, such as neural networks and deep learning algorithms, are likely to play a pivotal role in current and future cancer-related prediction model research. Systematic reviews of cancer-related predictive models, which could help clinicians select the optimal model for specific clinical conditions may emerge as potential research directions in this field.
1 Introduction
Cancer remains a paramount concern in global public health, imposing a significant burden on both healthcare systems and society due to its rising incidence and mortality rates (1–3). According to statistics from the International Agency for Research on Cancer (IARC), the number of new cancer cases worldwide has surged from 14.1 million in 2012 to nearly 19.98 million in 2022, with corresponding fatalities increasing to 9.74 million (4). The investigation into the etiology, progression, and prognosis of cancer, a complex condition posing a grave threat to human health, has remained a central and challenging area of medical research (5, 6).
The emergence and advancement of bioinformatics, big data analytics, and machine learning have led to the extensive study and application of clinical prediction models (CPMs) in cancer. These models offer novel opportunities for early detection, risk assessment, personalized therapy, and prognostic management of cancer (7–9). CPMs in cancer are generally classified into two main types: cancer incidence prediction models (10–12) and cancer prognosis prediction models (13–15). The former is designed to pinpoint populations at high risk for proactive intervention, while the latter concentrates on predicting post-diagnostic disease progression, recurrence risk, risk of cancer-related complications, and survival probabilities, thereby guiding treatment planning strategies.
Despite the proliferation of studies on cancer-related prediction models, comprehensive reviews and analyses of research trends, technical methodologies, international collaboration networks, and academic influence in this field remain lacking. This study utilizes bibliometric techniques to conduct an extensive review and in-depth analysis of the publications on cancer-related prediction models, providing a thorough synopsis of cancer prediction modeling research. To assist researchers in keeping pace with the latest developments in the field, this study delineates the research momentum, development trajectories, collaborative networks, and the distribution of key authors and institutions, while highlighting key areas of interest and potential future directions.
2 Methods
2.1 Eligibility criteria
The inclusion criteria for publications were as follows: (1) the publications pertained to cancer-related prediction models; (2) the publications were published in English; (3) the publication date ranged from the inception of the database up to November 15, 2024. The following were excluded: (1) reviews; (2) editorial material; (3) letters, replies, and corrections; (4) duplicate publications; (5) retracted publications; (6) news items.
2.2 Search strategy
The primary database for our literature search was the Science Citation Index Expanded (SCIE) of the Web of Science Core Collection (WoSCC). The search was conducted using the following strategy: (“neoplasm” OR “tumor” OR “cancer” OR “oncology” [Title]) AND (“predictive model” OR “prediction model” OR “forecasting model” [Title]).
2.3 Bibliometric and visualization analysis
Our study used VOSviewer 1.6.20 to perform co-occurrence analysis on countries, institutions, authors, journals, and keywords within the included publications. Keyword clustering, timeline view, and burst analysis were conducted by CiteSpace 6.3. CiteSpace enables the generation of timeline views and burst term emergence maps across time slices, thereby delineating the evolutionary trajectory of a research field and the historical context of publications within clusters (16). This facilitates an elucidation of the development process, research hotspots, and trends within the field. In contrast, VOSviewer emphasizes the graphical representation of bibliometric data, offering a diverse array of visualizations for areas including keywords, institutions, and authors (17). The integration of these two tools results in a comprehensive and multidimensional analysis, thoroughly uncovering the current state and future trajectory of research in cancer-related prediction models.
Publication deduplication and screening were carried out using EndNote X8. The records that met the eligibility criteria were subsequently imported into both CiteSpace 6.3 and VOSviewer 1.6.20 in plain text format. In CiteSpace, the time slice unit was set to one year, and to ensure the aesthetic and readability of the Timeline view, only keywords with a frequency of 20 or higher were displayed (Figure 1).
In analyzing authors, Price’s Law (18) and Lotka’s Law (19) were applied to estimate the minimum number of publications for core authors within the field. This established the threshold for author analysis, thereby identifying representative scholars and the core research strengths within the field. (, where Mmin denotes the minimum number of publications for core authors, and Nmax represents the number of publications by the most productive author.) Additionally, Bradford’s Law (20) was utilized as a bibliometric indicator for identifying core journals. This law reveals the distribution of scientific literature within specific disciplines and facilitates the identification of the most prominently published and influential journals within a specific scientific domain.
3 Results
3.1 Literature screening
This study retrieved a total of 1,661 records from the SCIE database. After deduplication and screening, 1,556 eligible records are ultimately selected for inclusion. Figure 2 illustrates the flowchart detailing the literature screening process.
3.2 Types and annual distribution of publications
A comprehensive review of the publication types and release dates within the field of cancer-related prediction models, spanning from the inception of the SCIE database to 2024, has been conducted (Figure 3). During this timeframe, a total of 1,556 relevant publications are released, comprising 1,095 articles (70.37%), 431 meeting abstracts (27.70%), 20 early access articles (1.29%), and 10 proceedings papers (0.64%). The cumulative citation count reaches to 18,422, with an average citation frequency per publication of 11.84. Prior to 2008, only a limited number of publications related to cancer-related prediction models were released annually, suggesting that the field was in its nascent stage. However, from 2008 to 2023 (with 202 publications as of November 15, 2024, which is less than the annual publication volume for 2023), the volume of publications witnesses a significant increase, marking a period of rapid development and maturity for the field. In terms of citation metrics, the average citation frequency per publication for the years 2002, 2004, 2005, 2006, 2008, and 2009 was 50 or higher, with 2002 and 2004 standing out particularly, as the average citation frequency per publication for these years approaches nearly 140. These findings emphasize the growing academic and clinical interest among researchers in cancer-related prediction models.

Figure 3. Distribution of publication types and annual publication volume. (A) Annual publication volume and citations of publications; (B) Distribution of publication types.
3.3 Countries/regions and institutions
The included publications originate from 2,334 institutions across 65 countries/regions, with each contributing at least one relevant publication. Among these countries, China has the highest number of publications (n=625, 40.17%) (Table 1). The United States follows in second place, with 346 publications (22.24%). Other countries with significant publication volumes include South Korea (120, 7.71%), the Netherlands (111, 7.13%), Japan (106, 6.81%), England (97, 6.23%), and Italy (78, 5.01%). The collaboration network among these countries/regions is depicted in Figure 4A. Specifically, the United States has the closest collaboration ties with other countries/regions, followed by England, the Netherlands, Germany, Italy, France, Canada, and China.

Figure 4. Distribution of countries/regions and institutions. (A) A visual mapping of the collaborative networks among countries/regions in relevant publications. Each circle represents a country/region, with the size of the circle proportional to the number of publications; larger circles imply a greater number of publications. (B) A visual mapping of the collaborative networks among institutions. Each circle represents an institution, and the size of the circle proportional to the number of publications; larger circles imply a greater number of publications.
The institutions with the highest publication output include Sun Yat-sen University (36 publications, 2.31%), the University of Texas MD Anderson Cancer Center (35, 2.31%), Seoul National University (31, 1.99%), and the Chinese Academy of Medical Sciences & Peking Union Medical College (29, 1.86%). A visual representation of the collaborative networks among these institutions is presented in Figure 4B. Notably, the University of Texas MD Anderson Cancer Center exhibits the strongest collaborative ties with other institutions, followed by Seoul National University, Harvard Medical School, the University of California (San Francisco), Erasmus Medical Center, and Sun Yat-sen University (Table 2).
3.4 Authors and journals
In accordance with Price’s Law (, authors with three or more publications are designated as core authors. Among the 11,318 authors, 401 are identified as core authors, collectively contributing 1,433 articles (92.10% of the overall publications). A visual representation of authors with four or more publications is depicted in Figure 5A. Notably, Antoniou, Antonis C., Easton, Douglas F., Lambin, P., and Valentini, V. emerge as the most prolific authors, each publishing eight articles (Table 3). The citations of these authors are 517, 502, 2, and 1, with average citations per publication being 64.63, 62.75, 0.25, and 0.13, respectively.

Figure 5. Distribution of authors and journals. (A) Visual mapping of the collaboration networks among authors. Each circle represents an author, and a larger circle indicates more publications. (B) Visual mapping of the journals. Each circle represents a journal, and a larger circle indicates more publications.
Regarding journal distribution, the included publications span 478 journals. Based on Bradford’s Law, the top 36 journals with the highest publication volume are recognized as core journals within the field of cancer-related prediction models. JOURNAL OF CLINICAL ONCOLOGY leads the list with 82 articles (Table 4), followed by FRONTIERS IN ONCOLOGY (45 publications) and INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS (44 publications). A visual mapping of journals publishing five or more is presented in Figure 5B. Within the top 36 journals, ANNALS OF ONCOLOGY boasts the highest impact factor (IF) of 56.7. Meanwhile, BRITISH JOURNAL OF CANCER achieves the highest average citation per publication at 61.18, followed by JOURNAL OF UROLOGY with 35.10 (Table 4).
3.5 Keywords
3.5.1 Co-occurrence and cluster analysis of keywords
In the co-occurrence analysis of keywords, a total of 4,225 keywords are identified. Table 5 presents the top 30 keywords. With the exception of “prediction model” and “predictive model,” the most frequently occurring keywords include: “nomogram” (192 occurrences), “survival” (191), “risk” (121), “prognosis” (112), “breast cancer” (103), “carcinoma” (93), “validation” (87), “surgery” (85), “diagnosis” (83), “chemotherapy” (80), and “machine learning” (77). These keywords highlight the current primary research directions within the field. The visual representation of the keywords is illustrated in Figure 6A.

Figure 6. Co-occurrence and cluster of keywords. (A) VOSviewer keyword co-occurrence map: Each circle represents a keyword, and a larger circle indicates a higher number of publications associated with that keyword. To ensure readability, only keywords with a frequency of occurrence ≥20 are visually mapped in the VOSviewer keyword co-occurrence map. (B) CiteSpace keyword clustering map: Different colored areas represent different clusters of keywords.
Through further cluster analysis of keywords, a structured outline of the research landscape in this field is presented, enabling researchers and clinicians to grasp a series of knowledge threads that constitute the structure of the field and swiftly comprehend the hotspots within the research area. Figure 6B displays the visual mapping of nine keyword clusters, which primarily include “#0 prediction model,” “#1 breast cancer,” “#2 carcinoma,” “#3 predictive model,” “#4 risk score,” “#5 online application,” “#6 risk factors,” “#7 machine learning,” and “#8 prostate cancer”.
3.5.2 Burst term and timeline view analysis
A total of 16 burst terms, each with strengths exceeding 3, are detected. The burst strength of each term is visually displayed in Figure 7A, where the length of the red line signifies the duration of the burst. Notably, “breast cancer” was the first burst term to emerge, spanning from 2004 to 2014, with a burst strength of 6.91. The burst terms “breast cancer” and “women” share the longest burst duration, extending from 2008 to 2018. Additionally, “women” exhibits the highest burst strength, with a value of 7.73.
The timeline view analysis provides a profound longitudinal perspective on the evolution of cancer-related prediction models (Figure 7B). Clusters such as “#0 prediction model,” “#2 carcinoma,” “#3 predictive model,” “#4 risk score,” and “#6 risk factors” demonstrate sustained vitality, reflecting the enduring interest of the research community. Furthermore, the lifespan of the “#7 machine learning” cluster emphasizes its emerging or continued significance within this field.

Figure 7. Keyword burst and timeline. (A) CiteSpace burst term map: Burst terms typically represent emerging research directions or shifts in field hotspots. The red segment indicates the burst period of the keyword (i.e., the timeframe when its frequency surged abruptly), while the blue segment corresponds to conventional active periods before or after the burst. Strength refers to the Burst Strength — the higher the value, the more rapidly the attention to the keyword has grown. (B) CiteSpace timeline map: Temporal analysis of keyword clusters, highlighting longitudinal trends, and pivotal milestones. The horizontal axis represents years, while the vertical axis displays keyword clusters. Keywords within the same-color cluster are thematically related. Connecting lines indicate co-occurrence relationships between keywords, and thicker lines signify stronger associations.
3.5.3 Density visualization and timeline view analysis of machine learning
The dual-perspective visualization reveals the evolving dynamics of machine learning applications in the cancer-related prediction model research (Figures 8A, B). Figure 8A displays a co-occurrence density map where “machine learning” serves as a central hub, forming an interconnected radiating network with clinical decision nodes (diagnosis and prognosis), technical components (predictive models, risk factors and nomogram), and specific malignancies including breast cancer and colorectal cancer. The timeline network map in Figure 8B, organized along a chronological axis (2011-2024) with clustered networks, delineates the evolutionary trajectory of keyword clusters. Notably, prediction and predictive model clusters (#2, #5) demonstrate marked surges in research density following technological breakthroughs in deep learning, artificial intelligence, bagging algorithms, and artificial neural networks. Multidimensional analysis indicates that machine learning applications are progressively extending from predictive models in well-established cancer types (breast cancer, colorectal cancer, and lung cancer) to more complex malignancies such as pancreatic cancer.

Figure 8. Density visualization and timeline of machine learning. (A) VOSviewer keyword density visualization: Each circle represents a keyword, and a brighter circle indicates a higher number of publications associated with that keyword. To ensure readability, only keywords with a frequency of occurrence ≥5 are visually mapped in the density visualization. (B) CiteSpace timeline map: Temporal analysis of keyword clusters, highlighting longitudinal trends, and pivotal milestones. The horizontal axis represents years, while the vertical axis displays keyword clusters. Keywords within the same-color cluster are thematically related. Connecting lines indicate co-occurrence relationships between keywords, and thicker lines signify stronger associations.
4 Discussion
To a certain extent, the distribution of publication dates can provide intuitive insights into the pace of development within a particular research field. As illustrated in Figure 3A, the annual number of publications on cancer-related prediction models has shown a sharp upward trend since 2015, suggesting that this field has garnered increasing attention and significance from scholars, accompanied by a growing academic and clinical interest. The number of publications by countries/regions and institutions objectively reflects the core research capabilities and influential regions within this research field. China has the highest number of publications (625 publications, 40.17%), followed by the United States (346, 22.24%), South Korea (120, 7.71%), and the Netherlands (111, 7.13%). These countries are prominent scientific contributors and have made substantial contributions to the advancement of cancer-related prediction models. Analysis of the international collaboration networks reveals that the United States, England, and the Netherlands have the closest collaborations with other countries. This close cooperation and the enhancement of international exchanges are conducive to fostering the development of this field, which may be one of the pivotal factors underlying its rapid progression in recent years. By collaborating across borders, researchers can combine their knowledge, skills, and data, leading to more comprehensive and impactful studies. For example, through international collaboration, researchers can access patient populations in different countries, which can improve the generalizability of cancer - related prediction models (21).
Regarding individual contributions, Antoniou, Antonis C., Easton, Douglas F., Lambin, P., and Valentini, V. have published the highest number of publications (eight each), highlighting their significant contributions to the development of cancer-related prediction models. Based on the publications analysis of these authors’ publications, Antoniou, Antonis C., and Easton, Douglas F., from the University of Cambridge, have focused on cancer diagnosis prediction models, including breast cancer risk prediction models (22, 23), epithelial tubo-ovarian cancer risk prediction models (24), and colorectal cancer risk prediction models (25). Meanwhile, Lambin, P. from MAASTRO Clinic, and Valentini, V. from the Università Cattolica del Sacro Cuore, have primarily concentrated on lung cancer prognosis prediction models (26, 27) and colorectal cancer prognosis models (28, 29), respectively.
Frank, I., Weaver, A.L., Cheville, J.C., Blute, M.L., Lohse, C.M., and Zincke, H. received the highest citations (914). In 2002, they developed a scoring system (SSIGN score) based on features such as tumor stage, size, grade and necrosis to predict the prognosis of patients undergoing radical nephrectomy for clear cell renal cell carcinoma (30). Subsequently, Tyrer, J., Duffy, S.W., and Cuzick, J., et al. (895 citations) established a breast cancer prediction model in 2004 that integrates familial and personal risk factors (31) and incorporated it into a computer program to provide personalized risk estimates.
Among the journals, JOURNAL OF CLINICAL ONCOLOGY (82 publications) holds the highest number of publications, followed by FRONTIERS IN ONCOLOGY (45 publications) and INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY (44 publications). Among the top 36 core journals, ANNALS OF ONCOLOGY has the highest impact factor (IF = 56.7), followed by JOURNAL OF CLINICAL ONCOLOGY (IF = 42.1). The journal with the highest average citation per article is BRITISH JOURNAL OF CANCER (61.18 citations), followed by JOURNAL OF UROLOGY (35.10 citations). These journals exert a significant influence on cancer-related prediction models.
4.1 Hotspots and future directions
Keywords, serving as pivotal indicators of the content in scholarly publications, provide a crucial tool for identifying research hotspots and developmental trajectories. By conducting a keyword co-occurrence analysis, we can elucidate the relationships among various research topics. This, consequently, offers forward-looking guidance for researchers and clinicians. Among the top-ranked keywords in publications pertaining to cancer-related prediction models, we discerned “nomogram” (frequency = 192), “survival” (191), “risk” (121), “prognosis” (112), “breast cancer” (103), “carcinoma” (93), “validation” (87), “surgery” (85), “diagnosis” (83), “chemotherapy” (80), and “machine learning” (77). Further clustering analysis of these keywords yielded nine clusters, including “#0 prediction model,” “#1 breast cancer,” “#2 carcinoma,” “#3 predictive model,” “#4 risk score,” “#5 online application,” “#6 risk factors,” “#7 machine learning,” and “#8 prostate cancer.” Moreover, in the burst analysis of keywords, we identified a total of 16 burst terms with burst intensities exceeding 3. Recent burst terms include “adjuvant chemotherapy,” “chemotherapy,” “adenocarcinoma,” “risk score,” “overall survival,” and “score.” The results of the keyword co-occurrence, clustering, and burst analyses suggest the following current research hotspots in this field: 1) prediction models for breast cancer (32, 33) and prostate cancer (34, 35); 2) prediction models for cancer prognosis (36), including the prediction of cancer-related complications (37) and responses or adverse reactions subsequent to surgery or chemotherapy in cancer patients (38–40); 3) applications of novel modeling methods, such as machine learning (41, 42); and 4) utilization of tools like risk scores and nomograms in cancer-related prediction models (43–45).
The timeline view analysis further reveals potential future trends in this field, highlighting a notable technological transition in the field of cancer-related prediction models, shifting from traditional risk assessment tools like nomograms and risk scores rooted in conventional statistical models (e.g., logistic regression) toward advanced methodologies such as machine learning and deep learning. The timeline view indicates that the life-cycle of the “#7 machine learning” cluster is exhibiting robust vitality, which reflects a burgeoning interest in machine learning algorithms and artificial intelligence [such as neural networks (46, 47) and deep learning (48, 49)] for enhancing the precision of cancer-related prediction models.
The enormous potential of machine learning and deep learning in biomedicine is increasingly recognized as transformative. As a sophisticated subset of machine learning algorithms, deep learning has been extensively implemented in domains such as image recognition and speech processing. Current and future research priorities in this field primarily focus on two key directions: (1) leveraging deep learning to integrate multimodal data (including radiomics, genomics, and metabolomics) to enhance the predictive accuracy and clinical utility of cancer-related prediction models (50–52); and (2) developing interpretability tools to elucidate model decision-making processes, thereby improving clinician confidence and adoption of machine learning/deep learning-based cancer-related prediction models (53). Emerging applications also demonstrate the feasibility of deep learning in predicting therapeutic efficacy and adverse effects of novel antitumor agents (54–58). For example, Yan, K. et al. (51) developed a dual-channel attention neural network (DANN) that utilizes in-born gene signatures to predict melanoma patients’ responses to immune checkpoint inhibitor therapy. This provides a tool for optimizing therapeutic regimens and minimizing adverse drug reactions.
In recent years, a substantial number of prediction models have been developed in the field of cancer, inevitably leading to multiple models for the same health issue (59). This poses challenges for clinical application selection. Additionally, these prediction model studies may be plagued by issues such as inadequate reporting quality, conflicting conclusions, high risks of bias, and limitations in accuracy and applicability, thereby impeding their clinical use. Systematic reviews may be an important method to select the best model, facilitating the interpretation of the potential applicability and generalizability of prediction models and providing a foundation for further evaluation and validation of models (60, 61). Systematic reviews on cancer-related prediction models may emerge as another research direction in this field (62, 63).
4.2 Limitations
This study used bibliometric analysis to provide a multidimensional and comprehensive perspective, as well as quantitative and qualitative insights, into the field of cancer-related prediction models. However, it also has certain limitations: 1) The relevant publications were exclusively sourced from the SCIE database and published in English, excluding publications from other databases and in other languages, which may introduce bias. 2) Searching all fields might retrieve many irrelevant publications. To ensure a high relevance of the retrieved publications to the cancer-related prediction models, we restricted the search to the title field. However, this may exclude some relevant publications that were not identified during the search process. 3) Moreover, challenges in accurately identifying authors due to factors like workplace changes, identical names within the same institution, or typographical errors or spelling discrepancies in names posed difficulties in precisely evaluating author contributions, which was an inherent limitation of bibliometric analysis.
5 Conclusion
This bibliometric analysis highlights research hotspots and trends in cancer-related prediction models. In recent years, there has been a substantial increase in the number of publications on cancer-related prediction models, with researchers focusing predominantly on adenocarcinoma diagnostic and prognostic models. Furthermore, the novel modeling techniques, such as machine learning algorithms, particularly deep learning algorithms, is likely to be a pivotal research direction both currently and in the future. Systematic reviews of cancer-related predictive models, which could help clinicians select the optimal model for specific clinical conditions, may emerge as the potential research directions.
Data availability statement
The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding author.
Author contributions
SL: Conceptualization, Data curation, Formal analysis, Methodology, Visualization, Writing – original draft, Writing – review & editing. WL: Conceptualization, Methodology, Supervision, Writing – review & editing. XW: Conceptualization, Funding acquisition, Writing – review & editing. WC: Conceptualization, Funding acquisition, Writing – review & editing.
Funding
The author(s) declare that financial support was received for the research and/or publication of this article. This research was supported by the Chongqing Medical Scientific Research Project (Joint project of the Chongqing Health Commission and the Science and Technology Bureau, Grant Nos. 2024QNXM016 and 2024ZDXM025), the Chongqing Medical Youth Top Talent Project (Grant No. YXQN202456), the Wu Jie ping Medical Foundation (Grant No. 320.6750.2024-6-46) and the National Key Laboratory of Neuro-Oncology Drug Research Open Project (Youth Project Grant) (Grant No. SKLSIM-2024079).
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declare that no Generative AI was used in the creation of this manuscript.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
1. Global Burden of Disease 2019 Cancer Collaboration, Kocarnik JM, Compton K, Dean FE, Fu W, Gaw BL, Harvey JD, et al. Cancer incidence, mortality, years of life lost, years lived with disability, and disability-adjusted life years for 29 cancer groups from 2010 to 2019: A systematic analysis for the global burden of disease study 2019. JAMA Oncol. (2022) 8:420–44. doi: 10.1001/jamaoncol.2021.6987
2. Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: Cancer J Clin. (2024) 74:229–63. doi: 10.3322/caac.21834
3. Collaborators GF. Burden of disease scenarios for 204 countries and territories, 2022-2050: a forecasting analysis for the Global Burden of Disease Study 2021. Lancet (London England). (2024) 403:2204–56. doi: 10.1016/S0140-6736(24)00685-8
4. Cao W, Qin K, Li F, and Chen W. Comparative study of cancer profiles between 2020 and 2022 using global cancer statistics (GLOBOCAN). J Natl Cancer Center. (2024) 4:128–34. doi: 10.1016/j.jncc.2024.05.001
5. Weeden CE, Hill W, Lim EL, Grönroos E, and Swanton C. Impact of risk factors on early cancer evolution. Cell. (2023) 186:1541–63. doi: 10.1016/j.cell.2023.03.013
6. Zhou Y, Tao L, Qiu J, Xu J, Yang X, Zhang Y, et al. Tumor biomarkers for diagnosis, prognosis and targeted therapy. Signal Transduct Target Ther. (2024) 9:132. doi: 10.1038/s41392-024-01823-2
7. Wu X, Li W, and Tu H. Big data and artificial intelligence in cancer research. Trends Cancer. (2024) 10:147–60. doi: 10.1016/j.trecan.2023.10.006
8. Swanson K, Wu E, Zhang A, Alizadeh AA, and Zou J. From patterns to patients: Advances in clinical machine learning for cancer diagnosis, prognosis, and treatment. Cell. (2023) 186:1772–91. doi: 10.1016/j.cell.2023.01.035
9. Zhang C, Xu J, Tang R, Yang J, Wang W, Yu X, et al. Novel research and future prospects of artificial intelligence in cancer diagnosis and treatment. J Hematol Oncol. (2023) 16:114. doi: 10.1186/s13045-023-01514-5
10. Wu Z, Wang F, Cao W, Qin C, Dong X, Yang Z, et al. Lung cancer risk prediction models based on pulmonary nodules: A systematic review. Thorac Cancer. (2022) 13:664–77. doi: 10.1111/1759-7714.14333
11. Li H, Sun D, Cao M, He S, Zheng Y, Yu X, et al. Risk prediction models for esophageal cancer: A systematic review and critical appraisal. Cancer Med. (2021) 10:7265–76. doi: 10.1002/cam4.4226
12. Harrison H, Thompson RE, Lin Z, Rossi SH, Stewart GD, Griffin SJ, et al. Risk prediction models for kidney cancer: A systematic review. Eur Urol Focus. (2021) 7:1380–90. doi: 10.1016/j.euf.2020.06.024
13. El Haji H, Souadka A, Patel BN, Sbihi N, Ramasamy G, Patel BK, et al. Evolution of breast cancer recurrence risk prediction: A systematic review of statistical and machine learning-based models. JCO Clin Cancer Informat. (2023) 7:e2300049. doi: 10.1200/CCI.23.00049
14. Jha AK, Mithun S, Sherkhane UB, Jaiswar V, Osong B, Purandare N, et al. Systematic review and meta-analysis of prediction models used in cervical cancer. Artif Intell Med. (2023) 139:102549. doi: 10.1016/j.artmed.2023.102549
15. Lin Q, Yang T, Yongmei J, and Die YM. Prediction models for breast cancer-related lymphedema: a systematic review and critical appraisal. Syst Rev. (2022) 11:217. doi: 10.1186/s13643-022-02084-2
16. Chen CM. CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. J Am Soc Inf Sci Technol. (2006) 57:359–77. doi: 10.1002/asi.20317
17. van Eck NJ and Waltman L. Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics. (2010) 84:523–38. doi: 10.1007/s11192-009-0146-3
19. Lotka AJ. The frequency distribution of scientific productivity. J Washington Acad Sci. (1926) 16:317–23.
21. Khadhouri S, Gallagher KM, MacKenzie KR, Shah TT, Gao C, Moore S, et al. Developing a diagnostic multivariable prediction model for urinary tract cancer in patients referred with haematuria: results from the IDENTIFY collaborative study. Eur Urol Focus. (2022) 8:1673–82. doi: 10.1016/j.euf.2022.06.001
22. Lee A, Mavaddat N, Wilcox AN, Cunningham AP, Carver T, Hartley S, et al. BOADICEA: a comprehensive breast cancer risk prediction model incorporating genetic and nongenetic risk factors. Genet Med. (2019) 21:1708–18. doi: 10.1038/s41436-018-0406-9
23. Yang X, Eriksson M, Czene K, Lee A, Leslie G, Lush M, et al. Prospective validation of the BOADICEA multifactorial breast cancer risk prediction model in a large prospective cohort study. J Med Genet. (2022) 59:1196–205. doi: 10.1136/jmg-2022-108806
24. Lee A, Yang X, Tyrer J, Gentry-Maharaj A, Ryan A, Mavaddat N, et al. Comprehensive epithelial tubo-ovarian cancer risk prediction model incorporating genetic and epidemiological risk factors. J Med Genet. (2021) 12:632–43. doi: 10.1136/jmedgenet-2021-107904
25. Zheng YY, Hua XW, Win AK, MacInnis RJ, Gallinger S, Marchand LL, et al. A new comprehensive colorectal cancer risk prediction model incorporating family history, personal characteristics, and environmental factors. Cancer Epidemiol Biomarkers Prev. (2020) 29:549–57. doi: 10.1158/1055-9965.EPI-19-0929
26. Jochems A, El-Naqa I, Kessler M, Mayo CS, Jolly S, Matuszak M, et al. A prediction model for early death in non-small cell lung cancer patients following curative-intent chemoradiotherapy. Acta Oncol. (2018) 57:226–30. doi: 10.1080/0284186X.2017.1385842
27. Oberije C, De Ruysscher D, Houben R, van de Heuvel M, Uyterlinde W, Deasy JO, et al. A validated prediction model for overall survival from stage III non-small cell lung cancer: toward survival prediction for individual patients. Int J Radiat Oncol Biol Phys. (2015) 92:935–44. doi: 10.1016/j.ijrobp.2015.02.048
28. Chiloiro G, Boldrini L, Preziosi F, Cusumano D, Yadav P, Romano A, et al. A predictive model of 2yDFS during MR-guided RT neoadjuvant chemoradiotherapy in locally advanced rectal cancer patients. Front Oncol. (2022) 12:8. doi: 10.3389/fonc.2022.831712
29. De Bari B, Vallati M, Gatta R, Lestrade L, Manfrida S, Carrie C, et al. Development and validation of a machine learning-based predictive model to improve the prediction of inguinal status of anal cancer patients: A preliminary report. Oncotarget. (2017) 8:108509–21. doi: 10.18632/oncotarget.10749
30. Frank I, Blute ML, Cheville JC, Lohse CM, Weaver AL, and Zincke H. An outcome prediction model for patients with clear cell renal cell carcinoma treated with radical nephrectomy based on tumor stage, size, grade and necrosis: The SSIGN score. J Urol. (2002) 168:2395–400. doi: 10.1016/S0022-5347(05)64153-5
31. Tyrer J, Duffy SW, and Cuzick J. A breast cancer prediction model incorporating familial and personal risk factors. Stat Med. (2004) 23:1111–30. doi: 10.1002/sim.1668
32. Chen J, Ma J, Li CX, Shao S, Su Y, Wu R, et al. Multi-parameter ultrasonography-based predictive model for breast cancer diagnosis. Front Oncol. (2022) 12:10. doi: 10.3389/fonc.2022.1027784
33. Osako T, Matsuura M, Yotsumoto D, Takayama S, Kaneko K, Takahashi M, et al. A prediction model for early systemic recurrence in breast cancer using a molecular diagnostic analysis of sentinel lymph nodes: A large-scale, multicenter cohort study. Cancer. (2022) 128:1913–20. doi: 10.1002/cncr.34144
34. Morote J, Borque-Fernando A, Esteban LM, Picola N, Muñoz-Rodriguez J, Paesano N, et al. External validation of the barcelona magnetic resonance imaging predictive model for detecting significant prostate cancer including men receiving 5-alpha reductase inhibitors. World J Urol. (2024) 42:8. doi: 10.1007/s00345-024-05092-0
35. Li RX, Li XL, Wu GJ, Lei YH, Li XS, Li B, et al. Analysis of risk factors leading to anxiety and depression in patients with prostate cancer after castration and the construction of a risk prediction model. World J Psychiatr. (2024) 14:12. doi: 10.5498/wjp.v14.i2.255
36. Lu SY, Liu ZZ, Wang YX, Meng Y, Peng R, Qu R, et al. A novel prediction model for pathological complete response based on clinical and blood parameters in locally advanced rectal cancer. Front Oncol. (2022) 12:11. doi: 10.3389/fonc.2022.932853
37. Martínez-Jaimez P, Verdú MA, Forero CG, Álvarez Salazar S, Fuster Linares P, Monforte-Royo C, et al. Breast cancer-related lymphoedema: Risk factors and prediction model. J Adv Nurs. (2022) 78:765–75. doi: 10.1111/jan.15005
38. Xie SH, Santoni G, Mälberg K, Lagergren P, and Lagergren J. Prediction model of long-term survival after esophageal cancer surgery. Ann Surg. (2021) 273:933–9. doi: 10.1097/SLA.0000000000003431
39. Xu W, Fan ZY, Wang LQ, He C, Ni Z, Hua Z, et al. Prediction model of objective response after neoadjuvant chemotherapy in patients with locally advanced gastric cancer. Am J Transl Res. (2021) 13:1568–79.
40. Sun YQ, Ping YD, Miao SM, Li Z, Pan C, Shen S, et al. Development of a multivariable clinical prediction model for liposomal doxorubicin-induced cardiotoxicity in adult breast cancer patients: a retrospective multicenter study. Ann Transl Med. (2022) 10:11. doi: 10.21037/atm-22-1935
41. Wang W, Wang WH, Zhang DD, Zeng P, Wang Y, Lei M, et al. Creation of a machine learning-based prognostic prediction model for various subtypes of laryngeal cancer. Sci Rep. (2024) 14:14. doi: 10.1038/s41598-024-56687-x
42. Monthatip K, Boonnag C, Muangmool T, and Charoenkwan K. A machine learning-based prediction model of pelvic lymph node metastasis in women with early-stage cervical cancer. J Gynecol Oncol. (2024) 35:12. doi: 10.3802/jgo.2024.35.e17
43. Wang SQ, Wang DM, Wen X, Xu XL, Liu DM, and Tian JW. Construction and validation of a nomogram prediction model for axillary lymph node metastasis of cT1 invasive breast cancer. Eur J Cancer Prev. (2024) 33:309–20. doi: 10.1097/CEJ.0000000000000860
44. Jin F, Liu W, Qiao X, Shi JP, Xin R, and Jia HQ. Nomogram prediction model of postoperative pneumonia in patients with lung cancer: A retrospective cohort study. Front Oncol. (2023) 13:8. doi: 10.3389/fonc.2023.1114302
45. Ma SX, Li F, Li J, Wang LQ, and Song HP. Risk factor analysis and nomogram prediction model construction of postoperative complications of thoracoscopic non-small cell lung cancer. J Thorac Dis. (2024) 16:19. doi: 10.21037/jtd-24-113
46. Li SS, Cai SY, Huang JH, Li Z, Shi Z, Zhang K, et al. Develop prediction model to help forecast advanced prostate cancer patients’ prognosis after surgery using neural network. Front Endocrinol. (2024) 15:12. doi: 10.3389/fendo.2024.1293953
47. Wang ZX, Xiong TY, Jiang MX, Cui Y, Qian XS, Su Y, et al. Automatic prediction model of overall survival in prostate cancer patients with bone metastasis using deep neural networks. Oncologie. (2023) 25:519–27. doi: 10.1515/oncologie-2023-0115
48. Zeng JJ, Li K, Cao FY, and Zheng YB. The development of a prediction model based on deep learning for prognosis prediction of gastrointestinal stromal tumor: a SEER-based study. Sci Rep. (2024) 14:12. doi: 10.1038/s41598-024-56701-2
49. Oh S and Shim JY. Development and validation of a deep learning-based cardiovascular disease risk prediction model for long-term breast cancer survivors. J Clin Oncol. (2024) 42:1. doi: 10.1200/JCO.2024.42.16_suppl.12023
50. Shi L, Li C, Bai Y, Cao Y, Zhao S, Chen X, et al. CT radiomics to predict pathologic complete response after neoadjuvant immunotherapy plus chemoradiotherapy in locally advanced esophageal squamous cell carcinoma. Eur Radiol. (2025) 35:1594–604. doi: 10.1007/s00330-024-11141-4
51. Yan K, Zhou Z, Liu S, Wang G, Yan G, and Wang E. Develop a deep-learning model to predict cancer immunotherapy response using in-born genomes. IEEE J Biomed Health Informat. (2025). doi: 10.1109/JBHI.2025.3555596
52. Zhang Z, Luo T, Yan M, Shen H, Tao K, Zeng J, et al. Voxel-level radiomics and deep learning for predicting pathologic complete response in esophageal squamous cell carcinoma after neoadjuvant immunotherapy and chemotherapy. J Immunother Cancer. (2025) 13:e011149. doi: 10.1136/jitc-2024-011149
53. Xiao Y, Sun S, Zheng N, Zhao J, Li X, Xu J, et al. Development of PDAC diagnosis and prognosis evaluation models based on machine learning. BMC Cancer. (2025) 25:512. doi: 10.1186/s12885-025-13929-z
54. Liu X, Song Y, Cheng P, Liang B, and Xing D. Targeting HER2 in solid tumors: Unveiling the structure and novel epitopes. Cancer Treat Rev. (2024) 130:102826. doi: 10.1016/j.ctrv.2024.102826
55. Sakellaropoulos T, Vougas K, Narang S, Koinis F, Kotsinas A, Polyzos A, et al. A deep learning framework for predicting response to therapy in cancer. Cell Rep. (2019) 29:3367–3373.e3364. doi: 10.1016/j.celrep.2019.11.017
56. Partin A, Brettin TS, Zhu Y, Narykov O, Clyde A, Overbeek J, et al. Deep learning methods for drug response prediction in cancer: Predominant and emerging trends. Front Med. (2023) 10:1086097. doi: 10.3389/fmed.2023.1086097
57. Cao A, Zhang L, Bu Y, and Sun D. Machine learning prediction of on/off target-driven clinical adverse events. Pharm Res. (2024) 41:1649–58. doi: 10.1007/s11095-024-03742-x
58. Zhu W, Zhang L, Jiang X, Zhou P, Xie X, and Wang H. A method combining LDA and neural networks for antitumor drug efficacy prediction. Digital Health. (2024) 10:20552076241280103. doi: 10.1177/20552076241280103
59. Usher-Smith JA, Li L, Roberts L, Harrison H, Rossi SH, Sharp SJ, et al. Risk models for recurrence and survival after kidney cancer: a systematic review. BJU Int. (2022) 130:562–79. doi: 10.1111/bju.15673
60. Damen JAA, Moons KGM, van Smeden M, and Hooft L. How to conduct a systematic review and meta-analysis of prognostic model studies. Clin Microbiol Infect. (2023) 29:434–40. doi: 10.1016/j.cmi.2022.07.019
61. Debray TP, Damen JA, Snell KI, Ensor J, Hooft L, Reitsma JB, et al. A guide to systematic review and meta-analysis of prediction model performance. BMJ (Clinical Res ed). (2017) 356:i6460. doi: 10.1136/bmj.i6460
62. Kreuzberger N, Damen JA, Trivella M, Estcourt LJ, Aldin A, Umlauff L, et al. Prognostic models for newly-diagnosed chronic lymphocytic leukaemia in adults: a systematic review and meta-analysis. Cochrane Database Syst Rev. (2020) 7:Cd012022. doi: 10.1002/14651858.CD012022.pub2
Keywords: cancer, prediction models, machine learning, bibliometrics, visualization analysis, hotspots and trends
Citation: Li S, Li W, Wang X and Chen W (2025) Progress and current trends in prediction models for the occurrence and prognosis of cancer and cancer-related complications: a bibliometric and visualization analysis. Front. Oncol. 15:1556521. doi: 10.3389/fonc.2025.1556521
Received: 07 January 2025; Accepted: 10 June 2025;
Published: 08 July 2025.
Edited by:
Shigao Huang, Air Force Medical University, ChinaReviewed by:
Wenjing Zhu, University of Health and Rehabilitation Sciences (Qingdao Municipal Hospital), ChinaShiwei Ma, Central South University, China
Michela Giulii Capponi, Santo Spirito in Sassia Hospital, Italy
Copyright © 2025 Li, Li, Wang and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Wanyi Chen, Y2hlbndhbnlpQGNxdS5lZHUuY24=
†These authors have contributed equally to this work