Use and accuracy of decision support systems using artificial intelligence for tumor diseases: a systematic review and meta-analysis

Background For therapy planning in cancer patients multidisciplinary team meetings (MDM) are mandatory. Due to the high number of cases being discussed and significant workload of clinicians, Clinical Decision Support System (CDSS) may improve the clinical workflow. Methods This review and meta-analysis aims to provide an overview of the systems utilized and evaluate the correlation between a CDSS and MDM. Results A total of 31 studies were identified for final analysis. Analysis of different cancers shows a concordance rate (CR) of 72.7% for stage I-II and 73.4% for III-IV. For breast carcinoma, CR for stage I-II was 72.8% and for III-IV 84.1%, P≤ 0.00001. CR for colorectal carcinoma is 63% for stage I-II and 67% for III-IV, for gastric carcinoma 55% and 45%, and for lung carcinoma 85% and 83% respectively, all P>0.05. Analysis of SCLC and NSCLC yields a CR of 94,3% and 82,7%, P=0.004 and for adenocarcinoma and squamous cell carcinoma in lung cancer a CR of 90% and 86%, P=0.02. Conclusion CDSS has already been implemented in clinical practice, and while the findings suggest that its use is feasible for some cancers, further research is needed to fully evaluate its effectiveness.


Introduction
Cancer is one of the leading causes of death worldwide (1).In 2020, 10 million people worldwide died from cancer (2).Interdisciplinary tumor boards or multidisciplinary team meetings (MDMs) are the backbone in treatment planning for patients with tumor disease (3).MDMs are usually held on a weekly basis, with the goal of finding the best treatment based on current guidelines and medical evidence.Indeed, medical guidelines strongly recommend discussing patients in MDMs prior to the actual treatment (4).
The goal of MDMs is to weigh potential treatment options based on available patient data and radiological exams.A complete set of the required patient data including performance status, tumor stage and co-morbidities is required for effective decision-making (5).In most countries, data are currently entered manually into simple online forms such as the Giessen Tumor Documentation System (GTDS) in preparation for MDMs (6).Administrative and procedural difficulties in retrieving patient information are not uncommon, usually due to missing pathology and radiology results or incomplete information on referral forms from other medical institutions (7).Thus, missing data can lead to delays in diagnosis and treatment (8).Moreover, excessive workload and time pressure adversely affect MDMs (9), which can in turn lead to unstructured case discussions and variability in the quality of decision-making.
To overcome the current problems in conventional MDMs, automated processes and decision support systems might help.There is increasing research on AI and machine learning (ML) techniques applied in MDM (Figure 1).In recent times, artificial intelligence (AI) is viewed as a branch of engineering that implements novel concepts and solutions to resolve complex challenges.With rapid advancements in technology, computers may someday be as intelligent as humans (10).Today, the natural language processing (NLP) model ChatGPT can hold conversations and produce meaningful text such as e-mail or essay writing when given prompts via a dialogue format (11).In medicine, AI can be divided into two main branches: virtual and physical (10).ML is an area of AI that aims to process large amounts of qualitative information to identify patterns of relevant information.
The objective of this review is to provide an overview and systematic analysis of the current usage and accuracy of AI-based decision support systems in MDM.Specifically, the review will focus on studies that evaluate the consistency between AI-based decision support systems and MDM decisions.

Methods
This review was conducted according to the PRISMA guidelines for systematic reviews (12) and was registered with the International Prospective Register of Systematic Reviews (PROPSPERO ID: 411462).

Eligibility criteria
The studies considered for this review met the following criteria: • The studies verified the consistency of AI-based systems in MDM, regardless of cancer type.• The studies thoroughly compared the consistency of treatment regimens established by AI and MDM, specifically the correspondence between AI decisions and Possible workflow of AI supporting MDMs.An automated program using artificial intelligence (machine learning, natural language processing) runs in the background of the hospital information system and can extract relevant data for MDM from the system.Afterwards, the tumor board protocol can be automatically prepared and filled out with all relevant patient data in preparation for the MDM.At the same time, the program could provide treatment suggestions based on the available data and support these with existing guidelines or studies.Based on this, the physicians in the MDM can then make the therapy decision.In the end, both physicians and patients could benefit.Created with BioRender.com.
those made by a multidisciplinary team or using established standards such as guidelines.• Only studies with adult patients aged 18 and above were included.• The studies were available in full text and written in English.
• Only retrospective and prospective studies were considered.

Exclusion criteria
• The study does not fulfill the inclusion criteria.
• The article is a systematic review or meta-analysis.

Literature search methodology
The present review was conducted according to the PRISMA guideline for systematic reviews (Figure 2 For the search terms "Watson for Oncology" and "IBM Watson for Oncology", the search was limited to literature from 2015 onwards, because commercial use of Watson for Oncology began in 2015 ( 14).For all other search terms, no time limit was set.In total, 4078 records were identified through database searching.Preliminary screening of titles, abstracts and duplicates yielded 139 articles.The aim of the paper was to include studies that focused on CDSS and then review the concordance.However, the very general selection of search terms resulted in in a large list of papers that deal with AI in oncology but did not cover any CDSS.Indeed, this was recognizable in most cases by title and abstract.Note, a decent amount of duplicates have been removed as well (n = 823).After the initial selection process, the articles were read in full and care was taken to review both treatment recommendations and concordance between CDSS and MDM.Excluded were articles that compared CDSS to a guideline, investigated how CDSS influences the actions of MDMs, articles that investigated the acceptance of CDSS by physicians or patients, or articles in which CDSS provided a prognosis or could decide on possible inclusion in a trial.Metaanalyses or reviews on the topic were also excluded.Finally, after independent assessment of full text articles by two different researchers (RO, SN), 31 articles were included.No separate checks on study quality like patient selection or study population were done.Studies were included when they had performed an analyzation of concordance rate between MDM and CDSS.If there was disagreement on this, an additional independent arbitrator (FK) was consulted for further resolution.

Statistical analysis
Review Manager (RevMan) 5.4.1 (The Cochrane Collaboration, 2020) software was utilized to conduct a comprehensive analysis of the extracted data.To enhance the clarity and ease of interpretation of the results, forest plots were generated.The primary objective was to assess the level of agreement between treatment decisions made by WFO and MDT for various cancer types.The data was analyzed dichotomously, and odds ratios (ORs) with corresponding 95% confidence intervals were calculated for each variable (stage, histology type, etc.) Heterogeneity among the studies was evaluated using the I 2 test.I 2 > 50% indicated considerable heterogeneity, whereas no heterogeneity was present in the absence of these conditions.P < 0.05 was considered significant.If the data provided could not be meta-analyzed, only descriptive analysis was done.Because not all studies could be included in the meta-analysis due to the unavailability data, an additional descriptive analysis was performed.Flow diagram of the study selection process.This figure was designed according to the PRISMA-Statement (13).

Study characteristics
Most of the studies which matched the review criteria used Watson for Oncology (WFO).Twenty-three of the 31 studies were on WFO and concordance (Table 1).Other Clinical Decision Support Systems (CDSSs) included were OncoDoc (15), Lung Cancer Assistant (LCA) (17) and the Multidisciplinary meeting Assistant or Treatment sElector (MATE) (16).Two studies using a decision tree model based on Dutch guidelines were included (41).In addition to the CDSSs mentioned above, there were two prototype decision tree models created by the working group Andrew et al. and Lin et al. that conducted a concordance study (18,42).
Across all studies, a total of 16,472 participants were included.The number of included subjects varied greatly within the included studies.Five studies had a very small number of cases (< 100) (20,24,25,27,40) while the other studies had a relatively large number of included cases(> 1000) (16-18, 36, 43).Three studies examined multiple tumor entities, and included only a small number of participants in the subgroups (21,38,44).
Of the analyses evaluating the concordance rate between therapy decisions and CDSS, the majority were retrospective.Only three analyses were prospective (15,16,44).

Clinical decision support systems
As seen in Table 1, there are several AI-based CDSSs used regularly in clinical oncology.The most common is Watson for Oncology (WFO); its use is widespread in the US and in Asia.Other TABLE 1 Overview of studies on decision support systems using artificial intelligence for tumor diseases; n refers to the actual number analyzed.

Team
Year Journal n Tumor Entity Decision instance that is being compared to AI

Watson for oncology
WFO is an AI CDSS developed by IBM Corporation (USA) in cooperation with oncologists from Memorial Sloan Kettering Cancer Center (USA) (46).For supported cases, the treatment recommendations provided by WFO fall into three possible categories: ´Recommended´, ´For consideration´and ´Not recommended´(14).

OncoDoc
OncoDoc is a CDSS based on clinical practice guidelines (CPGs) that allows physician discretion in the decision-making process.CPGs are organized in decision trees.Decision parameters are dynamically instantiated by the physicians.It was developed in collaboration with the medical oncology department of the Pitie-Salpetrière Hospital (France) and has first been applied to the treatment of breast cancer (47).

Lung cancer assistant
LCA is a CDSS prototype designed in the United Kingdom.Probabilistic and guideline rule-based decision support are used to aid clinicians' decision-making in lung cancer MDMs (17).

Oncoguide
Oncoguide is an open access, interactive decision support software developed in the Netherlands with the help of a multidisciplinary team.The Dutch CPGs for colorectal cancer were converted into decision trees and then validated with patient data.Supporting information from the CPGs, such as scientific evidence for specific treatment decisions, are presented with the recommendations (41, 44).

MATE
MATE (Multidisciplinary meeting Assistant and Treatment sElector) is a CDSS developed in the United Kingdom and used in breast cancer MDMs.It requires manual input of patient data by a physician, assesses patient eligibility for clinical trials and presents ranked recommendations together with supporting evidence (16).

Results of meta-analysis and concordance rate
First, we conducted an overall meta-analysis of patients with different cancer stages (see Figure 3).In studies concerning WFO, treatment was deemed concordant if it was categorized as 'Recommended' or 'For consideration'.A total of 18 studies were included in the analysis.The results showed a concordance rate of 72.7% (1992/2739) for stages I-II and 73.4% (2289/3117) for stages III-IV across various carcinomas, although this difference was not statistically significant (P=0.18).However, the meta-analysis revealed significant statistical heterogeneity (I 2 = 88%) across different cancer stages.As a result, we conducted a subgroup meta-analysis to examine specific cancer types and stages.In the case of breast cancer, five studies were included in the analysis (see Figure 4), revealing a concordance rate of 72.8% (1209/1661) for stages I-II and 84.1% (557/662) for stages III-IV, P≤ 0.00001.
Breast cancer has been analyzed by various CDSSs, showing generally high concordance.In the study of Somashekhar et al., the overall concordance rate between WFO and MDM is near 93% being at the ´Recommend´level 62% and the `For considerationĺ evel 31% (19).Across the different stages, the concordance is above 80% (19), which is the same in the study of Zhou N et al. (21) As for the other CDSSs, there is also a high concordance rate of 93,4%, 93,2% and 85,3% using OncoDoc2, MATE and decision clinical tree system based on Oncoguide respectively (15,16,44).McNamara et al. conducted a study to analyze the concordance of WFO with decisions made by oncology experts and its impact on decisions made by newcomers to oncology.In breast cancer, the overall concordance rate among experts was found to be 87.9%.Novice oncologists had a concordance rate of 75.5% without the use of WFO, which improved to 95.3% with WFO (27).
In a study by Zhao et al., concordance rates between MDM and WFO were found to be only 77% for the adjuvant treatment group and 27.5% for the metastatic group (34).Xu et al. conducted an interesting study on the influence of WFO on treatment decisions, which showed that treatment decisions changed in only 5% of cases after reviewing WFO recommended treatment options for patients (36).However, there were also studies on breast cancer with low concordance rates, such as a study by Suwanvecho et al., which found a concordance rate of only 59.3% (38).In a study by Pan et al., the overall concordance rate was only 69.4%.Interestingly, the concordance rate was worse in the adjuvant chemotherapy group, whereas in the neoadjuvant chemotherapy group, the overall concordance rate was 96.7% (43).
Studies evaluating the use of WFO in patients with colorectal carcinoma have shown highly variable results.Some studies, FIGURE 5 Overall concordance in colorectal cancer in stages I-II and III-IV.Overall concordance in breast cancer in stages I-II and III-IV.(25,39).Additionally, two studies that did not use WFO as a clinical decision support system also reported good overall concordance rates above 80% (41,44).
Several studies have been conducted on lung cancer and WFO.Kim et al. achieved a high concordance rate of 92.4% (35).Zhou et al. showed an overall concordance rate of 83%, 92% for SCLS and 80% for NSCLC (21).In contrast, Liu et al. reported only an overall concordance rate of 65.8%, but also achieved 83% for SCLS but only 61.1% for NSCLC (22).Two of the studies discussed in this paper were conducted just for NSCLC, You et al. recorded a high overall concordance rate of 85.16% compared to the other studies, and Yao et al. achieved 73.3%, which was higher than the work of Liu et al. (30,31) Sesen et al. used the LCA system, in which the rule-based decision support of the LCA guideline achieved an exact concordance rate of 0.57 with the recorded treatments.For the probabilistic LCA decision aid, the result was worse, with 0.27 and 0.76 for the exact and partial concordance rates, respectively.In this study, MDM was not performed, but patient treatment from the English National Lung Cancer Audit Database was compared with the LCA decision (17).
The overall concordance rate for gastric cancer was low at 54.5% by Tian et al. (32) In a study by Choi et al, concordance at the recommended level was also low at 41.5%, but higher at the recommendation level at 87.5%.For various stages and low ECOG scores, consensus was also low (24).
Two cervical cancer studies were found for this review.In both studies, overall agreement was below 75% with 64% and 72.8%, respectively (21,32).
Yu et al. showed an overall concordance of 73,6% for prostate cancer.Looking at the different stages there was a higher concordance for lower stages (33).Ebben et al. showed in there study a similar overall concordance (78,8%) but using a different CDSS (44).
For thyroid cancer the results are diverse.The study of Yun et al. showed only an overall recommendation of 48% (40) in contrast to 77% overall concordance shown in the study by Kim et al. (26).
For ovarian cancer Zhou et al. showed a concordance rate above 90% overall and for stages as well (21).
Andrew et al. did a study on a Machine-learning algorithm to predict multidisciplinary team treatment recommendations in the management of basal cell carcinoma (42).They stated that the choice of conventional treatment (surgical excision or radiotherapy) by the MDT could be reliably predicted based on the patient's age, tumor phenotype and lesion size.The algorithm reliably predicted the MDT decision outcome of 45.1% of nasal Basal cell cancer (42).
Zhang et al. conducted a study on hepatocellular carcinoma (HCC), where only surgically treated patients were included.The study aimed to compare the concordance between the decision made by WFO and the decision made by surgeons regarding the need for surgery, without comparing with MDM.The overall concordance rate was found to be 72%.In subgroup analyses, Overall concordance in cervical cancer in stages I-II and III-IV.

FIGURE 8
Overall concordance in lung cancer in stages I-II and III-IV.Overall concordance in gastric cancer in stages I-II and III-IV.

FIGURE 9
Overall concordance in different lung cancer types for SCLC and NSCLC.

FIGURE 10
Overall concordance in NSCLC for histopathology type for adenocarcinoma and squamous cell carcinoma.

Discussion
The objective of this review was to provide an overview and systematic analysis of the current research landscape, usage and accuracy of AI-based decision support systems in MDM.AI-based CDSS and MDM decisions have been evaluated according to consistency.

Limitation and disadvantages
While conducting a review on concordance, it was found that many studies from Asia were focused on the use of the WFO system.WFO was originally based on vast cancer treatment experience in North America and the National Comprehensive Cancer Network guidelines (14).Therefore, it is not surprising that there have been numerous studies on its concordance in other countries and this could affect the results and match rates.For example treatment recommendations for different types of cancer can differ significantly between countries, for example gastric cancer treatment in the US and Chinese population (48).Another example, while WFO recommends three immunotherapies, namely pembrolizumab, nivolumab, and atezolizumab, for metastatic NSCLC, these are not yet approved by the China Food and Drug Administration (CFDA) (21).Although WFO does not require all information, studies have shown that entering more data into the system could increase the concordance rate (20).It therefore also seems important to collect as much data from the population the system is used in.
When considering other CDSSs used, the studies available for analysis are limited, making it more challenging to draw conclusions about the consistency of treatment decisions compared to MDM.
Another significant issue is the variability in the definition of concordance.WFO overall concordance rate is often listed as Ŕecommended´and ´For consideration´with these two categories sometimes being reported separately.It is crucial to carefully examine how the overall match is evaluated, as a high overall agreement may not always translate to a high "recommendation" but may only be viewed "for consideration".
When treating cancer patients, an MDM is an integral part of treatment planning and approach (3,49).Studies have already shown that oncology patients benefit from a multidisciplinary approach to health care (50-52).Therefore, discussion in an MDM should be considered fundamental in treatment decisionmaking.Consequently, the decisions of the CDSS should only be compared with the decisions of the MDM.In some studies, however, only a comparison between decisions regarding the actual treatment of patients and the CDSS was made (26-28, 30,36,38,43).In part, even in some studies the CDSS decision was only compared to national guidance (17,40).Moreover, the decisions of an MDM or actual treatment are not always consistent with the guidelines (53).

Concordance analyses
This review highlights a range of different tumor types with particular focus on breast, lung and colorectal cancer.These cancers are among the most frequently diagnosed worldwide (2), so it is understandable that more studies have been conducted on these types.The number of studies conducted for each cancer type allows for reasonable conclusions to be drawn about the agreement rate between the CDSS and MDM.However, for other tumor types, such as HCC, thyroid, prostate, cervical, ovarian and basal cell cancer only a few studies have been conducted, making it difficult to draw definitive conclusions.Among the various tumor types, breast cancer studies are the most consistent, with high agreement rates observed across different CDSS.These studies also tend to involve larger sample sizes, with most studies including more than 1000 patients compared to studies on other tumor types (16-18, 36, 43).
The review demonstrates a wide range of concordance rates across different studies, with some studies showing rates above 90% (19, 35,39), while others are below 60% (29,40).Therefore, it is crucial to only use a CDSS in clinical practice when there is a high concordance rate to ensure high confidence in decision-making.Breast cancer studies have shown the highest overall concordance rate, exceeding 90% in some studies (15,16,19) but still showing a wide range with even reported concordance rates below 60% (38).The concordance rates for gastric, thyroid and basal cell cancer are consistently the lowest.Regarding the agreement rates for individual stages, there is no general statement as it varies between studies and tumor types.The meta-analysis for different carcinomas showed no significant difference between stage I-II and stage III-IV (Figure 3).For breast cancer, however, there was a significant difference, so the concordance rate was higher at advanced stages.For colorectal carcinoma, the studies that performed a staging analysis also showed low concordant rates.Thus, it is important to note that some studies showed a high overall concordance rate when no differential stage analysis was performed.
However, a lower ECOG score seems to be associated with a higher concordance in the results.Furthermore, in studies comparing the treatment recommendation for NSCLC and SCLS, SCLC shows a higher concordance rate than NSCLC.In NSCLC, adenocarcinoma has a higher concordance rate than Squamous cell carcinoma.

Comparison to work done in this field
Jie et al. ( 14) published a meta-analysis on the application of WFO in 2021.However, only studies on WFO were included here (n = 9).Since then, multiple studies on AI in MDM have been published.The main purpose of the review by Jie et al. was to analyze the concordance rate between MDM and CDSS which was similar to our review.In comparison to our study only WFO was analyzed and less studies were included.One important difference was the concordance rate between the stages.The study by Jie et al.
showed a higher agreement for lower stages, but without statistical significance, and a slightly higher overall agreement in comparison to our study.In our study, there was no significant difference in this regard, except for breast cancer.However, the subdivision was different.Thus, in contrast to us, Jie et al. subdivided into stages I-III and IV.Gastric cancer also showed the lowest agreement rates in Jie et al.A low ECOC also seemed to have been associated with a higher agreement rate in Jie et al.They also showed a higher consistency of SCLC compared to NSCLC, which was similar to our study.

Future perspective
In the near future, CDSS could be used in daily clinical routine.However, it is necessary to train the various systems based on large patient data sets.Moreover, verification of the accuracy of these data must take place on large patient collectives.The highest medical evidence is desirable and can be reached by conducting multicenter studies.This is certainly a major obstacle, since many hospitals use their own hospital information systems, making it more difficult to develop systems that can be used between different hospitals.Should these systems prove to be highly accurate, then the use of CDSS in MDM can bring both a time saving and a qualitative gain.However, complete decision-making power by a CDSS should not be granted yet due to the importance and complexity of the decisions made during MDMs.However, it is conceivable that decision proposals are made by the CDSS and that the medical staff only has to approve them.Furthermore, the system should also recognize and indicate complex or individual cases and serve the latest scientific studies for the cases.Lastly, the automatic preparation of MDM cases is also a conceivable support for the medical staff.

Conclusion
This review and meta-analysis provides a basic overview of previous work in the field of AI and MDM.In particular, concordance rate between CDSSS and MDM was assessed and compared.WFO is certainly the most widely used system, especially in the USA and Asia.Therefore, there are currently the most studies and data on this system.The use of WFO already allows some conclusions to be made, while the results are very heterogeneous.Some tumors show higher concordance rates than others.For instance, breast and lung cancer exhibit higher concordance rates than gastric cancer when using CDSS, yet WFO does not appear to be utilized in Europe.However, promising alternatives such as OncoDoc2 and Oncoguide exist in this region.AI holds the potential to revolutionize hospital workflows and enhance diagnostics and therapies for patients.However, to fully realize these benefits, it is crucial to conduct further studies on the concordance between CDSS and MDM decisions.This systematic review provides a comprehensive overview of the current state of research and indicates that the use of CDSS in clinical practice is feasible, but additional research is required to fully evaluate its potential impact.
) (13).The literature research on Pubmed (MEDLINE) was carried out until November 2022 using MeSH keyword search.The search terms were the following: (machine learning) AND (tumor board); (machine learning) AND (multidisciplinary team meetings); (machine learning) AND (multidisciplinary cancer teams); (artificial intelligence) AND (multidisciplinary cancer teams); (artificial intelligence) AND (multidisciplinary team meetings); (artificial intelligence) AND (tumor board); IBM Watson for Oncology; (machine learning) AND (multidisciplinary team); Watson for Oncology; (artificial intelligence) AND (multidisciplinary team); (clinical decision support system) AND (multidisciplinary team me e t i n g s ) ; ( c l i n i c a l de c i s i o n s u p p o r t s y s t e m) A N D (multidisciplinary team); (clinical decision support system) AND (multidisciplinary cancer teams); (clinical decision support system) AND (tumor board).

FIGURE 3
FIGURE 3Overall concordance of various cancers in stages I-II and III-IV.

TABLE 1 Continued
Zhou et al., Lee et al., and Mao et al., reported low overall concordance rates of 64%, 48.9%, and 66.9%, respectively, for colorectal cancer (21, 23, 37).However, other studies, such as Kim et al. and Aikemu et al., reported good agreement rates with overall concordance of 87% and 91%, respectively, for colorectal cancer including

TABLE 2
Concordance rate between AI system and MDM; Not every subgroup analysis has been included in the table.

TABLE 2 Continued
5% of patients could be reliably predicted to be triaged to Mohs micrographic surgery (MMS), based on tumor location and age -choice of conventional treatment (surgical excision or radiotherapy) by the MDT could be reliably predicted based on the patient's age, tumor phenotype and lesion size -the algorithm reliably predicted the MDT decision outcome of 45.1% of nasal Basal cell cancer