Economic evaluations of artificial intelligence-based healthcare interventions: a systematic literature review of best practices in their conduct and reporting

Objectives: Health economic evaluations (HEEs) help healthcare decision makers understand the value of new technologies. Artificial intelligence (AI) is increasingly being used in healthcare interventions. We sought to review the conduct and reporting of published HEEs for AI-based health interventions. Methods: We conducted a systematic literature review with a 15-month search window (April 2021 to June 2022) on 17th June 2022 to identify HEEs of AI health interventions and update a previous review. Records were identified from 3 databases (Medline, Embase, and Cochrane Central). Two reviewers screened papers against predefined study selection criteria. Data were extracted from included studies using prespecified data extraction tables. Included studies were quality assessed using the National Institute for Health and Care Excellence (NICE) checklist. Results were synthesized narratively. Results: A total of 21 studies were included. The most common type of AI intervention was automated image analysis (9/21, 43%) mainly used for screening or diagnosis in general medicine and oncology. Nearly all were cost-utility (10/21, 48%) or cost-effectiveness analyses (8/21, 38%) that took a healthcare system or payer perspective. Decision-analytic models were used in 16/21 (76%) studies, mostly Markov models and decision trees. Three (3/16, 19%) used a short-term decision tree followed by a longer-term Markov component. Thirteen studies (13/21, 62%) reported the AI intervention to be cost effective or dominant. Limitations tended to result from the input data, authorship conflicts of interest, and a lack of transparent reporting, especially regarding the AI nature of the intervention. Conclusion: Published HEEs of AI-based health interventions are rapidly increasing in number. Despite the potentially innovative nature of AI, most have used traditional methods like Markov models or decision trees. Most attempted to assess the impact on quality of life to present the cost per QALY gained. However, studies have not been comprehensively reported. Specific reporting standards for the economic evaluation of AI interventions would help improve transparency and promote their usefulness for decision making. This is fundamental for reimbursement decisions, which in turn will generate the necessary data to develop flexible models better suited to capturing the potentially dynamic nature of AI interventions.


Introduction
The use of artificial intelligence (AI) has significantly grown in the healthcare sector.Exploiting its ability to streamline tasks, provide real-time analytics, and process larger quantities of data has contributed to its increased prominence (Panch et al., 2018).Additionally, it may have the potential to deliver quality care at lower costs.AI is being used to address challenges ranging from staff shortages to ageing populations and rising costs (Dall et al., 2013).The number of AI technologies approved by the US Food and Drink Administration (FDA) was nearly 350 between 2016 and mid-2021, compared to less than 30 in the preceding 19 years (Miller, 2021).
Several systematic reviews have been published that examine health economic evaluations (HEEs) for AI in healthcare.The most recent is Voets et al. (1 April 2021) (Voets et al., 2022), who searched for publications from 5 years prior and included 20 full texts, discussing the methods, reporting quality and challenges.They found that automated medical image analysis was the most common type of AI technology, just under half of studies reported a model-based HEE, and the reporting quality was moderate.Overall, Voets et al. concluded that HEEs of AI in healthcare often focus on costs rather than health impact, and insight into benefits is lagging behind the technological developments of AI.
An up-to-date representation of the economic evidence base may be insightful.Clearly, AI is a rapidly developing area in healthcare, demonstrated by the National Institute for Health and Care Excellence (NICE) recently incorporating AI technologies into its Evidence Standards Framework (Unsworth et al., 2021;National Institute for Health and Care Excellence, 2022).While some of this rise may be attributable to changes in legislation, it indicates the importance of AI in the current healthcare climate and the need to have a contemporary understanding of its economic value.Additionally, the COVID-19 pandemic has led to a rapid increase in the digitalization of data and health services including teleconsultations, online prescriptions and remote monitoring (Gunasekeran et al., 2021).Therefore, we sought to update the Voets et al. systematic review.We report updated results consistent with the original review, by disaggregating the HEEs into costs, clinical effectiveness, modelling characteristics and methodologies to understand common techniques, limitations, assumptions, and uncertainties.This update allows us to advance the discussion around whether existing modelling methods and reporting standards are suitable to appropriately assess the cost effectiveness of AI technologies compared to non-AI technologies in healthcare.
This review was undertaken to inform ongoing work within the HTx project.HTx is a Horizon 2020 project supported by the European Union lasting for 5 years from January 2019.The main aim of HTx is to create a framework for the Next-Generation Health Technology Assessment (HTA) to support patient-centred, societally oriented, real-time decision-making on access to and reimbursement for health technologies throughout Europe.
2 Data and methods

Literature search strategy
The search strategy included the period from 1 April 2021 to 17 June 2022, in order to update the original search conducted by Voets et al. (Voets et al., 2022).The original search used the PubMed and Scopus databases.For the present update, the original search strategy was translated for use in MEDLINE, EMBASE, via the Ovid platform, and Cochrane Central, via Wiley.These databases were preferred due to their accessibility, and searching all 3 was considered to provide comparable coverage to PubMed and Scopus (Ramlal et al., 2021).
The search strategy was simplified into 2 concept pathways: 1. "Artificial intelligence" and 2. "Health economic evaluations".The search queries in Supplementary Appendix SA show the strategies divided into their respective databases.Subsequent terms in the AI pathway included, "artificial intelligence", "machine learning", and "data driven".The second pathway included terms such as, "cost effectiveness", "health outcomes", "cost", "budget".An English language query was applied to the search strategy.The initial database selection and search strategies were guided by NICE information specialists.The review and search protocol were not registered.

Inclusion and exclusion criteria
Studies were included if they were a HEE of an AI intervention and a comparator, such as current standard of care or a non-AI intervention.This included trial-based economic evaluations and model-based studies.There were no exclusion criteria on types of economic evaluation, such that cost-effectiveness analyses (CEAs), cost-utility analyses (CUAs), cost-minimization analyses (CMA) and budget impact analyses (BIAs) were included.We term all of these as HEEs, which are defined as the "comparative analysis of alternative courses of action in terms of both their costs and consequences" (Rudmik and Drummond, 2013).CEAs evaluate whether an intervention provides relative value, in terms of cost and health outcomes, to a respective comparator.CUAs are a subset of CEAs where the health outcome includes a preference-based measure such as the Quality Adjusted Life Year (QALY).BIA studies evaluate the affordability of an intervention for payers to allocate resources.Included studies reported a quantitative health economic outcome such as costs, or costs in relation to effectiveness.For the exclusion criteria in the initial screening of titles and abstracts, studies that were not original research or systematic reviews such as commentaries, letters, and editorials were excluded.Overall, the inclusion and exclusion criteria were consistent with Voets et al. (Voets et al., 2022).
After duplicates were removed, 2 reviewers independently screened titles, and abstracts.The reviewers discussed any discrepancies, and where agreement could not be reached, an independent third reviewer was consulted.The same process was followed for subsequent full-text screening.

Data extraction
The data extraction was initially completed by 1 reviewer, and then validated by a second reviewer who independently extracted and compared data from the included studies.The extraction strategy was divided into three components, the first and second components included the characteristics and the methodological details of the studies.The former included aspects such as the purpose of the AI technology, medical field, funding, care pathway phase (prevention, diagnostics, monitoring, treatment) and the type of AI (i.e., pattern recognition, risk prediction, etc.).The second table of methodological details included aspects such as the type of HEE, the comparator, and the outcome measure.The third component was relevant only for model-based HEEs, extracting parameters such as model states, time horizon, and details of sensitivity analyses.

Data analysis
The extracted data were synthesised using a narrative approach as heterogeneity between studies inhibited the utility of a quantitative synthesis.Descriptive statistics were used to summarize the characteristics of the retrieved studies, where appropriate.

Quality assessment
The quality assessment of all included studies was conducted using the NICE quality appraisal checklist for economic evaluations (National Institute for Health and Care Excellence, 2012).This checklist has been adopted in the literature of economic evaluation reviews (Elvidge et al., 2022) and is used by NICE when assessing HEE evidence for all public health guidelines.Included studies with a decision-analytic model were quality assessed independently by 2 reviewers using the methodological checklist section of the quality appraisal checklist.The checklist has 11 individual questions to create an overall assessment of whether there are minor-, potentially serious-, or very serious limitations that affects the robustness of the results.Quality assessment was not used as part of the exclusion criteria, as one of the research aims was to explore the reporting standards.
Although it is not possible to fully remove the potential of bias due to the subjective nature of the assessment, pre-set criteria were created to minimize its effects.The criteria are as follows: studies with very serious limitations included studies that had significant modelling discrepancies that could materially change the cost-effectiveness conclusion (e.g., the intervention changing from dominant to dominated).Also, very serious limitations are derived from a financial conflict of interest, where the developer of the AI technology also funded the HEE.Potentially serious limitations refer to methodological uncertainties which may change the quantitative result (e.g., an increase in the cost-effectiveness ratio), however the outcome could stay the same (e.g., the increase is not meaningful).All other limitations were considered to be minor limitations.The reviewers discussed any discrepancies in their quality assessments, and if major disagreements emerged, an independent third reviewer was consulted.

Search results
The searches across the 3 databases yielded 4,475 records, resulting in 3,033 unique records following deduplication (Table 1).After screening titles and abstracts against the study selection criteria 2,993 were excluded due to not relating to a human health intervention, not reporting a HEE, not relating to an AI-based intervention, or being a excludable study type (e.g., commentary).Therefore, 40 studies proceeded to full-text screening.Of those, 16 were excluded based on the selection criteria, and 2 were excluded as duplicates that had already been included in the Voets et al. review (Voets et al., 2022).We excluded a further study due to unclear reporting about whether it was a primary analysis or a review of other economic models.Therefore, 21 studies remained which were suitable for data extraction.See Figure 1 for the PRISMA flowchart showing the inclusion and exclusion stages.

Modelling characteristics
Of the 21 HEEs, 16 (16/21, 76%) included a decision analytic model.The modelling characteristics of these are summarized in Table 4.The most common model types were Markov models (6/16, 38%) and decision trees (4/16, 25%) with 3 (3/16, 19%) using a short-term decision tree followed by a longer-term Markov component.Of the remaining 3 studies, there was 1 cost simulation, 1 Markov chain Monte Carlo simulation, and 1 hybrid decision tree and microsimulation model.Authors typically justified their chosen model type by linking the decision to the type of AI intervention, the outcome measure, and the time horizon.Most Markov models used a cycle length of 1 year, and the rest used 1 month or 1 day.Studies that used decision tree models stated their primary reason for doing so was for their simplicity.
In terms of results, 7 (7/21, 33%) HEEs reported the AI intervention was cost effective versus the comparator relative to an appropriate threshold value, 5 (5/21, 24%) demonstrated that the AI intervention was dominant, and 2 (2/21, 10%) demonstrated equivalence.In 1 (1/21, 5%) study the AI intervention was cost effective versus one comparator and dominant versus the other.In 2 (2/21, 10%) studies the AI interventions produced savings.Three (3/ 21, 14%) studies did not state a preferred cost-effectiveness threshold to determine if the result was cost effective.The AI intervention was found to be cost ineffective in 1 (1/21, 5%) study.

Quality assessment
A summary of the results from the quality appraisal checklist is shown in Table 5.The assessment resulted in 6 (6/21, 29%) studies with very serious limitations, 11 (11/21, 52%) with potentially serious limitations, and 4 (4/21, 19%) with minor limitations.Initially the two reviewers disagreed on the assessment for two of the studies (Ericson et al., 2022;Mital and Nguyen, 2022).Both were upgraded for the reasons given below.Studies deemed to have very serious limitations were those where an issue in 1 or more quality criteria were highly likely to materially change the cost-effectiveness conclusion for the AI intervention.There were several key reasons which led to this assessment for 5 of the included studies.In one there was an acknowledged overestimation of cost data, representation issues between the dataset and target population, and a short 6-month horizon rather than the 12-month time horizon deemed best practice by the American College of Radiology (Rosenthal and Dudley, 2007).In another, adverse health effects were not captured, which the authors suggested would increase the costeffectiveness estimate (Fusfeld et al., 2022).This study also had a financial conflict of interest where research was funded by the company which developed the AI intervention.This was true for another 2 studies (Ericson et al., 2022;Szymanski et al., 2022).In another study, the result changed from intervention dominant to cost ineffective when input data, arising from multiple sources and assumption, were varied during the sensitivity analyses (Ziegelmayer et al., 2022).
Studies with potentially serious limitations tended to have a paucity of appropriate input data.Instead, alternative sources, or multiple sources were used with resulting generalizability issues.It was common for studies to have assumptions for the cost and effectiveness of the AI intervention, compliance, and the impact of the AI intervention on the subsequent treatment pathway.Examples of this are 1 study that assumed all patients would consent to a test (Mallow and Belk, 2021); 1 study that used a primary outcome that was patient reported (Delgadillo et al., 2022) and 1 study that assumed the effectiveness of the AI intervention last for 10 years, despite having data for only 5 years (Mital and Nguyen, 2022).These studies did account for the key uncertainties in sensitivity analyses and the effect was either minor or the initial assumptions were shown to be robust.Some studies were assessed as having potentially serious limitations due to unclear reporting, which reduced transparency around key information such as whether a cost had been applied for the AI intervention, how it would integrate with clinical care, and who the anticipated user of the AI intervention was.

Discussion
This paper systematically reviewed 21 HEEs of AI interventions.The studies mainly evaluated AI-based automated image analysis interventions for diagnosis and screening in general medicine, oncology and ophthalmology.Nearly all were CUAs and CEAs that took a healthcare system or payer perspective, and most had lifetime time horizons.Some of the HEEs were trial-based analyses, but the large majority were model-based which mostly used Markov models.In terms of the HEE results, the AI interventions were cost effective or dominant in just over half and all the studies performed sensitivity analyses.
This study reports an updated search to the review conducted by Voets et al. (Voets et al., 2022), providing a contemporary snapshot of the HEE evidence base for AI health technologies Our update captures an additional 15-month period in a time where AI health based technologies are on the exponential rise, evidenced by the near quadruple number of initial unique search results since April 2021 (Voets et al., 2022).It appears there has been no change in the most commonly evaluated purpose of AI being used as a healthcare intervention, as Voets et al. also found the most common to be automated image analysis (Voets et al., 2022).Ophthalmology and screening were the dominant specialty and phase of the care pathway at which the AI intervention was used, and these were also prevalent in this updated review.The prevailing type of HEE in the original review was cost minimization with the preferred outcome measure of cost saved per case identified.This was common among our included studies, although we termed it CEA, but CUA was the most common study type in this update.There was a difference between the two reviews in how many of the technologies were found to be cost saving.Voets et al. found the majority were whilst this was true for only 2 studies in this review.This could be due to differences in applying the terms 'cost-saving' and 'cost-effective' as a large proportion of studies in this updated review were cost-effective.
Another difference was the fact that the large majority of HEEs in our review were model-based, compared to 45% of those in Voets (Voets et al., 2022).This could suggest a shift towards using models to estimate future costs and benefits of AI technologies, permitting longer time horizons than trial-based evaluations (the most common time horizon is our review was lifetime, compared to 1 year in Voets).Furthermore, the increasing use of model-based evaluations may suggest AI interventions are moving towards traditional value assessment frameworks that are commonplace in the health technology assessment of medicines.This increase in model-based technologies may also explain the differences in results regarding cost saving versus cost effective.Perhaps it is easier or more expected to generate cost-effectiveness estimates when using a model compared to non-model HEEs where it may be more common to focus on costs.Voets et al. (2022) found that the evidence supporting the chosen analytical methods, assessment of uncertainty, and model structures was underreported.Our quality assessment determined that most studies had potentially serious limitations tending to arise from the sources and assumptions regarding the input data.These findings are consistent, which suggests that despite an increase in the use of more sophisticated economic evaluation techniques, the evidence supporting them remains limited.In some cases, the uncertainty and lack of clarity for the reader were due to the reporting of the HEE rather than the data quality.In numerous studies it was hard to determine fundamentals such as whether a cost had been applied for the AI intervention, how it would integrate with clinical care and who the anticipated user of the AI intervention was.As mentioned, not all of the studies we identified clearly stated how the AI intervention would integrate with clinical care.Studies did not typically thoroughly or transparently estimate subsequent care

Study Notable limitations identified Assessment
Adams et al. (2021) Strict assumptions regarding underlying parameters, such as an overestimation of costs, which directly determine the intervention outcome.The 6-month time horizon was short of 12 months deemed best practice by the American College of Radiology, also potentially impacting cost-effectiveness.Finally, the dataset used was not representative of the target populations, notably "overrepresenting white persons and underrepresenting racial minorities" Very serious limitations Areia et al. (2022) Misrepresentation of population data from clinical trials to clinical practice.The overall death rate modelled was lower than the actual.Assumption of compliance of tests and the linear relationship between cancer prevention effect and increased ADR were made, however impact on cost-effectiveness is not severe Potentially serious limitations de Vos et al. ( 2022) Short time horizon due to literature available for input parameters.Made assumptions from non-Dutch sources which was controlled for with sensitivity analysis, but limits generalisability of results

Delgadillo et al. (2022)
There were weaknesses regarding the internal validity.The primary outcome was patient reported, and used a general measure rather than disorder specific measures.The data estimates for the baseline (SOLVD) probabilities and effects were based on a study published 30 years ago from the last RCT.The model was calibrated to use a prespecified threshold which was not varied in the sensitivity analyses.There is also a conflict of interest where the research was funded by the organization which developed the AI technology Very serious limitations Turino et al. (2021) Patients with severe chronic pathologies were excluded which could limit the generalizability of results and the follow-up period is relatively short.The study collected EQ-5D data but did not report utility data

Potentially serious limitations
van Leeuwen et al. (2021) Model relied on two key inputs that were assumptions: percentage of missed LVOs in practice, and the capability of the AI to reduce missed LVOs.These were both varied in the sensitivity analyses and result did not change.The model only included early presenters but IAT would also include late presenters which limits generalizability.The authors also assumed that false positives would be neutralized by the reader and would not lead to unnecessary care Minor limitations (Continued on following page) Frontiers in Pharmacology frontiersin.org11 Vithlani et al. 10.3389/fphar.2023.1220950and downstream health outcomes resulting from the use of an AI intervention.Our findings from this literature review suggest this is an area that needs to be better considered and reported.AI-based interventions have the potential to be distinct from traditional medical interventions if they can learn (from data) over time.Theoretically, this means the relationship between the intervention and outcome may not be fixed; an AI intervention could get more effective over time, unlike the typical effect waning assumption associated with medicines.This has implications when considering future benefits and how to extrapolate this over the time horizon of the HEE.The prevailing model structures used in HEEs of AI interventions to date-Markov models, decision trees, and hybrids of the 2-may limit the extent to which studies have been able to capture and examine the dynamic nature of AI interventions.Therefore, there is the possibility that the existing HEE evidence base has not captured the true potential value of many AI interventions due to limitations imposed by their model structures, and only a third of our included studies explored the impact of structural uncertainty in sensitivity analysis.Furthermore, traditional, 'simple' models may not facilitate easy modelling of downstream costs and benefits, by quickly becoming slow or unwieldy.This, potentially, fails to show the full benefit of the AI intervention, inhibiting implementation.Guo et al. (Guo et al., 2020) acknowledge this through a paradox of "no evidence, no implementation-no implementation, no evidence".More sophisticated types of model, that are less restricted by the structural limitations that affect simple decision tree and Markov models may be better placed to capture full pathway effects in addition to potential timedependent effectiveness of AI-based interventions.
Simulation-based modelling presents the opportunity to build flexible, sophisticated models that can overcome several limitations of Markov models and decision trees.They can easily incorporate the history of past events, model factors that can vary between patients and have a non-linear relationship with outcomes, and do not use discrete time intervals (Davis et al., 2014).They can also track the path of each person over time and estimate individual-level effects or mean group-level effects for a population (Davis et al., 2014).These possibilities may lead to models capable of addressing the potential dynamic nature of AI interventions learning over time and the impact on linked decision points and subsequent care in a clinical pathway.As data on AI-based interventions continues to be collected and reported, the ability to develop these models should improve.One thing to note, however, is that for these models to underpin reimbursement decisions HTA agencies would need to be able to critique and utilize them.This may require new skills, knowledge and experience and present other challenges.Utilizing these sorts of models also leads to the debate of whether HTA should be more 'living'.This refers to regular and scheduled updates of recommendations instead of the more traditional 'one-off' decisions.Living HTA presents opportunities as well as challenges (Thokala et al., 2023) and is not yet common practice.
The usefulness of a published HEE for decision making depends on how well it is conducted and reported.Reporting guidelines play an important role in improving transparency and completeness and as new technologies emerge, can help drive best practice.A prominent reporting standard within the field of HEEs is the Consolidated Health Economic Evaluation Reporting Standards (CHEERS) (Husereau et al., 2022).This outlines minimum reporting standards and was recently updated in 2022.It includes a 28-item checklist covering methodological approach, data identification, model inputs, assumptions, uncertainty analysis, and conflicts of interest.It does not include any reporting items that are specific to any AI components of the intervention, but the authors did recognize that CHEERS could be more specific for certain situations and welcomed opportunities to create additional reporting guidance.An extension to CHEERS covering AI specific items could improve the reporting, transparency and ultimately decision making for AI interventions.This could also help mitigate the paradox of poor reporting inhibiting adoption of AI interventions.
The system-wide need and motivation for improving best practice around data collection and transparency for AI health interventions is evident.Extensions for AI technologies have already been developed for other checklists.CONSORT-AI (Liu et al., 2020) contains AI-specific items for the reporting of RCTs, and it was done in collaboration with the SPIRIT-AI extension for trial protocols (Rivera et al., 2020).Including AI-specific items in the reporting of HEEs may be a logical step to contribute to this standard setting and help to ensure that all relevant information is available to decision makers.

Limitations
This study has some limitations.We updated the Voets et al. systematic literature review, but searched different databases.It is possible there may have been relevant studies within our search window that we missed by not searching the same databases; however, we believe the databases we searched should give at least equivalent, and probably superior, sensitivity to the original review.Indeed, the sensitivity of our search strategy is

Study
Notable limitations identified Assessment Xiao et al. (2021) The predictive accuracy of the intervention came from the literature and may not be generalizable to the setting.Any varying of this was not reported.There was a lack of robust data on the efficacy of treatment that followed a positive screening result which was accounted for in the sensitivity analysis

Potentially serious limitations
Ziegelmayer et al. ( 2022) Input parameters came from multiple sources including assumptions and numerous published studies, leading to a degree of bias.Varying the specificity of the AI or CT and cost of AI greatly increased the ICER changing the result from intervention dominant to not cost-effective

Very serious limitations
Frontiers in Pharmacology frontiersin.org12 Vithlani et al. 10.3389/fphar.2023.1220950evidenced by the large number of studies excluded at primary screening (2,993) relative to the total number of unique records (3,033).The sensitivity of HEE search filters is well known (Hubbard et al., 2022).While this means our review is highly likely to have identified all relevant published studies, it does mean further updates may be labor intensive with lots of records to screen to identify a relatively small number of relevant studies.
Our review specifically focused on economic evaluations and whilst out of scope, some studies, such as those only reporting patient reported outcome measures, may have been of interest to readers.Additionally, a potential limitation is that our search only covered the period from 1 April 2021 to 17 June 2022.This relatively short search period remains informative due to the rapid advent of AI in healthcare, but it also means that it is likely that relevant economic evaluations have been published since our review.
Another limitation relates to the subjective nature of the NICE quality appraisal checklist.Although the checklist allowed for a further level of analysis regarding the quality of the economic evaluation, it should be used as a broad interpretation rather than a critique of any given study.Despite negating any potential bias by having 2 reviewers, it is possible that different reviewers may have implemented the checklist differently and produced different results.Additionally, other, similar checklists exist (Philips et al., 2004;Drummond, 2015;Adarkwah et al., 2016), and although they broadly serve a similar purpose of understanding the methodological limitations of HEEs, they may have resulted in different or more nuanced quality assessments.

Conclusion
This updated review, while covering just a 15-month window, found more economic evaluations of AI health interventions since the last comprehensive systematic literature review which covered the preceding 5 years.Many of the included studies were modelbased evaluations and the most common AI intervention was automated image analysis used for screening or diagnosis in the areas of general medicine and oncology.Most evaluations reported the cost per QALY gained.
Overall, the reporting of the studies exhibited limitations.Only a small number of studies were judged to have just minor limitations, according to application of the NICE quality assessment checklist.The majority had potentially serious or very serious limitations resulting from conflicts between research funding and authorship, uncertainty in input data changing the outcome of the evaluation, and lack of transparent reporting of key elements, such as the cost of the technology and how it will be implemented into clinical practice.Specific reporting standards for the economic evaluation of AI interventions would help to improve transparency, reproducibility and trust, and promote their usefulness for decision making.This is fundamental for implementation and coverage decisions which in turn will generate the necessary data to develop flexible models better suited to capture the potentially dynamic nature of the AI intervention.

TABLE 1
Database search results.
FIGURE 1 PRISMA flowchart describing study selection and reasons for exclusion during full-text screening.

TABLE 2
Characteristics of the included studies.

TABLE 2 (
Continued) Characteristics of the included studies.

TABLE 3 (
Continued) Health economic details of included studies.

TABLE 4 (
Continued) Summary of economic evaluation parameters and outcomes.
Huang et al. (2022)2)nts were white which has generalizability implications Limitations arise from patients who should have been included for Sepsis, not included.The model base case was purposely set to be conservative to not exaggerate the positive effects, however the assumptions made limits the validity of the outcomes.Finally, the research and funding were funded by the company who developed the intervention, creating potential for bias Very serious limitationsFusfeld et al. (2022)The model does not capture adverse events due to antirejection medication which they suspect MMDx would increase leading to uncertainty in the result.There is also a potential conflict of interest where the research was funded by the company which developed the AI technology Very serious limitationsHuang et al. (2022)Limited data available from study population led to values derived from other countries which were accounted for in sensitivity analysis.Data regarding sensitivity and specificity of the AI screening derived from one paper, but did not greatly affect cost effectiveness in the sensitivity analyses Trial-based analysis with small number of events and short follow up resulted in less precise treatment estimates.Study presence in the clinic may have modified health worker behaviour for standard of care.Alternative diagnoses to TB were not investigated Range of sources for input data which will lead to a degree of bias, although accounted for in sensitivity analyses.Lacked validity as in practice treatment decision would not be based on image analysis only

TABLE 5 (
Continued) Summary of quality assessment of included studies.