Diagnostic Utility of Non-invasive Tests for Inflammatory Bowel Disease: An Umbrella Review

Background This study aims to consolidate evidence from published systematic reviews and meta-analyses evaluating the diagnostic performances of non-invasive tests for inflammatory bowel disease (IBD) in various clinical conditions and age groups. Methods Two independent reviewers systematically identified and appraised systematic reviews and meta-analyses assessing the diagnostic utility of non-invasive tests for IBD. Each association was categorized as adults, children, and mixed population, based on the age ranges of patients included in the primary studies. We classified clinical scenarios into diagnosis, activity assessment, and predicting recurrence. Results In total, 106 assessments from 43 reviews were included, with 17 non-invasive tests. Fecal calprotectin (FC) and fecal lactoferrin (FL) were the most sensitive for distinguishing IBD from non-IBD. However, anti-neutrophil cytoplasmic antibodies (ANCA) and FL were the most specific for it. FC and FL were the most sensitive and specific tests, respectively, to distinguish IBD from irritable bowel syndrome (IBS). Anti-Saccharomyces cerevisiae antibodies (ASCA), IgA, were the best test to distinguish Crohn’s disease (CD) from ulcerative colitis (UC). Interferon-γ release assay was the best test to distinguish CD from intestinal tuberculosis (ITB). Ultrasound (US) and magnetic resonance enterography (MRE) were both sensitive and specific for disease activity, along with the high sensitivity of FC. Small intestine contrast ultrasonography (SICUS) had the highest sensitivity, and FC had the highest specificity for operative CD recurrence. Conclusion In this umbrella review, we summarized the diagnostic performance of non-invasive tests for IBD in various clinical conditions and age groups. Clinicians can use the suggested non-invasive test depending on the appropriate clinical situation in IBD patients.


INTRODUCTION
Inflammatory bowel diseases (IBD) [Crohn's disease (CD) and ulcerative colitis (UC)] are idiopathic disorders causing inflammation of the gastrointestinal tract. IBD is emerging as a globally important disease with increasing incidence. Although incidence has started to relatively stabilize in western countries, the disease burden remains high as prevalence surpasses 0.3% (1).
Gastrointestinal endoscopy has remained a reference standard but invasive test for the diagnosis, management, prognostics, and surveillance of IBD. However, endoscopy can be associated with considerable cost, risk, and burden to patients and healthcare systems, and it is the lowest acceptable tool for patients (2).
Accurate non-invasive tests such as biomarkers and radiological examinations would be ideal (3,4). Several promising non-invasive tests that could fulfill this role, including fecal calprotectin (FC) (5) and ultrasound (US) (6), have been studied. Despite many studies assessing the diagnostic performance of non-invasive tests for IBD, to the best of our knowledge, there has been no systematic effort to summarize and critically appraise this body of evidence. Therefore, we performed an umbrella review of meta-analyses, based on different clinical conditions (including diagnosis, activity assessment, and recurrence) and age groups (children, adults, and mixed population), to provide a comprehensive synopsis of the diagnostic performance and validity of reported non-invasive tests for IBD.

Search Strategy
Two reviewers (J-TS and Z-QW) independently searched PubMed, Embase, Web of Science, and Cochrane Library databases from inception to 16 April 2020. The search was limited to systematic reviews and meta-analyses without language restrictions. Supplementary Appendix 1 provides a detailed search strategy.

Study Selection and Data Extraction
Systematic reviews or meta-analyses meeting the following criteria were included: it described the conduct of the systematic review in adequate detail, and an attempt was made to identify all of the relevant primary studies in at least one database with provided search strategy and quality appraisal of the primary studies (7). Guidelines, narrative reviews, literature reviews, genetic studies, protocol, conference abstracts, and reviews assessing scoring indices were excluded.
Two reviewers (J-TS and Z-QW) independently carried out the study selection and data extraction from the eligible articles. Extracted data included author, year of publication, number of participants, number and type of studies included, appraisal instrument used, reference standard, outcomes assessed, heterogeneity, and study findings.

Quality Assessment
The methodological quality of included reviews was assessed independently by J-TS and Z-QW using the online AMSTAR 2 (A Measurement Tool to Assess Systematic Reviews) checklist (8). AMSTAR 2 is a validated and reliable quality measurement tool for systematic reviews, with 16 domains. Seven of these domains are considered critical. Shortcomings in any of the critical domains could affect the overall validity of a review. It results in an assessment of the methodologic quality as 1 of 2 grades: high, moderate, low, or critically low (9).

Identification of Age Groups
Based on the age ranges of primary studies included, associations can be categorized as adults, children, and mixed population. We defined children as under the age of 18 years (10). If a systematic review purporting to assess the accuracy in adults included people younger than 18 years, it would be classified as a mixed population. Supplementary Appendix 2 presents the process of identifying age groups.

Overlapping and Outdated Associations
Associations in two or more reviews overlapped if they evaluated the same test in the same clinical condition and same age group. Incorporating results of overlapping reviews could lead to double inclusion resulting in biased findings and estimates (11,12). In addition, up to 50% of published systematic reviews were considered out of date after 5.5 years. Therefore, we categorized overlapping systematic reviews as outdated (published before October 2015) and contemporary (published after October 2015).
For contemporary reviews found to have overlapping assessments, we generated a graphical cross-tabulation (citation matrix) of the overlapping reviews (in columns) and the included primary studies (in rows) (13). Corrected covered area (CCA) was a validated method to quantify the degree of overlap between two or more reviews. We used a citation matrix to calculate CCA. According to CCA, the overlap can be categorized as very high (CCA > 15%), high (CCA 11-15%), moderate (CCA 6-10%), or slight (CCA 0-5%) (14).
In all the systematic reviews that met the inclusion criteria, all non-overlapping reviews were included. A rigorous management tool was used for the overlapping reviews. Supplementary Appendix 3 shows the citation matrices for all overlapping studies. Supplementary Appendix 4 presents the management of overlapping reviews.

Data Synthesis
Systematic reviews that met the inclusion criteria formed the unit of analysis. Only data available from systematic reviews were presented. Results from systematic reviews were synthesized with a narrative synthesis, with a tabular presentation of findings and forest plots for studies that performed a meta-analysis. Summary tables describing review characteristics and findings were also presented.    • A focused or abbreviated search of primary studies using the key search terms from the search strategy of an existing review to identify newly published studies that met the inclusion criteria of the review. • The findings from newly published studies would change the conclusion or credibility of the review.
Supplementary Appendix 5 describes the search strategy used to identify newly published studies. YXZ and YHS initially screened the eligible newly published studies. Full-text screening and data extracting were accomplished by JTS and ZQW.
With findings from newly published studies, we relied on statistical methods using the bivariate model (16) to pool the data from newly published studies with the data from the original meta-analysis (17) (for meta-analyses) and discussion with senior authors (for reviews without meta-analyses) to determine whether a full update of the existing review was needed (18).
If an update was considered necessary, the original methods used in the conduct of the existing review were replicated. Supplementary Appendix 6 summarizes the evaluation process for considering reviews for updates adapted from Ahmadzai et al. (19).

Literature Search
The search retrieved 1,897 articles. After removing duplicates and screening titles and abstracts, 113 articles qualified for fulltext screening. Seven outdated reviews were further excluded. Finally, 46 reviews were included. Supplementary Appendix 7 summarizes the study selection process with accurate numbers of studies. Supplementary Appendix 8 provides the list of excluded studies with reasons for exclusion.

Overlapping and Non-overlapping Assessment
Seventeen reviews reported overlapping assessment (5, 6, 29, 32, 36, 37, 46, 49-52, 54, 58, 59, 61-63). Supplementary Appendix 10 describes the general characteristics of overlapping reviews, including the decision to retain or exclude an assessment. Supplementary Appendix 3 provides the citation matrices for overlapping reviews to assess the degree of overlap. Supplementary Appendix 11 lists forty-six reviews with non-overlapping assessments that were included and one contemporary review that was excluded because of overlap.

Study Characteristics of Reviews With Non-overlapping Assessments
Non-invasive tests for IBD assessed in the included reviews were FC, C-reactive protein (CRP), erythrocyte sedimentation rate (ESR), platelet count (PLT), hemoglobin (Hb), albumin (Alb), ASCA, anti-neutrophil cytoplasmic antibodies (ANCA), fecal lactoferrin (FL), US, computed tomography (CT), magnetic resonance imaging enterography (MRE), scintigraphy, autoantibodies-to-glycoprotein-2 (AntiGP2), interferon-γ release assays (IGRA), fecal immunochemical (FIT), microRNA, and S100A12. Of the 46 reviews included, 43 conducted metaanalyses. Supplementary Table 1 summarizes the general characteristics of the reviews and meta-analyses included in the umbrella review. Table 1 shows the diagnostic utility of non-invasive tests for IBD in different clinical scenarios and age groups. Tables 2, 3 show the diagnostic utility of non-invasive tests for CD and UC, respectively. The clinical scenarios include diagnosis (IBD vs. non-IBD), diagnosis (IBD vs. IBS), diagnosis (IBD vs.   Tables 2, 3 show the findings of meta-analyses and narrative synthesis from systematic reviews.

Children
For IBD, FC with a cutoff of 50 µg/g showed the highest AUC of 0.96 (21). The AUCs of other biomarkers [FC, CRP, ESR, PLT, Hb, and Alb (30)] ranged from 0.76 to 0.95. One review presented results of US from three primary studies: sensitivity range from 0.39 to 0.55 and specificity range from 0.90 to 1.00 (35).

Mixed Population
To differentiate CD from UC, the sensitivity of tests is generally low, including anti-GP2, ASCA (20,54). ASCA IgA showed the highest specificity of 0.955 (0.938-0.967) (20). To differentiate UC from CD, the only test included in our analysis was ANCA

Children
For IBD, US had great performance: Se, 0.876 (0.542-0.977); Sp, 1.00 (6). One review reported the diagnostic accuracy of TAUS, but showed that it remained inconclusive (Supplementary Table 3  recurrence presented with better sensitivity (26). Besides, MRE and other subtypes of US performed well in both sensitivity and specificity (50,51,57).

Reviews Eligible for Update
We searched for newly published studies for each moderate quality review (Supplementary Appendix 6). After screening, 8 reviews (20, 22, 26, 28-30, 32, 35) have eligible new published studies. However, after calculation, no reviews need to be updated. The overview of updating was presented in Supplementary Appendix 12.

DISCUSSION
Our detailed umbrella review synthesized existing systematic reviews and meta-analyses into one user-friendly document. A total of 106 associations, including 17 non-invasive tests, have been studied.

Main Findings
Evidence from the umbrella review suggests that FC (0.99) and FL (0.82) were the most sensitive markers for distinguishing IBD from non-IBD. Similarly, ANCA (0.971) and FL (0.95) were the most specific marker for this purpose. To distinguish IBD from IBS, the most sensitive one was FC (cutoff 50 µg/g, 0.97; cutoff 100 µg/g, 0.92) and the most specific marker was FL (0.94). To distinguish CD from UC, all tests had low sensitivity, with ASCA IgA (0.955) having the highest specificity. IGRA (Se, 0.828; Sp, 0.867) was the best test to distinguish CD from ITB. There is only one test to diagnose IBD from FGID and only one test to distinguish UC from CD, FC, and ANCA. As for assessing activity, US (Se, 0.864; Sp, 0.883) and MRE (Se, 0.83; Sp, 0.93) perform well. The sensitivity of FC (0.85) was also good. As for postoperative recurrence of CD, SICUS (0.99) had the highest sensitivity and FC (CR: 0.88) had the highest specificity. We concluded that biomarkers played a good role in diagnosis, while radiological examinations, especially MRE and US, were more prominent in assessing activity and predicting recurrence. Supplementary Table 4 presents the characteristic of diagnostic performance and clinical use of each test.

Strengths and Limitations
Compared with other studies summarizing non-invasive tests for IBD (65,66), our umbrella review provides the first systematic appraisal of the evidence using robust criteria. We used the AMSTAR 2 tool to assess the quality of reviews and used CCA to evaluate the degree of overlapping and report the highest quality and most current review. Besides, our umbrella review included both blood, stool biomarkers and radiological examinations. Furthermore, we rigorously classified the assessments into age groups based on the exact age range of the primary studies included and into several groups to discuss the diagnostic performance in a different clinical condition more rigorously and reasonably.
Several limitations are present in this review. Lack of data, including missing meta-data, hindered the reporting of some elements of the umbrella review and lack of reviews of children or adults alone. In addition, one review (20) could not undergo the normal updating process because it did not report the included studies of each assessment. Besides, some reviews were rated as low quality for the most common reason: lack of protocol. However, registering protocol has been rare, especially in the IBD field. What's more, since most articles do not report the value of AUC, we can't do a good comparison and analysis of AUC.

Implications for Practice and Future Research
This comprehensive umbrella review could help clinicians make better decisions about the appropriate tests prior to endoscopy. In terms of diagnosis, we suggested that in patients with symptoms suggestive of IBD in whom the clinician considers endoscopy, FC could be a sensitive test for safely excluding IBD. For patients with a negative result, we recommend that they continue to be monitored rather than do endoscopic examination immediately, unless it is very urgent. In patients with a positive result, FL is a good choice because of their low false-positive rate and consequent reduction of unnecessary endoscopies if patients are willing to have a stool test; if not, ANCA is an alternative. Clinicians can use our results to select a specific marker based on the practical situation. If both tests are positive, the patient is highly likely to have IBD. Endoscopic examination can be followed to confirm the diagnosis and disease classification. Radiation examinations, especially US and MRE, performed well in the activity assessment and predicting relapse. For patients with CD, we recommend having FC or US tests regularly to monitor the disease activity. Specifically, US or MRE is recommended for patients requiring postoperative recurrence monitoring. For patients with UC, MRE is the best choice to assess activity and predict relapse.
Our results show that there are not many reviews for children, especially in activity assessment and predicting recurrence. However, the use of endoscopy, invasive and requiring general anesthesia, can lead to child disobedience and disapproval of parents. An attitude of "wait and see" may cause unnecessary concerns and loss of wellbeing in children with IBD. Therefore, high-quality prospective studies on non-invasive testing in children should be complemented.

CONCLUSION
In summary, this umbrella review summarized the diagnostic performance of non-invasive tests for IBD in different clinical conditions and age groups and offered our suggestions on how to use the non-invasive tests appropriately. Researchers and clinicians could choose a suitable test based on our results. Further studies on non-invasive tests in children are needed.

DATA AVAILABILITY STATEMENT
The original contributions presented in this study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.