Computed Tomography Severity Index vs. Other Indices in the Prediction of Severity and Mortality in Acute Pancreatitis: A Predictive Accuracy Meta-analysis

Background: The management of the moderate and severe forms of acute pancreatitis (AP) with necrosis and multiorgan failure remains a challenge. To predict the severity and mortality of AP multiple clinical, laboratory-, and imaging-based scoring systems are available. Aim: To investigate, if the computed tomography severity index (CTSI) can predict the outcomes of AP better than other scoring systems. Methods: A systematic search was performed in three databases: Pubmed, Embase, and the Cochrane Library. Eligible records provided data from consecutive AP cases and used CTSI or modified CTSI (mCTSI) alone or in combination with other prognostic scores [Ranson, bedside index of severity in acute pancreatitis (BISAP), Acute Physiology, and Chronic Health Examination II (APACHE II), C-reactive protein (CRP)] for the evaluation of severity or mortality of AP. Area under the curves (AUCs) with 95% confidence intervals (CIs) were calculated and aggregated with STATA 14 software using the metandi module. Results: Altogether, 30 studies were included in our meta-analysis, which contained the data of 5,988 AP cases. The pooled AUC for the prediction of mortality was 0.79 (CI 0.73–0.86) for CTSI; 0.87 (CI 0.83–0.90) for BISAP; 0.80 (CI 0.72–0.89) for mCTSI; 0.73 (CI 0.66–0.81) for CRP level; 0.87 (CI 0.81–0.92) for the Ranson score; and 0.91 (CI 0.88–0.93) for the APACHE II score. The APACHE II scoring system had significantly higher predictive value for mortality than CTSI and CRP (p = 0.001 and p < 0.001, respectively), while the predictive value of CTSI was not statistically different from that of BISAP, mCTSI, CRP, or Ranson criteria. The AUC for the prediction of severity of AP were 0.80 (CI 0.76–0.85) for CTSI; 0.79, (CI 0.72–0.86) for BISAP; 0.83 (CI 0.75–0.91) for mCTSI; 0.73 (CI 0.64–0.83) for CRP level; 0.81 (CI 0.75–0.87) for Ranson score and 0.80 (CI 0.77–0.83) for APACHE II score. Regarding severity, all tools performed equally. Conclusion: Though APACHE II is the most accurate predictor of mortality, CTSI is a good predictor of both mortality and AP severity. When the CT scan has been performed, CTSI is an easily calculable and informative tool, which should be used more often in routine clinical practice.


INTRODUCTION Rationale
Acute pancreatitis (AP) is an inflammatory disease of the pancreas, one of the most common causes of hospitalization among gastrointestinal diseases (Lankisch et al., 2015). Based on the revised Atlanta classification, the severity of AP may be mild, moderate, or severe (Banks et al., 2013). Most cases of AP are mild, but the management of the moderate and severe forms of the disease with necrosis and multiorgan failure remains a challenge. The prognosis of the severe form is poor, it occurs in 8.8% of AP (Parniczky et al., 2016) and the mortality of severe AP (SAP) may reach 28% (Parniczky et al., 2016). Therefore, it is necessary to predict the severity of the disease because the early escalation of care and aggressive therapy may prevent complications and adverse outcomes of AP in high-risk patients. Unfortunately, research on pancreatitis is in danger, therefore, attempts to obtain clinically relevant data has very high importance .

Objectives
Currently, there are various scoring systems used for the early prediction of SAP. First, the Ranson score was used (Ranson and Pasternack, 1977), but later the acute physiology and chronic health examination II (APACHE II) scoring system seemed to be more accurate (Yeung et al., 2006). Moreover, several inflammatory parameters such as C-reactive protein (CRP) and interleukin-6 were documented to be clinically relevant in the differentiation of mild and non-mild AP (Sternby et al., 2017). In 2008, a new, easy-to-implement bedside index of severity in acute pancreatitis (BISAP) was proposed for use within 24 h of hospitalization to predict in-hospital mortality (Wu et al., 2008). However, with the improvements of the imaging techniques, contrast-enhanced computed tomography (CT) gets an important place in the diagnosis of AP and its complications. In the early '90s, Balthazar and his coworkers developed a numerical scoring system, the CT severity index (CTSI), for the estimation of the severity of AP (Balthazar et al., 1990). It combines the quantification of pancreatic and peripancreatic inflammation with the extent of pancreatic parenchymal necrosis. In 2004, Mortele et al., formulated the modified CTSI (mCTSI) including a simplified evaluation of peripancreatic inflammation and extent of pancreatic parenchymal necrosis and incorporated the extrapancreatic complications (vascular, gastrointestinal, and extrapancreatic parenchymal complications as well as the presence of pleural effusion and/or ascites) in the assessment. This modified index correlated more with the outcome of AP (Mortele et al., 2004). Tables 1A,B show the components of CTSI and mCTSI.

Research Question
It is still not clear, which scoring system has the highest predictive accuracy for severity and mortality of AP. The aim of this meta-analysis was to investigate, how accurate CT-based severity indices are in the prediction of the severity and mortality of AP in comparison with other widely accepted and used scoring systems.

Search Strategy
A systematic search was performed in Pubmed, Embase, and the Cochrane Library (CENTRAL), using the following search query: "acute pancreatitis" AND ("computed tomography severity index" OR CTSI OR "modified computed tomography severity index" OR MCTSI

Study Selection and Data Extraction
Two independent investigators (EV, AM) selected the studies, and disagreements were resolved by a third reviewer (PH). The records were selected for meta-analysis if (1) AP patients of any severity were enrolled consecutively; (2) if CTSI or mCTSI were used for the prediction of the severity or mortality of AP; and (3) if sensitivity and specificity values, the absolute numbers of true positive (TP), false negative (FN), false positive (FP) and true negative (TN), and/or area under the curve (AUC) were reported (for CTSI/mCTSI regarding AP severity and/or mortality). If other prognostic scores (Ranson, BISAP, APACHE II) or CRP values were also assessed in the selected articles, those results were extracted as well. Only full-text articles were included. Studies, which met the inclusion criteria were assessed for full-text evaluation. The following data were extracted from the articles: first author; year of the publication; study period; study design; the AP scoring systems used; evaluation time of the scores; the used definition for SAP; sample size based on severity; mean age; male/female ratio; cut-off value, clinical end-points,     Frontiers in Physiology | www.frontiersin.org FIGURE 2 | Area under the curve (AUC) summarizing the predictive performance of scoring systems regarding mortality in acute pancreatitis. Size of squares for effect size reflects weight of studies in pooled analysis. Horizontal bars represent 95% confidence intervals (CI). CTSI, computed tomography severity index; BISAP, bedside index of severity in acute pancreatitis; mCTSI, modified computed tomography severity index; CRP, C-reactive protein; APACHE II, Acute Physiology And Chronic Health Examination II. The vertical line represents the line of no effect.
and several data about the statistical analysis were reviewed for the risk of bias assessment.

Data Analysis
To construct 2 × 2 contingency tables, true positive, false positive, false negative, and true negative values were abstracted. These served as input to fit Hierarchial Summary Receiver Operating Characteristics (HSROC) curves and estimate summary sensitivity, specificity, and diagnostic odds ratio (DOR) with 95% confidence intervals (CI). For each method and outcome, we collected the AUC values and their CIs as well and performed a meta-analysis using the random effect model to gain pooled AUC estimates with 95% CI. The statistical analysis was performed with Stata 14 software using the metandi module 1 . Heterogeneity was assessed using the I 2 measure and the corresponding chi 2 test, p < 0.1 indicates significant heterogeneity. Based on the Cochrane Handbook, I 2 = 100% × (Q-df)/Q represents the magnitude of the heterogeneity (moderate: 30-60%, substantial: 50-90%, considerable: 75-100%) (Higgins and Green, 2011).

Risk of Bias and Applicability Assessment
The Prediction model Risk Of Bias ASsessment Tool (PROBAST) (Wolff et al., 2019) was used to assess the risk of bias and applicability of primary studies in accordance with the recommendation of the Cochrane Collaboration. This tool is able to assess the risk of bias based on the following four domains: participants, predictors, outcome, and analysis. It includes also concerns regarding applicability in three domains: participants, predictors and outcome.

Prediction of Mortality
From the 30 articles, 11 contained data on AUC for the prediction of mortality (Figure 2). Table 3 summarizes the study numbers, sample sizes, AUC, and heterogeneity data of the different severity scores based on the outcome of AP. For CTSI based on data from 10 articles, the pooled AUC for mortality was 0.79 (CI 0.73-0.86; heterogeneity I 2 = 83%, p < 0.001). Eight articles included AUC data for mortality for BISAP, the pooled AUC was 0.87 (CI 0.83-0.90; heterogeneity I 2 = 0%, p = 0.578). The pooled AUC for mCTSI was 0.80 (CI 0.72-0.89; heterogeneity I 2 = 79.4%, p = 0.001) according to five studies. Only two studies reported AUC data predicting mortality for CRP level, and the pooled AUC was 0.73 (CI 0.66-0.81; heterogeneity I 2 = 0%, p = 0.708) Six articles included AUC data for mortality for Ranson score with a pooled AUC of 0.87 (CI 0.81-0.92; heterogeneity I 2 = 65.6%, p = 0.013) and also for APACHE II score with a pooled AUC of 0.91 (CI 0.88-0.93; heterogeneity I 2 = 4.8%, p = 0.386). Based on the above results of the meta-analytical calculations the APACHE II scoring system had significantly higher predictive accuracy for mortality than CTSI or CRP level (p = 0.001; p < 0.001, respectively). However, CTSI was not different from the BISAP, mCTSI, CRP or Ranson criteria in the prediction of mortality of AP, and these scores can be classified as good and fair.

Prediction of Severity
AUC data for severity were included in 19 studies (Figure 3).
There was no statistical difference between the severity predicting values of the different scoring systems. The heterogeneity values were I 2 = 86.2%, p < 0.001; I 2 = 89.7%, p < 0.001; I 2 = 68.1%, p = 0.043; I 2 = 77%, p = 0.001; I 2 = 87.5%, p < 0.001; I 2 = 36.8%, p = 0.105 for CTSI, BISAP, mCTSI, CRP, Ranson, and APACHE II scores, respectively. The heterogeneity across the studies was significant in all scoring systems or predicting values, except for the APACHE II score. Based on the results of the meta-analytical calculations, the severity prediction values of the included scoring systems are not different.   Sensitivity, specificity, and DOR data of all scores predicting mortality and severity are summarized in Table 4. In summary, the sensitivity for the prediction of mortality of AP was the highest for mCTSI, Ranson, and APACHE II scores. While the specificity for prediction of mortality was the best for APACHE II, BISAP, and Ranson scores. The sensitivity for the prediction of severity of CTSI, mCTSI, and Ranson scores were the highest, while the specificity for prediction of severity were the highest for CRP and CTSI.

Quality Assessment and Risk of Bias
In this study, the two main outcomes were mortality and severity of the disease, therefore, we included the data in two different tables. The majority of the studies included in our analysis met the predefined criteria of the definition of AP and contained all grades of severities, therefore the risk of bias regarding the included populations was deemed as low. The data on CTSI from all studies were significantly limited by the timeframe the CT was done from either the admission or the onset of the symptoms and the diagnosis of AP. Therefore, this is the main limitation of the applicability of our results on CTSI. The result of the risk of bias and applicability assessment is showed in Tables 5A,B.

Summary of Main Findings
Severe acute pancreatitis is a serious state with high mortality and it requires high costs of the health care system. By more accurate prediction of the severity on admission, the risk of mortality can be reduced with the immediate optimal therapy.
Acute pancreatitis is diagnosed on the basis of the presence of two or more of the following three criteria: abdominal pain consistent with the diagnosis elevated pancreatic enzymes to a level of more than three times the upper normal value, and characteristic findings on abdominal imaging. Different radiological modalities (ultrasound, CT) are not only necessary to make the diagnosis of AP, but by the visualization of the gallbladder and biliary tract, they can reveal its etiology as biliary or non-biliary. Furthermore, by using morphological scoring systems e.g., CTSI or mCTSI, obtaining a CT scan can be useful for assessing the severity of AP. This is the first meta-analysis, which quantifies the accuracy of CTSI and mCTSI scores for the prediction of the severity and mortality of AP, and compares them with other commonly used scoring systems. Two previous meta-analyses (Gao et al., 2015;Yang and Li, 2016) assessed the predictive accuracy of the BISAP score, but these articles did not contain CTSI nor mCTSI. Yang and Li (2016) found that the pooled sensitivity and specificity of the BISAP for the prediction of SAP were 0.65 (CI: 0.54-0.74) and 0.84 (CI: 0.70-0.92), respectively, the pooled AUC was 0.77 (CI: 0.73-0.80). Gao et al. (2015) calculated the pooled sensitivity as 0.51 (CI: 0.43-0.60), the specificity as 0.91 (CI: 0.89-0.92), the AUC was 0.87. Based on our results, we calculated 0.73 (CI: 0.53-0.87) for sensitivity, 0.80 (CI: 0.72-0.88) for specificity, and our pooled AUC was 0.79. The results are similar, the difference in specificity between our results and those of Gao et al. (2015) may be explained by the higher numbers of articles included in our analysis.
In our meta-analysis, APACHE II proved to be the most accurate scoring system for the prediction of mortality. It is the most widely used mortality prediction score in critically ill patients, however, it contains 12 points, including numerous clinical parameters, hence its application can be cumbersome and it limits its widespread use. In addition, APACHE II is designed for patients admitted to the intensive care unit, therefore it is not suitable for the early prediction of severity of AP. The AUC's of BISAP, mCTSI, and Ranson scores overlapped with APACHE II, while those of CTSI and CRP were mildly weaker.
Computed tomography severity index is accurate to predict severity, and its accuracy did not differ from the other scoring systems. However, the Ranson, APACHE II, and BISAP scores include several clinical parameters. There is a good correlation between morphological severity according to CT scoring systems and clinical scoring systems using clinical data and laboratory parameters.
The most recent guidelines of AP recommend a CT scan 72-96 h after the onset of the symptoms (Working Group IAP/APA Acute Pancreatitis Guidelines., 2013; Hritz et al., 2015), because pancreatic parenchymal necrosis in contrast-enhanced CT rarely appears within 48 h (Ryu, 2009). The guidelines allow an earlier CT scan in case of diagnostic uncertainty.
However, the contrast-enhanced CT examination cannot always be performed in every patient. In extreme obese patients, body weight, and size preclude the CT investigation. The contrast-enhanced CT assessment requires an intravenous injection of iodinated contrast medium for the detection of hypoperfused areas in pancreas parenchyma, therefore intravenous contrast media allergy, impaired renal function, and hyperthyroidism are contraindications.
Because of the risk of radiation exposure, repeated CT scans should be avoided, and should be reserved for patients who fail to improve clinically. CT examination had shown an advantage
Jakchairoongruang and Arjhansiri, 2013 red, high risk of bias or concern; yellow, unclear risk of bias or concern; green, low risk of bias or concern; +, high risk of bias or concern; ?, unclear risk of bias or concern; −, low risk of bias or concern.
Frontiers in Physiology | www.frontiersin.org  in evaluation of local complications (Ju et al., 2006), which can modify therapeutical strategy. Contrary to the evaluation of Ranson, BISAP, and APACHE II scores, contrast-enhanced CT assessment, and CTSI calculation require radiological expertise.

Limitations
Three of the articles contained data of children (Fabre et al., 2012;Lautz et al., 2012;Hashimoto et al., 2016), in 2 from these articles the DeBanto score was used for evaluating the severity of AP, which is a specific score for pediatric pancreatitis. The AP population of the studies is not necessarily representative for the whole AP population, because the CT scan is mostly performed in the more severe cases. In several studies not all etiologies of AP were included, Yang et al. included only patients with hyperlipidemic etiology while Alper et his coworkers included only biliary AP cases. Fifteen of the included studies were retrospectively designed, and these might have caused selection bias. The time of CTSI and mCTSI was not the same in the studies, in several studies it included a longer delay, while in others it was carried out on admission, leading to higher heterogeneity. While APACHE II, Ranson, BISAP and CRP values were established mostly on admission or within the first 48 h, the optimal timing of the CT examination 72-96 h after the onset of the symptoms and in several studies it was performed later than the other prognostic scores. This can limitate the prognostic score of the CTSI because the other scores can predict earlier severity or mortality with similar accuracy. It is also a good question if the radiologist can judge the time point of necrosis development. In populations with previous necrotizing pancreatitis, the severity cannot be accurately assessed.
For the value of predicting mortality, a considerable heterogeneity for CTSI and a substantial heterogeneity for Ranson score can be observed, while for the severity predicting value of AP a considerable heterogeneity for CTSI, BISAP, CRP, and Ranson scores and substantial heterogeneity for mCTSI can be noticed. We suspect that the confounder factors, that cause high heterogeneity among the studies are because of different population in terms of ethnicity, BMI, age (etc.), and etiology, the different timing and interpretation of imaging modalities, and potential inter observer variability between the radiologists interpreting the CT images. Because of the long delay characterizing the studies, the severity was assessed according to several Atlanta classifications and definitions.

Implications for Practice
In the prediction of mortality in AP, CTSI was revealed as equally valuable as BISAP, mCTSI, CRP, or Ranson score, only APACHE II score overcame its predicting ability. Considering severity, there was no difference in the prediction value of the FIGURE 7 | Hierarchial summary receiver operating characteristic curves (HSROC) for C-reactive protein (CRP) for predicting severity of acute pancreatitis.
scores. If CT scans are performed, CTSI and mCTSI can be easily calculated and should be used in addition to the other scoring systems.

Implications for Research
Further research is warranted for the assessment of the effect of early CT and its predictive value in AP.

DATA AVAILABILITY
The datasets generated for this study are available on request to the corresponding author.