Diagnostic Performance of Diffusion-Weighted Imaging for Differentiating Malignant From Benign Intraductal Papillary Mucinous Neoplasms of the Pancreas: A Systematic Review and Meta-Analysis

Objectives To assess the diagnostic accuracy of diffusion-weighted imaging (DWI) in predicting the malignant potential in patients with intraductal papillary mucinous neoplasms (IPMNs) of the pancreas. Methods A systematic search of articles investigating the diagnostic performance of DWI for prediction of malignant potential in IPMNs was conducted from PubMed, Embase, and Web of Science from January 1997 to 10 February 2020. QUADAS-2 tool was used to evaluate the study quality. Pooled sensitivity, specificity, diagnostic odds ratio (DOR), positive likelihood ratios (PLR), negative likelihood ratios (NLR), and their 95% confidence intervals (CIs) were calculated. The summary receiver operating characteristic (SROC) curve was then plotted, and meta-regression was also performed to explore the heterogeneity. Results Five articles with 307 patients were included. The pooled sensitivity and specificity of DWI were 0.74 (95% CI: 0.65, 0.82) and 0.94 (95% CI: 0.78, 0.99), in evaluating the malignant potential of IPMNs. The PLR was 13.5 (95% CI: 3.1, 58.7), the NLR was 0.27 (95% CI: 0.20, 0.37), and DOR was 50.0 (95% CI: 11.0, 224.0). The area under the curve (AUC) of SROC curve was 0.84 (95% CI: 0.80, 0.87). The meta-regression showed that the slice thickness of DWI (p = 0.02) and DWI parameter (p= 0.01) were significant factors affecting the heterogeneity. Conclusions DWI is an effective modality for the differential diagnosis between benign and malignant IPMNs. The slice thickness of DWI and DWI parameter were the main factors influencing diagnostic specificity.


INTRODUCTION
Intraductal papillary mucinous neoplasms (IPMNs) of the pancreas, originating from the mucinous epithelium of the pancreatic duct system, are the most common types of pancreatic cystic neoplasms, which could overproduce mucin and lead to duct dilation (1,2). Histologically, IPMNs are classified as low-grade, intermediate-grade, high-grade, or even invasive carcinoma depending on the variable degree of dysplasia, and these types have been found to be associated with different prognosis (3). Due to the variable risk of malignancy ranging from 6 to 40% (4,5), it is crucial to accurately predict the malignant potential of IPMNs in order to choose appropriate surveillance and management strategy based on malignancy risk.
Diffusion-weighted imaging (DWI) is a functional MRI technique that reflects Brownian motion of free water and provides a qualitative or quantitative measurement of the motion of water molecules in various diseases by measuring the apparent diffusion coefficient (ADC) using the monoexponential model (6)(7)(8)(9). Many studies have proved that high grade or invasive IPMNs demonstrate significantly lower ADC values than low-or moderate-grade IPMNs (10)(11)(12). Although a recent meta by Liu (13) found that MRI/MRCP had the highest pooled diagnostic accuracy and DWI had the highest pooled specificity in distinguishing benign and malignant IPMNs, few details about the DWI has been described in their study.
Therefore, the purpose of this study was to systematically evaluate the diagnostic performance of DWI for predicting the malignant potential of pancreatic IPMNs using a meta-analysis.

Literature Search
We performed a comprehensive literature search in PubMed, Embase, and Web of Science to select original studies focusing on evaluating the accuracy of DWI in predicting the malignant potential of pancreatic IPMNs from January 1997 to February 10, 2020 according to the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines (14). The literature search terms were used as follows: (1) "Diffusion weighted imaging" or "diffusion-weighted" or "diffusion weighted MR" or "diffusion-weighted magnetic resonance imaging" or "DWI" or "apparent diffusion coefficient" or "ADC"; and (2) "pancreatic cyst" or "pancreatic cystic neoplasm" or "pancreatic cystic tumors" or "intraductal papillary mucinous neoplasm" or "IPMN". In addition, all the references of the included the study were checked and screened to ensure a comprehensive search. Two reviewers (FX and YL, with 5 and 7 years of experience) screened the literature independently, any discrepancies were resolved by discussion.

Inclusion and Exclusion Criteria
The retrieved articles were first screened according to their titles and abstracts, and then full-text of potentially eligible articles were reviewed by the previously noted two reviewers independently. Any discrepancies were resolved by discussion.
The inclusion criteria were as follows: 1) original studies focused on evaluating the diagnostic performance of DWI in predicting the malignant potential of pancreatic IPMNs; 2) sufficient data to calculate the 2 × 2 table including the true positives (TPs), false positives (FPs), false negatives (FNs), and true negatives (TNs); 3) pathological results as the reference standard; and 4) articles published in English.
The exclusion criteria were as follows: 1) articles in the form of conference abstracts, reviews, case reports, editorials, letters, or animal studies; 2) studies not in the field of interest; and 3) studies with overlapping patients and data (the study with the largest study population was included).

Data Extraction and Quality Assessment
The following data were extracted from the included studies: 1) study characteristics: publication years, authors, country, study period, study design, patient recruitment, blind, reader experience, patient numbers and ages, lesion numbers, reference standard, time interval between imaging test and surgery; 2) MRI techniques: vendor, scanner model, magnetic field strength, coil channels, DWI sequence, respiration, b values, slice thickness, diffusion restriction, and ADC cutoff values; 3) data was calculated for TPs, FPs, FNs, and TNs.
The quality assessment was evaluated using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool (15). Data extraction and the quality assessment were performed by the previously noted two reviewers independently and disagreement was resolved at a consensus.

Data Synthesis and Statistical Analysis
The forest plots of sensitivity and specificity were summarized in each study. The pooled sensitivity and specificity and their 95% confidence intervals (CIs) were obtained according to bivariate random-effects model. In addition, the positive likelihood ratio (PLR), negative likelihood ratio (NLR), and diagnostic odds ratio (DOR) with their 95% CIs were also obtained. Then, the summary receiver operating characteristic (SROC) curve was constructed, and area under the SROC curve (AUC) was computed to evaluate the value of DWI in diagnosing the malignant potential of IPMN, and the value was considered good for AUC value >0.9 and medium for AUC value from 0.7 to 0.9.
Heterogeneity among the studies was evaluated by Cochran's Q-test (P < 0.05 indicating the presence of heterogeneity) and Higgins inconsistency index (I 2 ) test [I 2 >50% indicating the presence of heterogeneity (16)]. The spearman correlation coefficient was calculated, and the presence of a threshold effect was indicated by a P-value less than 0.05.
Data analyses were performed using the Midas modules in Stata 15.0 (StataCorp, College Station, TX, USA). A value of p < 0.05 was considered as indicating statistical significance. Figure 1 demonstrated a flowchart of the selection process. A total of 123 studies were identified according to the described search strategies, and 33 duplicate articles were removed. Subsequently, 85 studies were excluded for the following reasons: letter to the editor (n = 1), animal studies (n = 3), case reports (n = 14), conference abstract (n = 6), review (n = 10), non-English article (n = 1), insufficient data to construct a 2 × 2 table (n = 1), not in the field of interest (n = 48), or studies with overlapping patients and data (n = 1) (10) ( Figure 1). Finally, five eligible articles with six studies were included in this meta-analysis.

Characteristics of the Included Studies
The main study and MRI features were shown in Tables 1 and 2. All studies were retrospective. A total of 307 patients with 307 lesions were included ranging from 35 to 132. The number of malignant IPMNs (39.7%, 122/307) ranged from 15 to 49. All patients had histopathology after surgey as the reference standard. The 1.5T scanners were used in two studies (11,17) and 3.0T scanners were used in three studies (18)(19)(20). The slice thickness of DWI was 5 mm in two articles (19,20) and 7 mm in three articles (11,17,18). Max value of b value ≥1,000 was found in two articles (11,18), and max value of b value<1,000 in three articles (17,19,20).

Study Quality of the Included Studies
The detailed study quality of the included studies was shown on Figure 2. Eighty percent studies (4/5) (11,(17)(18)(19) demonstrated unclear risk of bias in patient selection because they did not report the enrollment type of patient (consecutive or random). All five included studies were graded as an unclear risk of bias in reference standard because they did not describe whether the application of reference standard was blind. Twenty percent studies (1/5) (11) were graded as an unclear risk of bias in flow and timing due to the lack of information about the interval between MRI examination and reference standard.
The Q test revealed no heterogeneity was present (Q = 3.948, p = 0.069). However, the Higgins I 2 test demonstrated that heterogeneity was found in specificity (I 2 = 55.36%, p = 0.05),    not in sensitivity (I 2 = 0, p = 0.45). The spearman correlation coefficient value of DWI was 0.771 (P = 0.072). This result showed that a threshold effect was absent in this meta-analysis. The Deeks' funnel plot and asymmetry test (P = 0.30 for the slope coefficient) both indicated no influence of publication bias on our meta-analysis ( Figure 5).

Meta-Regression Analyses
The results of the meta-regression analyses were summarized in Table 3. Among the variables that was considered a potential source of heterogeneity, the slice thickness of DWI (5 vs. 7 mm; p = 0.02) and DWI parameter (quantitative DWI vs. qualitative DWI; p = 0.01) were significant factors. Specifically, studies using a thickness = 5 mm showed a higher sensitivity (0.

DISCUSSION
Our study demonstrated that DWI can accurately differentiate malignant potential of pancreatic IPMNs with overall pooled sensitivity of 74%, specificity of 94%, and AUC of 0.84. Heterogeneity was found in specificity (I 2 = 55.36%), while not in sensitivity (I 2 = 0). In meta-regression analyses, the DWI parameter and slice thickness of DWI were significant factors that affected the diagnostic performance of DWI in predicting malignant potential of IPMN. The differential diagnosis of benign and malignant IPMNs is crucial for appropriate treatment and improving prognosis. Thus, it is imperative to identify an efficient and non-invasive mehod to detect malignant potential of IPMNs. DWI as a noninvasive imaging technique has been widely used in the pancreas disease (21)(22)(23)(24). A recent meta by Liu et al. (13) also found that the pooled sensitivity, specificity, and AUC of DWI were 0.72, 0.97, and 0.82 in the differentiation of benign and malignant IPMNs, which was consisted with our results. MRCP is recommended for the diagnosis and follow-up of IPMN according to international guidelines. Kawakami et al. reported that the sensitivity and specificity of MRCP in differentiating malignant from benign IPMNs was only 60.5 and 93.9%, while the sensitivity and specificity of MRCP combined with DWI could be 92.1 and 91.2% (25). A study by Bertagna et al. (26) showed that F18-FDG-PET or PET/CT could achieve a better diagnosis performance between benign and malignant IPMNs with the pooled sensitivity and specificity of 88 and 98%, respectively. ADC entropy obtained from histogram analysis was also proved to be an effective predictive factor for identifying the malignant potential of IPMNs with comparable sensitivity (100 and 80%, respectively) and specificity (70 and 70.59%, respectively) (27,28). Therefore, more prospective studies are required to confirm the diagnostic performance of DWI combined with many other advanced imaging techniques which may help and achieve the final diagnosis of IPMNs.
Many studies have proved that quantitative and qualitative DWI could be used to differentiate between malignant and benign tissues or assessing the tumor grade in various organs, including the lung (29,30), liver (31), gallbladder (32), pancreas (10,12,17,18), kidney (33,34), and vertebral bone marrow (35,36), because it could reflect microscopic motion of water protons at the cellular level (32,33,35,37,38). However, there may be some controversy in the performance of DWI on the diagnosis of IPMNs. Fatima et al. (39) reported that all IPMNs had low-iso signals on DWI without mention the malignancy of IPMNs, while other studies showed that the malignant IPMNs demonstrated a higher signal intensity and lower ADC value compared to benign IPMNs (10,12,17,18).
Meta-regression analysis indicated that the slice thickness of DWI and DWI parameter were source of study heterogeneity. In particular, the pooled sensitivity was higher in studies with thinner slice thickness (5 mm) than those with thicker slice thickness (7 mm). This indicates that thinner slice thickness (5 mm) for the quantitative assessment of ADC is more appropriate for differentiating between benign and malignant IPMNs. You et al. (32) reported that qualitative DWI with sensitivity of 90% and specificity of 87% had a higher diagnostic performance compared to quantitative DWI with sensitivity of 82% and specificity of 86% in discriminating benign and malignant gallbladder lesions. However, it was reported by Shen et al. (29) that qualitative assessment and quantitative ADC could differentiate malignant from benign pulmonary lesions with reasonable accuracy (sensitivity: 0.88 vs 0.84; specificity: 0.75 vs  0.84) while they had no significant differences. Actually, our results indicated that the pooled specificity of quantitative DWI was more accurate than the qualitative DWI for differentiating benign and malignant IPMNs (0.97 vs 0.86, P = 0.01). This might be attributed to subjective assessment of qualitative DWI. Therefore, slice thickness of 5 mm and quantitative DWI were strongly recommended for DWI in differentiating benign and malignant IPMNs based on our results. However, due to the limited included studies, more further studies will be needed to confirm our results.
There were several limitations in this study. First, a relatively small number of the included studies without standard method of measuring ADC values was a major limitation, and this prevented calculation of diagnostic values in different patient subgroups. Second, there were only surgical series, and this probably underestimated the number of benign IPMNs. Third, it was still insufficient to explore the reasons for the heterogeneity using meta-regression because large heterogeneity was found between the studies. Finally, all the included studies were retrospective which may overestimate the diagnostic performance (40). Thus, further prospective studies are needed to confirm the diagnostic performance of DWI.
In conclusion, our study demonstrated that DWI had a considerable potential and value in the differential diagnosis of benign and malignant IPMNs. The slice thickness and parameter of DWI affected the diagnostic performance of DWI. More prospective studies are needed to validate the diagnostic value of DWI in the future.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding author.