Value of IVIM in Differential Diagnoses between Benign and Malignant Solitary Lung Nodules and Masses: A Meta-analysis

Purpose This study aims to evaluate the accuracy of intravoxel incoherent motion diffusion-weighted imaging (IVIM-DWI) in distinguishing malignant and benign solitary pulmonary nodules and masses. Methods Studies investigating the diagnostic accuracy of IVIM-DWI in lung lesions published through December 2020 were searched. The standardized mean differences (SMDs) of the apparent diffusion coefficient (ADC), tissue diffusivity (D), pseudo-diffusivity (D*), and perfusion fraction (f) were calculated. The sensitivity, specificity, area under the curve (AUC), publication bias, and heterogeneity were then summarized, and the source of heterogeneity and the reliability of combined results were explored by meta-regression and sensitivity analysis. Results A total of 16 studies including 714 malignant and 355 benign lesions were included. Significantly lower ADC, D, and f values were found in malignant pulmonary lesions compared to those in benign lesions. The D value showed the best diagnostic performance (sensitivity = 0.90, specificity = 0.71, AUC = 0.91), followed by ADC (sensitivity = 0.84, specificity = 0.75, AUC = 0.88), f (sensitivity = 0.70, specificity = 0.62, AUC = 0.71), and D* (sensitivity = 0.67, specificity = 0.61, AUC = 0.67). There was an inconspicuous publication bias in ADC, D, D* and f values, moderate heterogeneity in ADC, and high heterogeneity in D, D*, and f values. Subgroup analysis suggested that both ADC and D values had a significant higher sensitivity in “nodules or masses” than that in “nodules.” Conclusions The parameters derived from IVIM-DWI, especially the D value, could further improve the differential diagnosis between malignant and benign solitary pulmonary nodules and masses. Systematic Review Registration: https://www.crd.york.ac.uk/PROSPERO/#myprospero, identifier: CRD42021226664


INTRODUCTION
Lung cancer is the leading cause of cancer-related deaths worldwide, with 1.8 million deaths (18%) reported in 2020 (1). Lung cancer has a poor prognosis; at the time of diagnosis, approximately 70% of patients are already at an advanced stage, and more than half of the people diagnosed with lung cancer die within one year of diagnosis. The 5-year survival is <18% (2,3).
Early detection and characterization of solitary pulmonary lesions, especially the differentiation of benign and malignant pulmonary nodules, is important for risk assessment and management strategies. Low-dose CT (LDCT), which uses less radiation than a standard chest CT, has been proven effective in detecting early lung cancer and reducing mortality, especially among patients considered to be at high risk (4). Moreover, with the wide application of LDCT, an increase in the numbers of pulmonary nodules with unclear malignant tendencies has been observed, in turn affecting the treatment strategy (5,6). Yet, the major limitations of the LDCT are (a) inability to differentiate benign from malignant pulmonary lesions (7,8), (b) being unsuitable for long-term LDCT screening programs (due to cumulative radiation doses) (9), and (c) only suitable for certain patients (e.g., it is not recommended for pregnant women).
Diffusion-weighted imaging (DWI), a magnetic resonance imaging (MRI) method free from ionizing radiation and that requires no intravenous contrast agent, is based upon measuring the random Brownian motion of water molecules within a voxel of tissue, indicating changes at the cellular level (10). The apparent diffusion coefficient (ADC) value of DWI is usually lower in malignant lesions than that in benign lesions. However, ADC of conventional monoexponential DWI is not accurate enough to reflect the real diffusivity due to the influence of microcirculation (11,12).
More recently, intravoxel incoherent motion (IVIM), proposed by Bihan et al. (13) in 1988 to distinguish the influence of the random microscopic motion of water molecules and the microcirculation of blood by applying a biexponential signal equation model, has been recently applied to distinguish benign and malignant pulmonary lesions, showing promising results. Nonetheless, the number of related studies is insufficient to provide faithful results, so its application is still debatable. Thus, the aim of this study was to systematically assess the diagnostic performance of IVIM-DWI in differentiating benign and malignant nodules and masses using meta-analysis.

Literature Search
Studies published through December 2020 in English or Chinese in PubMed, Web of Science, Cochrane Library, and China National Knowledge Infrastructure databases were searched. The following keywords were applied: (Lung Neoplasm OR Pulmonary Neoplasm OR Lung Cancer OR Pulmonary Cancer) AND (Intravoxel Incoherent Motion OR IVIM OR multiple b-value DWI OR biexponential). Reference lists of qualified studies were also manually searched.

Study Selection
The following inclusion criteria were applied in study selection: (a) IVIM-DWI was used for differentiation of benign and malignant solitary pulmonary nodules and masses; (b) exploring the diagnostic performance of IVIM-DWI was the main purpose of the study; (c) the pathological evidence was used as diagnosis criteria; (d) the sensitivity and specificity about diagnostic performance were provided or enough information was reported to calculate the numbers of true-positive (TP), false-negative (FN), false-positive (FP), and true-negative (TN). The exclusion criteria were the following: (a) reviews, meta-analyses, conference abstracts, or dissertations; (b) duplication with the same study data from the same institutions; and (c) animal experiments.

Data Extraction
The mean values and standard deviation (SD), sensitivity, specificity, threshold, and area under the curve (AUC), which presented the diagnostic performance of ADC, D, D*, and f values, were extracted. Other information, including the first author, year of publication, study design, number and age of patients, field strength and vendors, b values, repetition time, and echo time, were also analyzed. Data extraction was performed by one author and reviewed by another author. TP, FP, FN, and TN data were calculated when the numbers of malignant lung lesions and benign lung lesions and the sensitivity and specificity were provided.

Quality Assessment
We assessed the quality of each included study using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) (14), which includes four domains (patient selection, index test, reference standard, and flow and timing); each domain is answered with "yes," "no," or "unclear." In our study, IVIM-DWI was designed as the index test and histopathologic confirmation as the reference standard. All assessment results were then imported into RevMan version 5.3 (The Nordic Cochrane Centre, Copenhagen, Denmark).

Publication Bias and Heterogeneity Evaluation
Publication bias of continuous variables was assessed by Funnel plots, Begg's test, and Egger's test; publication bias of diagnostic performance was assessed by Deek's plot using Stata version 14.0 (Stata Corp, College Station, TX, USA). An asymmetric or skewed funnel plot, p < 0.05 of Begg's test or Deek's test, was used to demonstrate the possibility of publication bias (15). The heterogeneity of included studies was evaluated by the inconsistency index (I 2 ) and Cochran's Q-tests. A random-effects model was applied in subsequent pooling when I 2 > 50% or p < 0.05 for Cochran's Q-test (suggesting statistically significant heterogeneity); the fixed-effects model was used when I 2 < 50% (16).

Meta-Regression and Subgroup Analysis
The Spearman correlation between the logit of sensitivity and the logit of 1−specificity was used to assess the threshold effect by Meta-DiSc version 1.4 (Universidad Complutense, Madrid, Spain); the threshold effect is one of the primary causes of heterogeneity in diagnosis-accuracy studies. A value of p < 0.05 for Spearman correlation analysis indicated the potential of a threshold effect. If heterogeneity resulting from the threshold effect was found, data were pooled by fitting a hierarchical summary receiver operating characteristic curve (HSROC), and the curve was pooled through the area under the receiver operating characteristic curve (AUC). Other factors may also contribute to heterogeneity in diagnosis-accuracy studies except for the threshold effect. Meta-regression of diagnostic performances was used to explore other factors (including study designs, lesion types, and machine types) that could significantly influence diagnostic values. Pooling could be performed in the homogeneous subgroup only if heterogeneity was related to other factors instead of the threshold effect. The sensitivity analysis was used to evaluate the stability and reliability of the combined results of meta-analysis and whether the combined results were significantly affected by a single study. Therefore, the sensitivity analysis was carried out by reducing one article at a time using Stata Version 14.0.

Data Synthesis
Forest plots were used for continuous variables, and the standardized mean difference (SMD) between malignant lesions and benign lesions was calculated by RevMan Version 5.3. The diagnostic performances, including sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), diagnostic odds ratio (DOR), and the area under the receiver operating characteristic curve (AUC), were pooled by a bivariate regression model using the Stata Version 14.0. The likelihood ratio and posttest probability were also significant to disease diagnosis (17), presenting the possibility that a patient was diagnosed with a certain disease using MRI parameters (18). The summary receiver operating characteristic curves (SROCs) and Fagan's nomograms were also used to evaluate the diagnostic values and predict post-test probabilities of ADC, D, D*, and f values.

Literature Search and Selection
A total of 310 studies were obtained, after which 133 duplication studies were excluded. Next, the titles and abstracts were screened, which led to exclusion of 139 additional studies (study reviews, meta-analysis, dissertations, or those where IVIM-DWI was not the main diagnosis measurement). We scanned the full texts of the remaining 38 studies in detail and excluded 22 studies for the following reasons: (a) a lack of sufficient data, (b) low-quality assessment, and (c) IVIM-DWI was applied for other purposes. Finally, 16 eligible studies (19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30)(31)(32)(33)(34) comprising 714 malignant and 355 benign lesions were included in the analysis. A flowchart of the study selection process is shown in Figure 1. The basic information and diagnostic performance of each study are presented in Tables 1 and 2, respectively.

Quality Assessment
The outcome of the QUADAS 2 assessment is shown in Figure 2. The overall quality of included studies was acceptable. Six studies were marked as "unclear" since their patient selection method was unclear. In the index test domain, six studies were marked as "unclear or "high risk of bias" because of the uncertainty concerning the process of interpreting result, and four studies because of the uncertainty concerning whether a threshold was prespecified or not while being used. Applicability of the index test showed unclear concern because threshold values of some parameters were missing (n = 5). Eight studies were marked as "unclear" in the reference standard domain since the application of the blind method while interpreting the gold standard result was unclear. In the flow and timing domain, 10 studies were marked as "unclear" or "high risk of bias" due to ambiguity related to the existence of an appropriate time interval between the index test and the reference standard; four of these ten studies were marked as high risk of bias, three (25,30,33) due to inconsistent application of the reference standard and one (28) due to the fact that four patients were excluded from statistical analysis. In the heterogeneity analysis, Cochran's Q-test suggested moderate heterogeneity in ADC (I 2 = 66% and p = 0.001 < 0.05) and high heterogeneity in the D value (I 2 = 74% and p < 0.001), D* value (I 2 = 82% and p < 0.001), and f value (I 2 =

Differential Diagnosis of Solitary Pulmonary Nodules and Masses by ADC
Eleven studies on ADC applied in differentiating solitary pulmonary nodules and masses were included in the analysis. The forest plot in Figure 4 presents the distribution of ADC

Differential Diagnosis of Solitary Pulmonary Nodules and Masses by the D Value
Thirteen studies on the D value used to distinguish solitary pulmonary nodules and masses were included for analysis. The forest plot in Figure 5 presents the distribution of the D value between the malignant and benign lesions. An SMD of −1.08 [95% CI, −1.41, −0.76] (p < 0.001) between the malignant and benign lesions was calculated by a randomeffects model.

Differential Diagnosis of Solitary Pulmonary Nodules and Masses by the D* Value
Twelve studies on the D* value used to differentiate solitary pulmonary nodules and masses were included for analysis. The forest plot in Figure 6 presents the distribution of the D* value between the malignant and benign nodules and masses. An SMD of −0.02 [95% CI, −0.41, 0.37] (p > 0.05) between the malignant and benign nodules and masses was calculated by a random-effects model.

Differential Diagnosis of Solitary Pulmonary Nodules and Masses by the f Value
Thirteen studies on the f value used to differentiate solitary pulmonary nodules and masses were included for analysis. The forest plot in Figure 7 presents the distribution of the f value between the malignant and benign lesions. An SMD of −0.44 [95% CI, −0.7, −0.09] (p < 0.05) between the malignant and benign lesions was calculated by a random-effects model.

Meta-Regression and Subgroup Analysis
There was no statistical difference concerning the influence of study designs, machine types, and lesion types on pooled ADC, D, D*, and f values (all P's > 0.05). There was a significant difference concerning study design in the D value     We then explored the potential factors (apart from the threshold effect) that caused the heterogeneity of ADC, D, D*, and f values with meta-regression analysis ( Table 3). For ADC, the pooled sensitivity of study designs ("prospective design" vs. "retrospective design") (p < 0.001), lesion types ("nodules" vs. "nodules or masses") (p < 0.001), and machine types ("3.0 T" vs. "1.5 T") (p < 0.01) was statistically significant. For the D value, pooled sensitivity of lesion types (p < 0.001) was statistically significant. For the f value, the pooled sensitivity of machine types was statistically significant (p < 0.05). Table 4 shows the combined DOR and 95% CI calculated by eliminating a study at a time. Regardless of which study was eliminated, the combined DOR did not significantly change, indicating that the result of this analysis was not excessively dependent on one certain study and that the conclusion was stable.

Diagnostic Performance
The results of pooled analysis of ADC, D, D*, and f values are shown in Table 5. Deeks' funnel plots (Figure 8

Post-Test Probabilities
Fagan's nomograms of ADC, D, D*, and f values were used to predict post-test probabilities ( Figure 10). All the pretest probabilities were set to 20% by default. Lower ADC and D values corresponded to a positive event associated with diagnosis of malignant pulmonary nodules and masses. An adverse event associated with benign nodules and masses corresponded to higher ADC and D values. The post-test probability increased to 46% with a PLR of 3.0 and decreased to 5% with an NLR of 0.21. Therefore, the diagnostic preference for malignant pulmonary lesions improved when using the ADC (a lower ADC). In contrast, when an adverse event occurred (a higher ADC), the probability of diagnosing malignant pulmonary lesions considerately decreased to 5%. Likewise, the post-test probability for a positive issue was 44% with a PLR of 3.0 and plunged to 3% with an NLR of 0.14 using the D value. The post-test probability of a positive issue was 30% with a PLR of 2.0 and decreased to 12% with an NLR of 0.54 by using the D* value. The post-test probability of a positive issue was 32% with a PLR of 2.0 and decreased to 11% with an NLR of 0.48 by using the f value. These data suggested that both ADC and IVIM parameters were useful in improving the accuracy of differentiating benign and malignant pulmonary nodules and masses.

DISCUSSION
This meta-analysis assessed the diagnostic performance of IVIM-DWI for the differential diagnosis of solitary pulmonary masses and nodules. The pooled results suggested that the D value of IVIM-DWI had better diagnostic performance compared with the monoexponential ADC value. Various aspects were assessed, including the threshold effect, meta-regression, subgroup analysis, and sensitivity analysis. Thus, the analysis and the outcomes were more precise and convincing.
The SMDs of ADC, D, and f values in malignant lesions were lower than those in benign lesions with statistical significance. The ADC value quantitatively expresses the diffusion characteristic of tissues; the ADC value is associated with tissue cellularity, cell density, and extracellular-intracellular components. A lower ADC value of malignant tissue usually results from the microstructural environment with dense cell membranes, larger cell nucleus, and higher cellular density acting as a diffusion barrier that characterizes the malignant lesion (35).
The D value, which represents the pure diffusion coefficient, negatively correlates with tumor cellularity (36). The D* value is proportionate to the blood velocity and capillary segment length in IVIM theory (13). The increased D* value may result from the angiogenesis of immature vessels in lung cancer, leading to larger blood flow velocity and capillary segment length in lung cancer (20). Since increasing angiogenesis is also a characteristic of benign lesions, higher D* may also be seen in benign lesions (37). The f value primarily reflects the proportion of blood perfusion in the whole diffusion movement of the tumor and could represent the percentage of capillary capacity in the voxel range to the whole tissue volume (38). Previous studies suggested a higher f value in benign lesions compared to that in malignant lesions (20,30). This may be because many benign lesions are seen as inflammatory granulomas or sclerosing hemangiomas with hypervascular features. However, the relaxation effects and the T2 may also be another potential cause affecting the f value. In addition, previous studies have suggested that the f value has no significant characteristics in differentiating lung cancer from benign lesions (20,28). The homogeneity test showed moderate or high heterogeneity with reference to the sensitivity or specificity of each parameter. In this case, it was not enough to just pool sensitivity or specificity, but it was essential to explore the sources of heterogeneity (including threshold effect) in a meta-analysis. Thus, the sources of heterogeneity were investigated in this meta-analysis.  In this study, no threshold effect was found in the analysis by the Spearman correlation coefficient, suggesting that there might be other sources that cause the heterogeneity. Thus, we explored the potential factors regarding study designs, lesion types, and machine types in the meta-regression analysis. The statistical significance was found in the pooled sensitivity of study designs ("retrospective study" vs. "prospective study"), lesion types ("nodules" vs. "nodules or masses"), and machine types ("3.0 T" vs. "1.5 T"), concerning the ADC value, in the pooled sensitivity of lesion types concerning the D value, and in the pooled sensitivity of machine types concerning the f value. This indicates that these factors may result in heterogeneity. Furthermore, in the subgroup analysis of the ADC value, significant differences were found in the sensitivity of study designs, lesion types, and machine types, suggesting that ADC had higher sensitivity in "retrospective study" than that in "prospective study," in "nodules or masses" than that in "nodules," and in "3.0 T" than that in the "1.5 T" MRI scanner. In the subgroup analysis of the D value, the statistically significant difference was found in the sensitivity of lesion types, suggesting a higher sensitivity in "nodules and masses" compared to that in "nodules." In the subgroup analysis of the f value, the significant difference was found in the sensitivity of the machine type, suggesting that the f value had higher sensitivity in "1.5 T" than that in the "3.0 T" MRI scanner. No significant change was found in the combined DOR while excluding any one of these studies, indicating the results of our meta-analysis were generally stable and reliable.
The study design was likely to cause heterogeneity since bias and confounding are more common in "retrospective studies" than in "prospective studies" (39). "Nodules or masses" showed higher sensitivity in both ADC and D values compared to "nodules." Pulmonary nodules are defined as focal opacities that measure up to 3 cm in diameter, while pulmonary masses are ≥3 cm in diameter. Regier et al. (40) found the sensitivity of DWI was only 43.8% for nodules ≤5 mm in diameter and increased to 86.4% for larger diameter (6-9 mm) and 97% for nodules ≥10 mm. Moreover, Jiang and colleagues (19) assumed that various factors regarding motion, vulnerability artifacts, and the partial volume effect had an obvious impact on smaller lung lesions. Jiang et al. (41) suggested that the nodule with a diameter smaller than 2 cm or a lower lung zone location would negatively affect the reproducibility. In addition, Koo et al. (42) found that most nodules (74%) in their study were <2 cm, and nearly half of the lesions were in the lower lobes. The "3.0 T" MRI scanner showed a higher sensitivity for ADC than the "1.5 T" MRI scanner. In contrast, the sensitivity for the f value was higher in the "1.5 T". Ohba and his team (43) indicated that both the "1.5 T" and "3.0 T" MRI scanners showed similar performance in assessing malignant pulmonary nodules. Schmidt et al. (44) indicated that 1.5 T MRI is the preferred imaging modality in a comparative study of highresolution whole-body MRI applications at 1.5 T and 3.0 T.
Apart from the aspects mentioned above, the influence of the b values on the heterogeneity could not be ignored. The number of b values was viewed to improve the separation of diffusion and perfusion (45). Additionally, lower b values were important in gaining perfusion-sensitive information (46). At the same time, the number and range of b values used in published studies substantially varied, revealing an obvious lack of consensus. Therefore, it is essential to reach a consensus on the number and range of b values in future research.
Moreover, advanced MRI technologies, such as the MRI respiratory triggering technology and advanced navigation platform, which could overcome challenges of the movement and breathing artifacts as well as the susceptibility artifacts caused by the interfaces between different tissues and the overall low proton density of the lung, should be applied. This study has a few limitations. First, this meta-analysis was only based on published studies, which might have led to overestimating the true effect. Second, since there are limited numbers of publications that included patients with solitary pulmonary nodules, we were unable to analyze the diagnostic performance of IVIM-DWI from the perspective of various sizes of nodules. Third, although meta-regression analysis suggested various aspects attributing to the heterogeneity, it was still not enough to explore heterogeneity through the analysis due to the differences in the scanning method and acquisition protocol. However, the variants such as b values, cutoff values, repetition time (TR), and echo time (TE) had too many included variables, which resulted in the difficulty in conducting subgroup analysis.

CONCLUSION
Overall, the pooled results suggested that the IVIM-DWI could be a valuable technique for the analysis of pulmonary nodules and masses. This meta-analysis first explored the heterogeneity of the lesion types concerning nodules and masses. The diagnostic performance shown in a subgroup analysis of the studies with masses or nodules is superior to the studies that only reported on nodules. Since MRI scanner hardware and sequence developments have achieved notable progress, IVIM-DWI might become an alternative diagnostic technique for malignant and benign differentiation of pulmonary masses and nodules.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

AUTHOR CONTRIBUTIONS
YC, QH, and ZX: critical revision of the manuscript, substantial contribution toward the conception and design of the work, and manuscript drafting. YC, QH, ZH, and ML: acquisition, analysis, and interpretation of the data. YC, ZA, YL, HY, and MW: revising the manuscript critically. All authors contributed to the article and approved the submitted version.