A new biomarker combining multimodal MRI radiomics and clinical indicators for differentiating inverted papilloma from nasal polyp invaded the olfactory nerve possibly

Background and purpose Inverted papilloma (IP) and nasal polyp (NP), as two benign lesions, are difficult to distinguish on MRI imaging and clinically, especially in predicting whether the olfactory nerve is damaged, which is an important aspect of treatment and prognosis. We plan to establish a new biomarker to distinguish IP and NP that may invade the olfactory nerve, and to analyze its diagnostic efficacy. Materials and methods A total of 74 cases of IP and 55 cases of NP were collected. A total of 80% of 129 patients were used as the training set (59 IP and 44 NP); the remaining were used as the testing set. As a multimodal study (two MRI sequences and clinical indicators), preoperative MR images including T2-weighted magnetic resonance imaging (T2-WI) and contrast-enhanced T1-weighted magnetic resonance imaging (CE-T1WI) were collected. Radiomic features were extracted from MR images. Then, the least absolute shrinkage and selection operator (LASSO) regression method was used to decrease the high degree of redundancy and irrelevance. Subsequently, the radiomics model is constructed by the rad scoring formula. The area under the curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of the model have been calculated. Finally, the decision curve analysis (DCA) is used to evaluate the clinical practicability of the model. Results There were significant differences in age, nasal bleeding, and hyposmia between the two lesions (p < 0.05). In total, 1,906 radiomic features were extracted from T2-WI and CE-T1WI images. After feature selection, using 12 key features to bulid model. AUC, sensitivity, specificity, and accuracy on the testing cohort of the optimal model were, respectively, 0.9121, 0.828, 0.9091, and 0.899. AUC on the testing cohort of the optimal model was 0.9121; in addition, sensitivity, specificity, and accuracy were, respectively, 0.828, 0.9091, and 0.899. Conclusion A new biomarker combining multimodal MRI radiomics and clinical indicators can effectively distinguish between IP and NP that may invade the olfactory nerve, which can provide a valuable decision basis for individualized treatment.


Introduction
Inverted papilloma (IP) is a common benign epithelial-derived tumor of the nasal cavity and sinuses, accounting for approximately 0.5-4.0% of nasal tumors (1); a complete surgical excision is crucial for their efficacy and prognosis (2). Nasal polyp (NP), also known as polypoid degeneration, has a high clinical incidence and can be treated by nasal irrigation or nasal endoscopic surgery combined with glucocorticoid medication (3). Some studies have reported that approximately 60% of patients require multiple intraoperative biopsies before an accurate pathological diagnosis can be made (4). Accurate preoperative diagnosis is critical for patients' treatment. Symptoms of both lesions can manifest as persistent nasal congestion, runny nose, nasal bleeding, facial pain, and hyposmia, making them clinically difficult to distinguish (5). Olfactory hyposmia is often overlooked in the clinical treatment of IP and NP. IP is characterized by local aggressiveness, high recurrence rate, and malignant transformation, and it easily invades the olfactory nerve through the skull base (2), and NP is often accompanied by chronic rhinosinusitis, histological changes of the mucous membrane secondary to the inflammatory process may reduce the olfactory neurons (6). Once surgically removed, patients' olfaction is not restored, which can seriously affect their quality of life, so it is beneficial to improve the patient's prognosis if we intervene in advance for IP and NP that may invade the olfactory nerve.
In recent years, the organic integration of artificial intelligence (AI), computer technology, and medical imaging in the context of big data has led to the rapid development of imaging omics. Radiomics (7) refers to obtaining abundant advanced quantitative imaging features from images, extracting feature data to extend conventional images, and applying suitable machine learning algorithms to construct predictive models by implementing tumor segmentation and feature extraction, and these quantitative features are different from the visual images we perceive, aiding physicians in making rapid diagnoses by providing potential value. In this field, machine learning (ML) algorithms are used to select the best features and develop and improve models, which have the potential to improve predictive power (8). In the last 2 years, studies regarding artificial intelligence in IP have gradually become a hot topic (9)(10)(11)(12)(13). In one study, Li et al. (14) designed a deep learning framework through convolutional neural networks to automatically identify IP and NP with high AUC values of 0.95. In another study, Ren et al. (15) used a deep convolutional neural network (CNN) which combines a densely connected convolutional network (DenseNet) and squeeze-and-excitation network (SENet) to classify IP and NP in CT and achieved a relatively high diagnostic value. Although these two study models gain excellent results but did not analyze IP and NP from a clinical perspective. MRI, as one of the common examination methods for sinus tumors, has the advantages of no radiation, high soft tissue resolution, and multiplanar imaging compared with CT, and clearly shows the signal changes of the internal structure of the tumor. IP and NP frequently show a lobulated shape with hyperintensity on T2-WI and isointensity to hypointensity on T1-WI (16). The convoluted cerebriform pattern (CCP) is a reliable MRI feature of IP on CE-T1-WI (17), but not all IPs have such characteristics (18). There are no relevant studies discussing the construction of machine learning models based on multi-parameter MRI to distinguish between the two lesions.
In this study, we aimed to use multimodal MRI sequences of the nasal cavity (T2WI, CE-T1WI) to construct radiomics models combined with clinical indicators, to effectively and highly accurately identify IP and NP that may invade the olfactory nerve, and this helps to provide more comprehensive information for their treatment plans.

Patients
This retrospective study included two groups of patients who underwent an MRI examination in the Second Hospital of Jilin University from March 2014 to May 2020 and were confirmed as NP or IP by pathological diagnosis. Ethical approval was obtained, and the informed consent requirement was waived by our institutional reviewing board. Inclusion criteria were as follows: (1)  As shown in Figure 1, a total of 170 patients' data were collected from the hospital database, and a total of 129 patients were included in the study. Clinical indicators include persistent nasal congestion, runny nose, nasal bleeding, facial pain, and hyposmia. In addition, individual cases were found to have decreased sense of hearing and tinnitus, considered to be the cause of an oversized involuted papilloma compressing the eustachian tube or inflammatory infiltration, so they are also included. Based on the inclusion criteria, a total of 129 patients, including 74 patients with NP (47 men and 27 women: 43.55 ± 16.81 years; range, 14-85 years) and 55 patients (36 men and 19 women: 52.72 ± 10.44 years; range, 12-82 years) with IP, were randomly assigned to a training or testing cohort to explore and validate the diagnostic performance of the model between NP and IP.

Image acquisition
The equipment for image acquisition was a 3 T MRI system (DiscoveryMR750; GE Healthcare, Waukesha, WI). Axial T2WI (TR/ TE 3000-3,500/120-130 ms, NEX 1) images were obtained by conventional plain scan examination. Then, patients were given an injection of 0.1 mmol/kg of contrast-enhancing agent (gadopentetate glucosamine), and axial CE-T1WI images were obtained with an adjusted layer thickness of 4-5 mm, a layer spacing of 0.5 mm, a matrix of 320 × 256, and field of view (FOV) of 20 cm × 20 cm.

Image preprocessing
The flowchart of image processing, feature extraction, feature selection, and model construction is given in Figure 2A. In this study, open-source medical image processing software (3D Slicer, version Frontiers in Neurology 03 frontiersin.org 4.11.0) 1 is used as the analysis platform. First, DICOM data of qualified axial T2-WI and CE-T1WI original scanning images were imported into 3Dslicer software, then the region of interest (ROI) for each slice was drawn, followed by determining the tumor contour on the CE-T1WI image, and finally, the CE-T1WI image was referred when sketching the T2-WI image.
Radiologists with 8 and 20 years of experience, respectively, will work together to assess image quality and confirm the location of the primary tumor. If there is a disagreement, the two radiologists will discuss it and make a decision. The ICC of the radiomic features assessed by different radiologists was calculated. Features with an ICC greater than 0.80 were considered to be in good agreement and reserved for further analysis (19). These two radiologists were blinded

Features extraction
3D Slicer software automatically generates a 3D volume area of interest and saves it in nii or nii.gz format. The features were then extracted using a radiomics module (Pyradiomics) based on Python (version 3.7.10) software. Based on the MRI images, we carried out the pre-processing and image transformation. Many image preprocessing methods were used, including Wavelet, LOG, and Square, and the best parameters are adjusted. After image processing, we obtained a total of 1,906 features from each ROI, as shown in Figure 3, including the three categories as follows: first-order features, shape features, and texture features (20). First-order features (396 features) use basic first-order statistics such as mean, variance, Filter flowchart. Through the inclusion and exclusion criteria of each step, a total of 129 patients were enrolled in the study and randomly assigned to training or testing sets.
Frontiers in Neurology 04 frontiersin.org entropy, and standard deviation to describe the pixel intensity and distribution within the ROI. Shape features (14 features) describe the shape and size information of ROI in 2D or 3D, such as volume, diameter, and roundness. Texture features mainly include gray level emphasis, gray level nonuniformity, gray level nonuniformity normalized, gray level variance, and gray level run emphasis, which describe the gray-level relationship between a pixel and surrounding pixels.

Feature selection
A total of 80% of datasets were randomly used as the training set, the remaining were used as the testing set. Before feature selection, all radiomics features were standardized by removing the mean and dividing by its standard deviation with the StandardScaler function. Each set of feature values was converted to a mean of 0 with a variance of 1. The Pearson correlation coefficient (PCC) method is used to reduce the dimensionality of the features and exclude those with a correlation coefficient threshold higher than 0.9. Then, the least absolute shrinkage and selection operator (LASSO) regression method was used to compress the regression coefficients of redundant prediction variables. The best λ, the coefficient of regularization used for the LASSO method, was selected using inner 5-fold crossvalidation in the training set via minimum average mean square error (MSE). Subsequently, the radiomics parameters with non-zero coefficients in the LASSO model were combined into the rad-score formula.

Model construction
Our study explored and verified 11 Ml models, namely, logistic regression (LR), support vector machine (SVM), random forest (RF), AdaBoost, Gradient Boosting (GB), Gaussian Naive Bayesian (NB), K-nearest neighbors (KNN), and ExtraTrees. All data are randomly divided into training and test cohort at a ratio of 8:2. Diagnostic performances of different imaging models were evaluated using the receiver operating characteristic curve. AUC, accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of the model were calculated. The confusion matrix for the test set was constructed based on the predicted values. Finally, decision curve analysis (DCA) is used to evaluate the clinical practicability usefulness of the model.

Statistical analysis
Statistical analyses in this study were conducted in the SPSS software package (version 25.0; IBM, Armonk, NY). Clinical characteristics of all numeric data are statistically described using mean, standard deviation, frequency, and percentage. Two independent sample t-tests were used for continuous variables conforming to a normal distribution, the Mann-Whitney U-test was used for skewed distribution, and the ROC analysis was used to evaluate the diagnostic performances of ML classifiers and visual assessment [95% confidence intervals (CIs), specificity, and sensitivity were also calculated]. A p < 0.05 was considered statistically significant.

Clinical characteristics
Patients with NP were significantly younger than those with IP (p < 0.05), and patients with IP were more prone to runny nose and hyposmia than those with NP (p < 0.05), but there were no significant differences in gender, persistent nasal congestion, runny nose, facial pain, and decreased sense of hearing between the two lesions (P > 0.05) ( Table 1).

Reproducibility and feature selection
The ICCs calculated for agreement of features extracted by the two radiologists ranged from 0.865 to 0.968 for T2-WI and from 0.934 to 0.991 for CE-T1W, reflecting good agreement.
After feature selection with the LASSO method, the radiomic signature label rad-score constructed when the minimum coefficient is taken has the smallest binomial deviation ( Figure 4A), when log(λ) = 0.068 ( Figure 4B), and the weighting coefficients for constructing the rad-score radiomic features are shown in Figure 4C. The rad-score formula is as follows:

Diagnostic performance of various classifier models
The diagnostic performance and cutoff values of various classifier models in discriminating IP from NP in the training and testing sets are summarized in Table 2. As shown in Table 2, the highest AUC (0.9121), accuracy (0.8461), sensitivity (0.8000), and specificity (0.9091) are shown in the ExtraTrees model on the test cohort. The LR, NB, SVM, KNN, RF, XGB, LightGBM, GB, AB, and MLP models also showed excellent AUC performance at 0.8182, 0.7515, 0.8060, 0.8394, 0.7455, 0.7758, 0.7879,  0.7636, 0.8364, and 0.8121, respectively. The performance of the other models in the test cohort was general. AUC and other values were not as good as those of the above seven models. Therefore, in this study, the performance ability of the model was in the following order: ExtraTrees > KNN > AB > LR > MLP > SVM > LightGBM > others. Finally, we summarized the ROC of the 11 types of machine learning models with higher AUC and confusion matrix for the ExtraTrees model in the test set, as shown in Figure 5, as a visual situation analysis table, the confusion matrix indicated that the classification model has a high accuracy rate. As shown in Figure 6, DCA showed that the predictive model curves were significantly farther away from the two extreme lines, indicating a good overall net benefit in the population.

Discussion
Inverted papilloma (IP) and NP have a high probability of recurrence, therefore, efficacious preoperative assessment of these lesions is crucial for symptomatic treatment and to reduce the recurrence rate (16). Pathological biopsy is the gold standard, but it is an invasive and limited examination. MRI has a very high soft tissue resolution and can clearly distinguish between the tumor itself and the surrounding environment, which can compensate for the limitations of tissue biopsy (21). Even though traditional radiology diagnostic methods are convenient and cost-effective in routine clinical practice (13), in a large number of cases, the clinical radiological characteristics alone are not enough to accurately distinguish IP and NP.
The analysis of clinical characteristics shows that IP is more common in the elderly and the incidence of nasal bleeding is greater than that of NP, which is consistent with the results of this study. Interestingly, hyposmia is statistically significant between the two, which has not been mentioned in previous studies. Both IP and NP can occur due to conductive olfactory disturbance caused by nasal tract obstruction, which is manifested by a decrease in olfactory hormone molecules reaching the olfactory nerve epithelium, while IP can also occur due to the neurological olfactory disturbance caused by damage to the olfactory nerve or olfactory center, which is manifested by an oversized tumor squeezing the olfactory bundle or the mucosal receptors at the top of the nasal cavity, the upper nasal septum, and the medial aspect of the superior turbinate innervated by the olfactory nerve (22).
Although olfaction in humans is less appreciated than other senses, it affects our lives all the time (23). Because the olfactory nerve is the cranial nerve exposed to the external environment, it is vulnerable to tumor compression and inflammatory stimulation, resulting in reduced olfactory function (24). Olfactory nerve injury  cannot be recovered if there is a prolonged attack of the lesion and the degree of damage is relatively severe (25). Most nasal cavity or sinus tumors may damage the olfactory nerve, and in case of olfactory disturbance, timely treatment is required. Our study also aims to highlight the necessity of predicting olfactory nerve invasion in clinical practice. Radiomics is automated and objective, and thus does not rely on human-derived measurements of image features (26). In our study, the best 12 radiomic features that could distinguish IP from NP included eight texture features, two shape features, and two first-order features. Among them, the texture features, which reflect gray-level nonuniformity account for a large proportion, may be explained by a higher heterogeneity of the images (27). This may be due to differences in tissue composition. In the case of IP, tumors derived from the Schneiderian membrane grow and replace areas of the mucous, serous glands, and ducts (28), whereas, in the case of nasal polyps, it contains mainly fibrin and water (29). A small number of shape features is related to an irregular multinodular mass observed under the endoscopy (30). Radiological features provide more systematic, comprehensive, and quantitative information on tumor heterogeneity than traditional morphological features, which can help explain the potential relationship between pathophysiological and radiology imaging phenotypes (31).
We build a radiomics model based on preoperative multimodality MRI imaging parameters to construct radiomic signature labels through feature downscaling and multiple model optimization. Among all models, ExtraTrees got the most satisfactory result of 0.9121 in AUC. In Li's research, by using a neural network and analyzing its ability to discriminate the differences, the diagnosis between IP and NP reached a sensitivity of 90.60%, a specificity of 86.40%, and an AUC of 0.884, which are similar to our diagnostic efficacy. In another study of automatic identification of IP and NP by FIGURE 6 DCA for ExtraTrees model in the test set. The predictive model curves are significantly farther away from the two extreme lines, indicating a good overall net benefit in the population.
Frontiers in Neurology 09 frontiersin.org convolutional neural networks, the result reached an accuracy of 89.30%, a sensitivity of 89.01%, a specificity of 89.70%, and an AUC of 0.95. Numerically, the results were slightly higher than the diagnostic performance of our study. It may be due to differences in radiomics models or algorithms, but it basically reaches the diagnostic performance of diagnostic radiologists. We combined clinical indicators through a multimodal study, making the data more comprehensive, providing more useful information for the clinic, controlling for bias due to missing clinical information, and more convincing. Compliance with the principle of early diagnosis and early treatment can clarify the disease as early as possible and achieve an improved prognosis. This study also has some limitations. On the one hand, it is a retrospective study in a single-center, small data set, and a more rigorous study using multicenter and large-scale data sets is needed to avoid overfitting. On the other hand, we only analyzed the radiomics of T2WI and CE-TIWI sequences, and in future, we will integrate clinical data and radiomic features of other MRI sequences, such as TIWI and diffusion-weighted imaging (DWI), to further improve the diagnostic accuracy of the model.

Conclusion
In summary, a new biomarker combining multimodal MRI radiomics and clinical indicators can effectively distinguish between IP and NP that may invade the olfactory nerve and can be a valuable addition to routine clinical practice, thus providing a more accurate and objective basis for individualized treatment decisions, showing the potential application value and prospects of radiomic models in nasal cavity diseases.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement
The studies involving human participants were reviewed and approved by the ethics committee of the Second Hospital of Jilin University (no. SB-2021-012). Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.