Diagnostic accuracy of Raman spectroscopy in oral squamous cell carcinoma

Background Raman spectroscopy (RS) has shown great potential in the diagnosis of oral squamous cell carcinoma (OSCC). Although many single-central original studies have been carried out, it is difficult to use RS in real clinical settings based on the current limited evidence. Herein, we conducted this meta-analysis of diagnostic studies to evaluate the overall performance of RS in OSCC diagnosis. Methods We systematically searched databases including Medline, Embase, and Web of Science for studies from January 2000 to March 2022. Data of true positives, true negatives, false positives, and false negatives were extracted from the included studies to calculate the pooled sensitivity, specificity, accuracy, positive and negative likelihood ratios (LRs), and diagnostic odds ratio (DOR) with 95% confidence intervals, then we plotted the summary receiver operating characteristic (SROC) curve and the area under the curve (AUC) to evaluate the overall performance of RS. Quality assessments and publication bias were evaluated by Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) checklist in Review Manager 5.3. The statistical parameters were calculated with StataSE version 12 and MetaDiSc 1.4. Results In total, 13 studies were included in our meta-analysis. The pooled diagnostic sensitivity and specificity of RS in OSCC were 0.89 (95% CI, 0.85–0.92) and 0.84 (95% CI, 0.78–0.89). The AUC of SROC curve was 0.93 (95% CI, 0.91–0.95). Conclusions RS is a non-invasive diagnostic technology with high specificity and sensitivity for detecting OSCC and has the potential to be applied clinically.


Introduction
Oral cancer is one of the most common malignant tumors in the head and neck, collectively known as head and neck squamous cell carcinoma (HNSCC) (1). Oral squamous cell carcinoma (OSCC) accounts for 90% of the incidence rate of HNSCC. The mortality rate of HNSCC in the world ranks sixth in the mortality rate of cancer, and the 5-year survival rate after diagnosis is less than 50%, making it a disfigurement disease with poor prognosis (2). The "gold standard" currently in OSCC diagnosis is visual inspection, then an invasive organizational examination or histopathological examination that is invasive and time-consuming (3). Even after successful treatment of primary cancer, there is still a risk of developing recrudescence or second primaries. Thus, screening and detection of early OSCC are the key to reduce the mortality of OSCC patients.
Many non-invasive techniques, such as vital staining techniques, optical imaging devices, and exfoliated cytology tools, have been developed to assist in the screening and early detection of OSCC (4). In 1928, Indian physicist Raman discovered Raman scattering effect, and the spectrum produced by the effect is called Raman spectroscopy (RS). RS can be used to identify the functional groups present in the molecule to provide a specific spectral characteristic of the internal structure and conformation of the cells, referred to as "fingerprint molecules". Known for its high specificity, high analysis efficiency, and features such as complex samples without dyeing or marking, RS can provide real-time molecular information and high-resolution imaging with a relatively low cost (5). Moreover, biological samples such as tissue, plasma, and saliva can be directly inspected by RS. The non-invasive feature of RS greatly reduces the pain and economic burden of patients and has become a novel tool for cancer diagnosis, treatment, and prognosis evaluation (6). Currently, RS has been proven to have high diagnostic accuracy for multiple types of human cancers, including OSCC (7), breast cancer (8), bladder cancer (9), colorectal cancer (10), and gastric cancer (11).
Although many single-central original studies have been carried out, due to the small number of samples, various diagnostic algorithms and analysis tools, and different RS settings, the previous researches cannot fully reveal the value of RS in OSCC diagnosis. Herein, we conducted this metaanalysis of diagnostic studies to evaluate the overall performance of RS in OSCC diagnosis.

Search strategy
A systematic search of the Medline, Embase, and Web of Science databases was performed for English publications for studies from January 2000 to July 2022. Keywords used in database searches were (OSCC) OR (oral cancer) AND (Raman spectroscopy) OR (Raman spectra) OR (Raman spectrum) AND [(histopathology) OR (biopsy)]. Reference lists of retrieved articles and unpublished clinical trials were also examined to identify potential findings.

Study selection
All records were selected for eligibility by 2 reviewers independently, and disagreements were reported to the third reviewer and resolved by discussion. Studies were selected with the following criteria: 1) The samples used should only be human tissue. 2) RS should be an independent diagnostic tool to identify OSCC or differentiate it from normal tissue. 3) Original data including true positives (TPs), true negatives (TNs), false positives (FPs), and false negatives (FNs) should be provided or can be calculated with sensitivity and specificity values. 4) A control group (normal tissues) should be included with a total number of more than 10 samples. 5) Articles should be published in English. Following are the exclusion criteria: 1) Studies used combined diagnostic methods; 2) Studies with animal trial; 3) Irrelevant article types such as reviews and case reports; 3) Studies without providing the exact original data; 4) Studies without a control group; 5) Studies using samples less than 10.

Data extraction and quality assessment
Two reviewers extracted the information from all eligible studies independently according to the selection criteria and organized all of the information into Table 1, including principal author, year of publication, country, number of specimens, number of patients, type of RS, diagnostic algorithm, sensitivity, specificity, accuracy, sample type, type of study design (in vivo or in vitro), spectra, and gold standard. Then, we evaluated the quality of all eligible studies in Review Manager 5.3 software (Cochrane Collaboration, Oxford, England) using Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) (12).

Statistical analysis
The summary receiver operating characteristic (SROC) curve model of Lee et al. (13) is used for the meta-analysis. TP, FP, TN, FN, sensitivity, specificity, accuracy, and positive and negative likelihood were calculated directly or indirectly using the correct method (14). Then, the pooled diagnostic odds ratio (DOR), publication bias, and the summary receiver operating AUC is correlated with diagnostic value: 0.5 ≤ AUC < 0.7, 0.7 ≤ AUC < 0.9, and AUC ≥ 0.9, respectively representative of a low, moderate, and high diagnostic value (15). The publication bias was calculated with Deeks' Funnel Plot Asymmetry Test (16). I 2 statistics was used to evaluate the heterogeneity of the studies included, and the random-effects model would be applied if there was heterogeneity between studies (17). For the studies focused on different kinds of samples such as tissues, serum, oral cells, and saliva, we conducted subgroup analysis to address the heterogeneity of the study effects due to sample type. To avoid the inflation of the type 1 error, the false discovery rate (FDR) method was used to correct p-value.

Study characteristics, quality assessment, and publication bias
The QUADAS-2 diagram is shown in Figure 2. Most studies conformed to the criteria in QUADAS-2. Some studies' patient selection and index test items were evaluated as "unclear", and some literature's selection of the samples was not random and double-blinded leading to a high or unclear risk of bias. The Deeks' funnel plot asymmetry test indicated that there was no publication bias (p = 0.82; Figure 3).

Threshold effect and heterogeneity
We used MetaDiSc 1.4 to analyze the diagnostic threshold effect, and the Spearman correlation coefficient was −0.316 (p = 0.187), indicating that there was no threshold effect between the studies.
However, the heterogeneity for sensitivity (Q = 166.12, I 2 = 88.56) and specificity (Q = 514.39, I 2 = 96.31) results was significant by using the Q test and I 2 index. Because the data showed great heterogeneity, multivariate meta-regression was conducted to explore the source of heterogeneity. The countries, diagnostic algorithm, type of study design, type of RS, sample The PRISMA flowchart.

FIGURE 2
Risks of bias assessment for each included study (n = 12). Risk of bias summary (A). Risk of bias graph (B). type, and spectra were used as covariates. Table 2 shows the results of the meta-regression. Type of study design (in vivo or in vitro) and type of RS (SERS, MLRM, and others) were the sources of heterogeneity.
3.4 Pooled diagnostic value of raman spectroscopy in oral squamous cell carcinoma diagnosis A total of 2,051 samples were tested in all of the studies, and 914 patients were included. The sensitivity of the included 12 studies fluctuated between 0.77 and 1. The pooled sensitivity was 0.89 (95% CI, 0.85-0.92), which means that the RS can effectively avoid missed diagnoses. The specificity of the studies ranged from 0.77 to 1 except for two studies using saliva or oral cell as samples. The pooled specificity was 0.84 (95% CI, 0.78-0.89), which means that RS could also avoid misdiagnosis well ( Figure 3). Pooled sensitivity and specificity determined the SROC curve, and the overall AUC was 0.93 (95% CI, 0.91-0.95), indicating a moderately high overall diagnostic value of RS in OSCC (Figure 3). Defining the pretest probability as 0.5, the positive posterior probability (PPP) and negative posterior probability (NPP) were 0.85 and 0.12 ( Figure 3).

Subgroup analysis
Type of study design (in vivo or in vitro) and type of RS (SERS, MLRM, and others) tend to be the sources of heterogeneity that may affect the accuracy of the test, so we conducted a subgroup analysis. All of the pooled diagnostic values of subgroup analysis are shown in Table 3.
In conclusion, in vitro detections showed better sensitivity and specificity but lower AUC than in vivo ones. The performance of SERS and MLRM was relatively close.

Discussion
As previous studies revealed, RS has shown its capability in the diagnosis of various types of cancer such as OSCC, lung cancer, and breast cancer. Investigations on the overall performance of RS in the diagnosis of OSCC were of great importance. However, no large-scale study has been carried out to evaluate the value of RS in the diagnosis of OSCC. Therefore, we conducted this meta-analysis to explore the value of RS in the diagnosis of OSCC.
The pooled sensitivity and specificity for RS in diagnosing OSCC are respectively 0.89 and 0.84. Different studies evaluating the same diagnostic indicator can be expressed in the SROC curve. In this meta-analysis, the AUC was 0.93 (95% CI, 0.91-0.95), indicating a moderately high overall diagnostic value of RS in OSCC. Above all, RS has reliable diagnostic performance to differentiate OSCC from normal oral samples according to the pooled sensitivity, specificity, and AUC. When we defined the pretest probability as 0.5, the PPP and NPP were 0.85 and 0.12, and this result demonstrated that RS has the capability to raise the probability of OSCC diagnosis to 85% when positive and lowering the probability of disease to 12% when negative. Therefore, RS is an effective method for the diagnosis of OSCC. Subgroup analysis showed that the pooled sensitivity and specificity of SERS and MLRM group were both over 0.80 and 0.75. Thus, SERS and MLRM both had good diagnostic performance in OSCC. The specificity for the in vivo group was slightly lower than the in vitro one, which may be explained by the limitation of imaging technology and limited number of studies. However, in vitro detection may not be as convenient for clinical operation because of a complicated pretreatment process, and in vivo realtime detection of OSCC is a highly potential technology and is worthy of researchers to explore further.
RS has the ability to discriminate normal healthy oral tissue from oral disease states with a fiber-optic probe in the clinical setting, but a major limitation of RS is its dependence on visual detection of morphological or structural lesions. This results in the weakness of Raman signals and even not sensitive enough to merit clinical value. In our study, a technology called "surface-enhanced Raman spectroscopy" showed good diagnostic accuracy. By adsorbing molecules onto nanostructured metal surfaces, SERS can enhance Raman signal by more than 1,000 times, making the possibility of detecting even a single molecule (29). It can also overcome the disadvantage of strong autofluorescence background in common RS. SERS is considered a non-invasive method for OSCC diagnosis using saliva or serum as the samples. In addition, the acquisition and pretreatment process of samples are relatively simple. Blood or saliva test-based screening is more practical and cost-saving for mass screening in nations with a great population like China and India (27). The studies included in this meta-analysis using SERS (18,21,27) had a pooled sensitivity and specificity of 0.819 and 0.786, showing little difference with traditional MLRM. Thus, SERS owes great potential in the practical application of RS technology in the clinic and is worthy of further studies and development.
RS not only can be used as a diagnostic tool to differentiate tumors with healthy tissues but also has the potential to distinguish different kinds of oral lesions. Krishna et al. (25) successfully classified the Raman spectra of oral tissue sites into four classes [normal, OSCC, oral submucous fibrosis (OSMF), and oral leukoplakia (OLK)] using the MRDF-SMLR-based diagnostic algorithm. The MRDF-SMLR algorithm showed an accuracy of 85%, 89%, 85%, and 82% in classifying the spectra into the four categories. In the study by Tan et al. (27), using the method of SERS, the PCA-LDA algorithm could classify and diagnose OSCC, mucoepidermoid carcinoma (MEC), and normal groups with a sensitivity and a specificity more than 80%. While the results were encouraging, there are still limitations existing. Firstly, these resulted was not proved by large clinical trials with good design, and still needed to be further studied by subsequent researchers. Secondly, oral disease is various and complicated, and using RS simultaneously identifies the difference between the many kinds of lesions, not just cancer and health, and may result in a greater error rate. Therefore, it still needs to be explored to find how to promote the RS and whether the RS guarantees a high accuracy in the diagnosis of various types and stages of oral lesions.
Automatic diagnosis using RS is inseparable from the application of algorithm. The PCA-LDA is the most common algorithm among them. PCA is a statistical technique that can simplify complicated data by reducing dimensions with the maximum correct rate. However, PCA was not practical for classification cases because of its disability to use any class information extracted from the original data. LDA is a method that can perform a linear transform for feature clusters and change them into forms that can be separated. PCA and LDA are usually combined for dimension reduction and classification of data sets (27). Jeng et al. (23) used two kinds of algorithms, PCA-LDA and PCA-QDA, to test the potential of RS in diagnosing OSCC. The LDA and QDA were used to identify the boundary among different classes. LDA is a good classifier for equal class samples, and QDA performs well in unequal class samples (24). The result revealed that the PCA-QDA model had greater classification efficiency than the PCA-LDA model and could be promoted further. In the study by Cals et al. (20), a two-step PCA-hLDA model was developed. The spectra of adipose tissue and nerve were distinguished from all of the other spectra first, and then the spectra of surface squamous epithelium, CT, muscle, and gland were distinguished from the spectra of OSCC. The PCA-hLDA model showed a better performance than the traditional one-step model. Moreover, RS for recognition and diagnosis based on deep learning is emerging in recent years. Ding et al. (22) created a new classification framework called DSB-ResNet and successfully distinguished tongue squamous cell carcinoma (TSCC) from non-cancerous tissue. Laboratory-level RS can obtain high-quality data, but it may not be suitable for clinical operation because of a complicated pretreatment process. However, deep learning can combine preprocessing, feature extraction, and classification into one architecture to automatically learn the characteristics of data at different levels of abstraction without manual adjustment, greatly simplifying the diagnosis process and achieving higher precision (22). Combined with the DSBResNet framework, a Raman probe device can be used for living samples, and the method proposed in this paper is likely to play a huge potential in the future operation process. However, there are some problems with this study. Firstly, the small amount of data may lead to bias. Secondly, so far, the model had only achieved two classifications, but oral cancer contains many types of cancer, and TSCC is only one of them. Finally, there is still a long way from a potential research on real-time identification to the real clinic. Thus, further detailed biochemical experiments of RS in diagnosis are needed to make the deep learning-based RS technology more practical and smarter.
Although our study suggested that RS showed great potential as a non-invasive, high-accuracy diagnostic tool in diagnosing OSCC, there were still some limitations in the study. Firstly, the gold standard in this research is histopathology, but histopathology can only determine where the sample was taken, thus it is an imperfect gold standard with a corresponding risk of bias. It will be better to combine histopathology with other standards such as follow-up in the study to assess diagnostic accuracy in clinical practice. However, histopathology is still the most effective diagnostic test in clinical practice (30), and it is widely used as the gold standard in our included and other original studies, and only one research included the use of "histopathology + follow up" as its gold standard. So, we still use histopathology as the gold standard in this study, leading to certain limitations. Secondly, diagnosis in the included studies was not made based on predefined criteria or cutoff values, leading to a high risk of overestimation of the diagnostic accuracy by data-driven cutoff selection (31). A characteristic of the Raman spectrum is that the training set is diagnosed by PCA or a similar algorithm to get the difference in the spectrum of normal and cancerous tissues, then the test set can be diagnosed. Therefore, there is no clear diagnostic threshold, and the original study cannot provide the relevant original data. This is an inevitable limitation of the Raman spectrum-related diagnostic meta-analysis and needs further consideration and research in the future. Thirdly, the evidence level of our study might be affected due to a small number of included studies and significant heterogeneity. Although our study included 2,051 samples from 914 patients, the amounts of samples were still insufficient in different groups (such as saliva samples, in vivo studies) due to less research in this field, which brought difficulties to the subgroup analysis, even influencing the accuracy of the results. Finally, not all of the studies mentioned that their samples were chosen randomly and double-blindly, resulting in selection bias in the final conclusions. In conclusion, to further verify the role of RS in OSCC diagnosis and promote its clinical practical application, sufficient sample size and randomized and double-blind original studies are required in the future.

Conclusion
This meta-analysis revealed that RS is a non-invasive diagnostic technology with high specificity and sensitivity for detecting OSCC and has the potential to be applied clinically. Further investigations are also needed to focus on real-time detection using RS with deep learning in vivo. Moreover, sufficient sample size and randomized and double-blind original studies are still required in the future to confirm this conclusion.

Data availability statement
The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding authors.