Diagnosis of Lung Cancer by FTIR Spectroscopy Combined With Raman Spectroscopy Based on Data Fusion and Wavelet Transform

Lung cancer is a fatal tumor threatening human health. It is of great significance to explore a diagnostic method with wide application range, high specificity, and high sensitivity for the detection of lung cancer. In this study, data fusion and wavelet transform were used in combination with Fourier transform infrared (FTIR) spectroscopy and Raman spectroscopy to study the serum samples of patients with lung cancer and healthy people. The Raman spectra of serum samples can provide more biological information than the FTIR spectra of serum samples. After selecting the optimal wavelet parameters for wavelet threshold denoising (WTD) of spectral data, the partial least squares–discriminant analysis (PLS-DA) model showed 93.41% accuracy, 96.08% specificity, and 90% sensitivity for the fusion data processed by WTD in the prediction set. The results showed that the combination of FTIR spectroscopy and Raman spectroscopy based on data fusion and wavelet transform can effectively diagnose patients with lung cancer, and it is expected to be applied to clinical screening and diagnosis in the future.


INTRODUCTION
Lung cancer is a malignant tumor with a high incidence rate and a high mortality rate threatening human health (Sung et al., 2021). Due to the lack of biomarkers in lung cancer, most patients are in the middle and advanced stage at the time of treatment (Stapelfeld et al., 2020). At present, the screening methods of lung cancer mainly include X-ray examination, low-dose computed tomography, and magnetic resonance imaging (MRI), but these technologies have some disadvantages, such as unable to apply to specific populations, high false positive rate, and low sensitivity (Thakur et al., 2020;Xu et al., 2021). Therefore, there is a need to find an early diagnostic method with wide application range, high specificity, and high sensitivity.
Vibrational spectroscopy is an important tool in the field of analytical chemistry and bioanalysis, of which Fourier transform infrared (FTIR) spectroscopy and Raman spectroscopy have been widely used in cancer diagnosis in recent years (Auner et al., 2018;Christensen et al., 2019;Baiz et al., 2020). In our previous work, we studied the serum samples of patients with lung cancer and healthy people using FTIR spectroscopy and found that the concentrations of protein, lipid, and nucleic acid molecules in the serum of patients with lung cancer were higher than those of healthy people (Yang et al., 2021a). Song et al. classified the tissues of healthy people and patients with lung squamous cell carcinoma using Raman spectroscopy combined with principal component analysis-linear discriminant analysis (PCA-LDA) (Song et al., 2020). These reports demonstrate the potential of FTIR spectroscopy and Raman spectroscopy in the diagnosis of lung cancer.
Data fusion has been widely used in the analysis and determination of biological and pharmaceutical components in recent years because of its integration of multiple methods to obtain more effective and comprehensive data (Haware et al., FIGURE 1 | FTIR spectra of serum from patients with lung cancer (A) and healthy people (B). Raman spectra of serum from patients with lung cancer (C) and healthy people (D). (The corresponding average spectra are shown in bold).  Comino et al., 2018;Feng et al., 2020;Zhang et al., 2020;Zhao et al., 2020;Azcarate et al., 2021). There are reports that data fusion was used in FTIR spectroscopy and Raman spectroscopy for the diagnosis of thyroid dysfunction and cervical cancer. Chen et al. studied the blood of patients with thyroid dysfunction and healthy people using FTIR spectroscopy and Raman spectroscopy combined with data fusion and achieved an accuracy of 83.48% . Zhang et al. studied the tissue samples from patients with cervical cancer using Raman spectroscopy and obtained an accuracy of 93.51% using characteristic data after fusion of first and second derivatives . Therefore, data fusion combined with FTIR spectroscopy and Raman spectroscopy has the potential to diagnose various diseases, so it is expected to be applied in the diagnosis of lung cancer. As a powerful signal processing technology, wavelet transform has been widely used in imaging, chromatography, vibration spectroscopy, and so on (Sudarshan et al., 2016;Jiang and Ma, 2020;Wahab and O'Haver, 2020;Sun et al., 2017;Godinho et al., 2014;Martyna et al., 2015;Dinç and Yazan, 2018). There are reports that wavelet transform and data fusion were used in combination with some other techniques to detect prostate cancer and neurocysticercosis. Tiwari et al. fused the data of magnetic resonance (MR), imaging (MRI), and spectroscopy (MRS) using multimodal wavelets (MaWERiC) and found that the MaWERiC had better detection results for prostate cancer than any single data (Tiwari et al., 2012). Chavan et al. proposed a  non-subsampled rotated complex wavelet transform (NSRCxWT) extraction image fusion algorithm based on computed tomography (CT) and MRI features and found that the image quality processed under this algorithm was much better than the original image quality, which was more conducive to the diagnosis of neurocysticercosis (Chavan et al., 2017). However,  Frontiers in Chemistry | www.frontiersin.org January 2022 | Volume 10 | Article 810837 4 the method of combining wavelet transform and data fusion using FTIR spectroscopy and Raman spectroscopy has not been studied in the diagnosis of lung cancer.
In this study, data fusion and wavelet transform were used in combination with FTIR spectroscopy and Raman spectroscopy to make full use of the FTIR and Raman spectral information of serum samples, and then distinguish the serum of patients with lung cancer from that of healthy people. The purpose is to explore a wide applicable and high-accuracy diagnostic method for lung cancer and to lay the foundation for the clinical application of FTIR spectroscopy and Raman spectroscopy in the diagnosis of lung cancer in the future.

Serum Samples
Serum samples from 92 patients with lung cancer and 155 samples from healthy people were obtained from The First People's Hospital of Yunnan Province, and all research content was conducted according to the Declaration of Helsinki.

Spectra Acquisition
FTIR spectra of serum were measured in the range of 4000-600 cm −1 by a Frontier spectrometer (Perkin Elmer) with the same test method as in our previous work (Yang et al., 2021a). Each IR spectrum was an accumulation of 32 scans at a resolution of 4 cm −1 . Serum samples (30 μl) were dropped on clean glass slides to measure Raman spectra using a confocal micro Raman spectrometer (ANDOR SR-500type) in the range of 800-1800 cm −1 through a ×50 objective lens with an excitation wavelength of 532 nm for the laser. The laser power at the serum sample was 12 mW. Each Raman spectrum was scanned for 10 s and accumulated three times.

Wavelet Threshold Denoising
Wavelet transform is the projection of signal on a wavelet base (Rameshnath and Bora, 2019). It gradually refines the signal in multiple-scale through expansion and translation operations and, finally, realizes the time subdivision of high frequency and the frequency subdivision of low frequency so as to focus any details of the signal. In the wavelet domain, the wavelet coefficient of the signal is larger than that of the noise. The basic principle of wavelet threshold denoising is to set an appropriate threshold. The wavelet coefficients larger than the threshold are considered to be generated by the signal and should be preserved. Those smaller than the threshold are considered to be generated by noise and set to zero, thus achieving the purpose of denoising (Donoho and Johnstone, 1995). Wavelet denoising removes noise and maintains the details of the signal using multi-scale and multi-resolution characteristics of wavelet transform. Compared with the lowpass filter based on Fourier transform, wavelet denoising has a better effect (Peng et al., 2021).
The effect of the WTD algorithm on FTIR and Raman spectral data mainly depends on the optimal wavelet function, wavelet decomposition level (DL), and wavelet threshold. Choosing an appropriate wavelet function is helpful to maximize the coefficient value in the wavelet domain. Generally, the appropriate wavelet function is determined by the specific practical requirements . In wavelet decomposition, the choice of DL is also a very important step. The larger the DL is, the more obvious the characteristics of noise and signal are, which is more conducive to the separation of them. Unfortunately, the larger the number of DL is, the greater the distortion of the reconstructed signal is. The selection of threshold is divided into two parts: the selection of threshold function and the selection of threshold. The commonly used threshold functions are mainly soft threshold function and hardness threshold function. The result of the soft threshold function is smoother than the hard threshold, so the soft threshold function was selected (Sanam and Shahnaz, 2013).

Spectral Data Preprocessing
In the process of spectral measurement, there are some inevitable interference factors, such as background disturbance, light scattering, and particle size, which influence the quality of raw spectra and decrease the accuracy of classification models (Chen et al., 2004). Therefore, several different preprocessing methods were used in order to reduce the unnecessary signal variations, such as normalization, Savitzky-Golay (SG) filter, first derivative (FD), second derivative (SD), and standard normal variate (SNV) (Roy, 2015;Everard et al., 2016).

Data Fusion
Date fusion is the process of integrating data from different sources. The main purpose of data fusion is to find more valuable data set, which might improve the accuracy of prediction and present a better interpretation of the results . In this study, matrices of FTIR and Raman spectral data were integrated into a single matrix. The FTIR matrix and Raman matrix were concatenated on the column forming a twodimensional merged data matrix that has the same rows with the analyzed samples.

Partial Least Squares-Discriminant Analysis
PLS-DA is a linear pattern classification method, which is widely used to deal with complicated data by reducing dimension. In this study, all samples were divided into a calibration set (60%) and a prediction set (40%) by the Kennard-Stone algorithm. Samples of patients with lung cancer were coded 1, while those of healthy people were coded 2, and the discriminant threshold of the model was set to 1.5. The performance of the PLS-DA model was evaluated in terms of accuracy, specificity, and sensitivity of calibration (Acc cv , Spe cv , and Sen cv ) and prediction (Acc p , Spe p , and Sen p ) (Yang et al., 2021b). The PLS-DA model, spectral data preprocessing, and WTD algorithm were performed using the MATLAB software (version R2019a, MathWorks).

RESULTS AND DISCUSSION
Fourier Transform Infrared Spectra and Raman Spectra of Serum Figure 1 shows the FTIR spectra and Raman spectra of serum samples. The main peaks and their assignments are listed in Table 1. It can be found from Table 1 that the Raman spectra of serum samples can provide more biological information than the FTIR spectra of serum samples, such as porphyrin, phospholipids, and glucose. It can be seen from Figure 1A and Figure 1B that the IR spectra of serum from patients with lung cancer are extremely similar to that of healthy people. Figure 1C and Figure 1D show the Raman spectra of serum from patients with lung cancer and that of healthy people, respectively. Although some differences between the patients with lung cancer and healthy people can be seen from the original Raman spectra of serum samples, it is also necessary to optimize and process the original FTIR and Raman spectral data to distinguish them.

Model Performances for Spectral Data Processed by Wavelet Threshold Denoising
In order to optimize the FTIR and Raman spectral data to improve the classification effect of the PLS-DA model, ten commonly used wavelet functions, bior2.2, coif1, coif3, db02, db08, fk4, fk8, haar, sym5, and sym8, were tested. Each wavelet function was performed under 1-8 wavelet DLs to study the effect of DL on the denoising effect . At the same time, four threshold acquisition methods, heursure, minimaxi, rigrsure, and sqtwolog, were performed to further improve the denoising performance. The optimal values of wavelet function, DL, and wavelet threshold were determined by calculating the accuracy (Acc cv ), specificity (Spe cv ), and sensitivity (Sen cv ) at 7-fold crossvalidation by the PLS-DA model.

Model Performances for Fourier Transform Infrared
Spectral Data Processed by Wavelet Threshold Denoising Figure 2 shows the Acc cv (mean value + error bar) of the PLS-DA model using FTIR spectral data processed by WTD in four thresholds. It is shown that the combination of different thresholds and wavelet functions has different effects when processing the same FTIR spectral data. Where the combination of heursure and db08 (heursure-db08) has the same and highest Acc cv as the combination of sqtwolog and fk8 (sqtwolog-fk8), but heursure-db08 has higher Sen cv than sqtwolog-fk8 ( Table 2). Figure 4 shows the choice of the best DL, where DL = 6 has the best performance for WTD (heursure-db08) of FITR spectral data. Therefore, heursure-db08 and DL = 6 were selected as the optimal wavelet parameters for WTD of FITR spectral data.

Model Performances for Raman Spectral Data
Processed by Wavelet Threshold Denoising Figure 3 shows the Acc cv (mean value + error bar) of the PLS-DA model using Raman spectral data processed by WTD in four thresholds. It is shown that the combination of minimaxi and bior2.2 (minimaxi-bior2.2) has higher Acc cv than the combination of other thresholds and wavelet functions ( Figure 3D). Figure 4 shows that DL = 6 is the best DL for WTD (minimaxi-bior2.2) of Raman spectral data. Therefore, minimaxi-bior2.2 and DL = 6 were selected as the optimal wavelet parameters for WTD of Raman spectral data.

Comparison of Wavelet Threshold Denoising With Other Preprocessing Methods
After obtaining the optimal wavelet parameters, the spectral data processed by WTD and other preprocessing methods were analyzed with the PLS-DA model. Table 3 and Table 4 show the accuracy, specificity, and sensitivity of FTIR and Raman spectral data in the PLS-DA model, respectively. Compared with the original spectral data and the data processed by other preprocessing methods, the spectral data processed by WTD, especially the Raman spectral data, obtained a good performance in the PLS-DA model.

Data Fusion Combined with Wavelet Threshold Denoising
In order to further improve the performances of the model, data fusion was used to FTIR spectral data combined with Raman spectral data to obtain more data information. Table 5 shows the performances of the PLS-DA model using data fusion combined with different preprocessing methods. It can be seen that the data fusion has an improvement effect on each preprocessed data set. Moreover, the fusion data processed by WTD has the highest accuracy, sensitivity, and specificity in the PLS-DA model. Figure 5 shows the score plot of the PLS-DA model using data fusion combined with WTD. It can be seen that the samples from patients with lung cancer coded 1 are separated from those of healthy people coded 2 at threshold = 1.5. The PLS-DA model shows good results with 93.41% Acc p , 96.08% Spe p , and 90% Sen p for the fusion data processed by WTD. The results show that FTIR spectroscopy combined with Raman spectroscopy based on data fusion and wavelet transform can effectively distinguish the serum samples of patients with lung cancer from those of healthy people.

CONCLUSION
Data fusion and wavelet transform were used in combination with FTIR spectroscopy and Raman spectroscopy to study the serum samples of patients with lung cancer and healthy people. The results showed that the Raman spectra of serum samples can provide more biological information than FTIR spectra of serum samples. WTD filtered the invalid information from the original spectral data, thus improving the performances of the PLS-DA model. The performance of FTIR spectral data processed by WTD in the model had higher accuracy than others. Although the addition of Raman spectral data may increase the information that is not conducive to the diagnosis of the PLS-DA model and then reduce the performance of the fusion data processed by WTD in the model, its combination with FTIR spectral data can provide better biological information. Finally, the PLS-DA model using the fusion data processed by WTD showed good results with 93.41% accuracy, 96.08% specificity, and 90% sensitivity in the prediction set, indicating that FTIR spectroscopy combined with Raman spectroscopy based on data fusion and wavelet transform could effectively distinguish the serum of patients with lung cancer from that of healthy people. In our future work, we will use richer wavelet denoising methods to improve the performance of Raman spectral data in the model, develop new methods that are more conducive to data fusion by assigning different weights to different spectral datasets, and then provide new methods for clinical screening and diagnosis of lung cancer and other diseases.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethics Committee of Yunnan Normal University (Number: ynnuethic2021-13). The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
XY designed the project and completed all the research. ZW performed the calculation and analysis of the model. XY and ZW share first authorship. KQ and WY provided medical instruction. LJ and YS made the charts. XY, ZW, QO, and GL wrote the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by the National Natural Science Foundation of China (Grant No. 31760341) and Zunyi Science and Technology Bureau (ZSKHHZZ-2019-5). The authors are grateful for these financial supports.