Plasma Metabolomics Reveals Diagnostic Biomarkers and Risk Factors for Esophageal Squamous Cell Carcinoma

Esophageal squamous carcinoma (ESCC) has a high morbidity and mortality rate. Identifying risk metabolites associated with its progression is essential for the early prevention and treatment of ESCC. A total of 373 ESCC, 40 esophageal squamous dysplasia (ESD), and 218 healthy controls (HC) subjects were enrolled in this study. Gas chromatography-mass spectrometry (GC/MS) was used to acquire plasma metabolic profiles. Receiver operating characteristic curve (ROC) and adjusted odds ratio (OR) were calculated to evaluate the potential diagnosis and prediction ability markers. The levels of alpha-tocopherol and cysteine were progressively decreased, while the levels of aminomalonic acid were progressively increased during the various stages (from precancerous lesions to advanced-stage) of exacerbation in ESCC patients. Alpha-tocopherol performed well for the differential diagnosis of HC and ESD/ESCC (AUROC>0.90). OR calculations showed that a high level of aminomalonic acid was not only a risk factor for further development of ESD to ESCC (OR>13.0) but also a risk factor for lymphatic metastasis in ESCC patients (OR>3.0). A low level of alpha-tocopherol was a distinguished independent risk factor of ESCC (OR< 0.5). The panel constructed by glycolic acid, oxalic acid, glyceric acid, malate and alpha-tocopherol performed well in distinguishing between ESD/ESCC from HC in the training and validation set (AUROC>0.95). In conclusion, the oxidative stress function was impaired in ESCC patients, and improving the body’s antioxidant function may help reduce the early occurrence of ESCC.


INTRODUCTION
Esophageal cancer (EC) is the seventh most common cancer and the sixth most common cause of cancer death globally, causing about 572 000 new cases and 509 000 deaths worldwide (1). Esophageal squamous cell carcinomas (ESCC) are the most common histological type of EC, accounting for approximately 90% of esophageal cancer cases worldwide (2). China has the highest incidence of ESCC, accounting for approximately 50% of all ESCC cases worldwide (3,4). ESCC has no specific clinical symptoms in its early stages, and most patients are diagnosed at an advanced stage, resulting in a 5-year survival rate of less than 15% (5). Esophageal squamous dysplasia (ESD) is a primary precancerous lesion for ESCC, with a significantly increased risk of developing into ESCC (6). Although endoscopy and histopathological testing can effectively improve the early diagnosis of ESCC (7,8), these two methods are invasive and require trained physicians and expensive equipment, making them challenging to use widely in the early screening of ESCC. Therefore, surveying the metabolic change and associated risk factors occurring during ESCC and establishing suitable non-invasive adjunctive assays development could provide implications for early diagnosis and potential therapeutic strategies.
With its powerful screening and identification of small molecule metabolites, Metabolomics has become a powerful tool to identify metabolic changes in cancer progression and discover non-invasive biomarkers for cancer prediction and diagnosis (9)(10)(11)(12). Currently, based on nuclear magnetic resonance (NMR) (13), gas chromatography-mass spectrometry (GC-MS) (14), liquid chromatography-mass spectrometry (LC-MS) (15), and other metabolomics techniques have been widely used in studies related to ESCC. Many non-invasive auxiliary essays related to ESCC have been established through plasma (16), serum (17) and urine (18). However, these studies have mainly focused on the role of small molecule metabolites in the progression of healthy controls (HC) and ESCC patients. Less attention has been paid to screening metabolic changes and associated risk factors during the progression of ESCC from early to advanced stages.
In this work, a two-phase development strategy (training set and validation set) was applied in 631 subjects, including clinically relevant controls covering the whole progression of ESCC. Based on the GC/MS metabolomics platform, we propose establishing a suitable non-invasive diagnostic approach and screening for risk factors associated with ESCC progression. This work could help discover new biomarkers for risk prediction and early detection of ESCC.

Sample Pretreatment
Plasma samples were processed, extracted, and derived following our previously developed methods (19,20). An aliquot of plasma (50 µL) was added to 200 µl methanol (containing IS, 5.0 µg/mL). The specimens were vigorously extracted for 5.0 min and centrifuged at 20 000×g for 10.0 min at 4°C. A 100.0 mL aliquot of the resulting supernatant was transferred to a GC vial and evaporated to dryness in a Speed-Vac concentrator (Savant Instruments, Farmingdale, NY, USA). 30.0 mL of methoxyamine in pyridine (10.0 mg/mL) was added to each GC vial. Then the solution was vigorously vortexed for 5.0 min. After methoximation reaction for 16.0 hours at room temperature, the samples were trimethylsilylated for another 1.0 hours by adding 30.0 mL of MSTFA with 1% TMCS as the catalyst. At last, 30.0 mL n-heptane with methyl myristate (15.0 µg/mL) as the quality control reference standard was added to each GC vial. The quality control samples (QC) were pooled with small aliquots of plasma samples in the study set and mixed.

GC/MS Analysis, Instrumental Setting, and Parameters
To diminish the opportunity for systematic variation, all the samples were randomly selected for analysis by GC/MS. A 0.5 mL portion of the derived samples was injected into Shimadzu GC/ MS QP2010Ultra/SE (Kyoto, Japan). It is equipped with a 30 m × 0.25 mm ID, fused silica capillary column, which was chemically bonded with 0.25 m DB1-MS stationary phase (J&W Scientific, Folsom, CA, USA).
The column temperature was initially kept at 80°C for 3.0 min, then increased from 80 to 300°C at 20°C/min, where it was held for 5.0 min. The transfer line temperature was set at 220°C and the ion source temperature at 200°C. Ions were generated by a 70-eV electron beam at a current of 3.2 mA. The mass spectra were acquired over the mass range of 50-700 m/z at a rate of 25 spectra/s after a solvent delay of 160 s.

Statistical Analysis
After normalization against the IS, all the semiquantitative data were log 10 -transformed. The transformed data were imported into SIMCA-P 14.1 software (Umetrics, Umea, Sweden) and preprocessed for multivariate statistical analysis using unit variance scaling (UV). Principal component analysis (PCA) and orthogonal projections to latent structures discriminant analysis (OPLS-DA) models were built and plotted to show the clustering or separation of samples from different groups. The goodness of fit for the OPLS-DA models was evaluated using three quantitative parameters: R 2 X, R 2 Y and Q 2 . R 2 X and R 2 Y are the explained variations, and Q 2 is the predicted variation, with a higher level of R 2 Y and Q 2 Y indicating the model's better fit and predictive performance (21). To avoid the classification obtained by supervised learning methods being chance and to test whether the model reproduces well and whether the data in the model are over-fitted, the validity of the built model was examined by 7fold cross-check and replacement test (200 times, crossvalidation). The intercept of the R 2 and Q 2 regression lines to the axes was used to measure overfitting, and the model was valid when the intercept of Q 2 was negative (22).
To determine the difference between groups, the independent-samples t-test and the Mann-Whitney U test were applied for normally and non-normally distributed data, respectively. The diagnostic performance of each metabolite was evaluated by the receiver operating characteristic (ROC) curve. The Youden index was the best threshold to select the optimal cut-point that maximized its value (23).
Metabolite variability analysis, logistic regression analysis, ROC curve analysis and (adjusted) OR calculations were performed using SPSS 26.0 (SPSS Inc., Chicago, IL, USA), bar graphs were produced using GraphPad Prism 8.0, and heatmap and pathway analysis were performed using the online software MetaboAnalyst (https://www.metaboanalyst.ca/).

Patients and Healthy Controls
Samples for this study were collected at the First Affiliated Hospital of Nanjing Medical University, and the sampling period was from June 2019 to June 2021. Blood samples were collected before 8:30 am after overnight fasting to eliminate the disturbance of diet, and samples were kept under 4°C temperature before being stored at -80°C within 6 hours after plasma isolation (24). A total of 631 subjects were included in this study, including 218 healthy controls (HC), 373 with esophageal squamous cell carcinoma (ESCC) and 40 with esophageal squamous dysplasia (ESD). The distribution of subjects is shown in Table 1. We set stage 0 and stage I as early-stage, stage II and stage III as intermediate-stage, and stage IV as advanced-stage, taking into account the progression of ESCC and cTNM staging.
Subjects included in this study were free of metabolic abnormalities such as hypoproteinemia, weight loss, and negative nitrogen balance. The ethics committee of the First Affiliated Hospital of Nanjing Medical University approved this study, and informed consent was obtained from all subjects. The study flowchart is shown in Figure 1A.

Clustering Analysis
Pooled QC samples were clustered well in the PCA score plots ( Figure 1B), indicating stable instrument operation and good reproducibility of the assay throughout the experiment. The supervised OPLS-DA models revealed that the samples in the HC and ESD/ESCC groups were closely clustered together, with fewer overlapping areas between the groups ( Figures 1C, D), indicating significant metabolic differences between the HC and ESD/ESCC groups. At the same time, the parameters of the two OPLS-DA models mentioned above were R 2 X=0.296, R 2 Y=0.789, Q 2 = 0.635 and R 2 X=0.375, R 2 Y=0.808, Q 2 = 0.774, respectively, indicating that the models had good fit and prediction accuracy. There was a partial overlap region between the ESD and ESCC groups ( Figure 1E), indicating some similarity of metabolic phenotypes between the ESD and ESCC groups (R 2 X=0.338, R 2 Y=0.234, Q 2 = 0.435). The permutation test results showed that the intercept of Q 2 was negative in all groups (Supplementary Figures 1A-C), indicating that our OPLS-DA models were not over-fitted and the models were valid. These results indicated significant differences in metabolic patterns between the HC and ESD groups or the HC and ESCC groups. At the same time, there were some similarities in the metabolic changes between the ESD and ESCC groups.

Metabolic Difference Analysis
GC/MS analysis of the plasma samples aligned the metabolites in typical chromatograms (Supplementary Figure 1). Deconvolution of the GC/MS chromatograms produced 135 independent peaks from the plasma samples, 57 of which were authentically identified as metabolites (Supplementary Table 1). Quantitative data were acquired for each metabolite in the plasma samples of the HC, ESD and ESCC cases.
There were 35, 46, and 9 differential metabolites among HC, ESD, and ESCC groups ( Table 2), and 3, 6, and 4 differential  Table 3). Changes in the levels of three metabolites, alpha-tocopherol, aminomalonic acid and cysteine, correlated with the continuous progression of disease in ESCC patients. The levels of alpha-tocopherol and cysteine gradually decreased and the levels of aminomalonic acid gradually increased as the disease progressed in ESCC patients (Figures 2A, B). These findings indicate that the above metabolites are involved in the development of ESCC (from precancerous lesions to advanced-stage).   Tables 2-4 and Figures 2C, D) that alpha-tocopherol performed well for the differentiation of HC and ESD/ESCC (AUROC>0.90). This suggests that alpha-tocopherol may be a diagnostic biomarker for the differentiation of HC and ESD/ESCC. However, for the differentiation of ESD and ESCC, each metabolite performed poorly (AUROC<0.72).

ROC analysis showed (Supplementary
Metabolic pathway analysis ( Figures 2E, F) showed that the HC and ESD groups were affected mainly by amino acid metabolism (urea cycle, glutamate metabolism, arginine and proline metabolism, etc.) and energy metabolism (citric acid cycle and Warburg effect). Metabolic pathways such as purine metabolism, alanine metabolism and carnitine synthesis may be further affected as ESD progresses to ESCC.

Risk Metabolites Associated With ESCC
To assess the role of the above metabolites as risk factors for predicting ESD/ESCC occurrence, OR values were calculated. The data were log10 transformed and expressed as mean ± SD. "/" represents the statistical significance of p-values more than 0.05.
Plasma glyceric acid, oxalic acid, hexadecanoic acid and 4hydroxybutanoic acid were all had ORs > 1 (ESD vs. HC) ( Table 4). Moreover, these metabolites were significantly higher in the ESD and ESCC groups than in the HC group ( Table 2). Also, creatinine and aminomalonic acid had ORs > 1 when ESCC vs. ESD ( Table 4). These two substances were significantly increased in ESCC relative to ESD. In particular, aminomalonic acid increased with the progression of the ESCC condition. These results suggested that higher glyceric acid, oxalic acid, hexadecanoic acid, and 4-hydroxybutanoic acid plasma levels increase HC's risk of being diagnosed as ESD.
And higher creatinine and aminomalonic acid plasma level increase the risk of ESD being diagnosed as ESCC. Meanwhile, plasma alpha-tocopherol was significantly inversely associated  with the risk of ESD and ESCC after adjusting for age and sex (OR<1) ( Table 4). Lower plasma concentrations of cysteine were associated with a significantly increased risk of ESCC Relative to the group without lymphatic metastases, there were five differential metabolites in ESCC patients with lymphatic metastases, with decreased succinate and glyceric acid levels and increased aminomalonic acid, pyrophosphoric acid, and uric acid (Supplementary Table 5). Aminomalonic acid, pyrophosphoric acid, and uric acid had ORs > 1 and may be risk factors for developing lymphatic metastases in patients with ESCC ( Table 4).

Predictive Modeling
To construct effective diagnostic models, we applied logistic regression analysis using the data from the training set. First, binary logistic regression analysis and an optimized algorithm of the stepwise forward method (Wald) method were applied to establish the best model using the above differential metabolites. Eventually, the combination of five metabolites was defined as the ideal biomarker panel to discriminate ESCC and ESD from HC. These five metabolites are glycolic acid, oxalic acid, glyceric acid, malate and alpha-tocopherol.

DISCUSSION
This study found that ESD and ESCC have similar metabolic phenotypes. From a metabolomic perspective, we suggested that ESD may be an early manifestation of ESCC, and prevention of ESD may be beneficial in preventing the development of ESCC. Meanwhile, as the disease of ESCC patients continued to worsen, their plasma levels of oxidative stress-related metabolites (alphatocopherol, cysteine, and aminomalonic acid) continued to change abnormally in ESCC patients at different stages. The development of esophageal cancer is associated with abnormal levels of oxidative stress. The use of antioxidants and regulating oxidative stress levels in the body may help prevent and control early-stage esophageal cancer (25)(26)(27).
Traditionally, alpha-tocopherol is considered the most active form of vitamin E in humans and is a powerful biological antioxidant. In the present study, lower plasma concentrations of alpha-tocopherol were associated with a significantly increased risk of ESCC. Previous large-scale intervention studies have shown that alpha-tocopherol deficiency is associated with the development of ESCC (28). Hui Yang et al. found that supplementation with alphatocopherol may prevent ESCC by modulating the PPAR g-Akt signaling pathway and attenuating NF-kB activation and CXCR3mediated inflammation without effect in the late stage of ESCC carcinogenesis (29,30). Therefore, we believe that early supplementation with alpha-tocopherol may have a preventive effect on ESCC.
Cysteine plays an essential role in the metabolic rewiring of cancer cells, participating in glutathione synthesis, contributing to oxidative stress control; acting as a substrate for hydrogen sulfide production (H 2 S), stimulating cellular bioenergy; and as a carbon source for biomass and energy production. Gwen Murphy et al. found that higher serum concentrations of cysteine were associated with a significantly reduced risk of oesophageal squamous cell carcinomas (31). Moreover, the level of cysteine in tumor tissue of ESCC patients was significantly higher than that in adjacent tissue (32). Therefore, we hypothesize that tumor tissues of ESCC patients may increase the uptake of plasma cysteine to maintain oxidative stress homeostasis and meet bioenergy requirements in tumors.
We found that aminomalonic acid levels increased at various exacerbation stages in ESCC patients, which may be a risk factor for ESCC. However, aminomalonic acid has never been suggested to play a role in esophageal diseases. Previously, several studies have found that altered aminomalonic acid levels in blood were associated with colorectal cancer, abdominal aortic aneurysm and type 2 diabetes (33)(34)(35). Moreover, aminomalonic acid is closely associated with oxidative damage biomarkers (8-isopropanedioic acid and 8-OHdG), and its origin may be related to free radicalmediated protein oxidation (36). Therefore, the elevated aminomalonic acid levels in ESCC patients may be associated with impaired function, including esophageal, gastrointestinal and hepatic functions due to long-term poor diet. Although alpha-tocopherol showed its potential in distinguishing HC from ESD (AUC= 0.92, sensitivity = 0%, specificity = 100%) and ESCC (AUC= 0.91, sensitivity = 0%, specificity = 100%). Its sensitivity was poor. To improve the diagnostic performance of alpha-tocopherol, we used a combined biosignature of glycolic acid, oxalic acid, glyceric acid, malate and alpha-tocopherol. This combination greatly improved the ability to differentiate between HC and ESD/ ESCC (AUC>0.95) and had good sensitivity and specificity. Unfortunately, we did not find a good combination of metabolites and metabolites to distinguish ESD from ESCC in this study.

Limitations
We collected patients with ESCC and performed a comprehensive analysis of their metabolic phenotypes and metabolic characteristics, but there are still some limitations. The major limitation of the present study is that it is a singlecenter study, and it is unclear whether the findings apply to other regions and populations. Although many metabolic changes associated with ESCC were identified in this study, further mechanistic studies are lacking. In the future, we will combine multiple centers, expand the sample size to validate our experimental results, and conduct mechanistic studies on the metabolic characteristics of ESCC.

CONCLUSION
The development of ESCC is accompanied by persistent abnormal changes in oxidative stress in patients. Improving

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The ethics committee of the First Affiliated Hospital of Nanjing Medical University. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
WZ and JA were responsible for the concept of the study. GW and JA provided the GC/MS platform. WW collected the blood samples, recorded the medical history of the volunteers, and prepared the plasma samples. MY performed the untargeted metabolomics. MY and XY analyzed the data. MY and WW wrote the manuscript. All authors contributed to the article and approved the submitted version.