Protein Glycopatterns in Bronchoalveolar Lavage Fluid as Novel Potential Biomarkers for Diagnosis of Lung Cancer

Lung cancer is one of the most prevalent and life-threatening neoplasias worldwide due to the deficiency of ideal diagnostic biomarkers. Although aberrant glycosylation has been observed in human serum and tissue, little is known about the alterations in bronchoalveolar lavage fluid (BALF) that are extremely associated with lung cancer. In this study, our aim was to systematically investigate and assess the alterations of protein glycopatterns in BALF and possibility as biomarkers for diagnosis of lung cancer. Here, lectin microarrays and blotting analysis were utilized to detect the differential expression of BALF glycoproteins from patients with 80 adenocarcinomas (ADC), 77 squamous carcinomas (SCC), 51 small cell lung cancer (SCLC), and 73 benign pulmonary diseases (BPD). These 281 specimens were then randomly divided into a training cohort and validation cohort for constructing and verifying the diagnostic models based on the glycopattern abundances. Moreover, an independent test was performed with 120 newly collected BALF samples enrolled in the double-blind cohort to further assess the clinical application potential of the diagnostic models. According to the results, there were 15 (e.g., PHA-E, EEL, and BPL) and 14 lectins (e.g., PTL-II, LCA, and SJA) that individually showed significant variations in different types and stages of lung cancer compared to BPD. Notably, the diagnostic models achieved better discriminate power in the validation cohort and exhibited high accuracies of 0.917, 0.864, 0.712, 0.671, and 0.781 in the double-blind cohort for the diagnosis of lung cancer, early stage lung cancer, ADC, SCC, and SCLC, respectively. Taken together, the present study revealed that the abnormally altered protein glycopatterns in BALF are expected to be novel potential biomarkers for the identification and early diagnosis of lung cancer, which will contribute to explain the mechanism of the development of lung cancer from the perspective of glycobiology.


INTRODUCTION
Lung cancer is the most frequent malignancy, with the highest incidence and mortality in both sexes worldwide, accounting for approximately 2.1 million new cases (11.6% of all tumors) diagnosed and 1.8 million deaths (18.4% of all tumors) per year globally, which seriously endangers human health (1). According to the different pathological characteristics, lung cancer is classified into small cell lung cancer (SCLC) and non-SCLC (NSCLC) clinically (2). The latter can be further subdivided into three main histological subtypes of adenocarcinoma (ADC), squamous cell carcinoma (SCC), and large cell carcinoma, and rarer variants such as mixed or undifferentiated pulmonary carcinomas (2). The major cause of this poor prognosis is primarily related to diagnosis at an advanced stage (stage III or IV) for the majority of patients with lung cancer and their therapeutic limitations that miss the optimal treatment opportunity (3). Therefore, early diagnosis is pivotal, and it is estimated that 36%-73% of patients will survive longer than 5 years if they are identified at an early stage (stage I or II) (4,5).
To improve the early detection and outcome of lung cancer patients, several systematic randomized clinical trials have recommended using high-sensitivity imaging technology such as low-dose computed tomography (LDCT) as a screening tool for monitoring high-risk individuals (6). Although LDCT scan can reduce mortality, its clinical utility is restricted due to the high false positive rate with multiple screenings and unnecessary radiation exposure (7). Furthermore, substantial progress has been made in our understanding of tumor biological processes and advancement in treatment strategies, which has led to the development of targeted therapy for lung cancer with significant improvement in the overall progression-free survival rate (8). Despite this array of new targeted immunotherapy treatments, a cure remains elusive for the majority of patients because of the inevitable drug resistance (9). Consequently, to overcome the obstacles and complement current diagnosis and screening methods, the discovery of potential protein biomarkers with high sensitivity and specificity for early detection, therapy guidance as well as prognosis monitoring is an urgent priority. To date, numerous studies have focused on searching for indicators in blood, yet the clinical application of traditional serum-based biomarkers is still far from satisfactory owing to its low diagnostic efficacy (10)(11)(12). Currently, protein biomarker detection using body fluids such as urine, saliva, exhaled breath condensate, and pleural effusion has emerged as a promising modality for cancer diagnosis and monitoring of disease progression mainly because of its minimal invasiveness and easy accessibility (13)(14)(15)(16). Bronchoalveolar lavage fluid (BALF), a type of proximal biofluid routinely obtained from the segmental bronchus of interest during flexible fiberoptic bronchoscopy in individuals with suspected pulmonary disease, has also been considered as a useful, safe and minimally invasive biological specimen for lung cancer biomarker discovery (17). By utilizing this approach, airway components can be recovered from a large area of lung parenchyma (18). This is particularly important for preinvasive and early cancer research, as these primary lesions may have no visible histological changes under bronchoscopy or cannot be reached by the biopsy needles, which makes BALF potentially useful in the clinical for early diagnosis of lung cancer (19). Interestingly, the fraction of BALF that is not required for standard pathological procedures could be conveniently used for proteomics analysis and lung cancer biomarker detection. BALF has its merit in that it provides varied information including immunologic, inflammatory and infectious processes and can directly reflect the true physiological or pathological status of the patients (17). Detection of proteins in BALF from patients with lung cancer can provide direct information on exposure within the lung.
Glycosylation is one of the most critical and heterogeneous posttranslational modifications during protein biosynthesis, and it is an enzyme-directed site-specific process, which is also critical for a wide range of biological processes, including microbial infection, cell differentiation, tumor metastasis, as well as cell carcinogenesis (20,21). To our knowledge, more than 50% of cellular proteins, including most secreted proteins, cell surface proteins and intracellular proteins, are glycoproteins modified by different types of glycan structures that closely reflect the physiological status of the cell (22). Hence, research efforts concentrating on the effect of disease state on the glycan biosynthesis may be more direct and evident than that of cancer-related protein alterations, which also contributes to the diagnosis and understanding of disease (23). It is now well manifested that the formation of abnormal glycosylation is a key feature of malignant transformation of tumor cells (24,25). Altered protein glycosylation often occurs early in tumor development, and the expressions of certain tumor-associated glycans in precursor lesions of different types of cancer have become powerful early diagnostic markers (23). Aberrant glycosylation has been observed during the development and progression of lung cancer, including changes in expression, fucosylation, N-glycan branching types, and increased sialylation on proteins or glycolipids (26)(27)(28). By using systemic glycomics strategies, we can further examine disease-related changes in glycoproteins. Researchers detected the differential glycopatterns of lung cancer tissue and nonmalignant tissue at the level of individual glycan structures by nLC-chip-TOF-MS (29). In addition, the relevant study also developed a serum mass profilebased signature to identify patients with early stage of lung cancer, which revealed that several components with abundances could distinguish patients with early-stage lung cancer from healthy highrisk smokers (30)(31)(32). In the past two decades, the use of lectins is one of the main methods to study glycosylation (33). A lectin microarray is composed of a group of lectins with unique glycanbinding properties printed on a solid support. These lectins are immobilized in a high-density matrix and exhibit a multivalent display (34). Currently, with the emergence of high-throughput glycomic techniques, lectin microarrays are capable of quantitative analysis of N-and O-linked glycans simultaneously based on subtle differences with minimal sample preparation and have become a primary and valuable approach for investigating the glycosylation of original intact samples without the need for glycan release, separation or purification (34,35). Furthermore, analysis of various types of biological specimens, such as cells, tissues, and body fluids, by lectin microarrays has been developed in different diseases (36)(37)(38)(39)(40)(41). For instance, Hirao et al. (42) performed lectin microarray analysis on lung cancer tissue and cell lines and found AAL, HHL, and ConA as lectin probes specific to NSCLC.
In this exploratory study, to investigate lectin-specific glycosylation changes in BALF associated with lung cancer, lectin microarrays were applied to compare different or similar alterations in glycopatterns between benign pulmonary disease (BPD) and lung cancer with different types (including ADC, SCC, and SCLC) as well as different stages [including early stage lung cancer (LC-ES) and advanced stage lung cancer (LC-AS)]. In addition, we also assessed the possibility of aberrant glycopatterns in BALF as novel potential biomarkers for the identification and early diagnosis of lung cancer.

Ethics Statements
The collection and use of all human BALF samples for research presented here were approved by the Ethical Committee of the First Affiliated Hospital of Xi'an Jiao Tong University in Xi'an, China. Written informed consent was received from patients for the collection of their BALF. The study methodologies were conducted in accordance with the ethical guidelines of the Declaration of Helsinki.
Training Cohort and Validation Cohort BALF samples were obtained from patients who were undergoing fiberoptic bronchoscopy examination at the First Affiliated Hospital of Xi'an Jiaotong University from July 2018 to March 2019. A total of 208 diagnosed lung cancer (80 ADC, 77 SCC, and 51 SCLC), 73 clinical controls with detected BPD but only with non-malignant lung disease consisting of pneumonia, tuberculosis and bronchiectasis, as confirmed by biopsy, were selected for the current study. All 281 subjects were randomly divided into a training cohort (n=163) and a validation cohort (n=118) for the construction and verification of the diagnostic models. Patients were studied in terms of their baseline clinicopathological characteristics and are presented in Table 1. The enrolled patients were newly diagnosed with the disease by histopathology, and those who had taken any treatment, such as preoperative radiotherapy, chemotherapy, chemoradiotherapy or curative, were excluded. BALF samples were collected by instillation and aspiration of 10 to 20 ml of sterile saline (0.9%) in the appropriate bronchopulmonary segment during fiberoptic bronchoscopy. After extraction from the respiratory airways, the majority of BALF samples appeared to be clear, and any samples with a slightly reddish appearance due to blood contamination were excluded. Approximately 10 ml of collected BALF was immediately placed on ice and thereafter Protease Cocktail Inhibitor added at a concentration of 1 ml/ml BALF to minimize protein degradation. The total volume was then centrifuged at 4,000 rpm and 4°C for 20 min to remove the cellular fraction and macromolecular insoluble materials. The supernatant was collected and then concentrated using 4 ml 3 kDa Amicon centrifugal filters. After the protein concentration was determined by the BCA assay (Beyotime Institute of Biotechnology, China), the resultant BALF was aliquoted into 1.5 ml cryotubes and stored at -80°C until the consecutive analysis.

Double-Blind Cohort
To properly confirm the availability of the diagnostic models established in the training cohort, another independent cohort of 120 BALF specimens was collected between April 2019 and July 2019 at the same hospital using similar selection criteria and sample processing strategies described above to serve as the double-blind cohort for this study. The results were compared with the clinical final diagnosis to evaluate the diagnostic value for lung cancer. A summary of the patient's clinical characteristics of each group is also provided in Table 1.

Fluorescent Labeling of BALF Proteins
First, 100 mg of BALF proteins were labeled with Cy3 fluorescent dye (GE Healthcare, Buckinghamshire, UK). Next, labelled proteins were separated from the excess free dye by Sephadex G-25 columns (GE Healthcare) according to the manufacturer's instructions. Finally, the purified Cy3-labeled BALF proteins were quantified and stored at -20°C in the dark until processing.

Lectin Microarray
The Cy3-labeled proteins were incubated in a lectin microarray to detect different glycoproteins among clinical samples. A lectin microarray was produced using 37 lectins (Vector Laboratories, Sigma-Aldrich, and Calbiochem) with different binding preferences covering N-and O-linked glycans that were spotted on homemade epoxysilane-coated slides with Stealth micro spotting pins (SMP-10B) (TeleChem,USA) using a Capital smart microarrayer (CapitalBio Beijing, China). The specifically recognized glycan structures by lectin are summarized in the Supplementary Material, Table S1. The concentration of each lectin was 1 mg/ml in a buffer recommended by the manufacturer containing 1 mM of the appropriate monosaccharide. As shown in Figure 1A, each lectin was spotted in triplicate per block with quadruplicate blocks on one slide. BSA and BSA conjugated with Cy3 were used as negative and positive controls to verify the feasibility of the lectin microarray. The slides were placed in a humidity-controlled incubator at 50% humidity overnight to immobilize the lectins. After immobilization, the slides were blocked with blocking buffer containing 2% BSA in 1×PBS (0.01 mol/L phosphate buffer containing 0.15 mol/L NaCl, pH 7.4) for 1 h and rinsed twice with 1× PBS. Then, 6 mg of Cy3-labeled BALF proteins diluted in 120 ml of hybridization buffer (2%, w/v, BSA, 500 mM glycine and 0.1%, v/v, Tween-20 in PBS, pH 7.4) was incubated on the blocked slide within the chamber for 3 h at room temperature in the dark. After incubation, the microarray was rinsed twice with 1× PBST (0.2%, v/v, Tween 20 in 1× PBS, pH 7.4) for 5 min each, followed by a final rinse in 1× PBS and dried via centrifugation at 600 rpm for 5 min. The microarrays were scanned at 70% photomultiplier tube and 100% laser power settings using a Genepix 4000B confocal scanner. The acquired images were analyzed at 532 nm for Cy3 detection by Genepix 3.0 software.

SDS-PAGE and Lectin Blotting Analysis
Glycosylation alterations detected by lectin microarrays between BPD controls and patients with different stages of lung cancer were further verified by SDS-PAGE and lectin blotting. To normalize the differences between subjects and to tolerate individual variation, 100 ml of each sample from BPD, LC-ES, and LC-AS were pooled. The pooled BALF proteins of each subject were analyzed by SDS-PAGE, and lectin blotting. For SDS-PAGE, samples were mixed with 5 × loading buffer, boiled for 5 min at 100°C and then electrophoresed on a 3% polyacrylamide stacking gel and a 10% resolving gel. After electrophoresis, some gels were stained directly with silver nitrate.
For lectin blotting, the proteins in gels were then transferred onto a polyvinylidene difluoride membrane (Immobilon-P; Millipore Corp. Bedford, MA, U.S.A.) with a wet transfer unit (Hoefer Scientific) for 1.5 h at 300 mA. After transfer, the membranes were washed four times with TBS (150 mM NaCl, 10 mM Tris-HCl, 0.05% v/v Tween20, pH 7.5) and then blocked for 1 h with Carbo-Free Blocking Solution (Vector, Burlingame, CA) at room temperature. The membranes were then washed again and incubated with Cy5 (GE Healthcare, Buckinghamshire, UK)-labeled RCA120, AAL, LCA, and WFA (2 mg/ml in Carbo-Free Blocking Solution) with gentle shaking overnight at 4°C in the dark. The membranes were then washed twice each for 10 min with TTBS and scanned by a red fluorescence channel (635 nm excitation/650 LP emission) with the voltage of 800 PMT using a phosphor imager (Storm 840, Molecular Dynamics).

Statistical Analysis
In order to minimize possible systematic variation, the median normalization method for the original lectin microarrays data was as follows. The net fluorescence intensity value of each spot was calculated by subtracting the average background value, and the values that were less than the average background ±2 standard deviations (SD) were removed from each data point. The median of the effective data points for each lectin was globally normalized to the sum of the medians of all effective data points for each lectin in a block, and we named these the normalized fluorescent intensities (NFIs). The NFI data were further analyzed by Expander 8.0 (http://acgt.cs.tau.ac.il/ expander/) to perform an unsupervised average hierarchical cluster analysis (HCA).
Statistical differences between two arbitrary data sets or multiple data sets were first evaluated using a Kruskal-Wallis test, followed by a Dunn's Multiple Comparison Test to correct for multiple comparisons through GraphPad Prism 8.0 software (GraphPad, La Jolla, CA, USA), and values of *p < 0.05, ** p < 0.01 or *** p < 0.001 were considered statistically significant. Five diagnostic models including Model LC, Model ADC, Model SCC, Model SCLC, and Model LC-ES were constructed according to the glycopattern abundances based on a forward binary stepwise logistic regression analysis using SPSS version 22.0. The discriminatory performances of candidate lectins and diagnostic models were measured using the area under the curve (AUC) on receiver operating characteristic (ROC) curve analysis by Origin 8.0 software.

Alterations in BALF Glycopatterns Among BPD, ADC, SCC, and SCLC Detected by Lectin Microarrays
To identify the abnormal glycopatterns associated with lung cancer, all samples from BPD, ADC, SCC, and SCLC were separately detected using lectin microarrays. The layout of the lectin microarrays and Cy3-labeled BALF glycoproteins from four subjects bound to the lectin microarrays are shown in Figures 1A, B. The generated data from three biological replicates of each sample were imported into Expander 8.0 software and analyzed by HCA to achieve the hierarchical relationship based on the similarities and differences among all glycopattern abundances. As shown in Figure 1C, the normalized data from 281 samples were distributed in the heat map. The expression levels of BALF glycoproteins among BPD, ADC, SCC, and SCLC showed obvious differences through different colours. The NFIs of each candidate lectin were further represented in a box plot by the Kruskal-Wallis test to show the variable expression levels of BALF glycopatterns. In total, 15 lectins revealed significant alterations in glycan expression among BPD, ADC, SCC, and SCLC ( Figure 2). As shown in Figure 2A, the results showed that the Siaa2-3Galb1-4Glc(NAc)/Glc binder MAL-II, the Galb-1,4GlcNAc (type II), the Galb1-3GlcNAc (type I) binders RCA120 and ECA, as well as the High-Mannose binders HHL, exhibited significantly decreased NFIs in all patients with lung cancer compared with BPD (all p < 0.001). In contrast, the aGalNAc binders GSL-I and DBA, the b-D-GlcNAc and (GlcNAcb1-4)n binder DSA, and the Fuca1-6 GlcNAc(core fucose) binders AAL exhibited significantly increased NFIs in all patients with lung cancer compared with BPD (all p < 0.001). However, there was no significant difference among the three subtypes of lung cancer. Meanwhile, the lectins revealed significant differences among ADC, SCC, and SCLC, as shown in Figure 2B. The bisecting GlcNAc binders PHA-E, the Gala1-3(Fuca1-2)Gal (blood group B antigen) binders EEL, and the Galb1-3GalNAc binders BPL were associated with decreased NFIs in ADC compared with BPD and SCC (all p < 0.05). In contrast, the GlcNAc binders GSL-II, the GalNAca-Ser/Thr(Tn) binders VVA, and the Galb1-3GalNAca-Ser/Thr(T) binder PNA showed significantly increased NFIs in patients with ADC compared with BPD and SCC (all p < 0.05). Notably, a decrease in the NFIs of PNA was observed in SCC compared with ADC and SCLC (all p < 0.05). Also, the Galb1-3GalNAc binders MPL showed a decreased in NFIs in patients with SCC compared with SCLC (p < 0.001).

Alterations in BALF Glycopatterns During the Development and Progression of Lung Cancer
In our study, we hoped to identify which glycans emerged and how the glycans were differentially expressed in BALF during the development of lung cancer. Therefore, 162 patients with a definite clinical stage of 208 lung cancer subjects were further divided into early stage (including stage I/II NSCLC and limited stage SCLC) lung cancer (LC-ES) and advanced stage (including stage III/IV NSCLC and extensive stage SCLC) lung cancer (LC-AS). Similarly, analyses of aberrant glycosylation among BPD, LC-ES, and LC-AS were performed by lectin microarrays. The fluorescent images are displayed in Figure 3A. The generated data from three biological replicates of each sample were executed by HCA using Expander 8.0 software to achieve the hierarchical relationship based on the similarities and differences among all glycopattern abundances ( Figure 3B). The NFIs of each candidate lectin were further represented in a scatter plot. In this differential analysis, 14 lectins revealed significant alterations in BALF glycopatterns among BPD, LC-ES, and LC-AS. As shown in Figure 4A, the NFIs of RCA120, MAL-II, EEL, and PHA-E in LC-ES and LC-AS were significantly lower than that of BPD (all p < 0.001). In contrast, both DBA and AAL exhibited significantly increased in all the stages of lung cancer compared with BPD (all p < 0.001). However, the NFIs of these lectins were not significantly different between LC-ES and LC-AS. As shown in Figure 4B, the Gal binders PTL-II, the a-D-Man binders LCA, and the aGalNAc binders SJA as well as the termination in GalNAca/b1-3/6Gal binders WFA exhibited a decreased in NFIs in patients with LC-ES compared with BPD and LC-AS (all p < 0.05). In contrast, an increase in the NFIs of the WGA was observed in patients with LC-ES compared with BPD and LC-AS (all p < 0.01). Also, the VVA and the branched (LacNAc)n binders PWM showed significantly decreased or increased NFIs in patients with LC-AS compared with BPD and LC-ES (all p < 0.01). In addition, with the development of lung cancer, the NFIs of the aGalNAc binders GSL-I showed a gradual increasing trend from BPD to LC-AS (all p < 0.05).

Validation of the Differential Expression Levels in the BALF Glycopatterns During the Development and Progression of Lung Cancer by SDS-PAGE and Lectin Blotting Analysis
To further validate the different abundances of glycoproteins in BALF from BPD, LC-ES, and LC-AS subjects, SDS-PAGE and lectin blotting analysis were performed with silver staining, Cy5labeled RCA120, AAL, LCA, and WFA staining, respectively.
with MWs of approximately 60 to 42 kDa in LC-ES and LC-AS than that in BPD subjects. Notably, WFA staining displayed weaker binding to three apparent bands (red arrows) with MWs of approximately 50 kDa, 25 kDa, and 15 kDa in LC-ES than that in BPD and LC-AS subjects. These results were generally consistent with the results from the lectin microarrays.

Construction of the Diagnostic Models in the Training Cohort Based on BALF Glycopattern Abundances
The BALF glycopatterns of BPD, ADC, SCC, SCLC, LC-ES, and LC-AS subjects were assessed based on the above lectin microarray data with different types and stages of lung cancer. The detailed information of the ROC analysis for the constructive models and candidate lectins in the training cohort is shown in Figure 5A and Table 2.
First, the Model LC mathematic formula was established to distinguish lung cancer from BPD using binary logistic regression analysis. The diagnosis accuracy of Model LC that referred to three lectins (ECA, GSL-I, and RCA120) in the training cohort was appraised by ROC analysis. When judging the types of lung cancer, the results indicated that Model LC had higher discriminatory power for differentiating lung cancer from BPD (AUC: 0.926, sensitivity: 0.837, and specificity: 0.875) than that of these single candidate lectins, such as ECA (AUC: 0.859, sensitivity: 0.894, and specificity: Simultaneously, the ROC curve indicated that Model LC had higher discriminatory power for differentiating different stages of lung cancer from BPD (AUC: 0.962, sensitivity: 0.908, and specificity: 0.875) than that of three candidate lectins, such as ECA (AUC: 0.893, sensitivity: 0.949, and specificity: 0.775), RCA120 (AUC: 0.915, sensitivity: 0.735, and specificity: 1.000), and HHL (AUC: 0.892, sensitivity: 0.908, and specificity: 0.750).
Then, diagnostic models for identifying different types of lung cancer were constructed separately. The Model ADC mathematic formula was built to distinguish ADC from SCC and SCLC using binary logistic regression analysis. The diagnosis accuracy of Model ADC that referred to four lectins (DBA, STL, UEA-I, and BPL) in the training cohort was appraised by ROC analysis, indicating that Model ADC had higher discriminatory power for differentiating ADC from SCC and SCLC (AUC: 0.776, sensitivity: 0.787, and specificity: 0.729) than that of the two single candidate lectins, such as DBA (AUC: 0.634, sensitivity: 0.520, and specificity: 0.729), and PHA-E (AUC: 0.634, sensitivity: 0.720, and specificity: 0.583).
The Model SCC mathematic formula was established to diagnose SCC from other subtypes of lung cancer using binary logistic regression analysis.  The diagnostic accuracy of Model SCLC that referred to six lectins (STL, BS-I, PTL-II, SBA, PSA, and GNA) in the training cohort was appraised by ROC analysis, indicating that Model SCLC had higher discriminatory power for differentiating SCLC from NSCLC (AUC: 0.846, sensitivity: 0.710, and specificity: 0.867) than two single candidate lectins, such as STL (AUC: 0.652, sensitivity: 0.591, and specificity: 0.700) and BS-I (AUC: 0.618, sensitivity: 0.785, and specificity: 0.467).
Furthermore, a diagnostic model for distinguishing lung cancer patients at different stages was also constructed by binary logistic regression analysis.

Verification of the Diagnostic Models in the Validation Cohort
In order to assess the discriminatory efficiencies, the diagnostic models constructed in the training cohort were then applied to a validation cohort including BPD (n = 33), ADC (n = 32), SCC (n = 32), SCLC (n = 21), LC-ES (n = 22), and LC-AS (n = 42). All subjects preliminarily judged the properties of diseases they underwent through Model LC, and then those patients who were diagnosed with lung cancer by Model LC were further judged using other constructed models to determine their pathological types and stages of cancer. Similarly, ROC analyses were carried out to evaluate the diagnostic accuracy of the constructive models. The detailed results are shown in Figure  5B and Table 2.
When judging the types of lung cancer, first, the ROC curve showed that Model LC (cutoff value: 0.754, AUC: 0.961, sensitivity: 0.918, and specificity: 0.939) had superior diagnostic accuracy in distinguishing lung cancer from BPD, and it could correctly classify 78 of 85 lung cancer cases and 31 of 33 BPD cases. Next, the ROC curve showed that Model ADC

Evaluation of the Clinical Application Potential of the Diagnostic Models in the Double-Blind Cohort
Another independent test was developed in a double-blind cohort with 120 subjects to further appraise the clinical application potential of these diagnostic models for differential diagnosis between lung cancer at different types and stages and BPD. A comparison of the double-blind test results and clinical final diagnosis is shown in

DISCUSSION
Lung cancer is a severe health problem that prevails around the world (43). At present, fiberoptic bronchoscopy is a standard procedure of the diagnostic work-up of patients with suspected respiratory system lesions to obtain specimens for histological or cytological examination. Naturally, BALF is sampled before biopsy during this process, gathering specimens that are difficult to detect by bronchoscopy. However, the cytological examination for BALF is greatly influenced by artificial factors and its diagnostic accuracy is relatively poor. Proteins are the major component of BALF and have been recognized to play valuable roles in the discovery of biomarkers for lung cancer.
Comparative analyses have documented that certain proteins are present at higher levels in BALF than in plasma, suggesting that they are specifically produced in the respiratory tract (44,45). Protein glycosylation is the enzymatic addition of sugars or oligosaccharides to proteins, which leads to the functional diversity of proteins and participates in the diversity of their biological activities, particularly in cancer genesis and progression (46). Due to the complexity of glycan structures and the heterogeneity of glycosylation sites, it is a challenge for the complete characterization of tumor glycomics and glycoproteomics represents (47,48). To date, aberrant glycosylation has been observed in patients with various types of cancer, such as gastric, breast, and colorectal cancer (25,49,50). In recent years, research on the changes in glycosylation in lung cancer has made extensive progress, providing a new thread for its diagnosis and therapy (23). Lectin microarray as a highthroughput analytical glycoscience strategy allows rapid observation of different glycans following minimal sample preparation, which guarantees the real state of protein glycosylation in clinical samples from body fluids of health, benign lesion, and cancer being accurately reflected (35). Moreover, lectins, which bind to the glycan of the glycoproteins, can be exploited to identify abnormal glycopatterns, which in turn would contribute to increasing the specificity of cancer diagnosis. In our study, first, the alterations in glycoproteins in 281 individual BALF subjects were systematically probed by lectin microarrays and lectin blotting analysis; then, those participants were randomly assigned into a training cohort and validation cohort for the construction and verification of diagnostic models, respectively. Moreover, an additional 120 newly collected BALF samples enrolled in the double-blind cohort were independently detected to examine the diagnostic accuracy of the diagnostic models. According to the research results, there were 15 lectins (e.g., PHA-E, EEL, and BPL) that contribute to significant alterations in the BALF glycopatterns among BPD, ADC, SCC and SCLC through statistical analysis. Meanwhile, 14 lectins (e.g., PTL-II, LCA, and SJA) revealed noticeable alterations in the BALF glycopatterns between the control group and lung cancer patients at different stages, and the validation results of lectin blotting were generally consistent with the results from lectin microarrays. The findings indicated that the expression levels of Tn antigen and its derived structure T antigen recognized by GSL-I, VVA, and DBA in BALF were up-regulated both in different subtypes and stages of lung cancer compared with BPD, of which the level of VVA was significantly higher in ADC than that in BPD and SCC. Tn antigen is one of the most specific tumor-associated carbohydrate structures that is not normally expressed in peripheral tissues or blood cells but can promote tumor cell invasion (51). The expression of this antigen found in most human carcinomas is derived from blockade during the normal O-glycosylation pathway, in which glycans extend from the common precursor GalNAca1-O-Ser/Thr (Tn antigen) (52). Similar to our results, an earlier study found that T and Tn antigen in ADC were detected at a higher frequency than in SCC by immunohistochemical staining (53). Moreover, the present study also observed that the expression levels of GSL-I and VVA gradually increased with the stages of lung cancer, which may reflect the tumor burden and is related to the poor prognosis of pulmonary disease. Our previous study also demonstrated that the level of T antigen in serum was increased in patients with stage III and stage IV ADC compared with levels in healthy controls (40). Based on the above results, the differential expression levels of T and Tn antigen may have potential as biomarkers that not only recognize lung cancer and distinguish the histological subtypes of NSCLC but also may serve as a prognostic indicator for lung cancer.
In mammals, core fucosylation, a typical terminal modification of proteins, is the addition of a1-6-linked fucose to the innermost GlcNAc residue of N-glycans, which is only catalyzed by fucosyltransferase 8 (FUT8) (54). Studies frequently reported that FUT8 is highly expressed in many malignant diseases, such as lung, breast, and colorectal carcinomas, but it is negatively correlated with the development of gastric cancer (55)(56)(57)(58). Hirao et al. (42) performed lectin microarray analysis of lung cancer tissues and cell lines and identified AAL as a lectin probe specific to NSCLC. In line with these findings mentioned above, the expression level of core-fucosylation recognized by AAL was elevated in different types and stages of lung cancer compared with expression in BPD in this study. Of particular relevance for our research, the binding performance of Gal and GalNAc glycans recognized by BPL, PNA, MPL, SJA, WFA, WGA, and PWM deserves our attention, which may help with pathological typing and early diagnosis of lung cancer. Among these lectins, compared with SCC, the expression of BPL decreased significantly in ADC. The expression of PNA and MPL decreased significantly in SCC compared with expression in ADC and SCLC. Simultaneously, the expression of SJA and WFA related to LC-ES was down-regulated in comparison with that in LC-AS, however, the expression of WGA and PWM was upregulated in LC-ES, and showed a same expression trend as BPD. This finding reminds us the glycans recognized by these four lectins provide important information for the early diagnosis of lung cancer. In addition, the expression levels of Gal and GalNAc structures recognized by RCA120, EEL, ECA, PTL-II, and PHA-E, as well as sialylated structure binders MAL-II were significantly decreased in different types and stages of lung cancer, while the expression levels of DSA and GSL-II increased significantly in lung cancer compared with levels in BPD, especially in ADC. To sum up, according to the research results, these lectins (MAL-II, RCA120, ECA, HHL, DBA, DSA, and AAL) have potential to become biomarkers for the diagnosis of lung cancer. Moreover, ADC and SCC may be distinguished by PHA-E, EEL, BPL, GSL-II, and VVA. Also, PNA and MPL can be used to distinguish SCC and SCLC. Notably, PTL-II, LCA, SJA, WFA, WGA, and PWM are expected to be valuable biomarkers for the early diagnosis of lung cancer.
Furthermore, we constructed five diagnostic models (Model LC, Model ADC, Model SCC, Model SCLC, and Model LC-ES) based on BALF glycopattern abundances for the differential diagnosis between benign and malignant lung diseases, as well as the classification and periodization of lung cancer. The distinguishing performance of all models was better than that of single lectins. Model LC, Model SCLC, and Model LC-ES achieved desired diagnostic powers with an AUC value greater than 0.700 (p < 0.01) for the diagnosis of lung cancer, SCLC, and LC-ES in both the test and validation cohorts. In addition, Model LC and Model LC-ES exhibited high accuracies of 0.917 and 0.864 in the double-blind cohort, respectively, which are clinically valuable for the identification of benign and malignant pulmonary diseases and early diagnosis of lung cancer with stable and reliable BALF glycopattern biomarkers. However, the sample discrimination abilities of Model ADC, Model SCC, and Model SCLC are not as good as the above two diagnostic models. Therefore, the subtle differences in glycosylation hidden behind different pathological types of lung cancer need to be further explored.
There are still some limitations in this study. One is that the clinical sample size of patients with LC-ES is relatively small, and the other is that our research has not involved the detailed molecular mechanism that causes aberrant glycosylation for the progression of lung cancer. Further investigations in larger cohorts are required to assess the clinical application potential of these BALF glycopattern biomarkers in diagnosing lung cancer and even distinguishing other cancers in the future. We will also focus on the glycosylated pathway related to the development of lung cancer and intend to elucidate the correlation between abnormal glycosylation alterations and malignant biological behaviors.
In conclusion, our current study systematically explored the lung cancer-related changes in BALF glycosylation and detected differentially expressed glycoproteins among patients with BPD, ADC, SCC, SCLC, LC-ES, and LC-AS by lectin microarrays and blotting analysis, which indicated that different combinations of lectins can be used to detect the type of lung cancer, and even its pathological stage. Further, five diagnostic models with better discrimination were constructed to distinguish different types and stages of lung cancer, and Model LC and Model LC-ES revealed high accuracy greater than 0.850 in the double-blind test, which may contribute to identifying benign and malignant pulmonary diseases and diagnosing lung cancer at an early stage. This study provides insight into the discovery of promising biomarkers for the diagnosis of lung cancer based primarily on the precision alterations in BALF glycopatterns.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethical Committee of the First Affiliated Hospital of Xi'an Jiao Tong University in Xi'an, China. The patients/participants provided their written informed consent to participate in this study.