Identification of Novel Biomarkers for Pre-diabetic Diagnosis Using a Combinational Approach

Reliable protein markers for pre-diabetes in humans are not clinically available. In order to identify novel and reliable protein markers for pre-diabetes in humans, healthy volunteers and patients diagnosed with pre-diabetes and stroke were recruited for blood collection. Blood samples were collected from healthy and pre-diabetic subjects 12 h after fasting. BMI was calculated from body weight and height. Fasting blood glucose (FBG), glycated hemoglobin (HbA1C), triglyceride (TG), total cholesterol, high-density lipoprotein, low-density lipoprotein (LDL), insulin and albumin were assayed by automated clinical laboratory methods. We used a quantitative proteomics approach to identify 1074 proteins from the sera of pre-diabetic and healthy subjects. Among them, 500 proteins were then selected using Mascot analysis scores. Further, 70 out of 500 proteins were selected via volcano plot analysis according to their statistical significance and average relative protein ratio. Eventually, 7 serum proteins were singled out as candidate markers for pre-diabetes due to their diabetic relevance and statistical significance. Immunoblotting data demonstrated that laminin subunit alpha 2 (LAMA2), mixed-lineage leukemia 4 (MLL4), and plexin domain containing 2 (PLXDC2) were expressed in pre-diabetic patients but not healthy volunteers. Receiver operating characteristic curve analysis indicated that the combination of the three proteins has greater diagnostic efficacy than any individual protein. Thus, LAMA2, MLL4 and PLXDC2 are novel and reliable serum protein markers for pre-diabetic diagnosis in humans.


INTRODUCTION
Diabetes is a metabolic disease characterized by hyperglycemia and defective carbohydrate metabolism resulting from insulin resistance/deficiency with b-cell dysfunction (1). In 2019, the International Diabetes Federation estimated that 9% of adults worldwide, 463 million people, lived with diabetes (2). Moreover, this disease caused 5 million deaths in the same year (2). Total health expenditure for diabetes and its complications was estimated to be 760 billion US dollars in 2019 (2). Of note, type 2 diabetes (T2D) accounted for over 90% of all diagnosed cases (2).
Despite advanced diagnosis and significant improvement in therapies, T2D remains an incurable disease. Accumulating evidence shows that early prevention and intervention can significantly reduce the incidence of T2D (3,4) and its complications (5), suggesting that a strategy of prevention may well be better than a cure for diabetes. Indeed, diet control, exercise, and bariatric surgery could prevent T2D in high-risk subjects (4,6,7). Consistently, prophylaxis with metformin also decreased the incidence of T2D (3). Therefore, identification of subjects at high risk for T2D before its clinical onset is the key to preventing this disease. To date, great efforts have been made to identify genetic and protein markers for future T2D (8)(9)(10)(11)(12). Although genetic markers have high reliability, they are not satisfactory because they show up across the whole lifespan of patients with T2D, not specifically at the onset of the disease or early in its progression, and sometimes have only modest sensitivity and specificity (13). On the other hand, protein markers have high sensitivity and specificity for T2D because they reflect the progression of the disease systemically and dynamically (13). Furthermore, proteins are tightly regulated by cellular stimulation; whereas genes are not (12). Thus, protein markers are practical and have potential for diabetic diagnosis.
T2D is an inflammatory disease primarily caused by obesity, insufficient physical activity and unhealthy life style (14). The role of inflammation in prevention and control of T2D is intriguing and not completely understood (14). Serum proteins related to inflammation have potential for use as diabetic markers because blood sampling is minimally invasive and feasible for any patient posing minimal risk, and serum proteins can reflect T2D pathophysiology. For example, cluster of differentiation 14 (CD14), a monocyte differentiation antigen, has been reported to modulate inflammation-driven insulin resistance and has been identified as an inflammatory marker in women with diabetes and impaired glucose tolerance (15,16). Serum amyloid (SA) proteins such as SAA2 and SAA4 are serum proteins that are upregulated during inflammation. SAA2 was reported to be increased in the plasma of people with obesity and insulin resistance. Similarly, SAA2 was also shown to be a marker of insulin resistance in mice (17). SAA2 gene is adjacent to SAA4 gene on chromosome 11. A read-through transcription was found to occur naturally, leading to the generation of the SAA2-SAA4 fusion protein (Gene ID: 100528017). Additionally, some inflammation-irrelevant proteins are also used as diabetic markers. For instance, cluster of differentiation 99 (CD99) and clusterin (CLU) have been patented as diabetic markers (3,10). CD99 was patented as a marker for insulin resistance (WO2006063733A1). Two isoforms of CD99 have been reported. A long form, CD99wt, contains 185 amino acids (32 KD) whereas an alternative splicing isoform, CD99sh, contains 161 amino acids (28 KD) (18). CD99wt is expressed on thymocytes, pancreatic islet cells, peripheral T cells, endothelial cells, and hematopoietic stem cells (19,20). Moreover, it plays an essential role in many biological functions, including cell adhesion, migration, apoptosis, death, differentiation, intracellular membrane protein trafficking, endocytosis, and exocytosis (18,20). CLU has been reported to be a serum marker for T2D (US8673644B2). CLU is expressed ubiquitously in the cytosol of cells, plasma, and body fluids (21). It is a multifunctional protein involved in the regulation of proliferation, differentiation, and survival of cells including epithelial cells, smooth muscle cells and synoviocytes (22). CLU has several isoforms with molecular weights of 37 KD (23), 49 KD (21), and 60 KD to 75 KD (21). Some isoforms might have different glycosylation sites (21). Although biomarkers for diabetic diagnosis may not be related to diabetes, we selected the serum proteins associated with inflammation and/or diabetes for further confirmation. In our study, we identified 7 proteins as potential markers. Four of them, CD14, SAA2, CD99 and CLU, were published diabetic markers. The other three are mixedlineage leukemia 4 (MLL4), laminin subunit alpha 2 (LAMA2) and plexin domain containing 2 (PLXDC2) and are unknown for diabetic diagnosis. MLL4 was reported to interact with the transcription factors to regulate islet b-cell function (24). LAMA2 mutation was shown to cause merosin-deficient congenital muscular dystrophy (25), and PLXDC2 is known to regulate differentiation and proliferation during the development of the nervous system (26). These three proteins may be related to diabetes and it complications.
Proteomics approaches are emerging as a straightforward strategy to identify biomarkers for disease diagnosis. Mass spectrometry (MS)-based quantitative proteomics analysis is a highly sensitive technique to measure biomolecules at the femtomolar level as exemplified in cancer and cardiovascular diseases (27,28). The isobaric tags for relative and absolute quantitation (iTRAQ) method can further label proteins in an isobaric manner and assist in simultaneously quantifying the amount of proteins from different sources. Over recent years, many publications have used proteomics approaches to identify the markers for T2D or its complications in human patients and/ or diabetic mice (29)(30)(31)(32)(33)(34)(35)(36)(37)(38). However, markers for pre-diabetes or before the onset of diabetes have not been identified.
In this study, we combined iTRAQ with MS techniques to globally characterize serum proteins of human and mouse origin at the pre-diabetic stage. Statistical significance, expression ratio, functional analysis, and molecular novelty reduced the number of human serum proteins from 1074 to 7. Ingenuity pathway analysis (IPA) was used to predict the likely interaction network and pathways of these 7 proteins. Furthermore, immunoblotting analysis and receiver operating characteristic (ROC) curve analysis were used to assess the serum expression level and diagnostic power, respectively, of these proteins in healthy and pre-diabetic subjects.

Human Serum Sample Collection
All participant signed informed content approved by the IRB/ REC of the China Medical University hospital (IRB No. CMUH105REC2001). Healthy volunteers (n=20) and patients diagnosed with pre-diabetes (n=19) and stroke (n=10) were recruited at the China Medical University hospital for blood collection. There were 23 females and 26 males all of whom were Taiwanese Han. The age range was from 23 to 69 years old. Blood samples were collected from healthy and pre-diabetic subjects 12 h post fasting. The serum samples were separated from whole blood, aliquoted to avoid repeat freeze thaw cycles and then stored at -80°C. Body mass index (BMI) was calculated from body weight and height. Fasting blood glucose (FBG), glycated hemoglobin (Hb A1C ), triglyceride (TG), total cholesterol, high-density lipoprotein, low-density lipoprotein (LDL), insulin and albumin were assayed by automated clinical laboratory methods.

Depletion of Abundant Proteins
In order to augment the detection and identification of lowabundance proteins, the ProteoPrep immunoaffinity Albumin and IgG Depletion Kit from Sigma-Aldrich was used to evaluate the efficiency of high abundance protein depletion from serum samples using the manufacturer's protocol. The protein concentration was calculated using the BCA protein assay kit from Thermo Fisher Scientific.

Protein Digestion and iTRAQ Labeling
An equal amount of total protein (100 mg) per depleted sample was diluted with 0.5 M TEAB, reduced with 5 mM TCEP at 60°C for 1 h, alkylated using 10 mM MMTS at room temperature for 10 minutes and then digested with 10 mg trypsin (Promega, WI, USA) at 37°C for 16 h. Subsequently, each sample from humans was labeled with a different iTRAQ tag (Applied Biosystems, MA, USA) as follows: iTRAQ-114 was used to label the pooled serum of 3 healthy volunteers and iTRAQ-115, 116, 117 were used to label the serum of pre-diabetic subjects, respectively. The four samples from humans were combined, dried by SpeedVac, dissolved in 200 ml of 5% ACN solution containing 0.5% TFA, and desalted using a C18 spin column (Thermo Fisher Scientific). After drying with SpeedVac again, each sample was dissolved in 400 ml of 25% ACN solution containing formic acid (FA).

Fractionation With Strong Cation Exchange Chromatography
The iTRAQ labeled samples were fractionated separately via strong cation exchange chromatography using polysulfoethyl A column [2.1 x 200 mm, 5 mm particle size, (PolyLC, MD, USA)] with a flow rate of 0.3 ml/min and mobile phase (A) 10 mM KH 2 PO 4 in 25% ACN (pH 3) and (B) 1 M KCL and 10 mM KH 2 PO 4 in 25% ACN (pH 3). The gradient of fractionation was set as follows: 0% B for 5 minutes, 0-20% B for 55 minutes, 20-60% B for 10 minutes, 60% B for 10 minutes and 60-0% B for 20 minutes. The fractions were collected and dried with SpeedVac.

LC-MS/MS Analysis and iTRAQ Data Analysis
The dried fractions were dissolved in 200 ml of 5% ACN solution containing 0.5% TFA and desalted using a C18 spin column. Following drying with SpeedVac, the pellet was dissolved in 40 ml of 5% ACN solution containing 0.1% FA for LC-MS/MS analysis. Q Exactive mass spectrometer (Thermo Fisher Scientific) coupled with HCD fragmentation mode was used to generate MS and MS/MS spectra. Ultimate 3000 RSLC system (Thermo Fisher Scientific) equipped with a C18 column (Acclaim PepMap RSLC, 75 mm x 150 mm, 2 mm, 100 Å) was used for LC separation with a flow rate of 0.25 ml/min and the mobile phase (A) 0.1% FA and (B) 95% ACN/0.1% FA. The gradient of analysis was as follows: 1% B for 5 minutes, 1-25% B for 25 minutes, 25-60% B for 15 minutes, 60-80% B for 5 minutes, 80% B for 10 minutes, 80-99% B for 5 minutes and 99% B for 5 minutes. Relative protein ratio and peptide identification were processed by Proteom Discover 1.4 for Mascot database search. All tandem mass spectra were searched for species of Homo sapiens and Mus musculus against the International Protein Index human and mouse V 3.87 database. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD017472 (http:// proteomecentral.proteomexchange.org/cgi/GetDataset).

Protein Signaling Pathways and Functional Analysis
Functions and signaling pathways of serum proteins with differential expression between the healthy and pre-diabetic humans were analyzed by IPA (Ingenuity Systems, at http:// www.ingenuity.com) and PubMed (at https://www.ncbi.nlm.nih. gov/pubmed).

Immunoblotting Analysis
Serum samples were collected from healthy and pre-diabetic human subjects and then lysed by RIPA lysis buffer. Total protein (20 mg) of each serum from control and pre-diabetic human subjects and mice was resolved by 6% and 10% sodium dodecyl sulfate polyacrylamide gel electrophoresis, transferred onto nitrocellulose membrane (Schleicher and Schuell, NH, USA), immunoblotted with the antibodies against LAMA2 (1:500, LifeSpan BioSciences, WA, USA), MLL4 (1:500, Abcam, Cambs., UK), PLXDC2 (1:500, Novus Biologicals, CO, USA), CD99 (1:200, Invitrogen, MA, USA), CLU (1:500, Novus Biologicals), CD14 (1:1000, Abcam), SAA2 (1:1000, Abcam) and horseradish peroxidase (HRP)-conjugated goat and rabbit anti-mouse IgG as secondary antibody. The membranes were detected using FluorChem HD2 system (Bio-Techne, MN, USA) after developing with enhanced chemiluminescence (ECL) substrate (EMD Millipore, MA, USA). As published previously (39,40), briefly, the accuracy, sensitivity and specificity were analyzed according to number of controls or patients divided by the total number of subjects. ROC curves were used to represent the sensitivity and specificity to (pre-)diabetic diagnosis at different cut-off values. The cut-off values of biomarkers for diagnosis were based on the best Youden index. A p value less than 0.05 was considered statistically significant.

Enzyme-Linked Immunosorbent Assay Analysis
Human LAMA2 ELISA kits were purchased from Cusabio (CSB EL012726HU, TX, US). One hundred microliters of sera from healthy volunteers and patients with pre-diabetes were added to wells of a 96-well microplate. Following incubation for 2 h at 25°C, the plate was incubated with biotin-antibody and peroxidase-conjugated avidin. After extensive washing, the plate was incubated with TMB and measured for absorbance at 450 nm using an ELISA reader.

Statistical Analysis
The data are expressed as mean ± standard deviation of the mean. Student's t-test was used to compare the difference between healthy volunteers and patients. A p value less than 0.05 was considered statistically significant. False discovery rate (F DR) was used to adjust proteomic p value for multiple comparisons.

Proteins Differentially Expressed in Healthy and Pre-diabetic Sera of Human and Mouse Origin
To characterize novel and reliable markers for pre-diabetes, we first used a combination of iTRAQ and MS techniques to analyze the serum proteins of healthy (diabetes-free) subjects and prediabetics as described (Discovery, Figure 1). BMI and serum biochemistry of the human subjects were analyzed (Supplementary Table 1) (41). We found that age, FBG, Hb A1C , fasting insulin, and albumin were significantly different (p ≤ 0.05) between the healthy and pre-diabetic groups (Supplementary Table 1) (41). Serum samples from both groups were collected and their abundant proteins were then depleted, followed by trypsin digestion. Next, three serum samples from healthy subjects were pooled together to minimize individual variability and then labeled with iTRAQ 114. Three serum samples from pre-diabetic patients were labeled with iTRAQ 115, 116 and 117, respectively. Finally, four iTRAQ samples were mixed up and analyzed by LC-MS/ MS (Figure 1). The identity of the serum proteins from healthy and pre-diabetic subjects was confirmed using the Mascot software ( Figure 1). A total of 1074 human serum proteins were identified (ProteomeXchange, PXD017472). Five hundred proteins were selected based on the following criteria; peptide Mascot score > 12 and number of unique peptide matches ≥ 2 ( Figure 1).
To further evaluate the potential of serum proteins as prediabetic markers, volcano plot analysis of 500 human serum proteins was performed based on both the average relative protein ratio and the p value ( Figure 2). The transformed volcano plot data indicated that among the 500 proteins, 70 proteins with p < 0.05 could be candidate markers for prediabetes ( Figure 2) and need to be verified. In parallel, we followed the same approach to characterize the serum proteins of diabetes-free and pre-diabetic db/db mice in an attempt to compare the markers in humans and mice (Supplementary Figure 1) (41). Body weight (BW) and serum biochemistry are shown (Supplementary Table 2) (41). Healthy and pre-diabetic mice were grouped based on their FBG. Accordingly, we found that BW, FBG, Hb A1C , TRIG, LDL, and fasting insulin were significantly different in both groups (Supplementary Table 2

Gene Ontology and Pathway Analysis of the Selected Serum Proteins
To gain information about the biological function of the 70 human proteins (Figure 2) that had been selected using statistical significance (p < 0.05) and average relative protein ratio, these proteins was analyzed by gene ontology and by searching PubMed (Supplementary Figure 2) (41). The proteins could be classified into 6 functional categories; related to diabetes, diabetic complications, obesity, inflammatory immunity, coagulation and others (Supplementary Figure 2) (41).
Next, we narrowed down the number of candidate markers by picking up proteins based on higher p values (< 0.01) and functions associated to diabetes, diabetic complications and obesity. Seven proteins, LAMA2, MLL4, PLXDC2, CD14, CLU, CD99 and SAA2/4 stood out when these stringent selection criteria were used (Figure 2 and Supplementary Table 4). This discovery strategy for potential markers of (pre-)diabetes uncovered several markers already known to be associated with diabetes. For example, CD99 (3, 10) and CLU (3, 10) have been patented as diabetic markers. SAA2 (17) has been reported to increase in the plasma of obese and insulin resistant humans and was a marker of insulin resistance in mice. CD14 (15,16) has been reported to modulate inflammation-driven insulin resistance and was identified as an inflammatory marker in women with diabetes and impaired glucose tolerance. CD99, CLU, CD14, and SAA2 have been reported to be associated with diabetes from human serum proteins. The data suggest the feasibility of the identification strategy using a combination of quantitative proteomics, statistical analysis, and functional analysis to seek protein markers for diabetes. However, whether LAMA2, MLL4 and PLXDC2 can be applied as (pre-) diabetic markers remains unclear.
To better understand the biological meaning of the changes in the levels of these proteins before and during T2D, web-based IPA (42) and PubMed database searches were used to predict the protein signaling pathways. IPA generated a network of a total of 35 proteins related to connective tissue disorders, dermatological diseases and conditions, and developmental disorders (Figure 3). Of course, the putative signaling pathways need to be ascertained by further experiments. FIGURE 1 | A flow chart indicating the experimental designs for identification of serum proteins of human origin, followed by confirmation of their presence in human sera. Serum samples were collected from 3 healthy volunteers and 3 pre-diabetic patients after fasting for 12 h. The serum from the 3 healthy subjects was pooled together and labeled with iTRAQ 114. The serum from 3 pre-diabetic subjects was labeled with iTRAQ 115, 116 and 117, respectively. In the first step (Discovery stage), the iTRAQ-labeled sera were pooled, depleted of serum abundant proteins, and subjected to MS analysis. As a result, 1074 serum proteins were identified. Seven serum proteins were selected from 1074 proteins by using average relative protein ratio, P value and novelty. In the second step (Confirmation stage), immunoblotting analysis showed that 3 out of the 7 serum proteins are promising markers for (pre-)diabetes.

Confirmation of the Selected Proteins as Potential Markers for Pre-diabetes
To confirm the feasibility of using the 7 human serum proteins (LAMA2, MLL4, PLXDC2, CD14, CLU, CD99 and SAA2) as pre-diabetic markers, we conducted immunoblotting analysis. The immunoblotting data showed that LAMA2, MLL4 and PLXDC2 were undetectable in the sera of 5 healthy volunteers and their expression was highly regulated in pre-diabetic patients although their levels varied (LAMA2, MLL4 and PLXDC2, Figure 4). Additionally, we examined the expression level of CD14, CLU, CD99, and SAA2 in the sera of healthy subjects and pre-diabetic patients. No significant difference in the serum level of CLU, CD99, CD14 and SAA2 was observed between the two human cohorts (Figure 4). Moreover, we tested the specificity of the 7 selected proteins in the sera of stroke patients. The immunoblotting showed that LAMA2, MLL4 PLXDC2 and CD99 were not detected in the sera of 10 stroke patients (Supplementary Figure 3) (41). However, we found that FIGURE 2 | Transformed volcano plot analysis of the selected proteins from human sera. Total proteins from human sera were labeled with iTRAQ tags, followed by MS analysis and Mascot identification. A total of 1074 human serum proteins were identified (ProteomeXchange, PXD017472). Five hundred proteins were selected based on a peptide Mascot score > 12 and number of unique peptide matches ≥ 2 ( Figure 1). A transformed volcano plot was used to analyze log2 (ratio of the level of one serum protein in a pre-diabetic patient to its average level in healthy subjects). The serum proteins detected in humans are labeled with circles. Upregulated and downregulated proteins with p < 0.05 (*) are labeled in red and green respectively. The proteins with p < 0.01 (**) that were associated with diabetes, diabetic complications and obesity were selected for further analysis using IPA. Student's t-test was used to compare the differences between healthy and pre-diabetic humans.
CD14, CLU and SAA2 were expressed in the sera of stroke patients at various levels (Supplementary Figure 3) The ROC curve was used as a diagram to illustrate the diagnostic efficacy of the serum LAMA2, MLL4, PLXDC2, CD14, CLU, CD99 and SAA2 as their discrimination thresholds. The AUC of the ROC curves was used to evaluate the diagnostic value of each protein. The AUC of LAMA2, MLL4, PLXDC2, CD14, CLU, CD99, and SAA2 was 0.9, 1, 1, 0.9, 0.5, 0.7, and 0.5, respectively ( Figure 5 and Table  1). Compared to currently used markers, FBG and Hb A1c , LAMA2, MLL4 and PLXDC2, alone and in combination, had higher AUC than FBG, Hb A1c and/or both ( Table 1). The sensitivity, specificity and accuracy of LAMA2 were 80%, 100% and 100%, respectively ( Table 1). The sensitivity, specificity, and accuracy were all 100% for both MLL4 and PLXDC2 ( Table 1). In contrast, CLU, CD99 and SAA2 had the same sensitivity (100%), specificity (0%), and accuracy (50%) as CD14 ( Table 1). Of note, LAMA2, MLL4 and PLXDC2 had better specificity and accuracy than the others ( Table  1). Furthermore, the sensitivity, specificity, and accuracy of LAMA2, MLL4 and PLXDC2, alone and in combination, were better than FBG, Hb A1c and both ( Table 1). Taken together, the three novel markers alone and in combination had better diagnostic value than currently existing markers, FBG and Hb A1c. To further validate the clinical potential of the above novel biomarkers, we screened the level of LAMA2 in the sera of health volunteers (n=12) versus prediabetic patients (n=11). As a consequence, we found that patients with pre-diabetes had a 4 fold higher serum LAMA2 level than healthy subjects ( Table 2).
Overall, the data suggest that a combination of iTRAQ and MS techniques is able to identify serum proteins as potential markers for (pre-)diabetes. LAMA2, MLL4 and PLXDC2 may be suitable diagnostic markers for (pre-)diabetes ( Table 1). According to our results, among these proteins, MLL4 and PLXDC2 are the most promising potential markers for diagnosis because of their higher AUC ( Table 1).

DISCUSSION
Mass spectrometry (MS) is an essential methodology in proteomics for basic research and applications. For instance, it can be used for identification of functional molecules and pathways for mechanistic study and for disease diagnosis (43). The iTRAQ labeling of multiple samples is used to measure the relative level of differential proteins simultaneously (43). Therefore, the iTRAQ-LC-MS/MS technique is widely used for quantitative proteomics because this technique saves time and has fewer experimental steps than conventional proteomics (43). A pool of three control serum samples was used to compare three individual sera of patients with pre-diabetes because this pooling could reduce control variability. However, a caveat of this pooling was neglect of variability of three individual control samples.
Up until the present, different molecules have been developed for diabetic diagnosis, including FBG, Hb A1C , fructosamine, glycated albumin (GA) and oral glucose tolerance test (OGTT) (44). Moreover, genetic markers have also been proposed for diabetic diagnosis (13). However, such markers are not suitable for the diagnosis of pre-diabetes or diagnosis prior to the onset of diabetes because their expression is not FIGURE 4 | Immunoblotting analysis of LAMA2, MLL4, PLXDC2, CD14, CLU, CD99, and SAA2 in healthy and pre-diabetic sera of human origin. Serum samples of healthy and pre-diabetic subjects were collected and then lysed with lysis buffer. After centrifugation, total lysates were prepared for immunoblotting analysis with the antibodies as indicated. Transferrin (TF) was used as an internal control of human sera.
proportional to the progression of the disease. Furthermore, their specificity and sensitivity may not be satisfactory (13,44). Hence, this work demonstrated the feasibility of identifying and using protein markers with high specificity and sensitivity for prediabetic diagnosis. That is to say, up-and down-regulation of the protein markers preceded the symptoms of diabetes. Here, we identified 7 candidate proteins, LAMA2, MLL4, PLXDC2, CD14, CLU, CD99 and SAA2 from 1074 human serum proteins using quantitative proteomics, statistics and pathway analyses. After immunoblotting analysis of healthy patients as well as patients with pre-diabetes and brain stroke, LAMA2, MLL4, and PLXDC2 emerged as potential (pre-)diabetic markers. The use of these markers has some advantage over traditional markers such as blood glucose and Hb A1C (Figures 4 and 5). First, LAMA2, MLL4 and PLXDC2 are novel diagnostic markers for pre-diabetes. Second, the 3 serum proteins have an undetectable level in healthy subjects and largely increase in pre-diabetic subjects ( Figure 4). Third, these markers have certain specificity for pre-diabetes. Fourth, they have higher sensitivity and accuracy than the known markers, CD99 and CLU. Moreover, their specificity is 0-20% higher than FBG and Hb A1C and their sensitivity is 20-40% higher than FBG and Hb A1C (44). Moreover, these 3 markers have great potential to detect pre-diabetes. In the future, a large number of clinical specimens from healthy and (pre-)diabetic subjects need to be verified for their clinical use alone and in combination with other markers. In addition, whether or not MLL4, LAMA2, and PLXDC2 can be used as therapeutic molecules needs to be addressed.
LAMA2, a basement membrane protein, plays an important role in muscle function (45). Deficiency of LAMA2 is associated with muscular dystrophy and demyelinating neuropathy (46). Moreover, abnormal skeletal muscle metabolic function causes insulin resistance in T2D patients FIGURE 5 | Diagnostic efficacy and value of LAMA2, MLL4, PLXDC2, CD14, CLU, CD99 and SAA2 in healthy and pre-diabetic sera of human origin. After immunoblotting analysis, diagnosis efficacy was analyzed using a ROC curve. Sensitivity, specificity, and accuracy were evaluated for diagnostic value.  The parameters of healthy volunteers and pre-diabetic patients are indicated as mean ± standard error. The parameters with significant change (p ≤ 0.05) between the healthy and pre-diabetic subjects are indicated with asterisk(s). (47). People in the late stage of diabetes have several diabetic complications including neuropathy, nephropathy, retinopathy and so on (48). Furthermore, it has been reported that transient neonatal diabetes mellitus (TNDM) is typically caused by imprinting aberrations in chromosomes located on LAMA2 (49). In our study, LAMA2 level in prediabetic patients was higher than in healthy subjects, implying that LAMA2 could be a promising marker in (pre-)diabetes and its complications. MLL4 plays a central role in transcription activation (50). Some evidence has shown that MLL4 is associated with cancer regulation (50). Furthermore, MLL4 was reported to bind to the MAFA and MAFB transcription factors to regulate islet b-cell function (24). The pre-diabetic patients had higher MLL4 level compared to healthy groups. Therefore, MLL4 might be a good candidate for (pre-)diabetic diagnosis. PLXDC2 is a transmembrane protein related to the developing central nervous system (51). It has been reported that the genetic variant near the PLXDC2 gene has an impact on the risk of primary open-angle glaucoma by increasing intraocular pressure in the Japanese population (52). Moreover, PLXDC2 level in the serum of prediabetic patients was higher than healthy groups. Thus, the results suggest that PLXDC2 may be a potential marker in (pre-)diabetes and its complications. Although CD14, CLU, CD99, and SAA2 have been reported to be protein markers for insulin resistance, obesity, and inflammation, our data showed that they only had a modest difference in expression level between healthy and pre-diabetic sera. Overall the data also illustrate the usefulness of the strategy of identifying human serum proteins as (pre-)diabetic markers. Although LAMA2 and CLU were commonly increased in pre-diabetic patients and mice, expression change of MLL4, PLXDC2, CD14, CD99, and SAA2 were found in human sera, but not in mouse counterparts (Supplementary Tables 4 and 5). Moreover, seven mouse serum proteins selected for biomarkers were not statistically significant because their FDR adjusted p values were over 0.05 (Supplementary Table 5). The data revealed that mouse serum proteins were less useful for human clinical prediction than human serum proteins. The data suggest the possible unreliability of using diabetic mouse models to seek the diagnostic markers for human (pre-) diabetes ( Figure 1; Supplementary Figure 1; Supplementary  . We did not test the 13 serum proteins, which were down-regulated in pre-diabetic patients for two reasons: (1) the 13 proteins are functionally irrelevant to diabetes; and (2) the relative expression levels of the 13 proteins (0.6-to 0.8-fold decrease) are lower than those of the top 7 serum proteins (1.3-to 2.7-fold increase). Western blotting analysis and ELISA assays confirmed that LAMA2, MLL4 and/or PLXDC2 can be applied for (pre-) diabetic diagnosis (Figure 4 and Table 2). In comparison with FBG and Hb A1C , LAMA2, MLL4 and PLXDC2 have higher sensitivity, specificity and accuracy than FBG ( Table 1). MLL4 and PLXDC2 have higher sensitivity, specificity and accuracy than Hb A1C (Table 1). However, Hb A1C had lower sensitivity (35%) in pre-diabetes in a clinical trial (44). Therefore, the combination of these three novel proteins have potential for development of a novel and reliable method of (pre-)diabetic diagnosis. Furthermore, one advantage of using these (pre-) diabetic markers was that we could detect pre-diabetes in patients whose FBG (100-125 mg/dL) and Hb A1c (5.7-6.5%) were relatively low and unstable compared to those during diabetes. The data support the superiority of our novel biomarkers in (pre-)diabetic diagnosis in comparison with the two traditional markers, FBG and Hb A1c . Moreover, LAMA2, MLL4 and PLXDC2 may be worth investigating for diabetes development.

DATA AVAILABILITY STATEMENT
The proteomics data is deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD017472.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the IRB/REC of the China Medical University hospital (IRB No. CMUH105REC2001). The patients/ participants provided their written informed consent to participate in this study. The animal study was reviewed and approved by the protocol of the Institutional Animal Care and Use Committee of Academia Sinica (Protocol no. 12-12-478). Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

ACKNOWLEDGMENTS
All participant signed informed content approved by the IRB/ REC of the China Medical University hospital (IRB No. CMUH105REC2001). The authors thank the metabolomics and animal core facilities of ABRC and the Taiwan Mouse Clinic for their excellent technical assistance. They thank the Data Science Statistical Cooperation Center of Academia Sinica (AS-CFII-108-117) for statistical support. They also thank Ms. Miranda Loney for manuscript editing. All authors have read the journal's authorship agreement and the manuscript has been reviewed by and approved by all named authors.