Identification of Key LncRNAs and Pathways in Prediabetes and Type 2 Diabetes Mellitus for Hypertriglyceridemia Patients Based on Weighted Gene Co-Expression Network Analysis

Aims Prevalence of prediabetes and type 2 diabetes mellitus(T2DM) are increasing worldwide. Key lncRNAs were detected to provide a reference for searching potential biomarkers of prediabetes and T2DM in hypertriglyceridemia patients. Methods The study included 18 hypertriglyceridemia patients: 6 newly diagnosed type 2 diabetes patients, 6 samples with prediabetes and 6 samples with normal blood glucose. Weighted gene co-expression network analysis (WGCNA) was conducted to construct co‐expression network and obtain modules related to blood glucose, thus detecting key lncRNAs. Results The green, yellow and yellow module was significantly related to blood glucose in T2DM versus normal controls, T2DM versus prediabetes, prediabetes versus normal controls, respectively. ENST00000503273, ENST00000462720, ENST00000480633 and ENST00000485392 were detected as key lncRNAs for the above three groups, respectively. Conclusions For hypertriglyceridemia patients with different blood glucose levels, ENST00000503273, ENST00000462720 and ENST00000480633 could be potential biomarkers of T2DM.


INTRODUCTION
Hyperglycemia promotes a variety of reactions, including oxidative stress and the formation of advanced glycosylated end products, which have been associated with structural and functional changes in blood vessels that eventually cause dysfunction of several organs, especially the heart, nerves, eyes, and kidneys (1). Prevalence of prediabetes is increasing worldwide and more than 470 million people will have prediabetes by 2030 (2). Previous studies indicated that prediabetes was associated with an increased risk of coronary heart disease, stroke, and all-cause mortality (3). Meanwhile, it is estimated that the number of patients with diabetes were expected to increase to 693 million by 2045. Type 2 diabetes mellitus accounts for about 90%-95% of patients with diabetes (4). Compared with people who do not have diabetes, T2DM patients have a 15% increased risk of all-cause mortality (5). Specially, hypertriglyceridemia can reduce peripheral insulin sensitivity. Meanwhile, insulin resistance leads to the occurrence of hypertriglyceridemia. Therefore, a vicious circle is established (6). Furthermore, T2DM patients with hypertriglyceridemia usually have a higher prevalence of diabetic complications (7). Therefore, whether from a public health perspective or a clinical perspective, prediabetes and T2DM patients with high triglycerides should be paid more attention.
Current research is still searching for novel reliable non-invasive biomarkers to replace the invasive sampling methods for the diagnostic protocols (8,9). Meanwhile, healthy individuals release exosomes with a cargo of different RNA, DNA, and protein contents into the circulation. Therefore, the above molecules can be measured non-invasively as biomarkers of healthy and diseased states (10). Long noncoding RNAs (lncRNAs) represent a class of transcripts longer than 200 nucleotides with limited protein-coding potential (11). Increasing evidence indicated that lncRNAs could interfere with gene expressions and signaling pathways at various stages, play important roles in the regulation of tissue homeostasis and pathophysiological conditions (12). Existing research have indicated that lncRNA were involved in the entire prediabetes biological process (13). Therefore, as a biomarker, lncRNA plays an important role in the prediction and diagnosis of diseases. Li et al. reported that lncRNA ENST00000550337.1 could be a potential diagnostic biomarker for prediabetes (14). In addition, studies have shown that lncRNAs were related to T2DM via producing a complex regulatory network through interactions with transcription factors (15). The abnormal expression of lncRNA NONRATT021972 participates in the occurrence and progression of T2DM, in which it was a potential clinical biomarker (16). The emerging evidence indicated that the role of MALAT1 in diabetes complications is both pro-inflammatory and apoptosis in different cell types (17). However, conventional method only focuses on the role of the single gene, the external sample traits cannot be combined (18). Fortunately, weighted gene co-expression network analysis (WGCNA) could solve the problem.
WGCNA was a method of scale-free network analysis proposed by Peter Langfelder and Steve Horvath in 2008. WGCNA can be used for detecting modules of highly correlated genes, and summarizing such modules via the module eigengene or an intramodular hub gene, and relating modules to one another and to external sample traits (18). Meanwhile, WGCNA also alleviates the multiple testing problems inherent in microarray data analysis (19). In addition, WGCNA focused on the whole genome information to overview of the signature of gene networks in phenotypes which can avoid bias and subject judgement (20). Based on the above characteristics of WGCNA, we used it to construct a co-expression network and obtain modules related to blood glucose, thus detecting key genes, and providing a reference for searching potential biomarkers of prediabetes and T2DM in hypertriglyceridemia patients. We present the following article in accordance with the STROBE reporting checklist.

Participants
The study included 18 hypertriglyceridemia patients: six newly diagnosed type 2 diabetes patients, six samples with prediabetes and six samples with normal blood glucose. All participants were Han Chinese aged 40-65 years and were recruited at the First Hospital of Jilin University from July to September 2020. Patients were diagnosed based on the guidelines for the prevention and control of type 2 diabetes in China (2017 Edition): Patients with type 2 diabetes were defined as fasting plasma glucose (FPG)≥7.0 mmol/L or oral glucose tolerance test (OGTT) two-hour blood glucose ≥11.1 mmol/L. The range of FPG from 6.1-7.0 or the range of OGTT from 7.8-11.1 were regarded as patients with prediabetes. Besides, FPG<6.1 mmol/L and OGTT<7.8 mmol/L could be regarded as the normal controls. Meanwhile, the level of triglycerides (TG) in all participants was >1.7 mmol/L based on the guidelines for prevention and treatment of dyslipidemia in China(2016 Edition). Additionally, all participants had not controlled their blood glucose through drugs or other treatments previously. Meanwhile, all patients with the history of coronary artery disease (CAD), hypertension, atrial fibrillation, myocardial infarction, tumor, acute infectious disease, immune disease, and hematological disease were excluded from the study. All participants have written informed consent and the study was approved by Ethics Committee of the Public Health of the Jilin University, and the privacy of the participants are strictly confidential.

Blood Sample Collection and RNA Sequencing
For each sample, 9 ml trizol (TAKARA BIO INC., CA, Japan) was added into the whole blood immediately after the blood samples (3 ml) were collected. The ratio of trizol to whole blood is 3:1. And total RNA was isolated and purified using total RNA extraction kit. NanoPhotometer ® spectrophotometer (IMPLEN, CA, USA) was used to detect the RNA purity. Meanwhile, RNA integrity was evaluated using the RNA Nano 6000 Assay Kit of the Agilent Bioanalyzer 2100 system (Agilent Technologies, CA, USA). The chain-specific library was constructed by removing the ribosomal RNA. After the library was qualified, Illumina PE150 sequencing was performed according to pooling of the effective concentration of the library and the data output requirements. Followed by the sequencing, we removed reads with adapter and N (N means that the nucleobase information cannot be determined) ≥ 0.002, and low-quality reads from raw data. Meanwhile, Q20, Q30, and GC content were calculated. Finally, we obtained the clean reads. All analyses in the study were based on the clean data.

Weighted Gene Co-Expression Network Analysis
We used R version 4.0.4 and the package 'WGCNA' for data analysis. The WGCNA R software package is a comprehensive collection of R functions for performing various aspects of weighted correlation network analysis (18). We selected the 5000 genes using the Median Absolute Deviation (MAD) algorithm to ensure heterogeneity and accuracy of bioinformatics for co-expression network analysis (21). Then, to make the constructed network more consistent with the characteristics of scale-free network and amplify the correlation between genes, an appropriate soft threshold b is selected (20,22). Subsequently, we converted the adjacency matrix into a topological overlap matrix (TOM) to evaluate gene connectivity in the network. Finally, an average linkage hierarchical clustering was performed based on TOM-based dissimilarity, with a gene dendrogram>30 and cutting height < 0.25 to construct module dendrograms for further analysis (21,23).

Screening for Key Modules and Differentially Expressed LncRNAs
The principal component analysis was performed for module eigengenes (ME). Meanwhile, we can assess the relation between MEs and blood glucose and determine the T2DM-related module by combining the clinical data of participants. Meanwhile, the limma R package, based on empirical Bayes methods and linear models, was used to get differentially expressed lncRNAs. The threshold was P<0.05 (22).

Identification of Key lncRNAs and Functional Enrichment Analysis
We take the intersection of all genes in T2DM-related module and differentially expressed lncRNAs to obtain key lncRNAs. Subsequently, these key lncRNAs were mapped to the DAVID (Database for Annotation, Visualization, and Integrated Discovery) dataset (https://david.ncifcrf.gov/) to convert gene symbol to gene ID. Then, we used KOBAS 3.0 (http://kobas.cbi.pku.edu.cn/) to conduct the Functional Enrichment Analysis.

Statistical Analysis
All statistical analyses were performed by IBM SPSS 24.0 and R version 4.0.4. The package 'WGCNA' was used to construct a weighted gene co-expression network. Mean and standard deviation were used to describe the normal continues variables, and the analysis of variance (ANOVA) was performed for the comparison. Meanwhile, median and quartiles were calculated to describe the skewed continues variables, and the Kruskal-Wallis test was used for the comparison. Moreover, Chi-square tests were used to compare categorical variables. A 2-sided P value less than 0.05 was considered significant.

Basic Situation of the Transcriptome Data and the Determination of Soft Threshold
Basic information of six newly diagnosed type 2 diabetes patients, six samples with prediabetes and six normal controls was shown in Table 1. And the transcriptome data were used to perform the study. To ensure the quality of analysis, the cleaned data were used after eliminating low-quality data. The detailed situation was shown in Table 2. In order to assess the effect of lncRNA in people with different blood glucose levels, we performed pairwise analysis on the three group: type 2 diabetes versus normal controls, type 2 diabetes versus prediabetes, and prediabetes versus normal controls. Meanwhile, the appropriate soft threshold b were ·determined based on the selected criteria of power value, which was 16, 18 and 16, respectively (Figures S1-S3).

Type 2 Diabetes Versus Normal Controls
Eight models (black, blue, brown, green, grey, red, turquoise, yellow) were obtained through constructing co-expression networks ( Figure 1A). The number of genes in different modules was shown in Table 3. Specially, the genes classified in grey model indicated that they were not assigned to any module. The correlation between genes was displayed in Figures 2A, B indicated the correlation between different modules. Meanwhile, by combining clinical data (gender, age, blood glucose and BMI) of T2DM patients and normal controls, we could get the relevant modules and corresponding genes. Based on the aims of the study, modules that are highly related to blood glucose have received attention. As shown in Figure 3A, the green module (r=0.7, P=0.01) was significantly related to blood glucose. Additionally, the results of differentially expressed lncRNAs between T2DM patients and normal controls were shown in Figures 4A, B, which revealing that 528 lncRNAs were up-regulated and 622 lncRNAs were downregulated. Subsequently, we take the intersection of all genes in green module and differentially expressed lncRNAs to obtain key lncRNAs (Table S1). Then, the pathway analysis to the target genes corresponding to these lncRNAs was performed. Figure 5A indicated that Glycolysis/Gluconeogenesis, Metabolic pathways, Type II diabetes mellitus and other signaling pathway were enriched. We found hexokinase-3(HK3) was involved in all above pathways. The corresponding lncRNA of HK3 was ENST00000503273. All enriched signaling pathways of HK3 were displayed in Tables S2 and S5.

Type 2 Diabetes Versus Prediabetes
By constructing co-expression networks, eleven models (black, blue, brown, green, grey, magenta, pink, purple, red, turquoise, yellow) were obtained ( Figure 1B). The number of genes in different modules was shown in Table 3. The correlation between genes was shown in Figures 2C, D indicated the correlation between different modules. Meanwhile, we could get the blood glucose related modules and corresponding genes via combining clinical data of T2DM and prediabetes patients. As shown in Figure 3B, the yellow module (r=0.69, P=0.01) was significantly related to blood glucose. Additionally, the results of differentially expressed lncRNAs between T2DM and prediabetes patients were displayed in Figures 4C, D, which containing 807 up-regulated lncRNAs and 809 down-regulated lncRNAs. Similarly, we take the intersection of all genes in yellow module and differentially expressed lncRNAs to obtain key lncRNAs (Table S1), and the pathway analysis was performed. Figure 5B indicated that Type II diabetes mellitus, Insulin resistance, Fc gamma R-mediated phagocytosis, Inflammatory mediator regulation of TRP channels and other signaling pathway were enriched. Protein kinase C-epsilon (PRKCE) was involved in all above pathways and corresponding lncRNA was ENST00000462720 and ENST00000480633. All enriched signaling pathways of PRKCE were shown in Tables S3 and S5.

Prediabetes Versus Normal Controls
Eight models were obtained by using WGCNA ( Figure 1C). The number of genes in each module was shown in Table 3. The correlation between genes was shown in Figures 2E, F shown the correlation between different modules. Similarly, as shown in Figure 3C, the yellow module (r=-0.61, P=0.03) was significantly   (Table S1). Figure 5C indicated that PI3K-Akt signaling pathway, Cortisol synthesis and secretion, TNF signaling pathway and other signaling pathway were enriched. Activating transcription factor 6-b (ATF6B) was enriched in all above pathways and the corresponding lncRNA was ENST00000485392. All enriched signaling pathways of ATF6B were displayed in Tables S4 and S5.

Validation via qRT-PCR, GEO Data Set and ROC Curve
As shown in Figure 6A, there were significant differences for ENST00000503273 between the T2DM patients and normal controls via the validation of qRT-PCR (z=-2.472, P=0.013). Meanwhile, significant differences for ENST00000462720(z=-2.389, P=0.017) and ENST00000480633(z=-5.477, P<0.001) were observed between the T2DM and prediabetes patients based on Figures 6B, C, respectively. However, no significant difference was found for ENST00000485392 between the prediabetes and normal controls based on qRT-PCR. Meanwhile, the corresponding genes of above lncRNAs were verified via GSE 130991 data set. 74 T2DM, 23 prediabetes patients and 112 controls were selected from the data set. However, the corresponding gene of ENST00000485392 was not detected in this array. The p-value and log2 fold change of all genes were shown in Table S6. Moreover, the ROC curve was used to evaluate the diagnostic power of above lncRNAs. Detailed situation was shown in Table 4 and Figure 7.

DISCUSSION
The prevalence of diabetes in adults aged 18-99 years was estimated to be 8.4% in 2017 and predicted to rise to 9.9% in 2045. Meanwhile, there were 374 million people, equaling 7.7% of the world population, who have impaired glucose tolerance. Based on this, the healthcare expenditure due to hyperglycemia has brought a large social, financial and health system burden to the world (4). We used WGCNA to find the key lncRNA was ENST00000503273 and the corresponding mRNA was HK3 between T2DM patients and normal controls. Similarly, the key lncRNAs were ENST00000462720 and ENST00000480633, and the corresponding mRNA was PRKCE between T2DM and prediabetes patients. Moreover, the key lncRNA was ENST00000485392 and the corresponding mRNA was ATF6B between prediabetes and normal controls. Hexokinase was involved in phosphorylation of glucose to produce glucose-6-phosphate, the initial step in glycolysis and most glucose metabolism pathways (24). When hungry, the body mobilizes stored fat to decompose and oxidize to produce large amounts of acetyl-CoA, which can be condensed with oxaloacetic acid to form citric acid to reduce glycolysis. Meanwhile, under fed conditions, glycolytic substrates appear to contribute at most 50% of the acetyl-CoA requirements for oxidation to fat (25). Previous studies have indicated that PRKCE could regulate Protein kinase C-delta (PRKCD), and PRKCD is related to the accumulation of triglycerides and the production of lipogenic enzymes in liver for mouse (26). Meanwhile, existing researches have shown that along with increased triglyceride and FFA levels, ATF6 protein was elevated after infusion (27). Therefore, these genes seem to be related to triglycerides or lipids.
Glycolysis pathway is one of the most critical pathways of glucose metabolism, and it is also a pathway that links the metabolism of glucose, fat and amino acids. For diabetes, the activities of hexokinase and glycogen synthase in the liver and skeletal muscle are reduced, resulting in hepatic glycogen increased and glycogen synthesis decreased, thus promoting blood glucose. Meanwhile, for patients with T2DM, insulin is relatively insufficient due to decreased insulin sensitivity in the liver, leading to the synthesis of glycolysis-related enzyme reduced and glycolysis weakened. Subsequently, the ability to  metabolize glucose is lessened. Finally, it has a certain impact on the blood glucose balance (28,29). Notably, there are a number of genes are related to insulin sensitivity. For example, an increase in HK3 could contribute to the improvement of insulin sensitivity (30). Meanwhile, previous studies have indicated that some synthase appears to stimulate HK3 expression to improve insulin sensitivity (31). Therefore, through the glycolysis pathway, the mechanism of HK3 affecting the development of T2DM could be improving insulin sensitivity. Early study found that overexpression of PRKCE could be associated with the development of insulin resistance by decreasing the insulin receptors in animals (32). The current literature indicated that increased PRKCE activation was related to marked increases TG in liver, which further led to insulin resistance (33). Meanwhile, PRKCE could also impairs insulin  signaling and its ability to activate glycogen synthesis and inhibit neoglucogenesis, resulting in insulin resistance (34). It is well known that insulin resistance is a core defect in T2DM (35). Besides, PRKCE is a critical gene in steatosis for NAFLD patients, and NAFLD is a risk factor for T2DM (36,37). Moreover, the association of PRKCE with insulin granules is essential for insulin secretion (38). Therefore, it seems that PRKCE can cause the further development of T2DM through a variety of ways. Meanwhile, the activation of protein kinase C (PKC) is considered to be one of the ways that hyperglycemia leads to the development of diabetic vascular complications and other diabetic complications (39,40). PI3K-Akt signaling pathway is related to endoplasmic reticulum (ER) stress (41). ER stress is a key phenomenon in the obesity-and T2DM-associated adverse metabolic outcomes, including insulin resistance in key metabolic organs (42). Studies have indicated that ATF6 is a key transcription factor regulating ER stress (43). Mammals express two homologous ATF6(ATF6a and ATF6b), and ATF6b contributes to adipogenic processes (44). Correspondingly, PI3K-AKT signaling pathway promotes lipid biosynthesis and inhibits lipolysis (45). Therefore, ATF6b could promote the production of adipocytokines, and then the adipocytokines lead to insulin resistance via blocking PI3K-AKT-mediated inhibition of lipolysis attenuating the capacity of glucose utilization (45). Meanwhile, ATF6 could increase expression of TNF-a and other inflammatory cytokines in response to ER stress (46). The inflammatory cytokines enhance lipolysis by reducing perilipin and fat-specific protein 27 levels, and then the hepatic insulin-AKT signaling was impaired (47)(48)(49). Therefore, ATF6b could involve in insulin resistance through PI3K-Akt signaling pathway, which also included the role of TNF-a. and in turn, insulin resistance aggravates the PI3K-AKT pathway, forming a vicious circle (45). Specially, literatures have indicated that the genetic variation in ATF6 is related to prediabetes in the Chinese Han population (50). However, no significant difference was found for its corresponding lncRNA based on qRT-PCR in our study, more sample was needed in the future. This research has some limitations. In order to get reliable results, a larger sample size will be needed. Moreover, although we have used qRT-PCR to verify the key lncRNA, more molecular biology experiments and functional studies are required to help validate our findings in the future.

CONCLUSION
For hypertriglyceridemia patients with different blood glucose levels, E NST00000503273, ENST00000462720 an d ENST00000480633 could be potential biomarkers of T2DM.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are publicly available. This data can be found here: NCBI, GEO, GSE193436.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethics Committee of the Public Health of the Jilin University. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
BL, WC, and SY made the study design. SY, MS, LG, and NY conducted the study. SY, MS, TF, and YY analyzed the data and wrote the manuscript. SY, MS, XL, and WH participated amending the manuscript. All authors contributed to the article and approved the submitted version.