Weighted Gene Co-expression Network Analysis for RNA-Sequencing Data of the Varicose Veins Transcriptome

Objective Varicose veins are a common problem worldwide and can cause significant impairments in health-related quality of life, but the etiology and pathogenesis remain not well defined. This study aims to elucidate transcriptomic regulations of varicose veins by detecting differentially expressed genes, pathways and regulator genes. Methods We harvested great saphenous veins (GSV) from patients who underwent coronary artery bypass grafting (CABG) and varicose veins from conventional stripping surgery. RNA-Sequencing (RNA-Seq) technique was used to obtain the complete transcriptomic data of both GSVs from CABG patients and varicose veins. Weighted Gene Co-expression network analysis (WGCNA) and further analyses were then carried out with the aim to elucidate transcriptomic regulations of varicose veins by detecting differentially expressed genes, pathways and regulator genes. Results From January 2015 to December 2016, 7 GSVs from CABG patients and 13 varicose veins were obtained. WGCNA identified 4 modules. In the brown module, gene ontology (GO) analysis showed that the biological processes were focused on response to stimulus, immune response and inflammatory response, etc. Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis showed that the biological processes were focused on cytokine-cytokine receptor interaction and TNF signaling pathway, etc. In the gray module, GO analysis showed that the biological processes were skeletal myofibril assembly related. The immunohistochemistry staining showed that the expression of ASC, Caspase-1 and NLRP3 were increased in GSVs from CABG patients compared with varicose veins. Histopathological analysis showed that in the varicose veins group, the thickness of vascular wall, tunica intima, tunica media and collagen/smooth muscle ratio were significantly increased, and that the elastic fiber/internal elastic lamina ratio was decreased. Conclusion This study shows that there are clear differences in transcriptomic information between varicose veins and GSVs from CABG patients. Some inflammatory RNAs are down-regulated in varicose veins compared with GSVs from CABG patients. Skeletal myofibril assembly pathway may play a crucial role in the pathogenesis of varicose veins. Characterization of these RNAs may provide new targets for understanding varicose veins diagnosis, progression, and treatment.


INTRODUCTION
Varicose veins are a common problem worldwide, affecting approximately 25% of the population (Callam, 1994;Evans et al., 1999) and causing significant impairments in healthrelated quality of life (HRQoL) (Smith et al., 1999;Kurz et al., 2001;Eberhardt and Raffetto, 2005). Symptoms in varicose vein patients span from cosmetic worries to severe refractory ulcer (Rabe and Pannier, 2012). Until now, the etiology and pathogenesis of varicose vein remain not well defined (Li et al., 2014).
Transcriptomic alteration of great saphenous vein can lead to vascular wall structure and morphological changes, which may contribute to the pathogenesis of varicose veins. RNA-Sequencing (RNA-Seq) is a relatively novel approach to transcriptome profiling, which can obtain the complete transcriptomic information and allow for more extensive analysis compared with microarray platforms . Gene expression profiling with co-expression analysis has been widely used in identifying gene expression level in various diseases, such as neurological disorders, cancer and some metabolic disorders (Sundarrajan and Arumugam, 2016). Weighted gene co-expression network analysis (WGCNA) is a powerful method to identify co-expressed groups of genes from large heterogeneous messenger RNA expression data sets (Clarke et al., 2013), and it is widely used to illuminate transcriptomic alterations in some diseases such as cancer and acute aortic dissection (Zhang and Horvath, 2005;Wang et al., 2017). Compared to partial correlation and information theory (PCIT) methods, WGCNA has proven its superiority and it may identify higher-order relationships by focusing on the integrated function of gene modules (Zhao et al., 2010;Kadarmideen and Watson-Haigh, 2012).
In this study, we harvested GSVs from CABG patients and varicose great saphenous veins (GSV) from conventional stripping surgery. RNA-Seq technique was used to obtain the complete transcriptomic data of both GSVs from CABG patients and varicose GSV. WGCNA and further analyses were then carried out with the aim to elucidate transcriptomic regulations of varicose veins by detecting differentially expressed genes, pathways and regulatory genes. We integrated RNA-Seq to associated gene networks and pathways with pathological process in varicose veins.

Study Population and Specimen Collection
The study population consists of patients who underwent CABG because of coronary artery disease and patients who underwent conventional great saphenous vein stripping surgery because of varicose veins. The GSVs from CABG patients were harvested during the CABG surgery and varicose great saphenous vein samples were harvested during the conventional GSV stripping surgery. The GSV phlebectomy was carried out about 5 cm above the knee and the specimens were 2 cm in length. When harvested, the surrounding tissue was removed and each specimen was divided into 2 segments, one stored in liquid nitrogen with a frozen tube and the other one in 10% formalin solution, both immediately. This study was carried out in accordance with the recommendations of the ethics committee of China-Japan Friendship Hospital with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the ethics committee of China-Japan Friendship Hospital.

Histopathological Analysis of Varicose Veins
The GSVs from CABG patients and varicose veins were fixed in 10% formalin for at least 24 h. Specimens were dehydrated, embedded with paraffin and sectioned (Tissue-Tek and IVS-410, Sakura Company, Japan) for further investigations. Hematoxylin-Eosin (HE), Masson trichrome and elastic fiber staining for morphologic analysis were carried out for both GSVs from CABG patients and varicose veins. We measured the thickness at 4 points around the vessel circumference, including 12-o'clock, 3-o'clock, 6-o'clock and 9-o'clock. Then, the histopathological analysis was carried out under an optical microscope (DM4000 B, Leica, German) with Image J software (Version 1.48). The person who performed the histopathological examination was blinded to whether the specimens were GSVs from CABG patients or varicose veins. The immunohistochemistry staining for ASC, Caspase-1 and NLRP3 (NOD-like receptor family, pyrin domaincontaining 3) was carried out with antibodies for human ASC (1:200, Proteintech, IL, United States), Caspase-1 (1:200, Proteintech, IL, United States) and NLRP3 (1:100, Abcam, Cambridge, United Kingdom) using EnVison method. Brown or brown-black particles in the cytoplasm were considered as positive expressions.

RNA Isolation and Quality Examination
RNA was isolated with TRIzol Reagent (Life Technologies, CA, United States). RNA degradation and contamination were detected by 1% agarose gels electrophoresis. RNA purity was checked using the Qubit R 3.0 Fluorometer (Life Technologies, CA, United States). RNA integrity and concentration were assessed using the RNA Nano 6000 Assay Kit of the Bioanalyzer 2100 system (Agilent Technologies, CA, United States).

Library Preparation, Construction, and Examination
A total amount of 2 µg RNA per sample was used as input material for the RNA sample preparations. Sequencing libraries were generated using NEBNext R Ultra TM RNA Library Prep Kit for Illumina R (#E7530L, NEB, United States) following the manufacturer's recommendations. Index codes were added to attribute sequences to each sample. Briefly, mRNA was purified from total RNA using poly-T oligo-attached magnetic beads. Fragmentation was carried out using divalent cations under elevated temperature in NEBNext First Strand Synthesis Reaction Buffer (5X). First strand cDNA was synthesized using random hexamer primer and RNase H. Second strand cDNA synthesis was subsequently performed using buffer, dNTPs, DNA polymerase I and RNase H. The library fragments were purified with QIAquick PCR kits and elution with EB buffer, subsequently, terminal repair, A-tailing and adapter were implemented. The aimed products were retrieved by agarose gel electrophoresis and PCR was performed. Then the library was completed.
The main procedure for library construction included mRNA concentration, fragmentation, cDNA synthesis, end repairing, add A-tailing and adapter, fragments selection and library purification. The reagent used included NEBNext super speed RNA Library Prep Kit for Illumina; Beckman AM Pure XP beads, Beckman; 80% Ethanol; Promega magnetic frame PCR instrument, ABI 9700.
RNA concentration of library was measured using Qubit R RNA Assay Kit in Qubit R 3.0 to preliminary quantify and then dilute to 1 ng/µl. Insert size was assessed using the Agilent Bioanalyzer 2100 system (Agilent Technologies, CA, United States), and qualified insert size was accurately quantized using StepOnePlus TM Real-Time PCR System (Library valid concentration >10 nM).

Library Clustering, Sequencing, and Quality Control
The clustering of the index-coded samples was performed on a cBot cluster generation system using HiSeq PE Cluster Kit v4-cBot-HS (Illumina) according to the manufacturer's instructions. After cluster generation, the libraries were sequenced on an Illumina Hiseq 4000 platform and 150 bp paired-end reads were generated. The Sequencing quality was controlled by MultiQC tool (Ewels et al., 2016).

RNA-Seq Data Normalization and Differentially Expressed Genes (DEGs) Selection
The number of reads mapped to each gene was counted using HTSeq v0.6.0. Fragments per kilobase of exon per million reads mapped (FPKM) was then calculated based on length of the gene and reads mapped to the gene. Then, raw RNA-Seq data of GSVs from CABG patients and varicose veins was subjected to DEGs analysis. DEGs were identified by DESeq2 (Love et al., 2014) package in R language. We applied false discovery rate (FDR) criterion proposed by Benjamini and Hochberg (Reiner-Benaim, 2007) for multiple testing corrections of the raw P-value. The threshold of DEGs was set as FDR < 0.05.

Weighted Gene Co-expression Network Analysis
We selected genes for network analysis based on their variation (SD > 0.25) and built the co-expression network using WGCNA package in R language . Pearson's correlations between each gene were calculated to build an adjacency matrix. The power parameter (β) was set to 14 based on the scale-free topology criterion (Zhang and Horvath, 2005). Then the topological overlap measure (TOM) and corresponding dissimilarity (1-TOM) were calculated using adjacency matrix.
(1-TOM) was used as a distance for gene hierarchical cluster, and then DynamicTree Cut algorithm  was used to identify the modules (defined as clusters of highly interconnected genes). Each module has a different color. We use Z-score summary predicted by the module preservation function implemented in the WGCNA package to assess the module conservation. The first principle gene in the module was defined as module eigengene (ME). Then, we used Pearson's correlation between expression profile of each gene and ME to identify the module membership (MM). The summing of connectivity for one gene with other genes in the module was defined as intra-module connectivity (Kin).

Module-Traits Relationships and Functional Categorization of Modules
Using the ME, the module-traits relationships were estimated by calculating the Pearson's correlations between the ME and the traits of interest. Those module-traits relationships were used to select potential biologically interesting modules for downstream analysis. Modules were selected when they had a correlation >0.5 with at least one of the selected traits. Genes in the module were selected when their intra-modular connectivity with that particular module was >0.6, and the intramodular connectivity with all other modules <0.6. The intramodular connectivity is calculated as the correlation between the gene's expression profile and the expression profile of the ME. Another gene characteristic is the gene trait correlation: the correlation between the gene's expression profile and the phenotype of interest. The uniqueness of the 4 modules was characterized based on gene ontology (GO), including the biological process, cellular components and molecular functions using clusterProfiler packages (Yu et al., 2012).

Statistical Analysis
Continuous variables were present as mean and standard deviation. Discrete variables were present as percentages. For continuous variables, we carried out the normality test. For normally distributed variables, we performed independent t-test. For non-normally distributed variables, we performed a non-parametric test. For discrete variables, the chi-square test or Fisher's exact test were performed. Data analysis was performed using SPSS version 22 (SPSS Inc., Chicago, IL, United States) and R software version 3.2.3 (package "WGCNA" "DESeq2"). A P-value of <0.05 was considered statistically significant.

Demographics and Clinical Features of the Patients
From January 2015 to December 2016, 7 GSVs from CABG patients and 13 varicose GSV were obtained. In the CABG group, there were 5 males and 2 females, with the mean age being 62.9 years old. In the varicose vein group, there were 10 males and 3 females, with the mean age being 55.7 years old. The clinical features of the patients are shown in Table 1.

Global Identification of Differentially Expressed RNAs in Varicose Veins
We characterized global transcriptional RNA expression between GSVs from CABG patients and those from varicose veins. Before alignment, low-quality reads and those containing adapter or poly-N were removed using MultiQC. The remaining reads  were mapped to the assembly hg19 genome using the default parameters in STAR (v2.5.1b) aligner. This resulted in an average of 19033962 uniquely mapped reads per sample, of which an average of 97.56% were mapped in the intragenic region (within introns or exons). On average 19572270 reads were detected among the mapped reads. Read counts were estimated at the gene level using HTSeq. Finally we identified 333 differentially expressed genes, including 87 up-regulated and 246 down-regulated (Supplementary Data Sheet S1). The volcano plot in Figure 1 showed the differentially expressed genes.

Weight Gene Co-expression Network Analysis (WGCNA)
Weight gene co-expression network analysis approach was applied to the data count obtained from RNA-Seq of 7 GSVs from CABG patients and 13 varicose vein samples. We used the differentially expressed genes to construct the co-expression network. Gene modules (clusters of highly co-expressed genes) were detected and assigned different colors. Finally, we identified 4 modules containing 285 genes, labeled by yellow, brown, blue and gray (Figure 1). Among these modules, there were yellow modules containing 41 genes, brown module containing 143 genes, gray module containing 40 genes, blue module containing 61 genes.

Module-Trait Relationship Analysis
In these identified modules, an eigengene was calculated. Then, we correlated the selected traits and modules' eigengene to calculate the module-traits relationships (Figure 1). The results showed that several modules were highly correlated with one or more selected traits. The most relevant in moduletrait relationship analysis was found between gray module and status of the sample (r = 0.92, P = 1 × 10 −8 ). Therefore, for further analyses, we selected the gray module as a module of interest. The gray module was also associated with CEAP (Clinical, Etiological, Anatomical and Pathological) categories, the presence of coronary artery disease, side of the great saphenous vein, vascular wall thickness and collagen/smooth muscle ratio.

Functional Enrichment of Modules
Genes in the modules were subjected to GO functional analysis and KEGG (Kyoto encyclopedia of genes and genomes) pathway analysis. Gene functional analysis of DE genes can be found in Supplementary Figure S1. In the brown module, GO analysis showed that the biological processes were enriched in response to stimulus, immune response and inflammatory response, etc. KEGG pathway analysis demonstrated that the difference expression genes may function through cytokinecytokine receptor interaction and TNF signaling pathway (Figure 2). In the gray module, GO analysis showed that the biological processes were skeletal myofibril assembly related (Figure 3). Gene functional analysis of module yellow genes can be found in Supplementary Figure S2.

Immunohistochemistry Staining of ASC, Caspase-1 and NLRP3 Expression
The immunohistochemistry staining showed that the NLRP3 expression in varicose veins was decreased compared with GSVs from CABG patients (Figure 4). The expression of Caspase-1 and ASC had a downward tendency, but the difference was not statistically significant (Figure 4).

Histopathological Characteristics of the Samples
The histopathological analysis showed that in varicose veins, the thickness of vascular wall, tunica intima, tunica media and the collagen/smooth muscle ratio were significantly increased, and the ratio of elastic fiber/internal elastic lamina was decreased ( Figure 5). The details are shown in Table 2.

DISCUSSION
In the present study, we carried out WGCNA based on RNA-Seq data, clinical information and the histopathologic  characteristics of GSVs from CABG patients and varicose veins from conventional surgery and identified several candidate genes and biological processes which may contribute to the pathogenesis of varicose veins. It is commonly considered that weakness of venous wall, venous valve dysfunction, and increased intravenous pressure are the main causes of varicose veins (Meissner et al., 2007;Oklu et al., 2012). Though venous valve dysfunction and increased intra-venous pressure were thought to be the initiating factors, in clinical practice, some varicose vein patients have functional venous valves and venous wall distension appear before valve incompetence (Alexander, 1972). The histology of venous walls was reported to be primitive, with incomplete vein wall formation and decreased elastic component in recurrent varicose veins (Brake et al., 2013). These phenomena indicated that the primary variation of venous wall components and structure may be vital in the pathogenesis of varicose veins (Clarke et al., 1992;Elsharawy et al., 2007). Therefore, an increasing number of studies began to focus on the primary changes of venous wall components and structure in patients with varicose veins.  Gomez reported (Gomez et al., 2014) that decreased prostaglandin E2 reduced matrix metalloproteinases-1 activity during vascular wall remodeling and, consequently, increased collagen density in varicose veins. Primary vascular smooth muscle cell (VSMC) variation may be the initiating factor for varicose veins (Venturi et al., 1996), and down-regulation of desmuslin promotes VSMC switch to a synthetic phenotype from a contractile phenotype (Xiao et al., 2009). Desmuslin knockdown also leads to an increase in matrix metalloproteinase-2 and collagen activity (Xiao et al., 2010), which may cause the formation of varicose veins.
Some studies have investigated the differentially expressed genes in primary varicose veins. Kim et al. (2005) recognized 3 cDNAs and Görmüs et al. (2014) found that up-regulation of elastin and related genes may play an important role in the pathogenesis of varicose veins. Cui et al. (2012) carried out microRNAs (miRNAs) profiling in the GSV in patients with chronic venous insufficiency (CVI) and found that miR-34a, miR-155, and miR-202 might play crucial rules in CVI and varicose veins. Li et al. (2014) reported the aberrantly expressed long noncoding RNAs (lncRNAs) involved in varicose veins and predicted their potential functions. Then further investigations (Li et al., 2015) showed that low expressed lncRNA-GAS5 may facilitate VSMC proliferation and migration through Annexin A2, which may promote the pathogenesis of varicose vein. Anwar et al. (2017) applied nuclear magnetic resonance spectroscopy and ultra-performance liquid chromatography-mass spectrometry to analyze metabolic profiling of varicose veins, and also reported 4 differentially expressed miRNAs. All the previous studies have investigated some aspects of RNA expression, but few of them have evaluated transcriptomic alterations. RNA-Seq is a relatively novel technique utilizing a high-throughput sequencing method to evaluate transcriptomic alterations, including total RNA, pre-mRNA, and non-coding RNA (Kukurba and Montgomery, 2015). WGCNA is also novel and powerful in identifying co-expression modules and is widely used to distinguish transcriptomic variations. In our study, we applied WGCNA on RNA-Seq data, and further investigations showed that some inflammatory RNA may be down-regulated in varicose veins compared with GSVs from CABG patients. The skeletal myofibril assembly pathway may play a crucial role in the pathogenesis of varicose veins.
NLRP3 inflammasome is a key component mediating sterile inflammation. This is a complex consisting of NLRP3, ASC (apoptosis-associated speck-like protein containing a CARD), and caspase-1. It has been reported to be closely related to vascular dysfunction. NLRP3 inflammasome can be activated by cholesterol crystal and is essential for atherogenesis (Duewell et al., 2010), and has been proven to be required for VSMC calcification in atherosclerosis (Wen et al., 2013). It has been reported that obesity can accelerate vascular endothelial dysfunction via the activation of NLRP3 inflammasome and mitochondrial dysfunction (Liu et al., 2015). Bruder et al. reported that NLRP3 inflammasome plays a central role in aldosterone-induced vascular dysfunction (Bruder-Nascimento et al., 2016), and this complex also contributes to VSMC phenotypic transformation (Sun et al., 2017). It has also been reported that activation of NLRP3 inflammasome is a key determinant of venous thrombosis during hypoxic condition in venous disease (Gupta et al., 2017). In our immunohistochemistry staining, the expression of NLRP3, ASC and caspase-1 was decreased in varicose veins compared with GSVs from CABG patients, except for 1 patient with high blood glucose. In conjugation with the WGCNA results, we speculate that NLRP3 inflammasome expression was associated with high blood glucose and may play a role in the pathogenesis of coronary artery atherosclerosis. As for varicose veins, NLRP3 inflammasome has a downward tendency, which suggested that NLRP3 inflammasome mediated inflammation may not contribute to the pathogenesis of varicose veins. However, GO showed that the skeletal myofibril assembly pathway may be vital in the pathogenesis of varicose veins.
It has been reported that insulin-like growth factor I stimulates myofibril development and decreases smooth muscle α-actin (Donath et al., 1994). In varicose veins, the ratio of collagen and smooth muscle significantly increased. Further investigations of differentially expressed genes in gray module may help to elucidate the underlying mechanism of these changes. Therefore, characterization of these RNAs may provide new targets for understanding varicose veins diagnosis, progression, and treatment.
Our study has some limitations. First, the sample size is relatively small. GSVs from CABG patients are not strictly normal, which may cause a bias. Increased cardiovascular risk in CABG patients might increase the expression of NLRP3 inflammasome markers, since those patients often presented hypertension, diabetes and obesity, which have been associated with NLRP3 inflammasome activation. Second, though WGCNA allows for more extensive analysis, over analysis of RNA-Seq data may lead to type I error. Third, this is a preliminary study, and further studies, such as RT-qPCR, western blotting, gene knocking down/out and overexpression should be carried out to shade light of the underlying mechanisms.

CONCLUSION
This study shows that there are clear differences in transcriptomic information between varicose veins and GSVs from CABG patients. Some inflammatory RNAs are down-regulated in varicose veins compared with GSVs from CABG patients. Skeletal myofibril assembly pathway may play a crucial role in the pathogenesis of varicose veins. Characterization of these RNAs may provide new targets for understanding varicose veins diagnosis, progression, and treatment.