Bioinformatic Analysis Identifies Potential Key Genes in the Pathogenesis of Turner Syndrome

Background: Turner syndrome (TS) is a sex chromosome aneuploidy with a variable spectrum of symptoms including short stature, ovarian failure and skeletal abnormalities. The etiology of TS is complex, and the mechanisms driving its pathogenesis remain unclear. Methods: In our study, we used the online Gene Expression Omnibus (GEO) microarray expression profiling dataset GSE46687 to identify differentially expressed genes (DEGs) between monosomy X TS patients and normal female individuals. The relevant data on 26 subjects with TS (45,XO) and 10 subjects with the normal karyotype (46,XX) was investigated. Then, tissue-specific gene expression, functional enrichment, and protein-protein interaction (PPI) network analyses were performed, and the key modules were identified. Results: In total, 25 upregulated and 60 downregulated genes were identified in the differential expression analysis. The tissue-specific gene expression analysis of the DEGs revealed that the system with the most highly enriched tissue-specific gene expression was the hematologic/immune system, followed by the skin/skeletal muscle and neurologic systems. The PPI network analysis, construction of key modules and manual screening of tissue-specific gene expression resulted in the identification of the following five genes of interest: CD99, CSF2RA, MYL9, MYLPF, and IGFBP2. CD99 and CSF2RA are involved in the hematologic/immune system, MYL9 and MYLPF are related to the circulatory system, and IGFBP2 is related to skeletal abnormalities. In addition, several genes of interest with possible roles in the pathogenesis of TS were identified as being associated with the hematologic/immune system or metabolism. Conclusion: This discovery-driven analysis may be a useful method for elucidating novel mechanisms underlying TS. However, more experiments are needed to further explore the relationships between these genes and TS in the future.


INTRODUCTION
Turner syndrome (TS) is a common genetic condition caused by abnormal sex chromosomes that affects 1 in 2500 female live births (1). Since initially described by Henry Turner in 1938, TS was gradually recognized as a syndrome characterized by the complete absence or partial loss of an X chromosome in phenotypic females. The clinical signs of TS include short stature, gonadal dysgenesis, lymphedema webbed neck and other more than 400 types of dysmorphic features. In addition, cardiovascular disease are more prevalent in women with TS, including congenital cardiac abnormalities, aortic dilation and dissection, hypertension and ischemic heart disease (2). In addition, ∼30% of individuals with TS have bicuspid aortic valve (BAV) (3). Typically, nearly half the patients with TS have a 45,X karyotype, 20-30% of patients with TS present with mosaicism (45,XO/46,XX), and the remainder have X chromosome structural abnormalities (4). Females with TS present with highly variable clinical features, which are caused by the haploinsufficiency of genes involved in multiple systems.
TS is a multiple-system disease, and the etiology is complex. However, the mechanisms underlying the pathogenesis of TS remain unclear. Previous studies indicated that haploinsufficiency of the short stature homeobox (SHOX) gene leads to occurrence of short stature and many specific skeletal anomalies in TS individuals (5). However, as a mainly developmental disorder, the pathogenesis of congenital heart defects in TS stills unclear. In previous research, mutations in NKX2.5 (6), GATA5 (7), and NOTCH1 (8) have been identified as the causative factor in non-syndromic patients with inherited BAV. Moreover, chromosome structural variants and potential pathogenic genes such as TIMP3 and TIMP1 may be associated with TS patients with congenital cardiac abnormalities (3,9). In addition, sex chromosome imbalance and dysregulation of certain genes on the X chromosome (such as FMR1, PDIAPH2, and BMP15, etc.) may result in accelerated oocyte atresia, leading to gonadal dysgenesis later in life (10,11). And haploinsufficiency of a lymphatic gene is related to the development of lymphedema and webbed neck (12). Recently, haploinsufficiency of immune-associated genes on the X chromosome was reported to result in the development of autoimmune diseases, including autoimmune thyroiditis, type 1 diabetes and autoimmune enteritis (13).
Altered autosomal gene expression as well as chrX gene expression has been observed in females with TS (45,X monosomy) in different samples as human fibroblast cell line, peripheral blood mononuclear cells, as well as in the induced pluripotent human cell lines, with inconsistent results. However, further data analysis and data mining are still absent, especially the derivation of X chromosome of the patients who inherited the monosomy X chromosome from mother or father (TS with Xm and Xp). Therefore, we used the statistical analysis and some data mining techniques to reveal patterns of genes responsible for TS. Here, we used the peripheral blood mononuclear cell (PBMC) microarray dataset GSE46687 created by Bondy et al. to perform a genome-wide gene expression analysis to investigate the differentially expressed genes (DEGs) between monosomy X TS patients and normal female 46,XX individuals to understand postnatal differences. Our results will contribute to our understanding of the genetic etiology of TS and provide new insights into the clinical diagnosis and treatment of TS.

Microarray Data
The microarray expression profiling dataset GSE46687, deposited by Bondy et al., was downloaded from the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/). The dataset was based on the GPL570 Affymetrix Human Genome U133 Plus 2.0 Array platform. The experiment contained 36 samples consisting of 16 subjects with TS who were identified as having a maternally inherited X chromosome (45,Xm), 10 subjects with TS who were identified as having a paternally inherited X chromosome (45,Xp) and 10 subjects with the normal female karyotype (46,XX). Since it was public dataset, the information of age and health condition as well as the usage of the medication of the individuals was unavailable, which appears to be a potential limitation. The annotation file for GPL570 was also downloaded from the GEO.

Differential Expression Analysis
Differential expression analysis was performed using the online analysis tool GEO2R; the expression profiles of monosomy X TS patients and normal 46,XX females were compared to identify the DEGs. P-values and adjusted P-values were calculated using t-tests. Genes from each sample with the following criteria were retained: (1) a |log2 (fold-change)| >1 and (2) an adjusted P < 0.05. We selected the most significant genes when the DEGs were duplicated. We divided the TS patients into two groups depending on the parental origin of the existing X chromosome. Analyses were performed independently for the 45,Xm and 45,Xp TS samples, and the DEGs were determined by the intersection of the two datasets. A Venn diagram of DEGs was drawn using the online tool Venny 2.1 (http://bioinfogp.cnb.csic.es/tools/venny/ index.html), and the heatmap for the DEGs was created using Heml software.

Tissue-Specific Gene Expression Analysis
We used the online resource BioGPS (http://biogps.org) to analyze the tissue-specific expression of the DEGs. Transcripts mapped to a single tissue with the following criteria were identified as highly tissue specific: (1) the tissue-specific expression level was >10 times the median, and (2) the second highest expression level was less than one-third as high as the highest level (14).

Functional Enrichment Analysis of DEGs
We used DAVID 6.8 (https://david.ncifcrf.gov/tools.jsp) to perform the functional enrichment analysis of DEGs; this analysis included the functional categories, Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. The GO analysis included 3 categories, namely, biological process (BP), cellular component  *Genes that were involved in the immune system and on the X chromosome.
Frontiers in Endocrinology | www.frontiersin.org (CC) and molecular function (MF), which were used to predict protein functions (15). KEGG pathway analysis was used to assign sets of DEGs to specific pathways to enable the construction of the molecular interaction, reaction and relationship networks (16). The functional categories included COG_ONTOLOGY, UP_KEYWORDS and UP_SEQ_FEATURE. Benjamini-adjusted P < 0.05 and an enriched gene count >5 were chosen as the criteria for significance.

Protein-Protein Interaction (PPI) Network Analysis
The PPI network analysis was conducted using STRING (https://string-db.org/), which is an online database of known and predicted protein-protein interactions. These interactions include physical and functional associations, and the data are mainly derived from computational predictions, high-throughput experiments, automated text mining and co-expression networks. We mapped the DEGs onto the PPI network and set an interaction score of >0.4 as the threshold value. In addition, Cytoscape v3.6.0 software was used to visualize and construct the PPI network. Nodes with the greatest numbers of interactions with neighboring nodes were considered hub nodes.
To identify the key PPI network modules, the app ClusterOne from the Cytoscape software suite was used to perform the gene network clustering analysis. A P < 0.05 was set as the significance threshold for identifying key modules.

Differentially Expressed Genes
We downloaded the microarray expression dataset GSE46687 from the GEO database and analyzed the DEGs between monosomy X TS patients and normal female 46,XX individuals using the online analysis tool GEO2R. In total, 42 upregulated and 91 downregulated genes were identified between Xm TS patients and normal individuals. In addition, 279 upregulated and 234 downregulated genes were identified between Xp TS patients and normal individuals (Supplemental Table 1). We identified the intersection of these two datasets and obtained a total of 25 upregulated and 60 downregulated genes. As shown in Table 1, 15 (17.6%, one upregulated and 14 downregulated) of the 85 DEGs were on the X chromosome. Most of these genes are involved in basic cellular activities, such as the structural maintenance of chromosomes and the mediation of transcription. Three of the downregulated genes (AP1S2, CSF2RA, and CD99) on the X chromosome were related to the immune system. The X inactive specific transcript (XIST) gene was downregulated, and the adjusted P-value was highly significant (P < 0.0001). The Venn diagram and heatmap for the DEGs are presented in Figure 1. As shown in Figures 1A,B, 25 upregulated and 60 downregulated genes were identified through the comparison of TS patients and normal individuals. LMF1, a protein-coding gene involved in the maturation and transport of lipoprotein lipase, was upregulated in Xp TS patients, whereas it was downregulated in Xm TS patients ( Figure 1C).

Tissue-Specific Expression of Genes
We identified 23 genes that were expressed in a specific tissue or organ system using BioGPS (Supplemental Table 2). The most highly tissue-specific expression system was the hematologic/immune system (69.6%, 16/23). The neurologic and skin/skeletal muscle systems had similar levels of enrichment (8.7%, 2/23), while the respiratory, digestive and circulatory systems had the lowest enrichment levels (4.3%, 1/23) ( Table 2).

PPI Network Analysis of DEGs
A PPI network with 42 nodes and 49 edges was obtained; the network had an interaction score >0.4 according to the STRING online database (Figure 3A). The nodes correspond to genes, and the edges represent the links between genes. Red represents upregulated genes, and green represents downregulated genes. We used ClusterOne in Cytoscape to perform network gene clustering to identify the key PPI network modules. As shown in Figures 3B,C, two key modules with one upregulated gene (UBE2O) and seven downregulated genes (CDC27, HECTD1, JAK1, ASMTL, CD99, SLC25A6, and CSF2RA) were identified. Furthermore, functional enrichment analysis indicated that these eight genes were mainly involved in protein binding, phosphoprotein, methylation, the nucleus and the cytoplasm ( Table 3).

Identification of Genes of Interest
We identified three downregulated X chromosome genes (AP1S2, CSF2RA, and CD99) that are involved in the immune system. PPI network analysis detected two key modules consisting of eight genes (UBE2O, CDC27, HECTD1, JAK1, ASMTL, CD99, SLC25A6, and CSF2RA). The tissue-specific gene expression  analysis revealed that CDC27 and CD99 were specifically expressed in the hematologic/immune system. In addition, using the GeneCards database, we manually identified three additional genes of interest (UBE2O, HECTD1, and CSF2RA) that are potentially related to the pathogenesis of TS. Furthermore, we identified several other potentially involved genes among the genes with tissue-specific expression using the GeneCards database. All the genes of interest are shown in Table 4.

DISCUSSION
In this study, we analyzed the DEGs in PBMCs from monosomy X TS patients and normal females. We performed independently for the 45,Xm and 45,Xp TS samples to narrow and strengthen the potential pathogenesis genes in TS. Several novel genes that had not been reported to be associated with this condition were identified by comparing Xm and Xp TS patients. This discoverydriven analysis included a genome-wide search, and these novel genes provide insight into the pathogenesis of TS.
TS is a common genetic condition caused by abnormal sex chromosomes where in the affected female individuals lose an entire copy or a portion of the X chromosome (17). The X chromosome contains many genes, and these genes are mainly involved in ovarian development and the immune and skeletal systems. Loss of the entire X chromosome or a portion of it can lead to haploinsufficiency of these genes, causing short stature (18,19), gonadal insufficiency (20,21), or maldevelopment of the lymphatic system (12,22). In addition, cardiovascular abnormalities (23), metabolic syndrome (24) and intellectual disability (25) are also present in TS patients.
The XIST gene is involved in the X-inactivation process, which ensures gene dosage equivalence between males and females (26). The XIST gene is universally expressed in all cells; therefore, it can function as a positive control. In our study, the expression level of the XIST gene was lower in TS individuals than in normal individuals, which demonstrated the accuracy of this research method. We identified a total of 85 DEGs, consisting of 25 upregulated and 60 downregulated genes in monosomy X TS patients. But, the SHOX gene which is known to be involved in skeletal abnormalities (5) was down-regulated only in TS patients with paternally inherited X chromosome (Supplemental Table 1), probably due to its only  Phosphoprotein  57  KDM6A, CXORF38, IQGAP1, ZBTB38, TOP1, NLRC4,  ANK1, TPP2, ASMTL, KIAA1033, USP10, VPS13B,  INO80D, AHNAK, ZFX, UBE2O, ACVR2B, KHSRP,  FKBP15, SRCAP, KDM6B, BID, PLXNC1, AGFG1,  STK11, SSFA2, HADHA, ZNF652, MYL9, CHD9,  BCL11B, VPS35, BAZ2A, HECTD1, DHX9, TAOK1,  EPRS, CD99, SAMHD1, DGKK, AFF3, MYLPF, AGER,  CDC27, PTPN12, FXR1, SAFB2, HDAC5, ATRX, CYBA,  B3GAT1, SON, FCGR2C, JAK1, MAP4, SMC1A, TOB1   6.04E-09   GOTERM_MF_DIRECT  GO:0005515  Protein  binding   57  IQGAP1, ZBTB38, TOP1, NLRC4, AP1S2, ANK1, TPP2  the tissue-specific expression analysis, revealing that the most highly specific system in terms of the expression of the DEGs was the hematologic/immune system, which could explain the common occurrence of autoimmune diseases in TS patients. In TS patients, the main autoimmune diseases are autoimmune thyroiditis, type 1 diabetes and autoimmune enteritis. Among these disorders, the most common is autoimmune thyroiditis (13). Two possible mechanisms may explain the increased prevalence of autoimmune diseases in patients with TS. One plausible mechanism is that the X chromosome contains a large number of immune-related genes, and the altered X-linked gene dosage results in the loss of immune tolerance (27,28). The other potential mechanism is that autoimmune diseases are induced by chromosome aneuploidy (22). Although the dataset is derived from PMBC, similar results were also observed in leukocyte RNA-expression profile and human fibroblast cell line, as well as by comparing amniotic fluid RNA expression of profile of Turner syndrome fetuses and female fetuses. The PPI network and key module analyses identified two genes of interest (CD99 and CSF2RA) that are involved in the hematologic/immune system. CD99, a cell surface glycoprotein, is encoded by the pseudoautosomal gene MIC2 (29) and is involved in critical biological processes. In studies on CD99deficiency in fetuses, Shin et al. (30) demonstrated that CD99 plays a key role in lymphocyte development. In addition, other studies have suggested that CD99 is important in peripheral immune responses and hematopoietic precursor cell differentiation (31). CSF2RA is a protein-coding gene that is located in the pseudoautosomal region of the X chromosome. The protein encoded by this gene is the alpha subunit of the heterodimeric receptor for colony-stimulating factor 2, a cytokine that controls the production, differentiation, and functions of granulocytes and macrophages. The data suggest that dysregulation of CD99 and CSF2RA might underlie the increased frequency of autoimmune diseases in females with TS. Cardiovascular abnormalities, bicuspid aortic valve and associated aortic disease including coarctation of the aorta are common in TS (24). Our manual screening of tissue-specific gene expression identified one upregulated gene, MYL9, and one downregulated gene, MYLPF. The protein encoded by MYL9 is the myosin light chain, which may regulate skeletal muscle contraction and is also associated with smooth muscle contraction (32). MYLPF, which was downregulated, encodes the regulatory light chain of striated muscle (33). These data suggest that both MYL9 and MYLPF may function in vascular muscles, and their dysregulation could play important roles in increasing the risk of aortic coarctation in females with TS.
Another characteristic of TS is short stature, and the SHOX gene is known to be involved in skeletal abnormalities (5). An upregulated gene, IGFBP2, encoding insulin-like growth factor binding protein 2, was identified in our trial. Insulin-like growth factor (IGF) regulates cartilage and bone development through the integrated action of IGF ligands, receptors and binding proteins (34,35). Fisher et al. (36) found that overexpression of IGFBP2 inhibits IGF-mediated proliferation and reduces the proliferation of maturing chondrocytes and the formation of the periosteal bony collar, thus disrupting the balance of IGF/IGFBP2 activity and bone development. Further experiments should be performed to confirm the action of IGFBP2 in the pathogenesis of TS.
Genes involved in chromatin regulator were identified from our enriched GO terms analysis, including ATRX, HDAC5, CHD9, KDM6A, SRCAP, BAZ2A, and KDM6B. SRCAP, a SNF2-related chromatin-remodeling ATPase, was found to be elevated 4.563-and 6.930-folds in TS patients with Xm and Xp compared with normal female. SRCAP has various crucial roles in chromatin remodeling, gene expression, DNA damage response and cell division. Although lack of in vivo and/or in vitro experiments, by whole-exome sequencing, SRCAP was recently found to be the causative gene for Floating-Harbor syndrome (FHS), a rare human disease characterized by delayed bone mineralization and growth deficiency, mental retardation and skeletal and craniofacial abnormalities (37). ATRX (ATP-dependent helicase ATRX, Xlinked helicase I), located in X, plays a key role in deposition of the histone variant H3.3 at telomeres and other genomic repeat, maintaining the heterochromatic modifications at these sites. Inherited mutations of the ATRX gene cause diverse changes in the pattern of DNA methylation, leading to an X-linked mental retardation (XLMR) syndrome most often accompanied by alpha-thalassemia syndrome. Trolle et al. (10) found genome wide hypomethylation with most differentially methylated positions in TS patients, suggesting the dysfunction of chromatin regulator. Further study need to be done illustrate whether the dosage of ATRX underlie the mechanism of TS.
In this study, we used discovery-driven analysis to identify DEGs and found five genes of interest (CD99, CSF2RA, MYL9, MYLPF, and IGFBP2) by constructing a PPI network and identifying key modules. In the future, more research should be conducted to study the relationship between these genes and TS.

DATA AVAILABILITY STATEMENT
The datasets generated for this study can be found in the GEO dataset: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=GSE46687, using code GSE46687.