Bioinformatic analysis reveals lysosome-related biomarkers and molecular subtypes in preeclampsia: novel insights into the pathogenesis of preeclampsia

Background: The process of lysosomal biogenesis and exocytosis in preeclamptic placentae plays a role in causing maternal endothelial dysfunction. However, the specific lysosome-associated markers relevant to preeclampsia (PE) are not well-defined. Our objective is to discover new biomarkers and molecular subtypes associated with lysosomes that could improve the diagnosis and treatment of PE. Methods: We obtained four microarray datasets related to PE from the Gene Expression Omnibus (GEO) database. The limma package was utilized to identify genes that were differentially expressed between individuals with the disease and healthy controls. The logistic regression analysis was used to identify core diagnostic biomarkers, which were subsequently validated by independent datasets and clinical samples. Additionally, a consensus clustering method was utilized to distinguish between different subtypes of PE. Following this, functional enrichment analysis, GSEA, GSVA, and immune cell infiltration were conducted to compare the two subtypes and identify any differences in their functional characteristics and immune cell composition. Results: We identified 16 PE-specific lysosome-related genes. Through regression analysis, two genes, GNPTG and CTSC, were identified and subsequently validated in the external validation cohort GSE60438 and through qRT-PCR experiment. A nomogram model for the diagnosis of PE was developed and evaluated using these two genes. The model had a remarkably high predictive power (AUC values of the training set, validation set, and clinical samples were 0.897, 0.788, and 0.979, respectively). Additionally, two different molecular subtypes (C1 and C2) were identified, and we found notable variations in the levels of immune cells present in the two subtypes. Conclusion: Our results not only offered a classification system but also identified novel diagnostic biomarkers for PE patients. Our findings offered an additional understanding of how to categorize PE patients and also highlighted potential avenues for creating treatments for individuals with PE.


Introduction
Preeclampsia (PE) is a pregnancy disorder that usually occurs after 20 weeks of pregnancy, and is characterized by high proteinuria and blood pressure (Ma'ayeh and Costantine, 2020;Ramos et al., 2017). This condition is linked to significant maternal morbidity and mortality, as well as fetal growth restriction, iatrogenic premature delivery, and organ dysfunction (Mol et al., 2016). Moreover, infants born to mothers with PE are at risk for adverse neurological outcomes, such as schizophrenia, autism, epilepsy, and intellectual disability (Byrne et al., 2007;Wu et al., 2008;Griffith et al., 2011;Walker et al., 2015;Ursini et al., 2018). Currently, there are limited clinical interventions available that are effective in treating PE. Even after delivery, both infant and mother are still susceptible to chronic renal or hypertension later in life (Irgens et al., 2001;Skjaerven et al., 2012). Therefore, identifying pregnant women at high risk of developing PE or severe PE promptly is crucial to prevent adverse pregnancy outcomes. Currently, PE is diagnosed based on proteinuria and hypertension. However, this approach lacks specificity and sensitivity, resulting in a poor prognosis for both the mother and fetus (Zhang et al., 2001). Therefore, it is imperative to identify novel diagnostic biomarkers that can be used for monitoring and diagnosis of PE.
The development of PE is a multifaceted process that involves various mechanisms such as inflammatory processes, immune imbalance, and oxidative stress (Taysi et al., 2019;Wang et al., 2022;Deer et al., 2023). A previous study has highlighted the significance of autophagy in vascular remodeling and trophoblast invasion (Saito and Nakashima, 2014). Lysosomes are crucial organelles within cells that serve as both degradation centers and signaling hubs. They play integral roles in maintaining cellular homeostasis, promoting development, and influencing the aging process (Yang and Wang, 2021). Lysosomes are also known to play a crucial role in autophagy and endocytosis and can originate from pre-existing autolysosomes or endolysosomes (Mahapatra et al., 2021). They are involved in different types of autophagy, such as macroautophagy and microautophagy, as well as various cell death pathways including autophagic cell death, apoptotic cell death, and lysosomal cell death (Radulovic et al., 2018;Wang et al., 2018). The pathogenesis and treatment of PE involve immune and inflammatory reactions, and lysosomes with immunogenicity may implicate the pathophysiological process of PE (Visser et al., 2007;Watts, 2022). Therefore, identifying lysosome-related genes (LRs) is crucial for early diagnosis of PE and may aid in early intervention for affected patients. However, the exact mechanism of lysosomes in the PE is still unclear.
The pathophysiology and etiology of PE are currently not well understood and are being extensively researched. It is widely recognized that the placenta has a significant impact on the development of PE, and research focused on the changes in placental pathophysiology is particularly valuable (van der Merwe et al., 2010;Travaglino et al., 2019;Kim et al., 2021). In the present study, we utilized a data-mining approach to identify the differentially expressed LRs in placenta samples from PE and healthy controls (HC). We employed logistic regression analysis to identify potential genes that serve as risk signatures for predicting the likelihood of PE occurrence. We then created a nomogram model using these diagnostic genes. Furthermore, we examined the relationship between immune infiltration and the diagnostic genes and performed a GSEA analysis on them. Furthermore, to identify subtypes associated with lysosome phenotypes in PE samples,

FIGURE 1
The flow chart of the process of the analysis. consensus clustering was performed. A theoretical basis for understanding the lysosomes associated with PE development will be provided through our findings. The workflow of this study is illustrated in Figure 1.

Materials and methods
Collection of the datasets and data processing Four microarray datasets (GSE10588, GSE35574, GSE25906, and GSE60438) related to PE were obtained from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/). Table 1 presents the relevant information for these datasets. To remove any batch effects, the "sva" package in R was used to merge three of the datasets (GSE10588, GSE35574, and GSE25906). To categorize the samples, the merged dataset was divided into HC (n = 103) and PE (n = 59) groups. The merged dataset was then used as a training set, while GSE60438 was used as a validation set. Lysosome-related gene sets were collected from the Molecular Signatures Database. After removing duplicates, 180 LRs were obtained.

Identification of differentially expressed LRs
We utilized the "limma" package in R to screen for differentially expressed LRs between HC and PE samples, applying a threshold of p.adjust <0.05 and |log fold change (FC)| ≥ 0.5. To create the heatmap, we used the "pheatmap" package, and for the volcano plot, we used the "ggplot2" package.

Identification of diagnostic markers and construction of nomogram model
To identify important genes associated with PE, logistic regression analysis was used to screen for these genes from the differentially expressed LRs. The accuracy of the diagnostic model was evaluated using receiver operating characteristic (ROC) curve analysis, both in the merged dataset and in the GSE60438 dataset. To investigate the specific role of important LRs in diagnosing PE, we utilized the "rms" package to develop a nomogram model for predicting the incidence of PE. The model's accuracy was evaluated using calibration curves and decision curve analysis (DCA) (Iasonos et al., 2008).

Single-gene gene set enrichment analysis (GSEA)
For single-gene GSEA, we utilized the "clusterProfiler" and "enrichplot" packages to classify samples into high and low expression groups using the median values of gene expression levels. We obtained the subset "c2.cp.kegg.v7.4.symbols.gmt" from the Molecular Signatures Database to investigate molecular mechanisms of genes based on phenotypic grouping.

Analysis of immune cell infiltration
The xCell algorithm was utilized in this study to compare the enrichment scores of infiltrating immune cells between PE and HC groups. The algorithm is a novel method for assessing the relative fraction of immune cells based on gene signatures (Aran, 2020). The correlation between the immune cells and the signature genes was analyzed using the "ggplot2" package and visualized in a lollipop diagram.

Identification of molecular subtypes
Unsupervised clustering analysis of PE patients was performed using ConsensusClusterPlus based on differentially expressed LRs. This helped to classify the PE samples into several subtypes. Subsequently, we used the "limma" package to screen differentially expressed genes (DEGs) between the two subtypes. To investigate the distinct signaling pathways and possible roles of DEGs, we performed Kyoto encyclopedia of genes and genomes (KEGG) and gene ontology (GO) enrichment analyses using the R package "clusterProfiler". A q-value of less than 0.05 was deemed significant.  In this study, we obtained the "c5.go.symbols" subsets from the MSigDB database to examine variations in biological pathways. To analyze the differences in pathways and biological functions among various subtypes, we employed the "Limma" and "GSVA" packages.

Validation of the bioinformatics results
Placenta specimens from 12 HC to 12 PE patients were collected from the Department of Obstetrics, The First Peple's Hospital of Chenzhou. The study obtained written informed consent from all participants and was approved by the Ethics Committee of The First People's Hospital of Chenzhou. Tissue total RNA was extracted using TRIzol reagent (Invitrogen; Thermo Fisher) following the manufacturer's protocols. Additionally, the RNA was quantified spectrophotometrically at an absorbance of 260 nm, and its purity was estimated by calculating the absorbance ratio at 260/ 280 nm. In RNA samples, the absorbance ratios (260/280 nm) should ideally range from 1.8 to 2.0. Approximately 2 µg total RNA was converted into cDNA using cDNA synthesis kits (Invitrogen; Thermo Fisher). Subsequently, quantitative real-time polymerase chain reaction (qRT-PCR) analysis was performed on a CFX96 ™ real-time PCR system (Bio-Rad Laboratories, Inc).
GAPDH was used as the reference mRNA. The 2 −ΔΔCt method was used to evaluate mRNA gene expression relative to GAPDH expression. The list of primers used can be found in Table 2.

Identification of differentially expressed LRs in PE
The integrated dataset used in this study included 59 PE samples and 103 HC samples, which were integrated after removing batch effects from three GEO datasets (GSE10588, GSE35574, and GSE25906). After conducting a differential analysis between PE and HC groups, 1375 differentially expressed genes were identified. Among them, 627 genes were upregulated, while 748 genes were downregulated (Figure 2A and Supplementary Table S1). The genes obtained from the DEGs and LRs were intersected, and 16 differentially expressed LRs were obtained ( Figure 2B). To analyze the differential expression of differentially expressed LRs between the PE samples and HC samples, a boxplot, and heatmap were generated to display the expression levels of these genes genes were downregulated in PE group compared to HC group (p < 0.05). In addition, we analysed the gene expression levels of 16 LRs using the GSE60438 dataset. As shown in Supplementary Figure S1, except for SLC11A1, NEU4, DNAJC6, and TYR, the expression levels of the other 12 LRs were consistent with the trend of the integrated dataset. The logistic analysis identifies diagnostic markers for PE patients A logistic regression analysis was conducted using 16 LRs, resulting in the identification of two hub genes (GNPTG and CTSC) that exhibited p values <0.05 (Table 3). These genes were found to be essential for the diagnostic model. In the training set ( Figure 3A) and validation set ( Figure 3C), the GNPTG gene was significantly upregulated in the PE group compared to the HC group, while CTSC was significantly downregulated in the PE group (p < 0.05). The model achieved an area under the curve (AUC) value of 0.897 on the training set ( Figure 3B) and 0.788 on the external validation set ( Figure 3D) based on ROC analysis. To validate the results of the bioinformatics analysis, clinical samples were collected. The results were consistent with those described above ( Figures 3E, F).

Establishment of the nomogram model
A nomogram model was developed using the projected risk score and two trait genes of the patient to predict the occurrence of PE ( Figure 4A). The calibration curves showed that the predictions made by the nomogram model were almost identical to those made by the ideal model ( Figure 4B). The decision curve analysis (DCA) demonstrated that the utilization of the nomogram model may be beneficial for patients with PE as the bottle green line (model) consistently outperformed the bottle blue (all) and orange-red (none) lines from 0 to 1 ( Figure 4C).

The landscape of the immune cell infiltrates in PE
As demonstrated in Figure 6A, we utilized xCell to measure the proportion of immune cell types. The findings revealed that the proportion of CD4 + Tcm (p < 0.01), eosinophils (p < 0.05), and neutrophils (p < 0.001) was notably greater in the PE group compared to the HC group; however, the proportion of B cells (p < 0.05), basophils (p < 0.05), CD8 + Tcm (p < 0.05), macrophages (p < 0.01), monocytes (p < 0.05), NK cells (p < 0.01), and Tregs (p < 0.05) was notably lower in the PE group compared to the HC group. In addition, a correlation analysis was conducted to examine the relationship between trait gene expression and levels of immune cell infiltration. The findings indicated that the expression of these genes was significantly correlated with infiltration levels in various immune cells, including neutrophils, eosinophils, CD4 + memory T cells, and CD4 + T cells, indicating that these diagnostic genes may play a role in immune regulation in the development of PE ( Figures 6B, C).

Identification of lysosome-related subtypes in PE
Using consensus clustering analysis, we were able to classify PE samples into two distinct molecular subtypes based on the LRs expression profiles. These subtypes were labeled as C1 (N = 30) and C2 (N = 29) ( Figure 7A). To compare the two subtypes, a total of 1821 DEGs were identified between C1 and C2. Among these, 1060 DEGs were upregulated and 761 DEGs were downregulated in C2 ( Figure 7B). The heatmap demonstrated that these DEGs were able to differentiate between the two lysosome patterns ( Figure 7C).

Functional enrichment analysis between two subgroups
In this study, we analyzed the DEGs between two lysosome modalities in patients with PE and investigated their involvement in biologically significant functions. Our findings revealed that these DEGs were mainly enriched in BP of blood Frontiers in Genetics frontiersin.org vessel diameter maintenance, regulation of tube diameter, positive regulation of JNK cascade, regulation of homotypic cell-cell adhesion, etc. The enriched CC of these genes were actin cytoskeleton, sarcolemma, cell-cell junction, sarcoplasmic reticulum, etc ( Figure 8A). The enriched KEGG of these genes were the calcium signaling pathway, platelet activation, cytokine-cytokine receptor interaction, oxytocin signaling pathway, inflammatory mediator regulator of TRP channels, lysosome, etc ( Figure 8B). In addition, we performed GSEA analysis on all genes linked to two distinct lysosome patterns and observed notable variations in pathways, particularly in glutathione metabolism, NABA Ecm glycoproteins, glycosaminoglycan metabolism, sarscov2 innate immunity evasion and cell-specific immune response, type interferon signaling, chemokine receptors bind chemokines, CXCR3 pathway, etc ( Figure 9A). Furthermore, GSVA results indicated that negative regulation of macrophage cytokine production, negative regulation of dendritic cell differentiation, regulation of macrophage chemotaxis, peripheral nervous system development, and lymphangiogenesis were upregulated in the C1 subtype, while positive regulation of response to interferongamma was downregulated in the C1 subtype ( Figure 9B).

Analysis of immune cell infiltration between the two subtypes
As shown in Figures 10A, B, our findings that the proportion of CD4 + memory T cells (p < 0.05) and iDC (p < 0.05) was notably greater in the C1 subtype compared to the C2 subtype; however, the proportion of CD4 + naïve T cells (p < 0.01), CD8 + naïve T cells (p < 0.05), CD8 + T cells (p < 0.01), NKT (p < 0.05), and Th1 cells (p < 0.05) was notably lower in the C1 subtype compared to the C2 subtype.

Discussion
Lysosomes are organelles responsible for digestion in both the autophagic and endocytic pathways. By increasing the activity of lysosomal enzymes, it may be possible to clear pathological waste from cells (Saftig and Haas, 2016). Lysosomal dysfunction has been demonstrated in the pathology of malignant tumors, pancreatitis, neurodegenerative diseases, and atherosclerosis (Zhang et al., 2021). Recent evidence suggested that lysosome is a significant factor in the onset and progression of PE (Nakashima et al., 2020;Ermini et al., 2021). As a result, biomarkers related to lysosomes could serve as valuable diagnostic tools and therapeutic targets for PE. Our study extracted the placental transcriptome from the GEO database of PE patients. Analysis of gene expression revealed 16 differentially expressed LRs in the PE group compared to the HC group. Subsequently, logistic regression analysis was performed on all the LRs, resulting in the identification of the key genes GNPTG and CTSC. A diagnostic model was constructed using these genes to predict the occurrence of PE.
The GNPTG gene is responsible for encoding the alpha and beta subunits, as well as the gamma subunit of the UDPGlcNAc-1phosphotransferase enzyme, which is crucial for the correct transportation of lysosomal acid hydrolases (Leroy et al., 2014). Mucolipidosis types II and III are severe forms of autosomal recessive lysosomal storage diseases that can be caused by mutations in the GNPTG gene (Chen et al., 2015;Pasumarthi et al., 2020). A study of individuals with stuttering from Pakistan and North America revealed multiple mutations in the GNPTG gene associated with lysosomal enzyme targeting pathways (Kang et al., 2010). CTSC is a lysosomal cysteine proteinase that belongs to the papain superfamily (Diao et al., 2022). It is expressed in various inflammatory cells in mammals, such as epithelial cells, natural killer cells, and cytotoxic T cells (Diao et al., 2022). Deficiency of CTSC in mammals leads to the development of autoimmune diseases such as periodontitis, Halm-Munk syndrome, and Papillon-Lefevre syndrome (Hart et al., 2000;Nakano et al., 2001;Loos et al., 2005). Additionally, it plays a crucial role in the development and progression of tumors (Xiao et al., 2021;Cheng et al., 2022;Dang et al., 2023). Our study is the first to report on the potential of GNPTG and CTSC as diagnostic biomarkers for PE progression. As shown in GSE60438 dataset ( Figure 3C) and in-house experiments ( Figure 3E), the GNPTG gene was significantly upregulated in the PE group compared to the HC group, while CTSC was significantly

FIGURE 10
Immune characteristics between the two subtypes. Heatmap (A) and boxplot (B) depicted the differential infiltration of the immune cells.
Frontiers in Genetics frontiersin.org downregulated in the PE group (p < 0.05). Our findings suggested that GNPTG and CTSC genes may serve as effective diagnostic biomarkers for PE patients. According to our research, GNPTG and CTSC genes be correlated with immune cell infiltrations during the development of PE. PE is a condition where immune cells, including regulatory T cells, macrophages, natural killer cells, and neutrophils, play a significant role in its pathology (Aneman et al., 2020;Deer et al., 2023). Our study revealed that patients with PE can be classified into two distinct subgroups based on unsupervised clustering analysis, with varying levels of immune cell infiltration. Notably, in the C2 subgroup, there was an upregulation of CD4 + naive T cells, CD8 + naïve T cells, CD8 + T cells, Th1 cells, and natural killer T cells (NKT), whereas CD4+memory T cells and iDC were downregulated. Consistent with our results, a previous study has revealed that NKT cells may be regulators of the Th1/Th2 balance at the maternal-fetal interface (Tsuda et al., 2001). NKT cells secrete both IL-4 and IFN-γ cytokines, which are believed to play a role in regulating the balance between Th1 and Th2 cells within the endometrial tissue (Chen and Paul, 1997). In addition, NKT cells play an important role in the pathogenesis of PE (Hashemi et al., 2017). At the feto-maternal interface, CD8 + T cells play a critical role in immune tolerance. Severe PE is distinguished by changes in the expression of cytotoxic CD8 + T cells in the maternal peripheral blood and decidua, indicating their involvement in the feto-maternal immune tolerance and pathophysiology of PE (Soljic et al., 2021). Moreover, modifying the expression of certain miRNAs associated with T cells could serve as a promising avenue for the treatment of PE (Zolfaghari et al., 2021). In our study, a total of 1821 genes showed significant differences in expression between the two lysosome-related subgroups. These genes were enriched in the cytokine-cytokine receptor interaction, inflammatory mediator regulator of TRP channels, lysosome, and regulation of macrophage chemotaxis in further functional enrichment analyses including GO, KEGG, GSEA, and GSVA. These findings indicated that the progression of PE is influenced by pathways associated with the immune system and inflammation, which is consistent with previous studies (Zhao et al., 2021;Deer et al., 2023).

Conclusion
Overall, GNPTG and CTSC were identified as diagnostic markers for PE patients. We have developed a diagnostic model for patients with PE that has shown promising diagnostic value. Our analysis of 16 LRs has revealed significant differences in the immune microenvironment between C1 and C2 subtypes. Our findings could facilitate the identification of early diagnostic indicators and the development of effective immunotherapy approaches against PE.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.