Abstract
Genome-wide association studies (GWAS) involving increasing sample sizes have identified hundreds of genetic variants associated with complex diseases, such as type 2 diabetes (T2D); however, it is unclear how GWAS hits form unique topological structures in protein–protein interaction (PPI) networks. Using persistent homology, this study explores the evolution and persistence of the topological features of T2D GWAS hits in the PPI network with increasing p-value thresholds. We define an n-dimensional persistent disease module as a higher-order generalization of the largest connected component (LCC). The 0-dimensional persistent T2D disease module is the LCC of the T2D GWAS hits, which is significantly detected in the PPI network (196 nodes and 235 edges, P0.05). In the 1-dimensional homology group analysis, all 18 1-dimensional holes (loops) of the T2D GWAS hits persist over all p-value thresholds. The 1-dimensional persistent T2D disease module comprising these 18 persistent 1-dimensional holes is significantly larger than that expected by chance (59 nodes and 83 edges, P0.001), indicating a significant topological structure in the PPI network. Our computational topology framework potentially possesses broad applicability to other complex phenotypes in identifying topological features that play an important role in disease pathobiology.
1 Introduction
Understanding the genotype–phenotype relationships is challenging owing to their polygenicity and nonlinearity. Complex diseases result from interactions between diverse cellular processes and genes. Elucidating the genetic basis of complex diseases in the context of protein–protein interaction (PPI) networks is essential (; ). In the PPI network, genes (or gene products) that have similar biological functions are likely to interact closely with each other. Thus, genes associated with a specific phenotype tend to be clustered into a connected component called a disease module in the PPI network (). Disease modules that significantly overlap with each other exhibit similar pathobiological pathways, co-expression patterns, and clinical manifestations (). This disease module concept is useful in identifying novel disease–disease or disease–drug relationships (; Guney et al., 2016), enabling the implementation of network-based drug repurposing for complex traits/diseases ().
Genome-wide association studies (GWAS) have identified numerous genetic variants associated with various complex diseases and can be used to characterize disease-associated modules in the PPI network (). Genes associated with GWAS loci or GWAS hits tend to be mapped onto coherent network modules in the PPI network. As the p-value threshold increases from 0 to the standard genome-wide significance threshold of 5 × 10−8, the GWAS hits mapped to the PPI network tend to gradually form a single, connected component (; ). This largest connected component (LCC) of disease genes is occasionally called an observable disease module (; ). However, owing to the limited sample size and coverage of current GWAS data as well as the interactome incompleteness, disease-associated seed genes are often scattered in the PPI network. To detect disease modules, various seed-expanding and/or heuristic-based algorithms have been developed to expand and merge the scattered seed genes in the PPI network (; ; ; ). In addition, machine learning and graph embedding algorithms have been used to predict disease-associated genes (; ; ) and disease treatment mechanisms (; ) in the context of biological networks. Most studies have focused on identifying connected components by mapping disease-associated seed genes onto the PPI network and expanding these seed genes. However, the mechanism by which GWAS hits are mapped onto the LCC or other unique topological structures in the PPI network as the p-value threshold increases remains unclear. Therefore, the topological features of GWAS hits mapped onto the PPI network warrant investigation.
One mathematical method for analyzing the topological features of complex networks is simplicial homology (; ). Simplicial homology is an algebraic topology tool used to analyze the topological features of a simplicial complex, which is a collection of higher-order interactions called simplices, including points (0-simplices), line segments (1-simplices), triangles (2-simplices), and higher-dimensional simplices. Simplicial homology can be used to examine the connectivity patterns within biological networks, such as gene-regulatory networks or brain connectivity networks. It can identify topological features, such as connected components (0-holes), loops (1-holes), voids (2-holes), and higher-dimensional holes in the data. For example, the LCC is the largest 0th homology class (connected component). Persistent homology is a method for capturing the persistence of simplicial homology features across multiple thresholds corresponding to a filtration of the simplicial complex (; ). It can identify important topological features that are persistent across different levels of interaction, rather than artifacts of noise or parameter uncertainty. Persistent homology features of biological networks potentially correspond to biologically relevant components that play a crucial role in disease mechanisms (; ). The mathematical details of simplicial complex and homology concepts are described in the Method section.
This study analyzes the persistent homology features of GWAS hits in the PPI network to identify important topological structures that potentially play a significant role in disease pathobiology. We analyze the simplicial homology features of GWAS hits in the PPI network as the p-value threshold increases from 0 to 5 × 10−8. For example, the LCC of the mapped GWAS hits, which is occasionally called an observable disease module, can be considered a connected component (0th homology class) that lives forever. This study aims to expand the LCC concept using higher-order topological structure analysis. We use GWAS summary statistics data of type 2 diabetes (T2D) because T2D has undergone extensive genetic study across diverse ancestry populations with large sample sizes. GWAS with increasing sample sizes have recently identified more than 300 genetic loci associated with T2D (); however, many of these GWAS loci have small effect sizes of unclear pathobiological meaning. Therefore, this study systematically explores the evolution and persistence of the topological features of T2D GWAS hits in the PPI network as the p-value threshold increases from 0 to 5 × 10−8. We also analyze biological pathways, transcription factors, and microRNAs associated with the persistent homology features.
2 Methods
2.1 Overview of the computational topology framework
This study analyzes the topological features of GWAS hits in the human PPI network. Using persistent homology, we systematically explore
n-dimensional holes associated with a specific phenotype, as follows.
1. Map GWAS hits onto the human PPI network.
2. Using persistent homology, identify n-dimensional holes of GWAS hits in the PPI network, as the p-value threshold increases from 0 to 5 × 10−8.
3. Detect nth persistent disease modules, which we define as unions of n-dimensional holes that live forever over all p-value thresholds.
4. Compute the statistical significance of nth persistent disease modules by comparing the result with the randomized distribution of a set of randomly selected nodes in the PPI network.
Since the LCC can be considered a connected component (0th homology class) that lives forever, the nth persistent disease module can be viewed as a higher-order generalization of the LCC concept. We test our computational framework using T2D GWAS summary statistics data, and perform functional enrichment analysis to validate the pathobiological significance of the persistent homology features.
2.2 Consolidated human protein–protein interactome
We used a consolidated human PPI network constructed previously by Wang and Loscalzo (; ). Briefly, the protein–protein interactome was compiled from various sources, including high-throughput yeast-two-hybrid studies, the Center for Cancer Systems Biology (CCSB) human interactome, binary PPIs from other laboratories, protein–protein co-complex interactions, signaling interactions, kinase–substrate interactions, and the Human Reference Interactome (HuRI) binary PPIs. This network possesses a scale-free topology (). The LCC of the protein–protein interactome, comprising 16,422 proteins (nodes) and 233,940 interactions (links), was used for the downstream analyses.
2.3 T2D GWAS hits
We used a GWAS meta-analysis summary statistics dataset of 228,499 T2D cases and 1,178,783 controls encompassing multi-ancestral groups () (downloaded from the GWAS catalog https://www.ebi.ac.uk/gwas/). The standard genome-wide significance threshold of 5 × 10−8 was applied. Each genetic variant was annotated with the closest gene(s) via GWAS catalog gene-mapping data. Only GWAS loci that had been annotated with at most two genes were included. Some genes were linked to multiple GWAS loci with multiple p-values. To extract GWAS hits, for each gene, we assigned the lowest p-value from the different GWAS loci mapped onto that gene (). We only considered genes (or proteins) in the human PPI network.
2.4 Simplicial complex and homology theory
Here, we briefly describe the fundamentals of the simplicial complex and persistent homology theory (; ; ). An n-dimensional simplex (n-simplex) is formed by n + 1 nodeswith an assigned orientation. For example, a 0-simplex is a vertex (node), a 1-simplex is an edge (link), and a 2-simplex is a triangle. An n′-face of an n-simplex (n′ < n) is a proper subset of the nodes of the simplex with order n′ + 1. A simplicial complex K is a set of simplices closed under the inclusion of the faces of each simplex. Given a set of n-simplices of a simplicial complex K, an n-dimensional chain (n-chain) is defined as a finite linear combination of n-simplices of K, as follows:where . In this study, we restrict our analysis to homology with coefficients. The set of n-chains forms an abelian group denoted by Cn (n-chain group). For any n-simplex , the boundary operator ∂n: Cn → Cn−1 is the homomorphism defined as follows:An n-chain is said to be a n-cycle if its boundary is zero; that is, elements of the subgroup Zn≔ ker ∂n ⊆ Cn are n-cycles. Similarly, elements of the subgroup Bn≔im ∂n+1 ⊆ Cn are said to be n-boundaries. Based on the definition of the boundary operator, it is obvious that any boundary has no boundary (i.e., ∂n∂n+1 = 0). Thus, Bn ⊆ Zn ⊆ Cn. Hence, the nth simplicial homology group Hn of the simplicial complex K can be defined as the quotient abelian group:The rank of the nth homology group Hn is called the nth Betti number βn. The nth homology group Hn is isomorphic to , with the basis of independent n-cycles on Zn modulo boundaries. Intuitively, it represents n-dimensional holes in the simplicial complex K. For example, β0, β1, and β2 represent the number of connected components, loops, and voids, respectively.
Persistent homology is a method for analyzing simplicial topological features at different resolutions of a given simplicial complex (; ). Formally, a filtration of the simplicial complex K is a finite sequence of subcomplexes such thatFor 0 ≤ i ≤ j ≤ m, the inclusion Ki↪Kj induces a homomorphism , and the nth persistent homology groups are defined as the images of these homomorphisms:Intuitively, the nth persistent homology groups represent n-dimensional holes that persist from Ki to Kj. We can track when n-dimensional holes appear (birth) and disappear (death) at different threshold values of the filtration. Persistence diagrams, representations of persistent homology, can be constructed by plotting the birth and death sites of topological features.
2.5 Persistent homology analysis of GWAS hits
In this study, the PPI network G = (V, E) is considered a simplicial complex K: genes (or proteins) are regarded as 0-simplexes (nodes), PPIs as 1-simplexes (links), and higher-order connections (or cliques) as high-dimensional simplices. The T2D disease module was identified as the LCC of the PPI subnetwork induced by the T2D GWAS hits. The statistical significance of the LCC was calculated by comparing the observed LCC size with the randomized LCC distribution of a set of randomly selected nodes of the same size in a degree-preserving manner over 1,000 repetitions. The z-score was estimated as , where LCCobs is the observed LCC size, and ⟨LCC⟩rnd and σrnd are the mean and SD of the randomized LCC distribution, respectively.
Each T2D GWAS hit’s p-value was used as a varying threshold to obtain a filtration of the PPI subnetwork induced by the T2D GWAS hits as a function of the p-value. As the threshold value increases from 0 to 5 × 10−8, each node appears at the p-value assigned to that gene. Formally, we define the δ-simplicial complex for the p-value threshold δ ≥ 0 as follows:where p(v) ∈ [0, 1] is the GWAS hit p-value assigned to the node v ∈ V. Using this δ-simplicial complex, we define the filtration as . We subsequently examined the persistent homology features (n-dimensional holes) of this filtration for each dimension as a function of the p-value threshold. Persistence diagrams were used to visualize the birth and death times of topological features. For each dimension, we also computed the Betti numbers (ranks) of the simplicial homology groups as a function of the p-value threshold.
We define an nth persistent disease module as a union of n-dimensional holes that live forever over all p-value thresholds. This definition is concordant with the conventional disease module concept–the 0th persistent disease module is the LCC, which is the persistent 0-dimensional hole (connected component) that lives forever. The statistical significance of the nth persistent disease module was calculated by comparing the observed persistent disease module size with the randomized persistent disease module distribution of a set of randomly selected nodes of the same size in a degree-preserving manner. The persistent homology features of randomly selected nodes were analyzed over 1,000 repetitions. The network and homology analyses were performed using the NetworkX and Ripser packages of Python 3.8 (https://www.python.org/), and networks were visualized using Cytoscape 3.9.1 (https://cytoscape.org/). The core code for analyzing persistent homology is publicly available in our GitHub repository (https://github.com/esong0/PHGWAS).
2.6 Functional enrichment analysis
To infer the biological significance of the persistent disease module, a pathway enrichment analysis was performed based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) 2021 database using the GSEApy Python package () with the Enrichr web server (). In addition, the transcription factor target enrichment analysis was conducted based on the ENCODE and ChEA Consensus databases. The microRNA target enrichment analysis was also performed based on the miRTarBase database. Adjusted p-values were computed using the Benjamini–Hochberg method, and statistical significance was set at P0.05.
3 Results
3.1 The LCC of GWAS hits
We compiled the T2D GWAS hits using the large-scale T2D GWAS summary statistics data, 565 of which are present in the human PPI network. As the p-value threshold increased from 0 to 5 × 10−8, the LCC of the subnetwork induced by the T2D GWAS hits increased (Figure 1A). When the standard genome-wide significance threshold of 5 × 10−8 was applied, we identified the LCC comprising 196 nodes and 235 edges, which is significantly larger than that expected by chance (p = 0.0487, Figure 1B). We defined a T2D observable disease module as this LCC of the T2D GWAS hits (Supplementary Table S1). Other connected components of the subnetwork induced by the T2D GWAS hits comprised ≤5 nodes, which were excluded from the downstream analyses.
FIGURE 1
3.2 Persistent homology analysis
We examined how the topological features of the T2D disease module evolve and persist in the PPI network as the p-value threshold increases from 0 to 5 × 10−8. In our framework, each node appears at the p-value assigned to that gene. We used the p-value of each T2D GWAS hit as a varying threshold and determined the timing of the appearance (birth) and disappearance (death) of n-dimensional holes at different threshold values. In the 0th homology group (H0) analysis, 61 0-dimensional holes (connected components) were identified, of which only one persisted over all p-value thresholds (Figure 2A). This persistent 0-dimensional hole is the LCC, that is, the T2D observable disease module. In the 1st homology group (H1) analysis, 18 1-dimensional holes (loops or 1-cycles) were identified, all of which persisted over all p-value thresholds (Figure 2A). No higher-dimensional hole (n ≥ 2) existed in the T2D GWAS hit data. The Betti numbers (ranks) of the simplicial homology groups are shown in Figure 2B. As the p-value threshold increases, the number of 0-dimensional holes converges to 1 (i.e., the LCC), while the number of 1-dimensional holes increases and converges to 18.
FIGURE 2
We identified the nth persistent disease modules, which were defined as unions of persistent n-dimensional holes that live forever over all p-value thresholds. The 0th persistent disease module is the LCC, which is the persistent 0-dimensional hole that lives forever. As shown in Figure 1B, the LCC is significantly larger than that expected by chance. Since the LCC concept has been extensively investigated in various complex diseases, this study focused on the 1st persistent disease module. In our T2D GWAS data analysis, we identified 18 persistent 1-dimensional holes (loops or 1-cycles), which constitute the 1st persistent T2D disease module comprising 59 nodes and 83 edges (Figure 3A). This 1st persistent T2D disease module is significantly larger than that expected by chance (P0.001, Figure 3B), indicating a significant topological feature of the T2D GWAS hits in the PPI network. Since the lowest p-value in the T2D GWAS data is as extremely small as 3e-695 (rs7903146 in TCF7L2), we repeated our analysis using the log p-value scale. The same 61 0-dimensional holes and 18 1-dimensional holes were also identified (Supplementary Figure S1).
FIGURE 3
3.3 Biological pathways, transcription factors, and microRNAs
To infer the pathobiological significance of the 1st persistent T2D disease module, we identified over-represented KEGG pathways. The top 10 enriched KEGG pathways included mTOR signaling, FoxO signaling, AMPK signaling, the longevity regulating pathway, PI3K-Akt signaling, the transcriptional misregulation pathway in cancer, and several cancer pathways (Figure 4A). In addition, the 1st persistent T2D disease module was enriched with targets of transcription factors, including UBTF, YY1, RUNX1, ZBTB7A, KLF4, RCOR1, GATA1, PBX3, E2F1, and CREB1 (Figure 4B). The 1st persistent T2D disease module was also enriched with targets of microRNAs, including hsa-miR-152–3p and hsa-miR-320a (Figure 4C).
FIGURE 4
4 Discussion
Using persistent homology, this study explored the evolution and persistence of the topological features of T2D GWAS hits in the PPI network as the p-value threshold increased from 0 to 5 × 10−8. The nth persistent disease module was defined as a union of persistent n-dimensional holes that live forever over all p-value thresholds. This is a higher-order generalization of the conventional disease module concept. The 0th persistent T2D disease module is the LCC of the T2D GWAS hits, which is significantly larger than that expected by chance. In the 1st homology group analysis, all 18 1-dimensional holes (loops) of the T2D GWAS hits persist over all p-value thresholds. The 1st persistent T2D disease module comprising these 18 persistent 1-dimensional holes is significantly larger than that expected by chance, indicating a significant topological structure in the PPI network. The 1st persistent T2D disease module is enriched with the mTOR, FoxO, AMPK, and PI3K-Akt signaling pathways; longevity regulating pathway; and cancer pathways. It has been known that the mechanisms of T2D, aging, and cancer are closely related to each other (). The pathobiological significance of this persistent disease module is subject to subsequent experimental validation.
Our computational topology framework potentially has broad applicability to other complex phenotypes. By analyzing the persistent homology features, the higher-order topological features that may be closely associated with a specific phenotype can be identified. We plan to expand this preliminary study to systematically analyze the topological features of the large-scale disease–gene networks (; Guney et al., 2016). We expect that there are several mathematical ways to expand our persistent homology approach in PPI networks. The weighted topology () of weighted PPI networks reflecting proteome-wide binding affinity and concentration information should provide more biologically plausible and reliable information. In addition, relational persistent homology () may be a useful tool for dissecting multispecies data, such as multiomics data or multilayer biological networks.
Notwithstanding, this study has several potential limitations. While the conventional disease module concept typically relies on connected components (0th homology class) of disease seeds, the proposed persistent disease module concept is a higher-order generalization of the LCC. Hence, it is hard to directly compare our homology approach to most other disease module identification algorithms. Therefore, it is essential to develop higher-order versions of seed-expanding algorithms to detect robust and reliable persistent disease modules. The role of seed connectors () in homology features also warrants elucidation. As the uncertainty and incompleteness of GWAS and PPI network data are inevitable (), how these errors and uncertainty affect the robustness of persistent disease modules remains unclear. Although no higher-dimensional hole (n ≥ 2) was present in our T2D GWAS hit data, higher-order interactions may play a significant role in disease pathobiology. Dynamic topological data analysis approaches based on sequential data would provide more rigorous and robust results (). Tissue- or cell-type-specific networks () should provide more biological information regarding persistent disease modules. Determining whether oncogenic mutations perturb PPI or higher-order interactions in the PPI network is a worthwhile endeavor ().
Statements
Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.
Author contributions
ES: Conceptualization, Formal Analysis, Investigation, Methodology, Visualization, Writing–original draft.
Acknowledgments
The author would like to thank Ruisheng Wang (Brigham and Women’s Hospital) for kindly sharing the consolidated human protein–protein interactome data and the reviewers for their valuable comments.
Conflict of interest
The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2023.1270185/full#supplementary-material
References
1
BacciniF.GeraciF.BianconiG. (2022). Weighted simplicial complexes and their representation power of higher-order network data and topology. Phys. Rev. E106, 034319. 10.1103/PhysRevE.106.034319
2
BarabasiA. L.GulbahceN.LoscalzoJ. (2011). Network medicine: a network-based approach to human disease. Nat. Rev. Genet.12, 56–68. 10.1038/nrg2918
3
Barrio-HernandezI.SchwartzentruberJ.ShrivastavaA.del ToroN.GonzalezA.ZhangQ.et al (2023). Network expansion of genetic associations defines a pleiotropy map of human cell biology. Nat. Genet.55, 389–398. 10.1038/s41588-023-01327-9
4
ChengF.ZhaoJ.WangY.LuW.LiuZ.ZhouY.et al (2021). Comprehensive characterization of protein-protein interactions perturbed by disease mutations. Nat. Genet.53, 342–353. 10.1038/s41588-020-00774-y
5
ChoobdarS.AhsenM. E.CrawfordJ.TomasoniM.FangT.LamparterD.et al (2019). Assessment of network module identification across complex diseases. Nat. Methods16, 843–852. 10.1038/s41592-019-0509-5
6
CiocanelM.-V.JuenemannR.DawesA. T.McKinleyS. A. (2021). Topological data analysis approaches to uncovering the timing of ring structure onset in filamentous networks. Bull. Math. Biol.83, 21. 10.1007/s11538-020-00847-3
7
FangZ.LiuX.PeltzG. (2022). GSEApy: a comprehensive package for performing gene set enrichment analysis in Python. Bioinformatics39, btac757. 10.1093/bioinformatics/btac757
8
GhiassianS. D.MencheJ.BarabasiA. L. (2015). A disease module detection (diamond) algorithm derived from a systematic analysis of connectivity patterns of disease proteins in the human interactome. PLoS Comput. Biol.11, e1004120. 10.1371/journal.pcbi.1004120
9
GohK. I.CusickM. E.ValleD.ChildsB.VidalM.BarabasiA. L. (2007). The human disease network. Proc. Natl. Acad. Sci. U S A.104, 8685–8690. 10.1073/pnas.0701361104
10
GreeneC. S.KrishnanA.WongA. K.RicciottiE.ZelayaR. A.HimmelsteinD. S.et al (2015). Understanding multicellular function and disease with human tissue-specific networks. Nat. Genet.47, 569–576. 10.1038/ng.3259
11
GuneyE.MencheJ.VidalM.BarabasiA. L. (2016). Network-based in silico drug efficacy screening. Nat. Commun.7, 10331. 10.1038/ncomms10331
12
HatcherA. (2002). Algebraic topology. Cambridge University Press.
13
HouS.ZhangP.YangK.WangL.MaC.LiY.et al (2022). Decoding multilevel relationships with the human tissue-cell-molecule network. Briefings Bioinforma.23, bbac170. 10.1093/bib/bbac170
14
KuleshovM. V.JonesM. R.RouillardA. D.FernandezN. F.DuanQ.WangZ.et al (2016). Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res.44, W90–W97. 10.1093/nar/gkw377
15
LeopoldJ. A.LoscalzoJ. (2018). Emerging role of precision medicine in cardiovascular disease. Circ. Res.122, 1302–1315. 10.1161/CIRCRESAHA.117.310782
16
MasoomyH.AskariB.TajikS.RiziA. K.JafariG. R. (2021). Topological analysis of interaction patterns in cancer-specific gene regulatory network: persistent homology approach. Sci. Rep.11, 16414. 10.1038/s41598-021-94847-5
17
MencheJ.SharmaA.KitsakM.GhiassianS. D.VidalM.LoscalzoJ.et al (2015). Disease networks. uncovering disease-disease relationships through the incomplete interactome. Science347, 1257601. 10.1126/science.1257601
18
OtterN.PorterM. A.TillmannU.GrindrodP.HarringtonH. A. (2017). A roadmap for the computation of persistent homology. EPJ Data Sci.6, 17. 10.1140/epjds/s13688-017-0109-5
19
PaciP.FisconG.ConteF.WangR. S.FarinaL.LoscalzoJ. (2021). Gene co-expression in the interactome: moving from correlation toward causation via an integrated approach to disease module discovery. NPJ Syst. Biol. Appl.7, 3. 10.1038/s41540-020-00168-0
20
RatnakumarA.WeinholdN.MarJ. C.RiazN. (2020). Protein-protein interactions uncover candidate ’core genes’ within omnigenic disease networks. PLoS Genet.16, e1008903. 10.1371/journal.pgen.1008903
21
RuizC.ZitnikM.LeskovecJ. (2021). Identification of disease treatment mechanisms through the multiscale interactome. Nat. Commun.12, 1796. 10.1038/s41467-021-21770-8
22
SalnikovV.CasseseD.LambiotteR. (2019). Simplicial complexes and complex systems. Eur. J. Phys.40, 014001. 10.1088/1361-6404/aae790
23
SizemoreA. E.Phillips-CreminsJ. E.GhristR.BassettD. S. (2019). The importance of the whole: topological data analysis for the network neuroscientist. Netw. Neurosci.3, 656–673. 10.1162/netn_a_00073
24
SongE.WangR. S.LeopoldJ. A.LoscalzoJ. (2020). Network determinants of cardiovascular calcification and repositioned drug treatments. FASEB J.34, 11087–11100. 10.1096/fj.202001062R
25
StolzB. J.DhesiJ.BullJ. A.HarringtonH. A.ByrneH. M.YoonI. H. R. (2023). Relational persistent homology for multispecies data with application to the tumor microenvironment. arXiv 2308.06205. 10.48550/arXiv.2308.06205
26
VlaicS.ConradT.Tokarski-SchnelleC.GustafssonM.DahmenU.GuthkeR.et al (2018). Modulediscoverer: identification of regulatory modules in protein-protein interaction networks. Sci. Rep.8, 433. 10.1038/s41598-017-18370-2
27
VujkovicM.KeatonJ. M.LynchJ. A.MillerD. R.ZhouJ.TcheandjieuC.et al (2020). Discovery of 318 new risk loci for type 2 diabetes and related vascular outcomes among 1.4 million participants in a multi-ancestry meta-analysis. Nat. Genet.52, 680–691. 10.1038/s41588-020-0637-y
28
WangR. S.LoscalzoJ. (2021). Network module-based drug repositioning for pulmonary arterial hypertension. CPT Pharmacometrics Syst. Pharmacol.10, 994–1005. 10.1002/psp4.12670
29
WangR. S.LoscalzoJ. (2018). Network-based disease module discovery by a novel seed connector algorithm with pathobiological implications. J. Mol. Biol.430, 2939–2950. 10.1016/j.jmb.2018.05.016
30
WangR. S.LoscalzoJ. (2023). Uncovering common pathobiological processes between covid-19 and pulmonary arterial hypertension by integrating omics data. Pulm. Circ.13, e12191. 10.1002/pul2.12191
31
WeiM.BrandhorstS.ShelehchiM.MirzaeiH.ChengC. W.BudniakJ.et al (2017). Fasting-mimicking diet and markers/risk factors for aging, diabetes, cancer, and cardiovascular disease. Sci. Transl. Med.9, eaai8700. 10.1126/scitranslmed.aai8700
32
YangK.WangR.LiuG.ShuZ.WangN.ZhangR.et al (2019). Hergepred: heterogeneous network embedding representation for disease gene prediction. IEEE J. Biomed. Health Inf.23, 1805–1815. 10.1109/JBHI.2018.2870728
33
YangK.ZhengY.LuK.ChangK.WangN.ShuZ.et al (2022). Pdgnet: predicting disease genes using a deep neural network with multi-view features. IEEE/ACM Trans. Comput. Biol. Bioinforma.19, 575–584. 10.1109/TCBB.2020.3002771
34
ZitnikM.AgrawalM.LeskovecJ. (2018). Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics34, i457–i466. 10.1093/bioinformatics/bty294
Summary
Keywords
persistent homology, topological data analysis, disease module, genome-wide association study, network medicine, systems biology, algebraic topology algorithms
Citation
Song E (2023) Persistent homology analysis of type 2 diabetes genome-wide association studies in protein–protein interaction networks. Front. Genet. 14:1270185. doi: 10.3389/fgene.2023.1270185
Received
01 August 2023
Accepted
12 September 2023
Published
26 September 2023
Volume
14 - 2023
Edited by
Zhi-Ping Liu, Shandong University, China
Reviewed by
Kuo Yang, Beijing Jiaotong University, China
Bing Li, China Academy of Chinese Medical Sciences, China
Updates
Copyright
© 2023 Song.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Euijun Song, drjunsong@gmail.com
ORCID: Euijun Song, orcid.org/0000-0002-5886-4210
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.