ORIGINAL RESEARCH article
Front. Bioinform.
Sec. Integrative Bioinformatics
Volume 5 - 2025 | doi: 10.3389/fbinf.2025.1651623
This article is part of the Research TopicAI in Integrative BioinformaticsView all 4 articles
Inferred global dense residue transition graphs from primary structure sequences enable protein interaction prediction via directed graph convolutional neural networks
Provisionally accepted- 1Texas Woman's University, Denton, United States
- 2The University of Texas Rio Grande Valley, Brownsville, United States
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Accurate prediction of protein-protein interactions (PPIs) is crucial to understanding cellular functions and advancing drug discovery. While some in-silico methods leverage direct sequence embeddings from Protein Language Models (PLMs), others apply Graph Neural Networks (GNNs) to topological features from PPI networks or 3D protein structures. Here, we introduce a novel approach ProtGram, that models protein primary structure through a hierarchy of globally inferred n-gram graphs. For each n-gram level, residue transition probabilities, aggregated from a large sequence corpus, define the edge weights of an underlying directed graph of paired residues. We propose a custom directed graph convolutional neural network, DirectGCN, featuring a unique convolutional layer that processes information through separate path-specific (incoming, outgoing, undirected) and shared transformations. These components are then combined via a learnable gating mechanism scoring each aggregated path. The efficacy of DirectGCN is established on standard node classification benchmarks, where its performance is on par with established methods on general benchmark datasets yet highlights its specialization for complex, directed, limited and dense heterophilic graph structures. Subsequently, DirectGCN is applied to the hierarchy of n-gram graphs ProtGram to learn residue-level embeddings, which are then pooled via an attention mechanism to generate protein-level embeddings. We name our two-stage graph representation learning framework ProtGram-DirectGCN. The main focus of this study is to investigate less intensive alternative models to PLMs for the downstream task of PPI prediction via link prediction. Our method achieves robust predictive power, suggesting that our globally inferred directed graph-based representation of global sequence transitions offers a potent and computationally distinct alternative to resource-intensive PLMs on the task of PPI prediction. Future work includes testing ProtGram-DirectGCN on a wide range of bioinformatics tasks.
Keywords: UniProt, BioGrid, russellab, graph theory, graph representation learning, Graph neural networks, Graph convolution networks, Link prediction
Received: 22 Jun 2025; Accepted: 22 Sep 2025.
Copyright: © 2025 Ebeid, Tang and Gu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Islam Akef Ebeid, iebeid@twu.edu
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.